Skip to content

feat(velox): Route Concat expression through Velox backend#12099

Open
minni31 wants to merge 1 commit into
apache:mainfrom
minni31:oss/velox-concat-support
Open

feat(velox): Route Concat expression through Velox backend#12099
minni31 wants to merge 1 commit into
apache:mainfrom
minni31:oss/velox-concat-support

Conversation

@minni31
Copy link
Copy Markdown

@minni31 minni31 commented May 15, 2026

Context

Spark's Concat expression supports StringType, BinaryType, and ArrayType inputs. Currently in Gluten, Concat falls through the generic expression transformer without any type-specific handling. This PR adds explicit routing with proper edge case handling for the Velox backend.

What

Add genConcatTransformer to SparkPlanExecApi and override it in VeloxSparkPlanExecApi with:

  • StringType / BinaryType: Offloaded to Velox. Velox's concat uses defaultNullBehavior=true (returns NULL when any input is NULL), matching Spark's null-in-null-out semantics.
  • ArrayType: Falls back to Spark. Velox returns NULL if ANY input is NULL, but Spark 3.4+ (SPARK-41296) skips NULL array inputs and only returns NULL when ALL inputs are NULL.
  • Zero arguments: Falls back to Spark (Velox requires at least 1 argument).
  • Single argument: Returns the child expression directly (identity optimization; Velox requires at least 2 arguments for concat).

Changes

  • SparkPlanExecApi.scala: Add genConcatTransformer with default GenericExpressionTransformer implementation
  • ExpressionConverter.scala: Add explicit case c: Concat => routing to backend API
  • VeloxSparkPlanExecApi.scala: Override with type checks and edge case handling
  • VeloxStringFunctionsSuite.scala: Enhanced tests for null handling, single-arg identity, zero-arg fallback, ArrayType fallback, and BinaryType support

Add explicit Concat expression routing from Spark to Velox with proper
type checking and edge case handling:

- StringType and BinaryType: offloaded to Velox (null-in-null-out
  semantics match Spark)
- ArrayType: falls back to Spark (Velox returns NULL if ANY input is
  NULL, but Spark 3.4+ skips NULL arrays per SPARK-41296)
- Zero arguments: falls back to Spark (Velox requires at least 1 arg)
- Single argument: returns the child directly (identity optimization,
  Velox requires at least 2 args for concat)

Changes:
- SparkPlanExecApi: add genConcatTransformer with default implementation
- ExpressionConverter: add explicit case for Concat routing
- VeloxSparkPlanExecApi: override with type checks and edge case handling
- VeloxStringFunctionsSuite: add tests for null handling, single-arg,
  zero-arg fallback, ArrayType fallback, and BinaryType support

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant