Skip to content

Deliberately opt eligible Incompatible expressions into codegen dispatch with test coverage #4596

@andygrove

Description

@andygrove

Background

PR #4538 introduced an opt-in CodegenDispatchFallback marker trait. When an expression reports Incompatible and the user has not enabled allowIncompatible, mixing in the trait routes it through the JVM codegen dispatcher (Spark's own doGenCode inside the Comet pipeline) so the projection stays native while matching Spark exactly, instead of falling the whole projection back to Spark.

Before that, enrollment was opt-out, which silently routed a number of native Incompatible expressions through the dispatcher without deliberate review or test coverage. Those were reverted to Spark fallback (matching main) in PR #4538. This issue tracks deliberately opting them back in, one at a time, each with explicit test coverage proving the dispatched path matches Spark.

Candidate expressions

These report Incompatible for some inputs, have a real Spark doGenCode (so the dispatcher can accept them), and currently fall back to Spark:

  • concat (CometConcat) - non-UTF8_BINARY collation
  • array_intersect (CometArrayIntersect)
  • array_except (CometArrayExcept)
  • array_join (CometArrayJoin)
  • from_utc_timestamp (CometFromUTCTimestamp)
  • to_utc_timestamp (CometToUTCTimestamp)
  • convert_timezone (CometConvertTimezone)
  • sort_array (CometSortArray) - verify dispatcher eligibility first; this one may be CodegenFallback in Spark, in which case the dispatcher refuses it and it should stay on Spark fallback

What to do per expression

  1. Confirm the dispatcher accepts it (not CodegenFallback, supported input/output types per CometBatchKernelCodegen.canHandle). If refused, leave it as Spark fallback and note it.
  2. Mix CodegenDispatchFallback into the serde that is registered in QueryPlanSerde (not a delegate; see how Reverse registers CometReverse, which delegates to CometArrayReverse).
  3. Add tests asserting native execution that matches Spark for the Incompatible inputs (checkSparkAnswerAndOperator / query in SQL file tests, replacing the expect_fallback assertion).
  4. For the timezone-sensitive expressions, test across multiple session time zones, since dispatch correctness depends on the resolved timeZoneId surviving closure serialization.

Notes

The expressions that have no native implementation and extend CometCodegenDispatch (hypot, bround, sequence, elt, ...) are out of scope: they report Compatible and already dispatch.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions