Background
PR #4538 introduced an opt-in CodegenDispatchFallback marker trait. When an expression reports Incompatible and the user has not enabled allowIncompatible, mixing in the trait routes it through the JVM codegen dispatcher (Spark's own doGenCode inside the Comet pipeline) so the projection stays native while matching Spark exactly, instead of falling the whole projection back to Spark.
Before that, enrollment was opt-out, which silently routed a number of native Incompatible expressions through the dispatcher without deliberate review or test coverage. Those were reverted to Spark fallback (matching main) in PR #4538. This issue tracks deliberately opting them back in, one at a time, each with explicit test coverage proving the dispatched path matches Spark.
Candidate expressions
These report Incompatible for some inputs, have a real Spark doGenCode (so the dispatcher can accept them), and currently fall back to Spark:
What to do per expression
- Confirm the dispatcher accepts it (not
CodegenFallback, supported input/output types per CometBatchKernelCodegen.canHandle). If refused, leave it as Spark fallback and note it.
- Mix
CodegenDispatchFallback into the serde that is registered in QueryPlanSerde (not a delegate; see how Reverse registers CometReverse, which delegates to CometArrayReverse).
- Add tests asserting native execution that matches Spark for the
Incompatible inputs (checkSparkAnswerAndOperator / query in SQL file tests, replacing the expect_fallback assertion).
- For the timezone-sensitive expressions, test across multiple session time zones, since dispatch correctness depends on the resolved
timeZoneId surviving closure serialization.
Notes
The expressions that have no native implementation and extend CometCodegenDispatch (hypot, bround, sequence, elt, ...) are out of scope: they report Compatible and already dispatch.
Background
PR #4538 introduced an opt-in
CodegenDispatchFallbackmarker trait. When an expression reportsIncompatibleand the user has not enabledallowIncompatible, mixing in the trait routes it through the JVM codegen dispatcher (Spark's owndoGenCodeinside the Comet pipeline) so the projection stays native while matching Spark exactly, instead of falling the whole projection back to Spark.Before that, enrollment was opt-out, which silently routed a number of native
Incompatibleexpressions through the dispatcher without deliberate review or test coverage. Those were reverted to Spark fallback (matchingmain) in PR #4538. This issue tracks deliberately opting them back in, one at a time, each with explicit test coverage proving the dispatched path matches Spark.Candidate expressions
These report
Incompatiblefor some inputs, have a real SparkdoGenCode(so the dispatcher can accept them), and currently fall back to Spark:concat(CometConcat) - non-UTF8_BINARY collationarray_intersect(CometArrayIntersect)array_except(CometArrayExcept)array_join(CometArrayJoin)from_utc_timestamp(CometFromUTCTimestamp)to_utc_timestamp(CometToUTCTimestamp)convert_timezone(CometConvertTimezone)sort_array(CometSortArray) - verify dispatcher eligibility first; this one may beCodegenFallbackin Spark, in which case the dispatcher refuses it and it should stay on Spark fallbackWhat to do per expression
CodegenFallback, supported input/output types perCometBatchKernelCodegen.canHandle). If refused, leave it as Spark fallback and note it.CodegenDispatchFallbackinto the serde that is registered inQueryPlanSerde(not a delegate; see howReverseregistersCometReverse, which delegates toCometArrayReverse).Incompatibleinputs (checkSparkAnswerAndOperator/queryin SQL file tests, replacing theexpect_fallbackassertion).timeZoneIdsurviving closure serialization.Notes
The expressions that have no native implementation and extend
CometCodegenDispatch(hypot,bround,sequence,elt, ...) are out of scope: they reportCompatibleand already dispatch.