Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
1746bcc
feat: Arrow-direct codegen dispatcher for Spark expressions and Scala…
mbutrovich May 8, 2026
08d6b78
prettier, add new suites to CI checks.
mbutrovich May 8, 2026
557752e
make format, fix shims for 4.0+
mbutrovich May 8, 2026
896f61f
make format, fix shims for 4.0+
mbutrovich May 8, 2026
a82e160
Merge branch 'main' into codegen_scala_udf
mbutrovich May 8, 2026
2a158f4
strengthen tests for composed expressions
mbutrovich May 8, 2026
654bbad
make format, again.
mbutrovich May 8, 2026
10df7e0
fix pr_benchmark_check.yml
mbutrovich May 8, 2026
7afe69f
fix arrow shading issue in CI.
mbutrovich May 8, 2026
0dc5855
fix Spark 4.0 collation expression shim
mbutrovich May 8, 2026
43a7b0c
apply common subexpression elimination, add tests for subqueries in UDFs
mbutrovich May 8, 2026
9640897
make format
mbutrovich May 8, 2026
f0c8296
decimal fast path. document 64KB limitation right now
mbutrovich May 9, 2026
2173f40
pass through task context to get around tokio worker pool calling ove…
mbutrovich May 9, 2026
2f9585b
fix compilation on scala 2.12, fix format issue
mbutrovich May 9, 2026
582cd17
Merge branch 'main' into codegen_scala_udf
mbutrovich May 9, 2026
22f3256
decimal output, utf8 output, non-nullable output optimizations
mbutrovich May 9, 2026
7666715
optimization menu
mbutrovich May 9, 2026
0a34636
estimate binaryview and binary size
mbutrovich May 9, 2026
e94b6db
fix "CSE collapses a repeated subtree to one evaluation in the genera…
mbutrovich May 9, 2026
d0f1f27
Merge remote-tracking branch 'origin/codegen_scala_udf' into codegen_…
mbutrovich May 9, 2026
07e37ea
add some complex type support, remove #4239 code. update docs.
mbutrovich May 9, 2026
ebf77c4
split codegen input and output, basic struct WIP
mbutrovich May 9, 2026
6836c30
split massive codegen file, handle recursive nested types
mbutrovich May 9, 2026
5d91a8f
map input
mbutrovich May 9, 2026
2a28aaf
more struct support
mbutrovich May 9, 2026
0c6586a
revert some benchmark changes
mbutrovich May 9, 2026
8497fe7
cleanup part 1
mbutrovich May 10, 2026
8d703c3
cleanup part 2
mbutrovich May 10, 2026
5ec0e3f
cleanup part 3
mbutrovich May 10, 2026
a22051e
remove view support, it's dead code right now
mbutrovich May 10, 2026
421c60c
use cometplainvector part 1
mbutrovich May 10, 2026
0705dff
use cometplainvector part 2
mbutrovich May 10, 2026
9a00874
make generated class final
mbutrovich May 10, 2026
d7b43fc
clean up test names
mbutrovich May 10, 2026
034e1f5
fix format
mbutrovich May 11, 2026
317feaf
Merge branch 'main' into codegen_scala_udf
mbutrovich May 11, 2026
db1f1f2
Merge branch 'main' into codegen_scala_udf
mbutrovich May 12, 2026
caffed9
fix 2.12 mapvalues usage
mbutrovich May 12, 2026
4be8144
Remove code related to #4239.
mbutrovich May 12, 2026
6fcd81c
Merge remote-tracking branch 'apache/main' into codegen_scala_udf
mbutrovich May 14, 2026
9f8aa07
fix after merging in upstream/main.
mbutrovich May 14, 2026
17b2714
switch to taskid-keyed state for CometUDFs.
mbutrovich May 14, 2026
ff8ee79
Merge branch 'main' into codegen_scala_udf
mbutrovich May 14, 2026
7ed806a
reduce the scope to just ScalaUDF instead of general spark expression…
mbutrovich May 14, 2026
6ff5aa0
update docs
mbutrovich May 14, 2026
935aec6
reorg codegen
mbutrovich May 14, 2026
cbf96df
more tests
mbutrovich May 14, 2026
5966055
cleanup
mbutrovich May 15, 2026
748f943
document optimizations
mbutrovich May 15, 2026
f9318d8
fix tests
mbutrovich May 15, 2026
19ac9f6
try to trim comments a bit
mbutrovich May 15, 2026
13270bf
update two tests
mbutrovich May 15, 2026
1111c6f
revert unintended diff from main
mbutrovich May 15, 2026
61ae5b7
add Java UDF test
mbutrovich May 15, 2026
6643208
update stale TODO references
mbutrovich May 15, 2026
965c2ba
better input fuzz coverage
mbutrovich May 15, 2026
948f3b9
better input fuzz coverage
mbutrovich May 15, 2026
41fc046
better input fuzz coverage
mbutrovich May 15, 2026
25c2511
simplify input logic
mbutrovich May 15, 2026
a057687
fix format
mbutrovich May 15, 2026
650f619
add fallback for too many args and a test, clean up printing code
mbutrovich May 15, 2026
b1e1c55
stronger tests
mbutrovich May 15, 2026
0f6f68c
Merge branch 'main' into codegen_scala_udf
mbutrovich May 15, 2026
d967143
fix(udf): scope the dispatcher's compile cache per task to isolate bo…
mbutrovich May 15, 2026
10da742
update docs
mbutrovich May 15, 2026
23df354
add missing suite
mbutrovich May 15, 2026
b161169
synchronize per-task UDF evaluation
mbutrovich May 16, 2026
f86e70b
Merge branch 'main' into codegen_scala_udf
mbutrovich May 16, 2026
dca8b22
update spark diffs
mbutrovich May 16, 2026
b1fbbb8
Merge branch 'main' into codegen_scala_udf
mbutrovich May 18, 2026
2be5f73
upmerge main, regenerate diffs
mbutrovich May 18, 2026
4d471e1
Merge branch 'main' into codegen_scala_udf
mbutrovich May 18, 2026
e19683e
cleanup round 1
mbutrovich May 18, 2026
ec42809
cleanup round 2
mbutrovich May 18, 2026
9089fa1
remove benchmark
mbutrovich May 18, 2026
2259ff6
remove cast from JNI layer that was a bandaid for List types
mbutrovich May 18, 2026
83096e7
Merge branch 'main' into codegen_scala_udf
mbutrovich May 18, 2026
5ee1ddf
fix scala 2.12
mbutrovich May 18, 2026
2102f62
Merge remote-tracking branch 'origin/codegen_scala_udf' into codegen_…
mbutrovich May 18, 2026
e98164c
set config to false by default since it's experimental
mbutrovich May 18, 2026
ca4cd41
Update fallback message.
mbutrovich May 18, 2026
c12096e
mbutrovich May 18, 2026
a159357
roll back diff changes
mbutrovich May 19, 2026
a68ba53
Merge branch 'main' into codegen_scala_udf
mbutrovich May 19, 2026
8a651e5
Merge branch 'main' into codegen_scala_udf
mbutrovich May 19, 2026
63573ba
address PR feedback
mbutrovich May 19, 2026
c9d2960
tighten comments, fix planner.rs builder changes to align to codebase…
mbutrovich May 20, 2026
41ea025
Merge branch 'main' into codegen_scala_udf
mbutrovich May 20, 2026
3edba99
swap init and process in CometBatchKernel
mbutrovich May 20, 2026
79a4e98
fix format
mbutrovich May 20, 2026
58757cb
update shading comments after #4325
mbutrovich May 20, 2026
0b57f11
clean up more comments
mbutrovich May 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ jobs:
org.apache.comet.CometFuzzAggregateSuite
org.apache.comet.CometFuzzIcebergSuite
org.apache.comet.CometFuzzMathSuite
org.apache.comet.CometCodegenFuzzSuite
org.apache.comet.DataGeneratorSuite
- name: "shuffle"
value: |
Expand Down Expand Up @@ -380,6 +381,9 @@ jobs:
org.apache.comet.expressions.conditional.CometIfSuite
org.apache.comet.expressions.conditional.CometCoalesceSuite
org.apache.comet.expressions.conditional.CometCaseWhenSuite
org.apache.comet.CometCodegenSuite
org.apache.comet.CometCodegenSourceSuite
org.apache.comet.CometCodegenHOFSuite
- name: "sql"
value: |
org.apache.spark.sql.CometToPrettyStringSuite
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ jobs:
org.apache.comet.CometFuzzAggregateSuite
org.apache.comet.CometFuzzIcebergSuite
org.apache.comet.CometFuzzMathSuite
org.apache.comet.CometCodegenFuzzSuite
org.apache.comet.DataGeneratorSuite
- name: "shuffle"
value: |
Expand Down Expand Up @@ -232,6 +233,9 @@ jobs:
org.apache.comet.expressions.conditional.CometIfSuite
org.apache.comet.expressions.conditional.CometCoalesceSuite
org.apache.comet.expressions.conditional.CometCaseWhenSuite
org.apache.comet.CometCodegenSuite
org.apache.comet.CometCodegenSourceSuite
org.apache.comet.CometCodegenHOFSuite
- name: "sql"
value: |
org.apache.spark.sql.CometToPrettyStringSuite
Expand Down
18 changes: 18 additions & 0 deletions docs/source/user-guide/latest/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,24 @@ The following scenarios will fall back to Spark's native Iceberg reader:
- Dynamic Partition Pruning under Adaptive Query Execution (non-AQE DPP is supported);
see [#3510](https://github.com/apache/datafusion-comet/issues/3510)

### Iceberg UDFs

Iceberg ships several `ScalaUDF`s that surface in user queries and maintenance actions:

- `IcebergSpark.registerBucketUDF` and `registerTruncateUDF` register `bucket(N, col)` and
`truncate(W, col)` for use in `SELECT` / `JOIN` / `WHERE` predicates that align with hidden
partitioning.
- `RewriteDataFiles` with `sort-strategy=zorder` builds a tree of per-type ordered-bytes UDFs
(`INT_ORDERED_BYTES`, `LONG_ORDERED_BYTES`, ..., `INTERLEAVE_BYTES`) over the sort key columns
during compaction.

By default these UDFs cause the enclosing operator to fall back to Spark, which forces a
columnar-to-row roundtrip and demotes the surrounding shuffle from `CometExchange` to
`CometColumnarExchange`. Enabling the experimental
[Scala UDF and Java UDF Support](scala_java_udfs.md) feature
(`spark.comet.exec.scalaUDF.codegen.enabled=true`) routes these UDFs through native execution so
the project, exchange, and sort operators around them stay on the Comet path end-to-end.

### Task input metrics

The native Iceberg reader populates Spark's task-level `inputMetrics.bytesRead` (visible in the Spark UI Stages tab) using the `bytes_read` counter from iceberg-rust's `ScanMetrics`. This counter includes bytes read from both data files and delete files.
Expand Down
1 change: 1 addition & 0 deletions docs/source/user-guide/latest/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ to read more.
Supported Data Types <datatypes>
Supported Operators <operators>
Supported Expressions <expressions>
ScalaUDF and Java UDF Support <scala_java_udfs>
Configuration Settings <configs>
Compatibility Guide <compatibility/index>
Understanding Comet Plans <understanding-comet-plans>
Expand Down
61 changes: 61 additions & 0 deletions docs/source/user-guide/latest/scala_java_udfs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Scala UDF and Java UDF Support

Comet executes Spark's Scala and Java [scalar user-defined functions (UDFs)](https://spark.apache.org/docs/latest/sql-ref-functions-udf-scalar.html) on the native Comet path. The presence of a UDF does not force the enclosing operator off the native path; surrounding native operators stay native.

This page covers Spark's `ScalaUDF` (Scala `udf(...)`, `spark.udf.register(...)` over Scala or Java functional interfaces, and SQL `CREATE FUNCTION ... AS 'com.example.MyUDF'`). Other UDF kinds (Python / Pandas, Hive, aggregate) are out of scope and continue to fall back to Spark.

This feature is experimental and disabled by default.

## Configuration

| Key | Default | Description |
| ------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------ |
| `spark.comet.exec.scalaUDF.codegen.enabled` | `false` | When `true`, eligible `ScalaUDF`s run on the Comet path. When `false`, the enclosing operator falls back to Spark. |

## Supported

- User functions registered via `udf(...)`, `spark.udf.register(...)` (Scala or Java functional interfaces), or SQL `CREATE FUNCTION ... AS 'com.example.MyUDF'`.
- Scalar input/output types: `Boolean`, `Byte`, `Short`, `Int`, `Long`, `Float`, `Double`, `Decimal`, `String`, `Binary`, `Date`, `Timestamp`, `TimestampNTZ`.
- Complex input/output types with arbitrary nesting: `ArrayType`, `StructType`, `MapType`.
- Composition with other Catalyst expressions inside the argument tree (e.g. `myUdf(upper(s))` runs as one native unit).
- Higher-order functions (`transform`, `filter`, `exists`, `aggregate`, `zip_with`, `map_filter`, `map_zip_with`, etc.) inside the argument tree.

## Not supported

- Aggregate UDFs (`ScalaAggregator`, `TypedImperativeAggregate`, the legacy `UserDefinedAggregateFunction`).
- Table UDFs and generators.
- Python `@udf` and Pandas `@pandas_udf`.
- Hive `GenericUDF` and `SimpleUDF`.
- `CalendarIntervalType`, `NullType`, and `UserDefinedType` arguments and return types. UDT-typed columns fall back to Spark; for native execution, store and read the underlying representation directly (e.g. write MLlib `Vector` outputs as `Struct<type: Byte, size: Int, indices: Array<Int>, values: Array<Double>>` rather than `VectorUDT`).
- Trees whose total nested-field count (output plus all input columns the UDF tree references) exceeds `spark.sql.codegen.maxFields` (default 100). Comet refuses these at plan time and the operator falls back to Spark.

When a UDF is rejected, the reason surfaces through Comet's standard fallback diagnostics; the query still runs on Spark.

## Behavior

- Non-deterministic expressions referenced from the argument tree (`rand`, `uuid`, `monotonically_increasing_id`) produce per-partition sequences consistent with Spark.
- `TaskContext.get()` inside the user function returns the driving Spark task's context.
- The user function must be closure-serializable; the same function that works with Spark's executor execution works here.

## Known limitations

- Each query containing a ScalaUDF pays a one-time codegen cost on its first batch and reuses the compiled kernel for subsequent batches, matching Spark's whole-stage codegen behavior. Bytecode is deduped JVM-wide via the same `CodeGenerator` cache, so structurally identical queries across a session share the compiled class.
13 changes: 10 additions & 3 deletions native/core/src/execution/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,9 @@ impl PhysicalPlanner {
self
}

/// Attach the Spark `TaskContext` global reference captured at `createPlan` time. Cloned
/// into every `JvmScalarUdfExpr` the planner builds so the JNI bridge can install it as
/// the thread-local on the Tokio worker driving the UDF.
/// Attach a propagated Spark `TaskContext` global reference. Called by the JNI `executePlan`
/// entry with whatever was captured at `createPlan` time. The planner clones this `Option`
/// into every `JvmScalarUdfExpr` it builds.
pub fn with_task_context(
mut self,
task_context: Option<Arc<Global<JObject<'static>>>>,
Expand Down Expand Up @@ -742,6 +742,13 @@ impl PhysicalPlanner {
to_arrow_datatype(udf.return_type.as_ref().ok_or_else(|| {
GeneralError("JvmScalarUdf missing return_type".to_string())
})?);
// Invariant: task_context is propagated for every JvmScalarUdfExpr built during
// normal execution. The TEST_EXEC_CONTEXT_ID path is the only context in which
// task_context may legitimately be None (unit tests, direct native driver runs).
debug_assert!(
self.task_context.is_some() || self.exec_context_id == TEST_EXEC_CONTEXT_ID,
"task_context must be set for non-test execution"
);
Ok(Arc::new(JvmScalarUdfExpr::new(
udf.class_name.clone(),
args,
Expand Down
32 changes: 19 additions & 13 deletions native/spark-expr/src/jvm_udf/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ impl JvmScalarUdfExpr {
return_nullable: bool,
task_context: Option<Arc<Global<JObject<'static>>>>,
) -> Self {
debug_assert!(
!class_name.is_empty(),
"JvmScalarUdfExpr requires a non-empty class name"
);
Self {
class_name,
args,
Expand Down Expand Up @@ -120,10 +124,10 @@ impl PhysicalExpr for JvmScalarUdfExpr {
}

fn evaluate(&self, batch: &RecordBatch) -> DFResult<ColumnarValue> {
// Step 1: evaluate child expressions to get Arrow arrays. Scalar children
// (e.g. literal patterns) are sent as length-1 vectors rather than expanded
// to batch-row count, so the JVM bridge does not pay an O(rows) copy for
// values that never vary across the batch.
// Scalar children (e.g. literal patterns) are sent as length-1 vectors rather than
// expanded to batch-row count, so the JVM bridge does not pay an O(rows) copy for
// values that never vary across the batch. The JVM side gets `numRows` directly via
// the bridge so it doesn't need the scalar to carry batch length.
let arrays: Vec<ArrayRef> = self
.args
.iter()
Expand All @@ -133,7 +137,6 @@ impl PhysicalExpr for JvmScalarUdfExpr {
})
.collect::<DFResult<_>>()?;

// Step 2: allocate FFI structs on the Rust heap and collect their raw pointers.
// The JVM writes into the out_array/out_schema slots and reads from the in_ slots.
let in_ffi_arrays: Vec<Box<FFI_ArrowArray>> = arrays
.iter()
Expand All @@ -157,7 +160,13 @@ impl PhysicalExpr for JvmScalarUdfExpr {
.map(|b| b.as_ref() as *const FFI_ArrowSchema as i64)
.collect();

// Allocate output FFI slots.
debug_assert!(!self.class_name.is_empty(), "class_name must not be empty");
debug_assert_eq!(
in_arr_ptrs.len(),
in_sch_ptrs.len(),
"input array and schema pointer counts must match"
);

let mut out_array = Box::new(FFI_ArrowArray::empty());
let mut out_schema = Box::new(FFI_ArrowSchema::empty());
let out_arr_ptr = out_array.as_mut() as *mut FFI_ArrowArray as i64;
Expand All @@ -166,7 +175,6 @@ impl PhysicalExpr for JvmScalarUdfExpr {
let class_name = self.class_name.clone();
let n_args = arrays.len();

// Step 3: attach a JNI env for this thread and call the static bridge method.
JVMClasses::with_env(|env| {
let bridge = JVMClasses::get().comet_udf_bridge.as_ref().ok_or_else(|| {
CometError::from(ExecutionError::GeneralError(
Expand All @@ -176,12 +184,10 @@ impl PhysicalExpr for JvmScalarUdfExpr {
))
})?;

// Build the JVM String for the class name.
let jclass_name = env
.new_string(&class_name)
.map_err(|e| CometError::JNI { source: e })?;

// Build the long[] arrays for input pointers.
let in_arr_java = env
.new_long_array(n_args)
.map_err(|e| CometError::JNI { source: e })?;
Expand All @@ -196,9 +202,10 @@ impl PhysicalExpr for JvmScalarUdfExpr {
.set_region(env, 0, &in_sch_ptrs)
.map_err(|e| CometError::JNI { source: e })?;

// Pass a null jobject when no TaskContext was propagated so the bridge's null-guard
// leaves the worker thread's current TaskContext.get() in place. The borrow must
// outlive `call_static_method_unchecked`.
// Resolve the TaskContext reference once before building the arg array so the
// borrow lives until `call_static_method_unchecked` returns. When no TaskContext
// was propagated, pass a null object so the bridge's null-guard leaves the thread-
// local alone.
let null_task_context = JObject::null();
let task_context_ref: &JObject = match &self.task_context {
Some(gref) => gref.as_obj(),
Expand Down Expand Up @@ -229,7 +236,6 @@ impl PhysicalExpr for JvmScalarUdfExpr {
Ok(())
})?;

// Step 4: import the result from the FFI slots filled by the JVM.
// SAFETY: `*out_array` moves the FFI_ArrowArray out of the Box (the heap
// allocation is freed by the move), and `from_ffi` wraps it in an Arc that
// keeps the JVM-installed release callback alive until the resulting
Expand Down
61 changes: 61 additions & 0 deletions spark/src/main/java/org/apache/comet/codegen/CometBatchKernel.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.comet.codegen;

import org.apache.arrow.vector.FieldVector;
import org.apache.arrow.vector.ValueVector;

/**
* Abstract base extended by the Janino-compiled batch kernel emitted by {@code
* CometBatchKernelCodegen}. The generated subclass extends {@code CometInternalRow} (so Spark's
* {@code BoundReference.genCode} can call {@code this.getUTF8String(ord)} directly) and carries
* typed input fields baked at codegen time, one per input column. Expression evaluation plus Arrow
* read/write fuse into one method per expression tree.
*/
public abstract class CometBatchKernel extends CometInternalRow {

protected final Object[] references;

protected CometBatchKernel(Object[] references) {
this.references = references;
}

/**
* Run partition-dependent initialization. The generated subclass overrides this to execute
* statements collected via {@code CodegenContext.addPartitionInitializationStatement}, e.g.
* reseeding {@code Rand}'s {@code XORShiftRandom} from {@code seed + partitionIndex}.
* Deterministic expressions leave this as a no-op.
*
* <p>The caller invokes this before the first {@code process} call of each partition. The
* generated subclass is not thread-safe across concurrent {@code process} calls. The dispatcher
* allocates one per partition and serializes calls.
*/
public void init(int partitionIndex) {}

/**
* Process one batch.
*
* @param inputs Arrow input vectors. Length and concrete classes match the schema the kernel was
* compiled against.
* @param output Arrow output vector. Caller allocates to the expression's {@code dataType}.
* @param numRows number of rows in this batch
*/
public abstract void process(ValueVector[] inputs, FieldVector output, int numRows);
}
11 changes: 11 additions & 0 deletions spark/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,17 @@ object CometConf extends ShimCometConf {
.booleanConf
.createWithDefault(false)

val COMET_SCALA_UDF_CODEGEN_ENABLED: ConfigEntry[Boolean] =
conf("spark.comet.exec.scalaUDF.codegen.enabled")
.category(CATEGORY_EXEC)
.doc("Experimental. Whether to route Spark `ScalaUDF` expressions through Comet's " +
"Arrow-direct codegen dispatcher. When enabled, a supported ScalaUDF is compiled into " +
"a per-batch kernel that reads and writes Arrow vectors directly from native " +
"execution. When disabled, plans containing a ScalaUDF fall back to Spark for the " +
"enclosing operator.")
.booleanConf
.createWithDefault(false)

val COMET_EXEC_SHUFFLE_WITH_HASH_PARTITIONING_ENABLED: ConfigEntry[Boolean] =
conf("spark.comet.native.shuffle.partitioning.hash.enabled")
.category(CATEGORY_SHUFFLE)
Expand Down
Loading
Loading