-
Notifications
You must be signed in to change notification settings - Fork 69
Add Java vectorized scalar function support #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
4707012
add java vectorized scalar function support
lfkpoa c12a32e
refine scalar udf type parsing and isolate jni scalar bridge
lfkpoa c83bdcf
Align Java scalar UDF registration with C API-first wrapper flow
lfkpoa 925981b
Refine Java scalar UDF callback API and migrate tests
lfkpoa 211d7c1
Implement and rework Java scalar UDF APIs around context callbacks
lfkpoa cfa37ba
Use long indices in scalar chunk readers
lfkpoa 590aa3d
Sync Windows export list with current JNI symbols
lfkpoa 68a7f91
Refine scalar UDF callback API, runtime model, and 128-bit support
lfkpoa caa02cd
Apply formatting-only cleanup after format-check
lfkpoa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,202 @@ | ||
| # Java Scalar Functions (UDF) | ||
|
|
||
| Use `DuckDBFunctions.scalarFunction()` to build and register scalar functions. | ||
| `register(java.sql.Connection)` returns a `DuckDBRegisteredFunction` with metadata about the registered scalar function. | ||
|
|
||
| Registered functions are also tracked in a Java-side registry exposed by `DuckDBDriver.registeredFunctions()`. | ||
| This registry is bookkeeping for functions registered through the JDBC API, not an authoritative view of the DuckDB catalog. | ||
| `DuckDBDriver.clearFunctionsRegistry()` clears only the Java-side registry and does not de-register functions from DuckDB. | ||
|
|
||
| ## Recommended API (Functional Interfaces) | ||
|
|
||
| Use these overloads for simple functions: | ||
|
|
||
| - `withFunction(Supplier<?>)` for zero arguments | ||
| - `withFunction(Function<?, ?>)` for one argument | ||
| - `withFunction(BiFunction<?, ?, ?>)` for two arguments | ||
| - `withIntFunction(IntUnaryOperator | IntBinaryOperator)` for `INTEGER` unary/binary functions | ||
| - `withLongFunction(LongUnaryOperator | LongBinaryOperator)` for `BIGINT` unary/binary functions | ||
| - `withDoubleFunction(DoubleUnaryOperator | DoubleBinaryOperator)` for `DOUBLE` unary/binary functions | ||
|
|
||
| ### Simple example (`withIntFunction`) | ||
|
|
||
| ```java | ||
| try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) { | ||
| DuckDBRegisteredFunction function = DuckDBFunctions.scalarFunction() | ||
| .withName("java_add_one") | ||
| .withParameter(Integer.class) | ||
| .withReturnType(Integer.class) | ||
| .withIntFunction(x -> x + 1) | ||
| .register(conn); | ||
| } | ||
| ``` | ||
|
|
||
| ```sql | ||
| SELECT java_add_one(41); | ||
| ``` | ||
|
|
||
| ### Slightly more complex example (`withDoubleFunction`) | ||
|
|
||
| ```java | ||
| try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) { | ||
| DuckDBFunctions.scalarFunction() | ||
| .withName("java_weighted_sum") | ||
| .withParameter(Double.class) | ||
| .withParameter(Double.class) | ||
| .withReturnType(Double.class) | ||
| .withDoubleFunction((x, w) -> x * w + 10.0) | ||
| .register(conn); | ||
| } | ||
| ``` | ||
|
|
||
| ```sql | ||
| SELECT java_weighted_sum(2.5, 4.0); | ||
| ``` | ||
|
|
||
| Behavior: | ||
|
|
||
| - `Function` and `BiFunction` callbacks receive `null` for NULL inputs; implement null handling in Java callback logic. | ||
| - `withIntFunction(...)`, `withLongFunction(...)`, and `withDoubleFunction(...)` run with null propagation enabled. | ||
| - For `withVectorizedFunction(...)`, use `ctx.propagateNulls(true)` when you want stream-level null skipping and automatic NULL output. | ||
| - For `Supplier`, returning `null` writes NULL output. | ||
| - `Function` and `BiFunction` are fixed arity only (no varargs). | ||
|
|
||
| Runtime error model: | ||
|
|
||
| - Callback-time reader/writer/context type and value failures throw `DuckDBFunctionException`. | ||
| - Invalid row/column indexes throw `IndexOutOfBoundsException`. | ||
| - `SQLException` remains for registration-time API usage and type declaration/validation. | ||
|
|
||
| ## Type declaration and mapping | ||
|
|
||
| `withParameter(...)` and `withReturnType(...)` accept: | ||
|
|
||
| - `Class<?>` | ||
| - `DuckDBColumnType` | ||
| - `DuckDBLogicalType` | ||
|
|
||
| Common class mappings include: | ||
|
|
||
| - `Integer` -> `INTEGER` | ||
| - `Long` -> `BIGINT` | ||
| - `String` -> `VARCHAR` | ||
| - `BigDecimal` -> `DECIMAL` | ||
| - `BigInteger` -> `HUGEINT` | ||
| - `LocalDate` and `java.sql.Date` -> `DATE` | ||
| - `LocalDateTime`, `java.sql.Timestamp`, and `java.util.Date` -> `TIMESTAMP` | ||
|
|
||
| Notes: | ||
|
|
||
| - `UHUGEINT` is supported through explicit `DuckDBColumnType.UHUGEINT`/`DuckDBLogicalType` declarations. | ||
| - Java class auto-mapping for `BigInteger` remains `HUGEINT`. | ||
|
|
||
| Use `DuckDBLogicalType.decimal(width, scale)` for explicit DECIMAL precision/scale. | ||
|
|
||
| ## Varargs | ||
|
|
||
| Declare varargs type with `withVarArgs(DuckDBLogicalType)`. | ||
|
|
||
| For functional varargs, use `withVarArgsFunction(Function<Object[], ?>)`: | ||
|
|
||
| ```java | ||
| try (Connection conn = DriverManager.getConnection("jdbc:duckdb:"); | ||
| DuckDBLogicalType intType = DuckDBLogicalType.of(DuckDBColumnType.INTEGER)) { | ||
| DuckDBFunctions.scalarFunction() | ||
| .withName("java_sum_varargs") | ||
| .withParameter(Integer.class) // fixed argument(s) | ||
| .withVarArgs(intType) // variadic argument type | ||
| .withReturnType(Integer.class) | ||
| .withVarArgsFunction(args -> { | ||
| int sum = 0; | ||
| for (Object arg : args) { | ||
| sum += (Integer) arg; | ||
| } | ||
| return sum; | ||
| }) | ||
| .register(conn); | ||
| } | ||
| ``` | ||
|
|
||
| ```sql | ||
| SELECT java_sum_varargs(1, 2, 3, 4); | ||
| ``` | ||
|
|
||
| Notes: | ||
|
|
||
| - `withFunction(Function)` and `withFunction(BiFunction)` reject varargs. | ||
| - `withVarArgsFunction(...)` requires `withVarArgs(...)` first. | ||
|
|
||
| ## Builder methods | ||
|
|
||
| - `withName(String)` | ||
| - `withParameter(Class<?> | DuckDBColumnType | DuckDBLogicalType)` | ||
| - `withParameters(Class<?>...)` | ||
| - `withReturnType(Class<?> | DuckDBColumnType | DuckDBLogicalType)` | ||
| - `withFunction(Supplier | Function | BiFunction)` | ||
| - `withIntFunction(IntUnaryOperator | IntBinaryOperator)` | ||
| - `withLongFunction(LongUnaryOperator | LongBinaryOperator)` | ||
| - `withDoubleFunction(DoubleUnaryOperator | DoubleBinaryOperator)` | ||
| - `withVarArgs(DuckDBLogicalType)` | ||
| - `withVarArgsFunction(Function<Object[], ?>)` | ||
| - `withVectorizedFunction(DuckDBScalarFunction)` | ||
| - `withVolatile()` | ||
| - `withSpecialHandling()` | ||
| - `register(java.sql.Connection)` | ||
|
|
||
| ## Registered Function Metadata And Registry | ||
|
|
||
| `DuckDBRegisteredFunction` exposes immutable metadata about the successful registration result: | ||
|
|
||
| - `name()` | ||
| - `functionKind()` | ||
| - `isScalar()` | ||
| - parameter and return type metadata | ||
| - callback and flags used at registration time | ||
|
|
||
| To inspect Java-side registrations: | ||
|
|
||
| ```java | ||
| List<DuckDBRegisteredFunction> functions = DuckDBDriver.registeredFunctions(); | ||
| ``` | ||
|
|
||
| The returned list is read-only. Duplicate function names may appear in the registry. | ||
|
|
||
| ## Advanced API (`DuckDBScalarFunction`) | ||
|
|
||
| Use `withVectorizedFunction(...)` for full context control through `DuckDBScalarContext`. | ||
|
|
||
| Example with multiple input types (`TIMESTAMP`, `VARCHAR`, `DOUBLE`) and `VARCHAR` output: | ||
|
|
||
| ```java | ||
| try (Connection conn = DriverManager.getConnection("jdbc:duckdb:"); | ||
| DuckDBLogicalType tsType = DuckDBLogicalType.of(DuckDBColumnType.TIMESTAMP); | ||
| DuckDBLogicalType strType = DuckDBLogicalType.of(DuckDBColumnType.VARCHAR); | ||
| DuckDBLogicalType dblType = DuckDBLogicalType.of(DuckDBColumnType.DOUBLE)) { | ||
| DuckDBFunctions.scalarFunction() | ||
| .withName("java_event_label") | ||
| .withParameter(tsType) | ||
| .withParameter(strType) | ||
| .withParameter(dblType) | ||
| .withReturnType(strType) | ||
| .withVectorizedFunction(ctx -> { | ||
| ctx.propagateNulls(true).stream().forEachOrdered(row -> { | ||
| String value = row.getLocalDateTime(0) + " | " | ||
| + row.getString(1).trim().toUpperCase() | ||
| + " | " + row.getDouble(2); | ||
| row.setString(value); | ||
| }); | ||
| }) | ||
| .register(conn); | ||
| } | ||
| ``` | ||
|
|
||
| ```sql | ||
| SELECT java_event_label(TIMESTAMP '2026-04-04 12:00:00', 'launch', 4.5); | ||
| ``` | ||
|
|
||
| Lifecycle rules: | ||
|
|
||
| - `DuckDBScalarContext`, `DuckDBScalarRow`, `DuckDBReadableVector`, and `DuckDBWritableVector` are valid only during callback execution. | ||
| - `DuckDBReadableVector` and `DuckDBWritableVector` are abstract callback runtime types (not interfaces). | ||
| - Write exactly one output value per input row for each callback invocation. | ||
| - With `propagateNulls(true)`, `DuckDBScalarContext.stream()` skips rows that contain NULL in any input column and writes NULL to the output for those rows. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.