Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 54 additions & 47 deletions UDF.MD
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
# Java Scalar Functions (UDF)

Use `DuckDBFunctions.scalarFunction()` to build and register scalar functions.
`register(java.sql.Connection)` returns a `DuckDBRegisteredFunction` with metadata about the registered scalar function.

Registered functions are also tracked in a Java-side registry exposed by `DuckDBDriver.registeredFunctions()`.
This registry is bookkeeping for functions registered through the JDBC API, not an authoritative view of the DuckDB catalog.
`DuckDBDriver.clearFunctionsRegistry()` clears only the Java-side registry and does not de-register functions from DuckDB.

## Recommended API (Functional Interfaces)

Expand All @@ -22,12 +17,12 @@ Use these overloads for simple functions:

```java
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
DuckDBRegisteredFunction function = DuckDBFunctions.scalarFunction()
.withName("java_add_one")
.withParameter(Integer.class)
.withReturnType(Integer.class)
.withIntFunction(x -> x + 1)
.register(conn);
DuckDBFunctions.scalarFunction()
.withName("java_add_one")
.withParameter(int.class)
.withReturnType(int.class)
.withIntFunction(x -> x + 1)
.register(conn);
}
```

Expand All @@ -41,9 +36,8 @@ SELECT java_add_one(41);
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
DuckDBFunctions.scalarFunction()
.withName("java_weighted_sum")
.withParameter(Double.class)
.withParameter(Double.class)
.withReturnType(Double.class)
.withParameters(double.class, double.class)
.withReturnType(double.class)
.withDoubleFunction((x, w) -> x * w + 10.0)
.register(conn);
}
Expand All @@ -53,32 +47,38 @@ try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
SELECT java_weighted_sum(2.5, 4.0);
```

Behavior:
NULL handling behavior:

- `Function` and `BiFunction` callbacks receive `null` for NULL inputs; implement null handling in Java callback logic.
- `withIntFunction(...)`, `withLongFunction(...)`, and `withDoubleFunction(...)` run with null propagation enabled.
- For `withVectorizedFunction(...)`, use `ctx.propagateNulls(true)` when you want stream-level null skipping and automatic NULL output.
- For `Supplier`, returning `null` writes NULL output.
- `Function` and `BiFunction` are fixed arity only (no varargs).
- Java functions that take object arguments (`Function` and `BiFunction` callbacks) are registered with
`duckdb_scalar_function_set_special_handling()` C API option enabled - `null` Java arguments are passed for `NULL` input values
- when `withNullInNullOut()` option is set (skips the `duckdb_scalar_function_set_special_handling()` call on function registration),
then DuckDB engine may skip the function call completely (and set `NULL` to the call result automatically),
but this does not happen in all cases (for example, when the function is applied to a relation where some rows are `NULL`),
so the function still can be passed `null` arguments and need to check them and return `null` accordingly
- Java functions that take primitive arguments (`withIntFunction(...)`, `withLongFunction(...)`, and `withDoubleFunction(...)`)
are never passed `null` arguments as an input - even when the DuckDB engine calls such function with `NULL` value -
the call will be skipped on Java side and `null` value will be returned (returning `null` writes NULL output)

Runtime error model:

- Callback-time reader/writer/context type and value failures throw `DuckDBFunctionException`.
- Callback-time reader/writer/context type and value failures throw `DuckDBFunctions.CallException extends RuntimeException`.
- Invalid row/column indexes throw `IndexOutOfBoundsException`.
- `SQLException` remains for registration-time API usage and type declaration/validation.
- `SQLException` is used for registration-time API usage and type declaration/validation.

## Type declaration and mapping

`withParameter(...)` and `withReturnType(...)` accept:

- `Class<?>`
- `Class<?>` (Object or primitive class, like `int.class` or `Integer.TYPE`)
- `DuckDBColumnType`
- `DuckDBLogicalType`

Common class mappings include:

- `Integer` -> `INTEGER`
- `Long` -> `BIGINT`
- `int` -> `INTEGER`
- `long` -> `BIGINT`
- `float` -> `FLOAT`
- `double` -> `DOUBLE`
- `String` -> `VARCHAR`
- `BigDecimal` -> `DECIMAL`
- `BigInteger` -> `HUGEINT`
Expand Down Expand Up @@ -140,12 +140,16 @@ Notes:
- `withVarArgsFunction(Function<Object[], ?>)`
- `withVectorizedFunction(DuckDBScalarFunction)`
- `withVolatile()`
- `withSpecialHandling()`
- `withNullInNullOut()`
- `register(java.sql.Connection)`

## Registered Function Metadata And Registry

`DuckDBRegisteredFunction` exposes immutable metadata about the successful registration result:
Registered functions, returned from `.register()` call, are additionally tracked in a Java-side registry exposed by `DuckDBDriver.registeredFunctions()`.
This registry provides bookkeeping for functions registered through the JDBC API, not an authoritative view of the DuckDB catalog.
`DuckDBDriver.clearFunctionsRegistry()` clears only the Java-side registry and does not de-register functions from DuckDB.

`DuckDBFunctions.RegisteredFunction` exposes immutable metadata about the successful registration result:

- `name()`
- `functionKind()`
Expand All @@ -156,47 +160,50 @@ Notes:
To inspect Java-side registrations:

```java
List<DuckDBRegisteredFunction> functions = DuckDBDriver.registeredFunctions();
List<DuckDBFunctions.RegisteredFunction> functions = DuckDBDriver.registeredFunctions();
```

The returned list is read-only. Duplicate function names may appear in the registry.

## Advanced API (`DuckDBScalarFunction`)

Use `withVectorizedFunction(...)` for full context control through `DuckDBScalarContext`.
Use `withVectorizedFunction(...)` to access multiple input rows (when the scalar function is applied
to a row set from a relation) in a single call. DuckDB engine splits the input row set into "data chunks"
(represented in Java as `DuckDBDataChunkReader`) each chunk containing up to 2048 rows.

`DuckDBScalarFunction` callback receives the `DuckDBDataChunkReader` as an `input` argument.
Data chunk contains a number of "data vectors" (`DuckDBReadableVector`) - single vector for each input column.
Vectors can be accessed using `input.vector(columnIndex)`. It writes the results into the `output` `DuckDBWritableVector`.
Results in the `output` vector must be set **on the same `row` indices** that are used to read `input`.

`input.stream()` call returns a `LongStream` of `row` indices that can be used with Java Streams API.

- `DuckDBDataChunkReader`, `DuckDBReadableVector`, and `DuckDBWritableVector` are valid only during callback execution.

Example with multiple input types (`TIMESTAMP`, `VARCHAR`, `DOUBLE`) and `VARCHAR` output:

```java
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:");
```
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:"); Statement stmt = conn.createStatement();
DuckDBLogicalType tsType = DuckDBLogicalType.of(DuckDBColumnType.TIMESTAMP);
DuckDBLogicalType strType = DuckDBLogicalType.of(DuckDBColumnType.VARCHAR);
DuckDBLogicalType dblType = DuckDBLogicalType.of(DuckDBColumnType.DOUBLE)) {
DuckDBFunctions.scalarFunction()
.withName("java_event_label")
.withParameter(tsType)
.withParameter(strType)
.withParameter(dblType)
.withParameters(tsType, strType, dblType)
.withReturnType(strType)
.withVectorizedFunction(ctx -> {
ctx.propagateNulls(true).stream().forEachOrdered(row -> {
String value = row.getLocalDateTime(0) + " | "
+ row.getString(1).trim().toUpperCase()
+ " | " + row.getDouble(2);
row.setString(value);
.withVectorizedFunction((input, output) -> {
input.stream().forEach(row -> {
String value = input.vector(0).getLocalDateTime(row) + " | " +
String.valueOf(input.vector(1).getString(row)).trim().toUpperCase() + " | " +
input.vector(2).getDouble(row, 0.0d);
output.setString(row, value);
});
})
.register(conn);
...
}
```

```sql
SELECT java_event_label(TIMESTAMP '2026-04-04 12:00:00', 'launch', 4.5);
```

Lifecycle rules:

- `DuckDBScalarContext`, `DuckDBScalarRow`, `DuckDBReadableVector`, and `DuckDBWritableVector` are valid only during callback execution.
- `DuckDBReadableVector` and `DuckDBWritableVector` are abstract callback runtime types (not interfaces).
- Write exactly one output value per input row for each callback invocation.
- With `propagateNulls(true)`, `DuckDBScalarContext.stream()` skips rows that contain NULL in any input column and writes NULL to the output for those rows.
22 changes: 14 additions & 8 deletions src/main/java/org/duckdb/DuckDBDataChunkReader.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import static org.duckdb.DuckDBBindings.*;

import java.nio.ByteBuffer;
import java.util.stream.LongStream;
import org.duckdb.DuckDBFunctions.FunctionException;

/**
* Reader over callback input data chunks.
Expand All @@ -17,12 +19,18 @@ public final class DuckDBDataChunkReader {

DuckDBDataChunkReader(ByteBuffer chunkRef) {
if (chunkRef == null) {
throw new DuckDBFunctionException("Invalid data chunk reference");
throw new FunctionException("Invalid data chunk reference");
}
this.chunkRef = chunkRef;
this.rowCount = duckdb_data_chunk_get_size(chunkRef);
this.columnCount = duckdb_data_chunk_get_column_count(chunkRef);
this.vectors = new DuckDBReadableVector[Math.toIntExact(columnCount)];

for (long columnIndex = 0; columnIndex < columnCount; columnIndex++) {
ByteBuffer vectorRef = duckdb_data_chunk_get_vector(chunkRef, columnIndex);
int arrayIndex = Math.toIntExact(columnIndex);
vectors[arrayIndex] = new DuckDBReadableVector(vectorRef, rowCount);
}
}

public long rowCount() {
Expand All @@ -33,17 +41,15 @@ public long columnCount() {
return columnCount;
}

public LongStream stream() {
return LongStream.range(0, rowCount);
}

public DuckDBReadableVector vector(long columnIndex) {
if (columnIndex < 0 || columnIndex >= columnCount) {
throw new IndexOutOfBoundsException("Column index out of bounds: " + columnIndex);
}
int arrayIndex = Math.toIntExact(columnIndex);
DuckDBReadableVector vector = vectors[arrayIndex];
if (vector == null) {
ByteBuffer vectorRef = duckdb_data_chunk_get_vector(chunkRef, columnIndex);
vector = new DuckDBReadableVectorImpl(vectorRef, rowCount);
vectors[arrayIndex] = vector;
}
return vector;
return vectors[arrayIndex];
}
}
7 changes: 4 additions & 3 deletions src/main/java/org/duckdb/DuckDBDriver.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.locks.ReentrantLock;
import java.util.logging.Logger;
import org.duckdb.DuckDBFunctions.RegisteredFunction;
import org.duckdb.io.LimitedInputStream;

public class DuckDBDriver implements java.sql.Driver {
Expand All @@ -42,7 +43,7 @@ public class DuckDBDriver implements java.sql.Driver {
private static boolean pinnedDbRefsShutdownHookRegistered = false;
private static boolean pinnedDbRefsShutdownHookRun = false;

private static final ArrayList<DuckDBRegisteredFunction> functionsRegistry = new ArrayList<>();
private static final ArrayList<RegisteredFunction> functionsRegistry = new ArrayList<>();
private static final ReentrantLock functionsRegistryLock = new ReentrantLock();

private static final Set<String> supportedOptions = new LinkedHashSet<>();
Expand Down Expand Up @@ -266,7 +267,7 @@ public static boolean shutdownQueryCancelScheduler() {
return true;
}

public static List<DuckDBRegisteredFunction> registeredFunctions() {
public static List<RegisteredFunction> registeredFunctions() {
functionsRegistryLock.lock();
try {
return Collections.unmodifiableList(new ArrayList<>(functionsRegistry));
Expand All @@ -284,7 +285,7 @@ public static void clearFunctionsRegistry() {
}
}

static void registerFunction(DuckDBRegisteredFunction function) {
static void registerFunction(RegisteredFunction function) {
functionsRegistryLock.lock();
try {
functionsRegistry.add(function);
Expand Down
13 changes: 0 additions & 13 deletions src/main/java/org/duckdb/DuckDBFunctionException.java

This file was deleted.

Loading
Loading