Skip to content

Commit be93d43

Browse files
committed
Scalar functions post-integration changes
This is a follow-up to PR duckdb#630. It makes the following changes to newly added Scalar Functions Java API: - moves exception and registered shell classes into the `DuckDBFunction.java` - removes abstract classes for vector reader and writer - removes `DuckDBScalarContext` and `DuckDBScalarRow` in favour of streaming plain indices (as a `LongStream`) from the input data chunk; the row object inteface appeared to have an unintended overhead of creating a Java object for every input row that we would like to avoid. And without it the context abstraction appeared to be unnecessary Null-propagation handling is changed the following way: - null propagation on Java side is enabled only for primitive callbacks (set automatically) and not exposed to the user, null propagation support for object callbacks is removed - null propagation on DuckDB engine side ( `duckdb_scalar_function_set_special_handling` C API call skip) is also enabled automatically only for primitive callbacks, but it is additionally exposed to users as `withNullInNullOut()` builder call (replaces awkwardly named `withSpecialHandling()`); in some cases NULLs still can be passed to callbacks wheh `withNullInNullOut()` is set so callback still must check for nulls Testing: more tests added aroung the null handling
1 parent f97500d commit be93d43

18 files changed

+2231
-2781
lines changed

UDF.MD

Lines changed: 54 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,6 @@
11
# Java Scalar Functions (UDF)
22

33
Use `DuckDBFunctions.scalarFunction()` to build and register scalar functions.
4-
`register(java.sql.Connection)` returns a `DuckDBRegisteredFunction` with metadata about the registered scalar function.
5-
6-
Registered functions are also tracked in a Java-side registry exposed by `DuckDBDriver.registeredFunctions()`.
7-
This registry is bookkeeping for functions registered through the JDBC API, not an authoritative view of the DuckDB catalog.
8-
`DuckDBDriver.clearFunctionsRegistry()` clears only the Java-side registry and does not de-register functions from DuckDB.
94

105
## Recommended API (Functional Interfaces)
116

@@ -22,12 +17,12 @@ Use these overloads for simple functions:
2217

2318
```java
2419
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
25-
DuckDBRegisteredFunction function = DuckDBFunctions.scalarFunction()
26-
.withName("java_add_one")
27-
.withParameter(Integer.class)
28-
.withReturnType(Integer.class)
29-
.withIntFunction(x -> x + 1)
30-
.register(conn);
20+
DuckDBFunctions.scalarFunction()
21+
.withName("java_add_one")
22+
.withParameter(int.class)
23+
.withReturnType(int.class)
24+
.withIntFunction(x -> x + 1)
25+
.register(conn);
3126
}
3227
```
3328

@@ -41,9 +36,8 @@ SELECT java_add_one(41);
4136
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
4237
DuckDBFunctions.scalarFunction()
4338
.withName("java_weighted_sum")
44-
.withParameter(Double.class)
45-
.withParameter(Double.class)
46-
.withReturnType(Double.class)
39+
.withParameters(double.class, double.class)
40+
.withReturnType(double.class)
4741
.withDoubleFunction((x, w) -> x * w + 10.0)
4842
.register(conn);
4943
}
@@ -53,32 +47,38 @@ try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
5347
SELECT java_weighted_sum(2.5, 4.0);
5448
```
5549

56-
Behavior:
50+
NULL handling behavior:
5751

58-
- `Function` and `BiFunction` callbacks receive `null` for NULL inputs; implement null handling in Java callback logic.
59-
- `withIntFunction(...)`, `withLongFunction(...)`, and `withDoubleFunction(...)` run with null propagation enabled.
60-
- For `withVectorizedFunction(...)`, use `ctx.propagateNulls(true)` when you want stream-level null skipping and automatic NULL output.
61-
- For `Supplier`, returning `null` writes NULL output.
62-
- `Function` and `BiFunction` are fixed arity only (no varargs).
52+
- Java functions that take object arguments (`Function` and `BiFunction` callbacks) are registered with
53+
`duckdb_scalar_function_set_special_handling()` C API option enabled - `null` Java arguments are passed for `NULL` input values
54+
- when `withNullInNullOut()` option is set (skips the `duckdb_scalar_function_set_special_handling()` call on function registration),
55+
then DuckDB engine may skip the function call completely (and set `NULL` to the call result automatically),
56+
but this does not happen in all cases (for example, when the function is applied to a relation where some rows are `NULL`),
57+
so the function still can be passed `null` arguments and need to check them and return `null` accordingly
58+
- Java functions that take primitive arguments (`withIntFunction(...)`, `withLongFunction(...)`, and `withDoubleFunction(...)`)
59+
are never passed `null` arguments as an input - even when the DuckDB engine calls such function with `NULL` value -
60+
the call will be skipped on Java side and `null` value will be returned (returning `null` writes NULL output)
6361

6462
Runtime error model:
6563

66-
- Callback-time reader/writer/context type and value failures throw `DuckDBFunctionException`.
64+
- Callback-time reader/writer/context type and value failures throw `DuckDBFunctions.CallException extends RuntimeException`.
6765
- Invalid row/column indexes throw `IndexOutOfBoundsException`.
68-
- `SQLException` remains for registration-time API usage and type declaration/validation.
66+
- `SQLException` is used for registration-time API usage and type declaration/validation.
6967

7068
## Type declaration and mapping
7169

7270
`withParameter(...)` and `withReturnType(...)` accept:
7371

74-
- `Class<?>`
72+
- `Class<?>` (Object or primitive class, like `int.class` or `Integer.TYPE`)
7573
- `DuckDBColumnType`
7674
- `DuckDBLogicalType`
7775

7876
Common class mappings include:
7977

80-
- `Integer` -> `INTEGER`
81-
- `Long` -> `BIGINT`
78+
- `int` -> `INTEGER`
79+
- `long` -> `BIGINT`
80+
- `float` -> `FLOAT`
81+
- `double` -> `DOUBLE`
8282
- `String` -> `VARCHAR`
8383
- `BigDecimal` -> `DECIMAL`
8484
- `BigInteger` -> `HUGEINT`
@@ -140,12 +140,16 @@ Notes:
140140
- `withVarArgsFunction(Function<Object[], ?>)`
141141
- `withVectorizedFunction(DuckDBScalarFunction)`
142142
- `withVolatile()`
143-
- `withSpecialHandling()`
143+
- `withNullInNullOut()`
144144
- `register(java.sql.Connection)`
145145

146146
## Registered Function Metadata And Registry
147147

148-
`DuckDBRegisteredFunction` exposes immutable metadata about the successful registration result:
148+
Registered functions, returned from `.register()` call, are additionally tracked in a Java-side registry exposed by `DuckDBDriver.registeredFunctions()`.
149+
This registry provides bookkeeping for functions registered through the JDBC API, not an authoritative view of the DuckDB catalog.
150+
`DuckDBDriver.clearFunctionsRegistry()` clears only the Java-side registry and does not de-register functions from DuckDB.
151+
152+
`DuckDBFunctions.RegisteredFunction` exposes immutable metadata about the successful registration result:
149153

150154
- `name()`
151155
- `functionKind()`
@@ -156,47 +160,50 @@ Notes:
156160
To inspect Java-side registrations:
157161

158162
```java
159-
List<DuckDBRegisteredFunction> functions = DuckDBDriver.registeredFunctions();
163+
List<DuckDBFunctions.RegisteredFunction> functions = DuckDBDriver.registeredFunctions();
160164
```
161165

162166
The returned list is read-only. Duplicate function names may appear in the registry.
163167

164168
## Advanced API (`DuckDBScalarFunction`)
165169

166-
Use `withVectorizedFunction(...)` for full context control through `DuckDBScalarContext`.
170+
Use `withVectorizedFunction(...)` to access multiple input rows (when the scalar function is applied
171+
to a row set from a relation) in a single call. DuckDB engine splits the input row set into "data chunks"
172+
(represented in Java as `DuckDBDataChunkReader`) each chunk containing up to 2048 rows.
173+
174+
`DuckDBScalarFunction` callback receives the `DuckDBDataChunkReader` as an `input` argument.
175+
Data chunk contains a number of "data vectors" (`DuckDBReadableVector`) - single vector for each input column.
176+
Vectors can be accessed using `input.vector(columnIndex)`. It writes the results into the `output` `DuckDBWritableVector`.
177+
Results in the `output` vector must be set **on the same `row` indices** that are used to read `input`.
178+
179+
`input.stream()` call returns a `LongStream` of `row` indices that can be used with Java Streams API.
180+
181+
- `DuckDBDataChunkReader`, `DuckDBReadableVector`, and `DuckDBWritableVector` are valid only during callback execution.
167182

168183
Example with multiple input types (`TIMESTAMP`, `VARCHAR`, `DOUBLE`) and `VARCHAR` output:
169184

170-
```java
171-
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:");
185+
```
186+
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:"); Statement stmt = conn.createStatement();
172187
DuckDBLogicalType tsType = DuckDBLogicalType.of(DuckDBColumnType.TIMESTAMP);
173188
DuckDBLogicalType strType = DuckDBLogicalType.of(DuckDBColumnType.VARCHAR);
174189
DuckDBLogicalType dblType = DuckDBLogicalType.of(DuckDBColumnType.DOUBLE)) {
175190
DuckDBFunctions.scalarFunction()
176191
.withName("java_event_label")
177-
.withParameter(tsType)
178-
.withParameter(strType)
179-
.withParameter(dblType)
192+
.withParameters(tsType, strType, dblType)
180193
.withReturnType(strType)
181-
.withVectorizedFunction(ctx -> {
182-
ctx.propagateNulls(true).stream().forEachOrdered(row -> {
183-
String value = row.getLocalDateTime(0) + " | "
184-
+ row.getString(1).trim().toUpperCase()
185-
+ " | " + row.getDouble(2);
186-
row.setString(value);
194+
.withVectorizedFunction((input, output) -> {
195+
input.stream().forEach(row -> {
196+
String value = input.vector(0).getLocalDateTime(row) + " | " +
197+
String.valueOf(input.vector(1).getString(row)).trim().toUpperCase() + " | " +
198+
input.vector(2).getDouble(row, 0.0d);
199+
output.setString(row, value);
187200
});
188201
})
189202
.register(conn);
203+
...
190204
}
191205
```
192206

193207
```sql
194208
SELECT java_event_label(TIMESTAMP '2026-04-04 12:00:00', 'launch', 4.5);
195209
```
196-
197-
Lifecycle rules:
198-
199-
- `DuckDBScalarContext`, `DuckDBScalarRow`, `DuckDBReadableVector`, and `DuckDBWritableVector` are valid only during callback execution.
200-
- `DuckDBReadableVector` and `DuckDBWritableVector` are abstract callback runtime types (not interfaces).
201-
- Write exactly one output value per input row for each callback invocation.
202-
- With `propagateNulls(true)`, `DuckDBScalarContext.stream()` skips rows that contain NULL in any input column and writes NULL to the output for those rows.

src/main/java/org/duckdb/DuckDBDataChunkReader.java

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
import static org.duckdb.DuckDBBindings.*;
44

55
import java.nio.ByteBuffer;
6+
import java.util.stream.LongStream;
7+
import org.duckdb.DuckDBFunctions.FunctionException;
68

79
/**
810
* Reader over callback input data chunks.
@@ -17,7 +19,7 @@ public final class DuckDBDataChunkReader {
1719

1820
DuckDBDataChunkReader(ByteBuffer chunkRef) {
1921
if (chunkRef == null) {
20-
throw new DuckDBFunctionException("Invalid data chunk reference");
22+
throw new FunctionException("Invalid data chunk reference");
2123
}
2224
this.chunkRef = chunkRef;
2325
this.rowCount = duckdb_data_chunk_get_size(chunkRef);
@@ -33,6 +35,10 @@ public long columnCount() {
3335
return columnCount;
3436
}
3537

38+
public LongStream stream() {
39+
return LongStream.range(0, rowCount);
40+
}
41+
3642
public DuckDBReadableVector vector(long columnIndex) {
3743
if (columnIndex < 0 || columnIndex >= columnCount) {
3844
throw new IndexOutOfBoundsException("Column index out of bounds: " + columnIndex);
@@ -41,7 +47,7 @@ public DuckDBReadableVector vector(long columnIndex) {
4147
DuckDBReadableVector vector = vectors[arrayIndex];
4248
if (vector == null) {
4349
ByteBuffer vectorRef = duckdb_data_chunk_get_vector(chunkRef, columnIndex);
44-
vector = new DuckDBReadableVectorImpl(vectorRef, rowCount);
50+
vector = new DuckDBReadableVector(vectorRef, rowCount);
4551
vectors[arrayIndex] = vector;
4652
}
4753
return vector;

src/main/java/org/duckdb/DuckDBDriver.java

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
import java.util.concurrent.ThreadFactory;
1717
import java.util.concurrent.locks.ReentrantLock;
1818
import java.util.logging.Logger;
19+
import org.duckdb.DuckDBFunctions.RegisteredFunction;
1920
import org.duckdb.io.LimitedInputStream;
2021

2122
public class DuckDBDriver implements java.sql.Driver {
@@ -42,7 +43,7 @@ public class DuckDBDriver implements java.sql.Driver {
4243
private static boolean pinnedDbRefsShutdownHookRegistered = false;
4344
private static boolean pinnedDbRefsShutdownHookRun = false;
4445

45-
private static final ArrayList<DuckDBRegisteredFunction> functionsRegistry = new ArrayList<>();
46+
private static final ArrayList<RegisteredFunction> functionsRegistry = new ArrayList<>();
4647
private static final ReentrantLock functionsRegistryLock = new ReentrantLock();
4748

4849
private static final Set<String> supportedOptions = new LinkedHashSet<>();
@@ -266,7 +267,7 @@ public static boolean shutdownQueryCancelScheduler() {
266267
return true;
267268
}
268269

269-
public static List<DuckDBRegisteredFunction> registeredFunctions() {
270+
public static List<RegisteredFunction> registeredFunctions() {
270271
functionsRegistryLock.lock();
271272
try {
272273
return Collections.unmodifiableList(new ArrayList<>(functionsRegistry));
@@ -284,7 +285,7 @@ public static void clearFunctionsRegistry() {
284285
}
285286
}
286287

287-
static void registerFunction(DuckDBRegisteredFunction function) {
288+
static void registerFunction(RegisteredFunction function) {
288289
functionsRegistryLock.lock();
289290
try {
290291
functionsRegistry.add(function);

src/main/java/org/duckdb/DuckDBFunctionException.java

Lines changed: 0 additions & 13 deletions
This file was deleted.
Lines changed: 108 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,121 @@
11
package org.duckdb;
22

33
import java.sql.SQLException;
4+
import java.util.ArrayList;
5+
import java.util.Collections;
6+
import java.util.List;
47

58
public final class DuckDBFunctions {
6-
public enum DuckDBFunctionKind { SCALAR }
9+
public enum Kind { SCALAR }
710

811
private DuckDBFunctions() {
912
}
1013

1114
public static DuckDBScalarFunctionBuilder scalarFunction() throws SQLException {
1215
return new DuckDBScalarFunctionBuilder();
1316
}
17+
18+
static RegisteredFunction createRegisteredFunction(String name, List<DuckDBLogicalType> parameterTypes,
19+
List<DuckDBColumnType> parameterColumnTypes,
20+
DuckDBLogicalType returnType, DuckDBColumnType returnColumnType,
21+
DuckDBScalarFunction function, DuckDBLogicalType varArgType,
22+
boolean volatileFlag, boolean specialHandlingFlag,
23+
boolean propagateNullsFlag) {
24+
return new RegisteredFunction(name, Kind.SCALAR, Collections.unmodifiableList(new ArrayList<>(parameterTypes)),
25+
Collections.unmodifiableList(new ArrayList<>(parameterColumnTypes)), returnType,
26+
returnColumnType, function, varArgType, volatileFlag, specialHandlingFlag,
27+
propagateNullsFlag);
28+
}
29+
30+
public static class FunctionException extends RuntimeException {
31+
private static final long serialVersionUID = 1L;
32+
33+
public FunctionException(String message) {
34+
super(message);
35+
}
36+
37+
public FunctionException(String message, Throwable cause) {
38+
super(message, cause);
39+
}
40+
}
41+
42+
public static final class RegisteredFunction {
43+
private final String name;
44+
private final Kind functionKind;
45+
private final List<DuckDBLogicalType> parameterTypes;
46+
private final List<DuckDBColumnType> parameterColumnTypes;
47+
private final DuckDBLogicalType returnType;
48+
private final DuckDBColumnType returnColumnType;
49+
private final DuckDBScalarFunction function;
50+
private final DuckDBLogicalType varArgType;
51+
private final boolean volatileFlag;
52+
private final boolean nullInNullOutFlag;
53+
private final boolean propagateNullsFlag;
54+
55+
private RegisteredFunction(String name, Kind functionKind, List<DuckDBLogicalType> parameterTypes,
56+
List<DuckDBColumnType> parameterColumnTypes, DuckDBLogicalType returnType,
57+
DuckDBColumnType returnColumnType, DuckDBScalarFunction function,
58+
DuckDBLogicalType varArgType, boolean volatileFlag, boolean nullInNullOutFlag,
59+
boolean propagateNullsFlag) {
60+
this.name = name;
61+
this.functionKind = functionKind;
62+
this.parameterTypes = parameterTypes;
63+
this.parameterColumnTypes = parameterColumnTypes;
64+
this.returnType = returnType;
65+
this.returnColumnType = returnColumnType;
66+
this.function = function;
67+
this.varArgType = varArgType;
68+
this.volatileFlag = volatileFlag;
69+
this.nullInNullOutFlag = nullInNullOutFlag;
70+
this.propagateNullsFlag = propagateNullsFlag;
71+
}
72+
73+
public String name() {
74+
return name;
75+
}
76+
77+
public Kind functionKind() {
78+
return functionKind;
79+
}
80+
81+
public List<DuckDBLogicalType> parameterTypes() {
82+
return parameterTypes;
83+
}
84+
85+
public List<DuckDBColumnType> parameterColumnTypes() {
86+
return parameterColumnTypes;
87+
}
88+
89+
public DuckDBLogicalType returnType() {
90+
return returnType;
91+
}
92+
93+
public DuckDBColumnType returnColumnType() {
94+
return returnColumnType;
95+
}
96+
97+
public DuckDBScalarFunction function() {
98+
return function;
99+
}
100+
101+
public DuckDBLogicalType varArgType() {
102+
return varArgType;
103+
}
104+
105+
public boolean isVolatile() {
106+
return volatileFlag;
107+
}
108+
109+
public boolean isNullInNullOut() {
110+
return nullInNullOutFlag;
111+
}
112+
113+
public boolean propagateNulls() {
114+
return propagateNullsFlag;
115+
}
116+
117+
public boolean isScalar() {
118+
return functionKind == Kind.SCALAR;
119+
}
120+
}
14121
}

0 commit comments

Comments
 (0)