Skip to content

Commit 4793340

Browse files
committed
Scalar functions post-integration changes
This is a follow-up to PR duckdb#630. It makes the following changes to newly added Scalar Functions Java API: - moves exception and registered shell classes into the `DuckDBFunction.java` - removes abstract classes for vector reader and writer - renames `DuckDBScalarContext` into `DuckDBScalarFunctionCallData` - removes `DuckDBScalarRow` in favour of streaming plain indices of the input vector rows (as a `LongStream`); the row object inteface appeared to have an unintended overhead of creating a Java object for every input row that we would like to avoid.
1 parent f97500d commit 4793340

18 files changed

Lines changed: 1718 additions & 2375 deletions

UDF.MD

Lines changed: 35 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,6 @@
11
# Java Scalar Functions (UDF)
22

33
Use `DuckDBFunctions.scalarFunction()` to build and register scalar functions.
4-
`register(java.sql.Connection)` returns a `DuckDBRegisteredFunction` with metadata about the registered scalar function.
5-
6-
Registered functions are also tracked in a Java-side registry exposed by `DuckDBDriver.registeredFunctions()`.
7-
This registry is bookkeeping for functions registered through the JDBC API, not an authoritative view of the DuckDB catalog.
8-
`DuckDBDriver.clearFunctionsRegistry()` clears only the Java-side registry and does not de-register functions from DuckDB.
94

105
## Recommended API (Functional Interfaces)
116

@@ -22,12 +17,12 @@ Use these overloads for simple functions:
2217

2318
```java
2419
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
25-
DuckDBRegisteredFunction function = DuckDBFunctions.scalarFunction()
26-
.withName("java_add_one")
27-
.withParameter(Integer.class)
28-
.withReturnType(Integer.class)
29-
.withIntFunction(x -> x + 1)
30-
.register(conn);
20+
DuckDBFunctions.scalarFunction()
21+
.withName("java_add_one")
22+
.withParameter(Integer.class)
23+
.withReturnType(Integer.class)
24+
.withIntFunction(x -> x + 1)
25+
.register(conn);
3126
}
3227
```
3328

@@ -41,8 +36,7 @@ SELECT java_add_one(41);
4136
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:")) {
4237
DuckDBFunctions.scalarFunction()
4338
.withName("java_weighted_sum")
44-
.withParameter(Double.class)
45-
.withParameter(Double.class)
39+
.withParameters(Double.class, Double.class)
4640
.withReturnType(Double.class)
4741
.withDoubleFunction((x, w) -> x * w + 10.0)
4842
.register(conn);
@@ -55,15 +49,16 @@ SELECT java_weighted_sum(2.5, 4.0);
5549

5650
Behavior:
5751

58-
- `Function` and `BiFunction` callbacks receive `null` for NULL inputs; implement null handling in Java callback logic.
52+
- When `.withSpecialHandling()` is set (results in `duckdb_scalar_function_set_special_handling()` C API call),
53+
then `Function` and `BiFunction` callbacks receive `null` for NULL inputs; null handling needs to be implemented in Java callback logic.
5954
- `withIntFunction(...)`, `withLongFunction(...)`, and `withDoubleFunction(...)` run with null propagation enabled.
60-
- For `withVectorizedFunction(...)`, use `ctx.propagateNulls(true)` when you want stream-level null skipping and automatic NULL output.
55+
- For `withVectorizedFunction(...)`, use `data.propagateNulls(true)` when you want stream-level null skipping and automatic NULL output when **any** of input arguments is NULL.
6156
- For `Supplier`, returning `null` writes NULL output.
6257
- `Function` and `BiFunction` are fixed arity only (no varargs).
6358

6459
Runtime error model:
6560

66-
- Callback-time reader/writer/context type and value failures throw `DuckDBFunctionException`.
61+
- Callback-time reader/writer/context type and value failures throw `DuckDBFunctions.CallException`.
6762
- Invalid row/column indexes throw `IndexOutOfBoundsException`.
6863
- `SQLException` remains for registration-time API usage and type declaration/validation.
6964

@@ -145,7 +140,11 @@ Notes:
145140

146141
## Registered Function Metadata And Registry
147142

148-
`DuckDBRegisteredFunction` exposes immutable metadata about the successful registration result:
143+
Registered functions, returned from `.register()` call, are additionally tracked in a Java-side registry exposed by `DuckDBDriver.registeredFunctions()`.
144+
This registry provides bookkeeping for functions registered through the JDBC API, not an authoritative view of the DuckDB catalog.
145+
`DuckDBDriver.clearFunctionsRegistry()` clears only the Java-side registry and does not de-register functions from DuckDB.
146+
147+
`DuckDBFunctions.RegisteredFunction` exposes immutable metadata about the successful registration result:
149148

150149
- `name()`
151150
- `functionKind()`
@@ -156,14 +155,21 @@ Notes:
156155
To inspect Java-side registrations:
157156

158157
```java
159-
List<DuckDBRegisteredFunction> functions = DuckDBDriver.registeredFunctions();
158+
List<DuckDBFunctions.RegisteredFunction> functions = DuckDBDriver.registeredFunctions();
160159
```
161160

162161
The returned list is read-only. Duplicate function names may appear in the registry.
163162

164163
## Advanced API (`DuckDBScalarFunction`)
165164

166-
Use `withVectorizedFunction(...)` for full context control through `DuckDBScalarContext`.
165+
Use `withVectorizedFunction(...)` for full context control through `DuckDBScalarFunctionCallData`
166+
that allows to access input data vectors (single vector for each function argument), and
167+
the output vector.
168+
169+
`data.stream()` call returns a `LongStream` of the indices of the input rows.
170+
Input vectors that can be accessed using `data.input(columnIndex) -> DuckDBReadableVector` method.
171+
The result of the function invocation for each input `rowIndex` must be set on the
172+
`data.output() -> DuckDBWritableVector` vector using the same `rowIndex`.
167173

168174
Example with multiple input types (`TIMESTAMP`, `VARCHAR`, `DOUBLE`) and `VARCHAR` output:
169175

@@ -174,16 +180,14 @@ try (Connection conn = DriverManager.getConnection("jdbc:duckdb:");
174180
DuckDBLogicalType dblType = DuckDBLogicalType.of(DuckDBColumnType.DOUBLE)) {
175181
DuckDBFunctions.scalarFunction()
176182
.withName("java_event_label")
177-
.withParameter(tsType)
178-
.withParameter(strType)
179-
.withParameter(dblType)
183+
.withParameters(tsType, strType, dblType)
180184
.withReturnType(strType)
181-
.withVectorizedFunction(ctx -> {
182-
ctx.propagateNulls(true).stream().forEachOrdered(row -> {
183-
String value = row.getLocalDateTime(0) + " | "
184-
+ row.getString(1).trim().toUpperCase()
185-
+ " | " + row.getDouble(2);
186-
row.setString(value);
185+
.withVectorizedFunction(data -> {
186+
data.propagateNulls(true).stream().forEach(rowIndex -> {
187+
String value = data.input(0).getLocalDateTime(rowIndex) + " | "
188+
+ data.input(1).getString(rowIndex).trim().toUpperCase()
189+
+ " | " + data.input(2).getDouble(rowIndex);
190+
data.output().setString(rowIndex, value);
187191
});
188192
})
189193
.register(conn);
@@ -196,7 +200,6 @@ SELECT java_event_label(TIMESTAMP '2026-04-04 12:00:00', 'launch', 4.5);
196200

197201
Lifecycle rules:
198202

199-
- `DuckDBScalarContext`, `DuckDBScalarRow`, `DuckDBReadableVector`, and `DuckDBWritableVector` are valid only during callback execution.
200-
- `DuckDBReadableVector` and `DuckDBWritableVector` are abstract callback runtime types (not interfaces).
201-
- Write exactly one output value per input row for each callback invocation.
202-
- With `propagateNulls(true)`, `DuckDBScalarContext.stream()` skips rows that contain NULL in any input column and writes NULL to the output for those rows.
203+
- `DuckDBScalarFunctionCallData`, `DuckDBReadableVector`, and `DuckDBWritableVector` are valid only during callback execution.
204+
- Write exactly one output value per input row for each callback invocation *on the same `rowIndex`*.
205+
- With `propagateNulls(true)`, `DuckDBScalarFunctionCallData.stream()` skips rows that contain NULL in **any** input column and writes NULL to the output for those rows.

src/main/java/org/duckdb/DuckDBDataChunkReader.java

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import static org.duckdb.DuckDBBindings.*;
44

55
import java.nio.ByteBuffer;
6+
import org.duckdb.DuckDBFunctions.FunctionException;
67

78
/**
89
* Reader over callback input data chunks.
@@ -17,7 +18,7 @@ public final class DuckDBDataChunkReader {
1718

1819
DuckDBDataChunkReader(ByteBuffer chunkRef) {
1920
if (chunkRef == null) {
20-
throw new DuckDBFunctionException("Invalid data chunk reference");
21+
throw new FunctionException("Invalid data chunk reference");
2122
}
2223
this.chunkRef = chunkRef;
2324
this.rowCount = duckdb_data_chunk_get_size(chunkRef);
@@ -41,7 +42,7 @@ public DuckDBReadableVector vector(long columnIndex) {
4142
DuckDBReadableVector vector = vectors[arrayIndex];
4243
if (vector == null) {
4344
ByteBuffer vectorRef = duckdb_data_chunk_get_vector(chunkRef, columnIndex);
44-
vector = new DuckDBReadableVectorImpl(vectorRef, rowCount);
45+
vector = new DuckDBReadableVector(vectorRef, rowCount);
4546
vectors[arrayIndex] = vector;
4647
}
4748
return vector;

src/main/java/org/duckdb/DuckDBDriver.java

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
import java.util.concurrent.ThreadFactory;
1717
import java.util.concurrent.locks.ReentrantLock;
1818
import java.util.logging.Logger;
19+
import org.duckdb.DuckDBFunctions.RegisteredFunction;
1920
import org.duckdb.io.LimitedInputStream;
2021

2122
public class DuckDBDriver implements java.sql.Driver {
@@ -42,7 +43,7 @@ public class DuckDBDriver implements java.sql.Driver {
4243
private static boolean pinnedDbRefsShutdownHookRegistered = false;
4344
private static boolean pinnedDbRefsShutdownHookRun = false;
4445

45-
private static final ArrayList<DuckDBRegisteredFunction> functionsRegistry = new ArrayList<>();
46+
private static final ArrayList<RegisteredFunction> functionsRegistry = new ArrayList<>();
4647
private static final ReentrantLock functionsRegistryLock = new ReentrantLock();
4748

4849
private static final Set<String> supportedOptions = new LinkedHashSet<>();
@@ -266,7 +267,7 @@ public static boolean shutdownQueryCancelScheduler() {
266267
return true;
267268
}
268269

269-
public static List<DuckDBRegisteredFunction> registeredFunctions() {
270+
public static List<RegisteredFunction> registeredFunctions() {
270271
functionsRegistryLock.lock();
271272
try {
272273
return Collections.unmodifiableList(new ArrayList<>(functionsRegistry));
@@ -284,7 +285,7 @@ public static void clearFunctionsRegistry() {
284285
}
285286
}
286287

287-
static void registerFunction(DuckDBRegisteredFunction function) {
288+
static void registerFunction(RegisteredFunction function) {
288289
functionsRegistryLock.lock();
289290
try {
290291
functionsRegistry.add(function);

src/main/java/org/duckdb/DuckDBFunctionException.java

Lines changed: 0 additions & 13 deletions
This file was deleted.
Lines changed: 108 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,121 @@
11
package org.duckdb;
22

33
import java.sql.SQLException;
4+
import java.util.ArrayList;
5+
import java.util.Collections;
6+
import java.util.List;
47

58
public final class DuckDBFunctions {
6-
public enum DuckDBFunctionKind { SCALAR }
9+
public enum Kind { SCALAR }
710

811
private DuckDBFunctions() {
912
}
1013

1114
public static DuckDBScalarFunctionBuilder scalarFunction() throws SQLException {
1215
return new DuckDBScalarFunctionBuilder();
1316
}
17+
18+
static RegisteredFunction createRegisteredFunction(String name, List<DuckDBLogicalType> parameterTypes,
19+
List<DuckDBColumnType> parameterColumnTypes,
20+
DuckDBLogicalType returnType, DuckDBColumnType returnColumnType,
21+
DuckDBScalarFunction function, DuckDBLogicalType varArgType,
22+
boolean volatileFlag, boolean specialHandlingFlag,
23+
boolean propagateNullsFlag) {
24+
return new RegisteredFunction(name, Kind.SCALAR, Collections.unmodifiableList(new ArrayList<>(parameterTypes)),
25+
Collections.unmodifiableList(new ArrayList<>(parameterColumnTypes)), returnType,
26+
returnColumnType, function, varArgType, volatileFlag, specialHandlingFlag,
27+
propagateNullsFlag);
28+
}
29+
30+
public static class FunctionException extends RuntimeException {
31+
private static final long serialVersionUID = 1L;
32+
33+
public FunctionException(String message) {
34+
super(message);
35+
}
36+
37+
public FunctionException(String message, Throwable cause) {
38+
super(message, cause);
39+
}
40+
}
41+
42+
public static final class RegisteredFunction {
43+
private final String name;
44+
private final Kind functionKind;
45+
private final List<DuckDBLogicalType> parameterTypes;
46+
private final List<DuckDBColumnType> parameterColumnTypes;
47+
private final DuckDBLogicalType returnType;
48+
private final DuckDBColumnType returnColumnType;
49+
private final DuckDBScalarFunction function;
50+
private final DuckDBLogicalType varArgType;
51+
private final boolean volatileFlag;
52+
private final boolean specialHandlingFlag;
53+
private final boolean propagateNullsFlag;
54+
55+
private RegisteredFunction(String name, Kind functionKind, List<DuckDBLogicalType> parameterTypes,
56+
List<DuckDBColumnType> parameterColumnTypes, DuckDBLogicalType returnType,
57+
DuckDBColumnType returnColumnType, DuckDBScalarFunction function,
58+
DuckDBLogicalType varArgType, boolean volatileFlag, boolean specialHandlingFlag,
59+
boolean propagateNullsFlag) {
60+
this.name = name;
61+
this.functionKind = functionKind;
62+
this.parameterTypes = parameterTypes;
63+
this.parameterColumnTypes = parameterColumnTypes;
64+
this.returnType = returnType;
65+
this.returnColumnType = returnColumnType;
66+
this.function = function;
67+
this.varArgType = varArgType;
68+
this.volatileFlag = volatileFlag;
69+
this.specialHandlingFlag = specialHandlingFlag;
70+
this.propagateNullsFlag = propagateNullsFlag;
71+
}
72+
73+
public String name() {
74+
return name;
75+
}
76+
77+
public Kind functionKind() {
78+
return functionKind;
79+
}
80+
81+
public List<DuckDBLogicalType> parameterTypes() {
82+
return parameterTypes;
83+
}
84+
85+
public List<DuckDBColumnType> parameterColumnTypes() {
86+
return parameterColumnTypes;
87+
}
88+
89+
public DuckDBLogicalType returnType() {
90+
return returnType;
91+
}
92+
93+
public DuckDBColumnType returnColumnType() {
94+
return returnColumnType;
95+
}
96+
97+
public DuckDBScalarFunction function() {
98+
return function;
99+
}
100+
101+
public DuckDBLogicalType varArgType() {
102+
return varArgType;
103+
}
104+
105+
public boolean isVolatile() {
106+
return volatileFlag;
107+
}
108+
109+
public boolean hasSpecialHandling() {
110+
return specialHandlingFlag;
111+
}
112+
113+
public boolean propagateNulls() {
114+
return propagateNullsFlag;
115+
}
116+
117+
public boolean isScalar() {
118+
return functionKind == Kind.SCALAR;
119+
}
120+
}
14121
}

0 commit comments

Comments
 (0)