|
| 1 | +# User-Defined Functions (Java) |
| 2 | + |
| 3 | +This guide shows how to use Java Scalar UDFs and Table Functions with `DuckDBConnection`. |
| 4 | + |
| 5 | +## Scalar UDF |
| 6 | + |
| 7 | +Scalar UDF callbacks use a vectorized contract: |
| 8 | + |
| 9 | +```java |
| 10 | +ScalarUdf.apply(UdfContext ctx, UdfReader[] args, UdfScalarWriter out, int rowCount) |
| 11 | +``` |
| 12 | + |
| 13 | +Use `rowCount` loops and write one output value per row. |
| 14 | + |
| 15 | +### Basic example |
| 16 | + |
| 17 | +```java |
| 18 | +try (DuckDBConnection conn = DriverManager.getConnection("jdbc:duckdb:").unwrap(DuckDBConnection.class); |
| 19 | + Statement stmt = conn.createStatement()) { |
| 20 | + |
| 21 | + conn.registerScalarUdf("add_one", DuckDBColumnType.INTEGER, DuckDBColumnType.INTEGER, |
| 22 | + (ctx, args, out, rowCount) -> { |
| 23 | + for (int row = 0; row < rowCount; row++) { |
| 24 | + out.setInt(row, args[0].getInt(row) + 1); |
| 25 | + } |
| 26 | + }); |
| 27 | + |
| 28 | + try (ResultSet rs = stmt.executeQuery("SELECT add_one(41)")) { |
| 29 | + rs.next(); |
| 30 | + System.out.println(rs.getInt(1)); // 42 |
| 31 | + } |
| 32 | +} |
| 33 | +``` |
| 34 | + |
| 35 | +### Registration forms |
| 36 | + |
| 37 | +You can register scalar UDFs with: |
| 38 | + |
| 39 | +- `DuckDBColumnType` signatures (`registerScalarUdf`) |
| 40 | +- `Class<?>` signatures (`registerScalarUdf`) |
| 41 | +- explicit `UdfLogicalType` signatures (`registerScalarUdf`) |
| 42 | +- varargs signatures (`registerScalarUdfVarArgs`) |
| 43 | + |
| 44 | +For decimal precision/scale, prefer explicit logical types: |
| 45 | + |
| 46 | +```java |
| 47 | +conn.registerScalarUdf( |
| 48 | + "mul_decimal", |
| 49 | + new UdfLogicalType[] {UdfLogicalType.decimal(20, 4), UdfLogicalType.decimal(20, 4)}, |
| 50 | + UdfLogicalType.decimal(38, 8), |
| 51 | + (ctx, args, out, rowCount) -> { |
| 52 | + for (int row = 0; row < rowCount; row++) { |
| 53 | + out.setBigDecimal(row, args[0].getBigDecimal(row).multiply(args[1].getBigDecimal(row))); |
| 54 | + } |
| 55 | + } |
| 56 | +); |
| 57 | +``` |
| 58 | + |
| 59 | +### Options |
| 60 | + |
| 61 | +`UdfOptions` controls scalar behavior: |
| 62 | + |
| 63 | +- `deterministic(true|false)`: marks whether equal inputs always produce equal output. Use `false` for non-deterministic logic (for example random/time-based behavior). |
| 64 | +- `nullSpecialHandling(true|false)`: when `true`, your callback receives rows that contain `NULL` input values; when `false`, DuckDB handles null propagation before callback execution. |
| 65 | +- `returnNullOnException(true|false)`: when `true`, Java exceptions in callback rows are returned as `NULL`; when `false`, the query fails with an error. |
| 66 | +- `varArgs(true|false)`: enables varargs registration (normally used via `registerScalarUdfVarArgs`). |
| 67 | + |
| 68 | +Example: |
| 69 | + |
| 70 | +```java |
| 71 | +UdfOptions options = new UdfOptions() |
| 72 | + .deterministic(true) |
| 73 | + .nullSpecialHandling(true) |
| 74 | + .returnNullOnException(false); |
| 75 | + |
| 76 | +conn.registerScalarUdf("safe_add", DuckDBColumnType.INTEGER, DuckDBColumnType.INTEGER, |
| 77 | + (ctx, args, out, rowCount) -> { |
| 78 | + for (int row = 0; row < rowCount; row++) { |
| 79 | + if (args[0].isNull(row)) { |
| 80 | + out.setNull(row); |
| 81 | + } else { |
| 82 | + out.setInt(row, args[0].getInt(row) + 1); |
| 83 | + } |
| 84 | + } |
| 85 | + }, options); |
| 86 | +``` |
| 87 | + |
| 88 | +## UdfReader / UdfScalarWriter object mappings |
| 89 | + |
| 90 | +| DuckDB type | Reader object | Writer object | |
| 91 | +| --- | --- | --- | |
| 92 | +| `BOOLEAN` | `Boolean` | `Boolean` | |
| 93 | +| `TINYINT`, `SMALLINT`, `INTEGER`, `UTINYINT`, `USMALLINT` | `Integer` | `Integer` | |
| 94 | +| `BIGINT`, `UINTEGER`, `UBIGINT` | `Long` | `Long` | |
| 95 | +| `FLOAT` | `Float` | `Float` | |
| 96 | +| `DOUBLE` | `Double` | `Double` | |
| 97 | +| `DECIMAL` | `BigDecimal` | `BigDecimal` | |
| 98 | +| `VARCHAR` | `String` | `String` | |
| 99 | +| `BLOB` | `byte[]` | `byte[]` | |
| 100 | +| `DATE` | `LocalDate` or `Date` | `LocalDate` or `Date` | |
| 101 | +| `TIME`, `TIME_NS` | `LocalTime` | `LocalTime` | |
| 102 | +| `TIME_WITH_TIME_ZONE` | `OffsetTime` | `OffsetTime` | |
| 103 | +| `TIMESTAMP`, `TIMESTAMP_S`, `TIMESTAMP_MS`, `TIMESTAMP_NS` | `LocalDateTime` | `LocalDateTime` or `Date` | |
| 104 | +| `TIMESTAMP_WITH_TIME_ZONE` | `OffsetDateTime` | `OffsetDateTime` or `Date` | |
| 105 | +| `UUID` | `UUID` | `UUID` | |
| 106 | +| `HUGEINT`, `UHUGEINT` | `byte[]` | `byte[]` | |
| 107 | + |
| 108 | +`UdfScalarWriter` supports explicit setters and `setObject(...)`. |
| 109 | + |
| 110 | +## Table Function |
| 111 | + |
| 112 | +Table function callbacks use: |
| 113 | + |
| 114 | +- `bind(BindContext ctx, Object[] parameters) -> TableBindResult` |
| 115 | +- `init(InitContext ctx, TableBindResult bind) -> TableState` |
| 116 | +- `produce(TableState state, UdfOutputAppender out) -> int` |
| 117 | + |
| 118 | +What each callback does: |
| 119 | + |
| 120 | +- `bind`: runs once per invocation to validate/interpret parameters, define output schema, and create bind state. |
| 121 | +- `init`: runs after bind to initialize execution state (cursor/counters/chunk state). |
| 122 | +- `produce`: runs repeatedly to emit rows in chunks; return the number of rows produced in that call. |
| 123 | + |
| 124 | +### Basic example |
| 125 | + |
| 126 | +```java |
| 127 | +conn.registerTableFunction( |
| 128 | + "range_java", |
| 129 | + new TableFunction() { |
| 130 | + @Override |
| 131 | + public TableBindResult bind(BindContext ctx, Object[] parameters) { |
| 132 | + long end = ((Number) parameters[0]).longValue(); |
| 133 | + return new TableBindResult( |
| 134 | + new String[] {"i"}, |
| 135 | + new UdfLogicalType[] {UdfLogicalType.of(DuckDBColumnType.BIGINT)}, |
| 136 | + new long[] {0L, end} |
| 137 | + ); |
| 138 | + } |
| 139 | + |
| 140 | + @Override |
| 141 | + public TableState init(InitContext ctx, TableBindResult bind) { |
| 142 | + return new TableState(bind.getBindState()); |
| 143 | + } |
| 144 | + |
| 145 | + @Override |
| 146 | + public int produce(TableState state, UdfOutputAppender out) { |
| 147 | + long[] st = (long[]) state.getState(); |
| 148 | + long current = st[0]; |
| 149 | + long end = st[1]; |
| 150 | + int produced = 0; |
| 151 | + |
| 152 | + while (produced < 256 && current < end) { |
| 153 | + out.beginRow().append(current).endRow(); |
| 154 | + current++; |
| 155 | + produced++; |
| 156 | + } |
| 157 | + |
| 158 | + st[0] = current; |
| 159 | + return produced; |
| 160 | + } |
| 161 | + }, |
| 162 | + new TableFunctionDefinition().withParameterTypes(new DuckDBColumnType[] {DuckDBColumnType.BIGINT}), |
| 163 | + new TableFunctionOptions().threadSafe(false).maxThreads(1) |
| 164 | +); |
| 165 | +``` |
| 166 | + |
| 167 | +### Bind parameter object mappings |
| 168 | + |
| 169 | +In `bind`, parameters are materialized as Java objects. Common mappings: |
| 170 | + |
| 171 | +- `DECIMAL -> BigDecimal` |
| 172 | +- `DATE -> LocalDate` |
| 173 | +- `TIME`, `TIME_NS -> LocalTime` |
| 174 | +- `TIMESTAMP* -> LocalDateTime` |
| 175 | +- `TIME_WITH_TIME_ZONE -> OffsetTime` |
| 176 | +- `TIMESTAMP_WITH_TIME_ZONE -> OffsetDateTime` |
| 177 | +- `UUID -> UUID` |
| 178 | + |
| 179 | +### Output writing with UdfOutputAppender |
| 180 | + |
| 181 | +`UdfOutputAppender` supports: |
| 182 | + |
| 183 | +- primitive/object `append(...)` for one column at a time |
| 184 | +- `setObject(...)` and typed setters (`setBigDecimal`, `setLocalDate`, etc.) |
| 185 | +- nested output objects for container types: |
| 186 | + - `LIST`/`ARRAY`: Java arrays or `Collection` |
| 187 | + - `MAP`: `Map` |
| 188 | + - `STRUCT`: positional `List`/array or named `Map<String, Object>` |
| 189 | + - `UNION`: `AbstractMap.SimpleEntry<String, Object>` |
| 190 | + - `ENUM`: `String` |
| 191 | + |
| 192 | +## Table function options |
| 193 | + |
| 194 | +`TableFunctionOptions`: |
| 195 | + |
| 196 | +- `threadSafe(false|true)` |
| 197 | +- `maxThreads(int >= 1)` |
| 198 | + |
| 199 | +`TableFunctionDefinition`: |
| 200 | + |
| 201 | +- `withParameterTypes(...)` |
| 202 | +- `withProjectionPushdown(true|false)` |
| 203 | + |
| 204 | +## Unsupported in scalar signatures |
| 205 | + |
| 206 | +Scalar UDF signatures do not support nested/container logical types (`LIST`, `STRUCT`, `MAP`, `ARRAY`, `UNION`, `ENUM`) and `INTERVAL`. |
| 207 | + |
| 208 | +## Practical recommendations |
| 209 | + |
| 210 | +- Use chunk-oriented loops (`rowCount`) for scalar UDF throughput. |
| 211 | +- Avoid executing SQL on the same `DuckDBConnection` from inside callbacks. |
| 212 | +- Use explicit logical types for decimal-sensitive workloads. |
0 commit comments