Skip to content

[PECOBLR-1746] Implementing support for listing procedures#1238

Merged
msrathore-db merged 17 commits intodatabricks:mainfrom
msrathore-db:PECOBLR-1746
Mar 16, 2026
Merged

[PECOBLR-1746] Implementing support for listing procedures#1238
msrathore-db merged 17 commits intodatabricks:mainfrom
msrathore-db:PECOBLR-1746

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

@msrathore-db msrathore-db commented Feb 27, 2026

Description

Implements getProcedures and getProcedureColumns in DatabricksDatabaseMetaData by querying information_schema.routines and information_schema.parameters via SQL.

Unlike other metadata operations that use SHOW commands or Thrift RPCs, these use direct SQL SELECT queries against information_schema views. This works for both Thrift and SEA transports.

getProcedures:

  • Queries information_schema.routines filtered by routine_type = 'PROCEDURE'
  • Returns 9-column JDBC-spec result set (PROCEDURE_CAT, PROCEDURE_SCHEM, PROCEDURE_NAME, reserved x3, REMARKS, PROCEDURE_TYPE, SPECIFIC_NAME)
  • PROCEDURE_TYPE is always procedureNoResult (1)

getProcedureColumns:

  • Queries information_schema.parameters JOINed with information_schema.routines to filter for procedures
  • Returns 20-column JDBC-spec result set with parameter metadata
  • Maps parameter_mode (IN/OUT/INOUT) to JDBC COLUMN_TYPE constants (1/4/2)
  • Maps Databricks type names to java.sql.Types codes via existing getCode()

Catalog resolution:

  • NULL catalog → queries system.information_schema.routines (cross-catalog)
  • Specific catalog → queries <catalog>.information_schema.routines
  • Empty string → returns empty result set

Shared SQL builders in CommandConstants eliminate duplication between SDK and Thrift clients.

Testing

Unit tests (DatabricksMetadataSdkClientTest):

  • 4 parameterized tests for listProcedures SQL generation (catalog+schema+name, null schema, null name, null catalog)
  • 3 parameterized tests for listProcedureColumns SQL generation (all filters, partial filters, all nulls)

Integration test (MetadataIntegrationTests#testGetProceduresAndProcedureColumns):

  • Creates a procedure jdbc_test_compute_area(x DOUBLE, y DOUBLE, OUT area DOUBLE) with COMMENT
  • Verifies getProcedures returns correct name, schema, catalog, remarks, type
  • Verifies getProcedureColumns returns 3 params with correct COLUMN_TYPE (IN=1, OUT=4), DATA_TYPE (DOUBLE=8), ordinal positions
  • Tests column name filtering
  • Cleans up procedure after test
  • WireMock stubs recorded for REPLAY mode

Existing tests: All 248 DatabricksDatabaseMetaDataTest + 51 DatabricksMetadataSdkClientTest pass.

Additional Notes to the Reviewer

  • information_schema.parameters is used (not routine_columns) because routine_columns contains table-valued function output columns, not procedure parameters.
  • NULLABLE is always procedureNullableUnknown (2) since the server does not track parameter nullability.
  • When catalog is NULL, system.information_schema is queried which requires system table access permissions. If the user lacks this permission, the driver returns an empty result set. This will be addressed server-side in a future release.

sql.append(" AND routine_schema LIKE '").append(schemaPattern).append("'");
}
if (procedureNamePattern != null) {
sql.append(" AND routine_name LIKE '").append(procedureNamePattern).append("'");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use parameterized statements here and provide user provided values as parameters? This will prevent SQL injection issues

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now using ? placeholders with ImmutableSqlParameter for server-side binding.

}

private static String getCatalogPrefix(String catalog) {
return (catalog == null) ? "system" : "`" + catalog + "`";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if catalogName already contains a backtick character? Should we escape that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog names cannot contain backticks, so the user input would be invalid in that case. No escaping needed.

return (short) procedureColumnUnknown;
}
switch (parameterMode.toUpperCase()) {
case "IN":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare as constants

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Extracted PARAM_MODE_IN, PARAM_MODE_OUT, PARAM_MODE_INOUT, and IS_RESULT_YES as constants.

try {
Object val = resultSet.getObject(columnName);
return val != null ? val.toString() : null;
} catch (SQLException e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add logging

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added debug logging to getRowsForProcedures, getRowsForProcedureColumns, and mapParameterModeToColumnType.

if (val == null) return null;
if (val instanceof Number) return ((Number) val).intValue();
return Integer.parseInt(val.toString());
} catch (SQLException | NumberFormatException e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add logging

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — see above.

if (val == null) return null;
if (val instanceof Number) return ((Number) val).shortValue();
return Short.parseShort(val.toString());
} catch (SQLException | NumberFormatException e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add logging

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — see above.

Integer charMaxLength = getIntOrNull(resultSet, "character_maximum_length");
Integer charOctetLength = getIntOrNull(resultSet, "character_octet_length");
row.add(numericPrecision != null ? numericPrecision : charMaxLength); // PRECISION
row.add(charOctetLength != null ? charOctetLength : numericPrecision); // LENGTH
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but numericPrecision can be null also

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — if both numericPrecision and charMaxLength are null, COLUMN_SIZE returns null, which is the expected behavior per JDBC spec for types where precision is not applicable.

Integer numericPrecision = getIntOrNull(resultSet, "numeric_precision");
Integer charMaxLength = getIntOrNull(resultSet, "character_maximum_length");
Integer charOctetLength = getIntOrNull(resultSet, "character_octet_length");
row.add(numericPrecision != null ? numericPrecision : charMaxLength); // PRECISION
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charMaxLength itself can be null

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — null fallback is intentional. Returns null when neither numeric precision nor character length applies to the type.

Integer charMaxLength = getIntOrNull(resultSet, "character_maximum_length");
Integer charOctetLength = getIntOrNull(resultSet, "character_octet_length");
row.add(numericPrecision != null ? numericPrecision : charMaxLength); // PRECISION
row.add(charOctetLength != null ? charOctetLength : numericPrecision); // LENGTH
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is length being fallback to precision?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. BUFFER_LENGTH now uses charOctetLength directly (null for non-character types) instead of falling back to numericPrecision.

Arguments.of(
"SELECT p.specific_catalog, p.specific_schema, p.specific_name,"
+ " p.parameter_name, p.parameter_mode, p.is_result,"
+ " p.data_type, p.full_data_type,"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is full_data_type used?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. It was unused — numeric_precision and numeric_scale are available as separate columns so full_data_type is not needed.

msrathore-db and others added 13 commits March 16, 2026 02:10
…SQL queries

Implement JDBC-compliant getProcedures and getProcedureColumns by querying
information_schema.ROUTINES and information_schema.parameters via SQL.
This approach works for both thrift and SEA transports without using thrift RPC.

- getProcedures: queries ROUTINES filtered by routine_type='PROCEDURE'
- getProcedureColumns: queries parameters joined with ROUTINES for procedures
- Null catalog uses system.information_schema; specific catalog uses <catalog>.information_schema
- Shared SQL builders in CommandConstants to avoid duplication across SDK/Thrift clients
- Maps parameter_mode (IN/OUT/INOUT) to JDBC COLUMN_TYPE constants
- Maps Databricks type names to java.sql.Types via existing getCode()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…line helpers

- Extract table names, column lists, and filter into named constants in CommandConstants
- Remove getProcedureColumnPrecision/getProcedureColumnLength helpers with unused dataType param
- Inline precision/length logic directly in getRowsForProcedureColumns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ocedureColumns

Unit tests (DatabricksMetadataSdkClientTest):
- testListProcedures: parameterized tests for SQL generation with various
  catalog/schema/name combinations including null catalog (system prefix)
- testListProcedureColumns: parameterized tests for SQL generation with
  JOIN to routines table, all filter combinations

Integration test (MetadataIntegrationTests):
- testGetProceduresAndProcedureColumns: creates a test procedure with
  IN and OUT params, verifies getProcedures returns correct metadata
  (name, remarks, type), verifies getProcedureColumns returns correct
  parameter info (column types IN=1/OUT=4, data types, ordinal positions),
  tests column name filtering, cleans up procedure after test

Also renames ROUTINES -> routines in CommandConstants for consistency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…I changes

Add GET_PROCEDURES and GET_PROCEDURE_COLUMNS to MetadataOperationType enum.
Update getResultSet/executeStatement calls to pass the new required
MetadataOperationType parameter after upstream API signature change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
String.format treats the single quote before %s as a flag character,
causing UnknownFormatConversion when patterns like '%' are passed.
Replace String.format with StringBuilder.append for SQL LIKE clauses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… '%'

The LOGGER.error(e, "message {}", e) call uses String.format internally.
When the exception message contains SQL with LIKE '%', the '%' is
interpreted as a format specifier causing UnknownFormatConversionException.
Use plain log message without exception interpolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…m values

Add GET_PROCEDURES and GET_PROCEDURE_COLUMNS to the enum count assertion
and parameterized header value tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace string concatenation with ? placeholders and ImmutableSqlParameter
for all LIKE clause values in buildProceduresSQL and buildProcedureColumnsSQL.
This prevents SQL injection via user-provided schema/procedure/column patterns.

The build methods now accept a Map<Integer, ImmutableSqlParameter> that they
populate with the parameter bindings. Callers pass this map through to
executeStatement instead of an empty HashMap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o prevent injection

Use ? placeholders with ImmutableSqlParameter for LIKE clause values in
buildProceduresSQL and buildProcedureColumnsSQL. Parameters are passed to
the server via executeStatement's parameter map for server-side binding,
the same mechanism used by PreparedStatement.

Also resolves merge conflict with upstream rename of DatabricksMetadataSdkClient
to DatabricksMetadataQueryClient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract string constants for parameter mode values (IN/OUT/INOUT/YES)
- Add debug logging to getRowsForProcedures, getRowsForProcedureColumns,
  and mapParameterModeToColumnType
- Fix BUFFER_LENGTH: use charOctetLength directly instead of falling back
  to numericPrecision which is semantically wrong
- Remove unused full_data_type from PARAMETERS_SELECT_COLUMNS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ries

The SQL now uses ? placeholders with server-side parameter binding,
so the WireMock stubs need to match the new request format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and fix column count

The PARAMETERS_SELECT_COLUMNS no longer includes full_data_type (removed
per PR review), so unit test expected SQL strings and column count mock
need to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
int paramIndex = 1;

StringBuilder sql = new StringBuilder();
sql.append("SELECT ").append(PARAMETERS_SELECT_COLUMNS);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use utility like mybatis-3 for creating such SQL strings?

List<List<Object>> rows = new ArrayList<>();
while (resultSet.next()) {
List<Object> row = new ArrayList<>();
row.add(getStringOrNull(resultSet, "routine_catalog")); // PROCEDURE_CAT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare as constants, all of hard coded values

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Extracted all column name strings to private static final constants (COL_ROUTINE_CATALOG, COL_SPECIFIC_NAME, COL_DATA_TYPE, etc.).

LOGGER.debug("Building rows for getProcedureColumns result set");
List<List<Object>> rows = new ArrayList<>();
while (resultSet.next()) {
String dataType = getStringOrNull(resultSet, "data_type");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare as constants

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — see above.

String isResult = getStringOrNull(resultSet, "is_result");

List<Object> row = new ArrayList<>();
row.add(getStringOrNull(resultSet, "specific_catalog")); // PROCEDURE_CAT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is null value allowed for catalog and schema?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, per the JDBC spec both PROCEDURE_CAT and PROCEDURE_SCHEM are nullable. From the spec: "PROCEDURE_CAT String => procedure catalog (may be null)" and "PROCEDURE_SCHEM String => procedure schema (may be null)".

row.add(getStringOrNull(resultSet, "parameter_default")); // COLUMN_DEF
row.add(null); // SQL_DATA_TYPE (reserved)
row.add(null); // SQL_DATETIME_SUB (reserved)
row.add(getIntOrNull(resultSet, "character_octet_length")); // CHAR_OCTET_LENGTH
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this same as buffer_length? and you are reading it again

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed — now reusing the charOctetLength variable for both BUFFER_LENGTH (col 9) and CHAR_OCTET_LENGTH (col 17) instead of calling getIntOrNull twice.

…e charOctetLength

- Extract all hardcoded column name strings to private static final constants
  (COL_ROUTINE_CATALOG, COL_SPECIFIC_NAME, COL_DATA_TYPE, etc.)
- Reuse charOctetLength variable for CHAR_OCTET_LENGTH instead of calling
  getIntOrNull a second time
- Add nullable annotations in comments per JDBC spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
msrathore-db and others added 2 commits March 17, 2026 00:29
…eColumns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@msrathore-db msrathore-db merged commit 28d6a13 into databricks:main Mar 16, 2026
14 of 15 checks passed
github-merge-queue bot pushed a commit to adbc-drivers/databricks that referenced this pull request Mar 24, 2026
## Summary
- Implement `getProcedures`, `getProcedureColumns` metadata operations
- Follow the existing `getTables`/`getColumns` pattern with trait
methods, SQL builders, SeaClient impl, FFI functions, and C header
declarations
- Add comprehensive E2E metadata tests covering all metadata operations

### getProcedures
- Queries `information_schema.routines` filtered by `routine_type =
'PROCEDURE'`
- Catalog resolution: `NULL` → `system` (cross-catalog), specific value
→ that catalog, empty string → empty result
- Pattern filtering via SQL `LIKE` on `routine_schema` and
`routine_name`

### getProcedureColumns
- Queries `information_schema.parameters` joined with
`information_schema.routines`
- Same catalog resolution and pattern filtering as getProcedures
- Selects all columns needed for ODBC `SQLProcedureColumns`:
parameter_name, parameter_mode, is_result, data_type, full_data_type,
numeric_precision/scale, character_maximum/octet_length,
ordinal_position, parameter_default, comment

### getCrossReferences
- Uses `SHOW FOREIGN KEYS` (same query as `getForeignKeys`)
- Takes all 6 ODBC `SQLCrossReferences` parameters
(pk_catalog/schema/table + fk_catalog/schema/table)
- Parent table filtering is done client-side by the ODBC C++ layer

### Design references
- JDBC implementation: databricks/databricks-jdbc#1238
- Design doc: [SQLProcedures and SQLProcedureColumns
Support](https://docs.google.com/document/d/1WV1hOiJA8Obs9q3o47u7cvZCwnRvKg8kWI8gZhuFFaY)

## Test plan
- [x] 12 unit tests for SQL builders (catalog resolution, pattern
filters, escaping, column selection)
- [x] 12 E2E metadata tests (`#[ignore]`, require real Databricks
connection) covering all operations: catalogs, schemas, tables, columns,
primary keys, foreign keys, cross-references, procedures, procedure
columns
- [x] `cargo test` — 237 tests pass
- [x] `cargo clippy --all-targets -- -D warnings` — clean
- [x] `cargo +stable fmt --all` — clean
- [ ] E2E validation against live workspace (blocked on token — tests
compile and are properly `#[ignore]`d)

This pull request was AI-assisted by Isaac.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants