Skip to content

Conversation

@sreekanth-db
Copy link
Collaborator

@sreekanth-db sreekanth-db commented Dec 22, 2025

Description

This PR enhances geospatial datatype handling to include SRID (Spatial Reference System Identifier) information in column type names and fixes multiple issues related to complex datatype handling across different result formats.

Key Changes

  1. Geospatial Type Name Enhancement

    • Column type names now include SRID: GEOMETRY(4326) instead of GEOMETRY
    • Applies to both GEOMETRY and GEOGRAPHY types
    • Preserves full type information in metadata for better type identification
  2. SEA Inline Mode Complex Type Fix

    • Fixed issue where complex types (ARRAY, MAP, STRUCT) were not returned as complex objects in SEA Inline mode (JSON array result format)
    • Now properly converts to complex datatype objects when EnableComplexDatatypeSupport=true
  3. Thrift CloudFetch Metadata Enhancement

    • Fixed error when extracting type details (e.g., INT from ARRAY<INT>) in Thrift CloudFetch mode
    • Enhanced getColumnInfoFromTColumnDesc() to use Arrow schema metadata alongside TColumnDesc
    • Arrow schema provides complete type information (e.g., ARRAY<INT>) while TColumnDesc only contains base type (e.g., ARRAY)
  4. Arrow Metadata Extraction

    • Added DatabricksThriftUtil.getArrowMetadata() to deserialize Arrow schema from TGetResultSetMetadataResp
    • Fixed null arrow metadata issue in DatabricksResultSet constructor for Thrift CloudFetch mode

Testing

Unit Tests

  • All existing unit tests pass and additional tests are added for new methods

Integration Tests

  • GeospatialTests.java - Comprehensive E2E integration test
    • Tests geospatial types (GEOMETRY and GEOGRAPHY)
    • Validates 24 configuration combinations:
      • Protocol: Thrift / SEA
      • Serialization: Arrow / Inline
      • CloudFetch: Enabled / Disabled (only with Arrow, as CloudFetch requires Arrow)
      • GeoSpatial Support: Enabled / Disabled
      • Complex Type Support: Enabled / Disabled
    • Validates metadata: column types, type names, class names
    • Validates values: WKT representation, SRID
    • Validates behavior when geospatial objects are enabled vs. disabled (STRING fallback)
    • All 24 tests pass

Additional Notes to the Reviewer

Other required details are mentioned in comments in the diff

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
return parser.parseJsonStringToDbStruct(object.toString(), arrowMetadata);
}

private static AbstractDatabricksGeospatial convertToGeospatial(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing duplicate code, using the same logic present in GeospatialConverter

long rowSize = executionResult.getRowCount();
List<String> arrowMetadata = null;
if (executionResult instanceof ArrowStreamResult) {
arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata();
Copy link
Collaborator Author

@sreekanth-db sreekanth-db Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((ArrowStreamResult) executionResult).getArrowMetadata() will always be null for cloud fetch mode at this point in the code flow. So getting arrow metadata from TGetResultSetMetadataResp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will getResultSetMetadata always be present?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should have TGetResultSetMetadataResp object in TFetchResultsResp. Anyways we are handling this case in getArrowMetadata method to check for null values

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is arrow metadata consistently populated in TGetResultSetMetadataResp?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when inline result. But that is limitation of complex data types as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when inline result

Arrow metadata is populated from thrift always when format is CSV/JSON_ARRAY or ARROW.

The only time arrow metadata is not present is for when format is HIVE.
Is inline result in hive format ?

* @return true if the type name starts with ARRAY, MAP, STRUCT, GEOMETRY, or GEOGRAPHY, false
* otherwise
*/
private static boolean isComplexType(String typeName) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this method to DatabricksTypeUtil for using in other classes

return complexDatatypeSupport ? obj : obj.toString();
}

private Object handleComplexDataTypesForSEAInline(Object obj, String columnName)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this method to always return complex datatype, if complex mode is disabled, caller will convert to string

typeText = "STRING";
}

// store base type eg. DECIMAL instead of DECIMAL(7,2) except for geospatial datatypes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For geospatial, we want SRID info (in paranthesis) to be present. eg: GEOMETRY(4326)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a case that SRID info might be missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, even for SRID 0, we get GEOMETRY(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srid will always be present

columnIndex++) {
TColumnDesc columnDesc = resultManifest.getSchema().getColumns().get(columnIndex);

ColumnInfo columnInfo = getColumnInfoFromTColumnDesc(columnDesc);
Copy link
Collaborator Author

@sreekanth-db sreekanth-db Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TColumnDesc doesn't have complete information about the column metadata, hence enhancing it using the arrowMetadata extracted from the TGetResultSetMetadataResp. This is required for methods which use ColumnInfo later in the flow (thrift + cloudfetch + complex)

.columnTypeClassName("java.lang.String")
.columnType(Types.OTHER)
.columnTypeText(VARIANT);
} else if (isGeometryColumn(arrowMetadata, columnIndex)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled with updates to the column info, no separate handling is required

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
@sreekanth-db sreekanth-db changed the title geospatial column type name with srid SRID in geospatial column type name Dec 22, 2025
@sreekanth-db sreekanth-db changed the title SRID in geospatial column type name [PECOBLR-1377] SRID in geospatial column type name Dec 30, 2025

if (arrowMetadata != null && isComplexType(arrowMetadata)) {
typeText = arrowMetadata;
if (arrowMetadata.startsWith(GEOMETRY)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to check if Geospatial flag is enabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inner layer, so flag check is not required here. we are checking for geospatial flags in DatabricksResultSetMetaData which is the interface through which users get metadata info

.map(e -> e.get(ARROW_METADATA_KEY))
.collect(Collectors.toList());
} catch (IOException e) {
throw new DatabricksSQLException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add logging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added error log

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Copy link
Collaborator

@vikrantpuppala vikrantpuppala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! can we also run the jdbc comparator in different modes (sea/arrow) across different column types to ensure things are consistent


### Fixed
- Fixed complex types not being returned as objects in SEA Inline mode when `EnableComplexDatatypeSupport=true`.
- Fixed errors with complex data types in Thrift CloudFetch mode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be more descriptive on what these errors are?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the change log with more details

long rowSize = executionResult.getRowCount();
List<String> arrowMetadata = null;
if (executionResult instanceof ArrowStreamResult) {
arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is arrow metadata consistently populated in TGetResultSetMetadataResp?

return parser.parseJsonStringToDbStruct(obj.toString(), columnName);
} else if (columnName.startsWith(GEOMETRY)) {
return obj;
return new GeospatialConverter().toDatabricksGeometry(obj);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is toDatabricksGeometry not a static method? why do we need to init an object to use it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the code to use cached converter similar to other datatype's converters

long rowSize = executionResult.getRowCount();
List<String> arrowMetadata = null;
if (executionResult instanceof ArrowStreamResult) {
arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when inline result

Arrow metadata is populated from thrift always when format is CSV/JSON_ARRAY or ARROW.

The only time arrow metadata is not present is for when format is HIVE.
Is inline result in hive format ?

return parser.parseJsonStringToDbStruct(obj.toString(), columnName);
} else if (columnName.startsWith(GEOMETRY)) {
return obj;
return new GeospatialConverter().toDatabricksGeometry(obj);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why didnt we handle this before ?
what is the impact of not having this in 3.0.6 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently complex data types are not supported in SEA inline flow, so geospatial type is returned as a string (till 3.0.7). This is consistent with our documentation. With this change we are supporting complex types for SEA inline mode as well. Will update the documentation accordingly.

typeText = "STRING";
}

// store base type eg. DECIMAL instead of DECIMAL(7,2) except for geospatial datatypes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srid will always be present

* and EnableComplexDatatypeSupport are enabled AND not in Thrift+Inline mode. Otherwise, returns as
* STRING.
*/
public class GeospatialTests {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets test geometry any as well:
example query:

%sql
SELECT * FROM VALUES
(ST_GeomFromText('POINT(17 7)', 4326)),
(ST_GeomFromText('POINT(5 5)', 0)) AS t(geom)

expected type name: GEOMETRY(ANY) expected SRIDs per row: 4326 and 0. lets assert on everything, that both row level values are correct and column metadata is correct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the tests for GEOMETRY(ANY)

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
@sreekanth-db sreekanth-db enabled auto-merge (squash) January 21, 2026 18:32
@sreekanth-db sreekanth-db disabled auto-merge January 21, 2026 18:44
@sreekanth-db sreekanth-db merged commit 809c0b3 into main Jan 21, 2026
13 of 15 checks passed
@sreekanth-db sreekanth-db deleted the geospatial-enhancements branch January 21, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants