Skip to content

Conversation

@sreekanth-db
Copy link
Collaborator

Description

Fixed IndexOutOfBoundsException that occurs when executing DDL statements (e.g., CREATE DATABASE) using the Thrift protocol. The bug manifests when there's a mismatch between the number of Thrift column descriptors and Arrow schema fields.

Root Cause

When executing DDL statements, the Databricks server behavior is:

  • Thrift Protocol: Returns column descriptors including a "Result" status column (1 column)
  • Arrow Schema: Returns an empty schema with 0 fields (no actual data)
  • The Bug: Code attempted to access arrowMetadata[0] without checking if the list was empty

This mismatch caused IndexOutOfBoundsException when the driver tried to access arrow metadata at index 0 of an empty list.

Debug Evidence

TColumnDesc (Thrift):

Column[0]:
  name: Result
  type: STRING_TYPE
  position: 1
  Full TColumnDesc: TColumnDesc(columnName:Result, typeDesc:TTypeDesc(...), position:1, comment:)

Arrow Schema:

Arrow schema bytes length: 72
Deserialized Arrow schema, field count: 0  ← Empty!
Arrow metadata list: size=0

Changes Made

Added bounds checking in two locations where arrow metadata is accessed:

  1. ArrowUtil.java:247 - Used by StreamingInlineArrowResult
  2. DatabricksResultSetMetaData.java:195 - Used for result set metadata construction

Before:

String columnArrowMetadata =
    arrowMetadata != null ? arrowMetadata.get(columnIndex) : null;

After:

String columnArrowMetadata =
    arrowMetadata != null && columnIndex < arrowMetadata.size()
        ? arrowMetadata.get(columnIndex)
        : null;

Testing

Manual Testing

Test Case: Execute CREATE DATABASE statement

String sqlQuery = "CREATE DATABASE IF NOT EXISTS hive_metastore.test_db";
boolean hasResultSet = stmt.execute(sqlQuery);

Before Fix: IndexOutOfBoundsException: Index 0 out of bounds for length 0
After Fix: Executes successfully, returns hasResultSet=false

Additional Notes to the Reviewer

NO_CHANGELOG=true

…ents

When executing DDL statements (like CREATE DATABASE), the Thrift protocol
returns column descriptors but the Arrow schema is empty. This caused
IndexOutOfBoundsException when accessing arrowMetadata list without bounds
checking.

Changes:
- Add bounds check in DatabricksResultSetMetaData.java (line 195)
- Add bounds check in ArrowUtil.java (line 247)

Both locations now verify columnIndex < arrowMetadata.size() before accessing
the list to handle cases where Thrift column count != Arrow schema field count.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Collaborator

@vikrantpuppala vikrantpuppala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix, can we add ddl commands into any of the repo tests so that such failures are caught earlier?

@sreekanth-db sreekanth-db merged commit 1893a40 into databricks:main Jan 23, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants