Skip to content

Conversation

@vikrantpuppala
Copy link
Collaborator

@vikrantpuppala vikrantpuppala commented Jan 19, 2026

Summary

Implements proactive prefetching with a sliding window for both Thrift columnar and inline Arrow results, eliminating blocking at batch boundaries and improving throughput.

Key Components

New Streaming Infrastructure

  • ThriftStreamingProvider<T>: Generic type-safe streaming provider with background prefetch thread and configurable sliding window
  • StreamingBatch<T>: Type-safe batch container with lifecycle management and error handling
  • ThriftResponseProcessor<T>: Interface for pluggable response processors
    • ColumnarResponseProcessor: Processes Thrift columnar results
    • InlineArrowResponseProcessor: Processes inline Arrow results with schema caching

Result Implementations

  • StreamingInlineArrowResult: High-throughput streaming implementation for inline Arrow results with background prefetching
  • StreamingColumnarResult: Streaming implementation for Thrift columnar results with prefetch

Supporting Classes

  • ThriftBatchFetcher / ThriftBatchFetcherImpl: Abstraction for fetching batches from the Thrift server
streaming inline

Configuration

Parameter Description Default
EnableInlineStreaming Toggle streaming mode for inline results 1 (enabled)
ThriftMaxBatchesInMemory Sliding window size (max batches kept in memory) 3

Key Features

  1. Background Prefetching: Dedicated thread fetches batches ahead of consumption
  2. Sliding Window: Configurable memory limit prevents unbounded memory growth
  3. Type Safety: Generic ThriftStreamingProvider<T> eliminates unsafe casting
  4. Graceful Error Handling:
    • Try-catch around resource cleanup to prevent cascading failures
    • Timeout on batch creation wait to prevent indefinite blocking
  5. Comprehensive Logging: Debug/error logging for troubleshooting

Testing

  • Updated ExecutionResultFactoryTest for new factory logic
  • Updated DatabricksThriftServiceClientTest for CloudFetch control
  • Existing integration tests cover streaming behavior

Usage

Streaming is enabled by default. To disable and use lazy loading instead:

jdbc:databricks://host:port/default;EnableInlineStreaming=0;...

To adjust the sliding window size:

jdbc:databricks://host:port/default;ThriftMaxBatchesInMemory=5;...

Copilot AI review requested due to automatic review settings January 19, 2026 11:01
@vikrantpuppala vikrantpuppala force-pushed the non-cloud-latency branch 2 times, most recently from 4c16fd6 to a9bd3cf Compare January 19, 2026 11:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements proactive prefetching with sliding window for Thrift columnar and inline Arrow results to eliminate blocking at batch boundaries. The implementation adds a comprehensive streaming infrastructure with background prefetch threads and configurable memory management.

Changes:

  • Adds new streaming infrastructure with generic type-safe batch providers and processors
  • Introduces two new JDBC parameters: EnableInlineStreaming (default: 1) and ThriftMaxBatchesInMemory (default: 3)
  • Changes IGNORE_TRANSACTIONS default from "0" to "1" (breaking change)

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
DatabricksThriftServiceClient.java Adds CloudFetch control via isCloudFetchEnabled()
DatabricksJdbcUrlParams.java Adds streaming parameters and changes IGNORE_TRANSACTIONS default
IDatabricksConnectionContext.java Adds interface methods for streaming configuration
ThriftBatch.java New batch container with lifecycle management
ThriftBatchFetcher.java, ThriftBatchFetcherImpl.java New abstraction for fetching batches
ThriftBatchProvider.java Streaming provider with prefetch thread (appears unused/dead code)
ThriftStreamingProvider.java Generic type-safe streaming provider
ThriftResponseProcessor.java Interface for pluggable processors
StreamingBatch.java Generic batch container
InlineArrowResponseProcessor.java, ColumnarResponseProcessor.java Concrete processor implementations
StreamingThriftResult.java, StreamingInlineArrowResult.java Streaming result implementations
LazyThriftInlineArrowResult.java New lazy loading implementation
InlineChunkProvider.java Removes Thrift-based constructor (moved to lazy result)
ArrowStreamResult.java Refactors complex type handling into shared method
ExecutionResultFactory.java Adds factory logic to choose between streaming and lazy
DatabricksResultSet.java Adds metadata handling for LazyThriftInlineArrowResult
DatabricksConnectionContext.java Implements new configuration methods
LazyThriftInlineArrowResultTest.java Comprehensive unit tests for lazy implementation
Test files Updates for API changes and new test coverage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Implements proactive prefetching with sliding window for both Thrift columnar
and inline Arrow results, eliminating blocking at batch boundaries.

Key components:
- ThriftStreamingProvider<T>: Generic streaming provider with type-safe batches
- StreamingBatch<T>: Type-safe batch container with lifecycle management
- ThriftResponseProcessor<T>: Pluggable processors for Columnar and Arrow
- StreamingColumnarResult: Streaming variant for Thrift columnar results
- StreamingInlineArrowResult: Streaming variant for inline Arrow results

Configuration:
- EnableInlineStreaming: Toggle streaming (default: enabled)
- ThriftMaxBatchesInMemory: Sliding window size (default: 3)

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
- Add null validation for required parameters in ThriftStreamingProvider
- Wrap batchFetcher.close() and batch.release() in try-catch blocks
- Add timeout to waitForBatchCreation to prevent indefinite waiting
- Add logging for error conditions in StreamingInlineArrowResult
- Add logging for error conditions in InlineArrowResponseProcessor
- Extract timeout constant in ExecutionResultFactory
- Update NEXT_CHANGELOG.md to document streaming prefetch feature

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Merge latest main branch to incorporate CloudFetch disable feature (PR databricks#1183)

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
- Make cachedSchema volatile for thread visibility in InlineArrowResponseProcessor
- Add null checks for getData() in StreamingInlineArrowResult
- Add null checks for currentBatch and getData() in StreamingColumnarResult

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…cloud-latency

# Conflicts:
#	src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java
#	src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
@vikrantpuppala vikrantpuppala merged commit 1a68271 into databricks:main Jan 22, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants