-
Notifications
You must be signed in to change notification settings - Fork 27
feat: Implement streaming prefetch for Thrift inline results #1184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement streaming prefetch for Thrift inline results #1184
Conversation
4c16fd6 to
a9bd3cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements proactive prefetching with sliding window for Thrift columnar and inline Arrow results to eliminate blocking at batch boundaries. The implementation adds a comprehensive streaming infrastructure with background prefetch threads and configurable memory management.
Changes:
- Adds new streaming infrastructure with generic type-safe batch providers and processors
- Introduces two new JDBC parameters: EnableInlineStreaming (default: 1) and ThriftMaxBatchesInMemory (default: 3)
- Changes IGNORE_TRANSACTIONS default from "0" to "1" (breaking change)
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| DatabricksThriftServiceClient.java | Adds CloudFetch control via isCloudFetchEnabled() |
| DatabricksJdbcUrlParams.java | Adds streaming parameters and changes IGNORE_TRANSACTIONS default |
| IDatabricksConnectionContext.java | Adds interface methods for streaming configuration |
| ThriftBatch.java | New batch container with lifecycle management |
| ThriftBatchFetcher.java, ThriftBatchFetcherImpl.java | New abstraction for fetching batches |
| ThriftBatchProvider.java | Streaming provider with prefetch thread (appears unused/dead code) |
| ThriftStreamingProvider.java | Generic type-safe streaming provider |
| ThriftResponseProcessor.java | Interface for pluggable processors |
| StreamingBatch.java | Generic batch container |
| InlineArrowResponseProcessor.java, ColumnarResponseProcessor.java | Concrete processor implementations |
| StreamingThriftResult.java, StreamingInlineArrowResult.java | Streaming result implementations |
| LazyThriftInlineArrowResult.java | New lazy loading implementation |
| InlineChunkProvider.java | Removes Thrift-based constructor (moved to lazy result) |
| ArrowStreamResult.java | Refactors complex type handling into shared method |
| ExecutionResultFactory.java | Adds factory logic to choose between streaming and lazy |
| DatabricksResultSet.java | Adds metadata handling for LazyThriftInlineArrowResult |
| DatabricksConnectionContext.java | Implements new configuration methods |
| LazyThriftInlineArrowResultTest.java | Comprehensive unit tests for lazy implementation |
| Test files | Updates for API changes and new test coverage |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/com/databricks/jdbc/api/impl/thrift/ThriftBatchProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/ExecutionResultFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/thrift/ThriftBatchProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/ExecutionResultFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/thrift/ThriftBatchProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/thrift/ThriftBatchProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/thrift/ThriftBatchProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftResponseProcessor.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/ExecutionResultFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/InlineArrowResponseProcessor.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/InlineArrowResponseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/InlineArrowResponseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Outdated
Show resolved
Hide resolved
Implements proactive prefetching with sliding window for both Thrift columnar and inline Arrow results, eliminating blocking at batch boundaries. Key components: - ThriftStreamingProvider<T>: Generic streaming provider with type-safe batches - StreamingBatch<T>: Type-safe batch container with lifecycle management - ThriftResponseProcessor<T>: Pluggable processors for Columnar and Arrow - StreamingColumnarResult: Streaming variant for Thrift columnar results - StreamingInlineArrowResult: Streaming variant for inline Arrow results Configuration: - EnableInlineStreaming: Toggle streaming (default: enabled) - ThriftMaxBatchesInMemory: Sliding window size (default: 3) Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
- Add null validation for required parameters in ThriftStreamingProvider - Wrap batchFetcher.close() and batch.release() in try-catch blocks - Add timeout to waitForBatchCreation to prevent indefinite waiting - Add logging for error conditions in StreamingInlineArrowResult - Add logging for error conditions in InlineArrowResponseProcessor - Extract timeout constant in ExecutionResultFactory - Update NEXT_CHANGELOG.md to document streaming prefetch feature Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
f3fc271 to
db7ad68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/com/databricks/jdbc/api/impl/streaming/InlineArrowResponseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Outdated
Show resolved
Hide resolved
Merge latest main branch to incorporate CloudFetch disable feature (PR databricks#1183) Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
- Make cachedSchema volatile for thread visibility in InlineArrowResponseProcessor - Add null checks for getData() in StreamingInlineArrowResult - Add null checks for currentBatch and getData() in StreamingColumnarResult Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
0a2b5b7 to
8d592e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Show resolved
Hide resolved
src/test/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResultTest.java
Outdated
Show resolved
Hide resolved
src/test/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResultTest.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/api/impl/streaming/ThriftStreamingProvider.java
Show resolved
Hide resolved
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…cloud-latency # Conflicts: # src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java # src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Summary
Implements proactive prefetching with a sliding window for both Thrift columnar and inline Arrow results, eliminating blocking at batch boundaries and improving throughput.
Key Components
New Streaming Infrastructure
ThriftStreamingProvider<T>: Generic type-safe streaming provider with background prefetch thread and configurable sliding windowStreamingBatch<T>: Type-safe batch container with lifecycle management and error handlingThriftResponseProcessor<T>: Interface for pluggable response processorsColumnarResponseProcessor: Processes Thrift columnar resultsInlineArrowResponseProcessor: Processes inline Arrow results with schema cachingResult Implementations
StreamingInlineArrowResult: High-throughput streaming implementation for inline Arrow results with background prefetchingStreamingColumnarResult: Streaming implementation for Thrift columnar results with prefetchSupporting Classes
ThriftBatchFetcher/ThriftBatchFetcherImpl: Abstraction for fetching batches from the Thrift serverConfiguration
EnableInlineStreaming1(enabled)ThriftMaxBatchesInMemory3Key Features
ThriftStreamingProvider<T>eliminates unsafe castingTesting
ExecutionResultFactoryTestfor new factory logicDatabricksThriftServiceClientTestfor CloudFetch controlUsage
Streaming is enabled by default. To disable and use lazy loading instead:
To adjust the sliding window size: