Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Feb 10, 2026

We need to merge #3446 first

Rationale

BatchReader is annotated @IcebergApi and must be kept for Iceberg compatibility, but Comet's own production code no longer uses it. The prefetch feature was entirely built on BatchReader and is dead code now that the native_iceberg_compat path uses NativeBatchReader. I checked the Iceberg source and confirmed that the prefetch feature was not used there either. Comet also no longer accelerates V2 Parquet scans, so CometParquetScan and CometParquetPartitionReaderFactory are dead code.

Summary

  • Marks BatchReader as @Deprecated (since 0.14.0) — kept for Iceberg compatibility via @IcebergApi
  • Removes all prefetch internals from BatchReader (fields, methods, PrefetchTask inner class)
  • Removes COMET_SCAN_PREFETCH_ENABLED and COMET_SCAN_PREFETCH_THREAD_NUM configs
  • Removes CometPrefetchThreadPool
  • Deletes CometParquetPartitionReaderFactory and CometParquetScan (V2 Parquet scan dead code)
  • Simplifies CometScanExec.prepareRDD to always use newFileScanRDD
  • Cleans up EliminateRedundantTransitions V2 dead code path
  • Removes prefetch tests and BatchReader benchmark case
  • Cleans up CometParquetScan references in tests

🤖 Generated with Claude Code

@andygrove andygrove changed the title Remove prefetch feature, V2 scan dead code, and deprecate BatchReader chore: Remove prefetch feature, V2 scan dead code, and deprecate BatchReader Feb 10, 2026
@andygrove andygrove changed the title chore: Remove prefetch feature, V2 scan dead code, and deprecate BatchReader chore: Remove all remaining uses of BatchReader from Comet Feb 10, 2026
@andygrove andygrove changed the title chore: Remove all remaining uses of BatchReader from Comet chore: Remove all remaining uses of legacy BatchReader from Comet Feb 10, 2026
BatchReader is annotated @IcebergApi and kept for Iceberg compatibility,
but Comet's own production code no longer uses it. The prefetch feature
was entirely built on BatchReader and is dead code now that the
native_iceberg_compat path uses NativeBatchReader. V2 Parquet scan
acceleration (CometParquetScan) is also no longer active.

This commit:
- Marks BatchReader as @deprecated (since 0.14.0)
- Removes all prefetch internals from BatchReader (fields, methods, inner class)
- Removes COMET_SCAN_PREFETCH_ENABLED and COMET_SCAN_PREFETCH_THREAD_NUM configs
- Removes CometPrefetchThreadPool
- Deletes CometParquetPartitionReaderFactory and CometParquetScan
- Simplifies CometScanExec.prepareRDD to always use newFileScanRDD
- Removes dead BatchReader code path from CometParquetFileFormat
- Cleans up EliminateRedundantTransitions V2 dead code path
- Removes prefetch tests, BatchReader-only tests, and BatchReader benchmark case
- Cleans up CometParquetScan references in tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant