[paimon-common] Fix ApplyBitmapIndexRecordReader to stop readBatch after bitmap selection is exhausted. by wwj6591812 · Pull Request #7994 · apache/paimon

wwj6591812 · 2026-05-27T10:22:53Z

Problem

When file-index.read.enabled=true (default) and a query has only LIMIT N (no filter), Paimon pushes the limit down as a bitmap selection and wraps the ORC reader with ApplyBitmapIndexRecordReader.

After row N, ApplyBitmapIndexFileRecordIterator.next() returns null because position > last. However, RecordReaderIterator treats iterator null as batch exhaustion, not reader exhaustion, and keeps calling readBatch() until EOF. This causes unnecessary full-file ORC I/O even though only N rows are needed.

This affects any caller using RecordReader.toCloseableIterator() or forEachRemaining(), not only Flink.

Related Flink-side fix: #7991
That PR stops the dedicated split path from calling hasNext() after the limit is reached in ReadOperator. This PR fixes the root cause in the core reader layer and benefits all read paths.

Root Cause

ApplyBitmapIndexFileRecordIterator stops yielding rows once position > last, but ApplyBitmapIndexRecordReader.readBatch() had no global exhausted state. So RecordReaderIterator.advanceIfNeeded() interpreted the null as "current batch finished, open the next batch" and looped until EOF.

Fix

Track bitmap selection exhaustion in ApplyBitmapIndexRecordReader with an AtomicBoolean:

When ApplyBitmapIndexFileRecordIterator sees position > last, it sets exhausted = true.
Subsequent readBatch() calls return null immediately, so RecordReaderIterator stops without scanning the rest of the file.

Minimal change: only ApplyBitmapIndexRecordReader and ApplyBitmapIndexFileRecordIterator are modified.

Relationship to #7991

PR	Layer	Scope
#7991	Flink `ReadOperator`	Dedicated split path; stops extra `hasNext()` after limit
This PR	Core `ApplyBitmapIndexRecordReader`	All callers of `toCloseableIterator()` / `forEachRemaining()`

The two PRs are independent and can be reviewed/merged separately. Either one fixes the reported LIMIT N performance issue on the dedicated path; together they provide defense in depth.

Testing

ApplyBitmapIndexRecordReaderTest — mock reader with 100 rows (batch size 20), limit 10 → 10 rows returned, underlying readBatch called once; covers toCloseableIterator() and forEachRemaining(); sparse bitmap case
AppendOnlySimpleTableTest.testLimitWithCloseableIterator — 5000-row Parquet table, limit 10 via toCloseableIterator()
BatchFileStoreITCase.testDedicatedPathLimitTenOnManyRows — 100 rows INSERT, dedicated split + LIMIT 10 → 10 rows

mvn test -pl paimon-common -Dtest=ApplyBitmapIndexRecordReaderTest

mvn test -pl paimon-core -Dtest=AppendOnlySimpleTableTest#testLimitWithCloseableIterator

mvn test -pl paimon-flink/paimon-flink-common \
  -Dtest=BatchFileStoreITCase#testDedicatedPathLimitTenOnManyRows

leaves12138

Thanks for the fix. I think the current exhausted flag does not cover the real RawFileSplitRead path yet.

In RawFileSplitRead#createFileReader, the same BitmapIndexResult is first passed into FormatReaderContext and then DataFileRecordReader#readBatchInternal applies it again via iterator.selection(selection) before the reader is wrapped by ApplyBitmapIndexRecordReader. Because of that inner selection, the wrapped iterator can return null as soon as the bitmap iterator is exhausted, without ever exposing a row whose returnedPosition() > last to ApplyBitmapIndexFileRecordIterator. In that case exhausted remains false, and RecordReaderIterator still keeps calling readBatch() until EOF.

The new unit test does not model this because its underlying reader returns unfiltered rows. I added a local variant where CountingFileRecordReader#readBatch() returns batch.selection(selection), which is equivalent to the DataFileRecordReader behavior above. With totalRows = 100, batchSize = 20, and limit = 10, the result rows are correct, but the underlying readBatchCount() is 6 instead of 1.

A minimal fix could be to mark the reader exhausted when returning the last selected row, for example when bitmap.contains(position) and position >= last, or otherwise avoid/apply only one layer of bitmap selection and add a test that simulates the already-selected underlying iterator.

…ter bitmap selection is exhausted.

wwj6591812 · 2026-05-29T01:43:39Z

Thanks for the fix. I think the current exhausted flag does not cover the real RawFileSplitRead path yet.

In RawFileSplitRead#createFileReader, the same BitmapIndexResult is first passed into FormatReaderContext and then DataFileRecordReader#readBatchInternal applies it again via iterator.selection(selection) before the reader is wrapped by ApplyBitmapIndexRecordReader. Because of that inner selection, the wrapped iterator can return null as soon as the bitmap iterator is exhausted, without ever exposing a row whose returnedPosition() > last to ApplyBitmapIndexFileRecordIterator. In that case exhausted remains false, and RecordReaderIterator still keeps calling readBatch() until EOF.

The new unit test does not model this because its underlying reader returns unfiltered rows. I added a local variant where CountingFileRecordReader#readBatch() returns batch.selection(selection), which is equivalent to the DataFileRecordReader behavior above. With totalRows = 100, batchSize = 20, and limit = 10, the result rows are correct, but the underlying readBatchCount() is 6 instead of 1.

A minimal fix could be to mark the reader exhausted when returning the last selected row, for example when bitmap.contains(position) and position >= last, or otherwise avoid/apply only one layer of bitmap selection and add a test that simulates the already-selected underlying iterator.

Thanks again for the careful review and for pointing out the gap in the real RawFileSplitRead path. Your feedback was very helpful.

You were right that our original exhausted flag did not cover the case where the underlying iterator is already selection-filtered in DataFileRecordReader. In that path, the wrapped iterator can return null once the selected rows in the current batch are consumed, without ever reaching position > last, so readBatch() could still continue until EOF.

We have updated the fix accordingly:

We still keep the original reader-level exhausted guard in ApplyBitmapIndexRecordReader.
In ApplyBitmapIndexFileRecordIterator, we now also mark the reader exhausted when returning the last selected row (position >= last), in addition to the existing position > last handling.
We added a unit test where the underlying reader returns batch.selection(selection), matching the DataFileRecordReader behavior you described. With totalRows = 100, batchSize = 20, and limit = 10, the result is still 10 rows, and readBatch() is now called only once.

We really appreciate you taking the time to review this in detail. Could you please take another look when convenient? Thx.

@leaves12138

wwj6591812 mentioned this pull request May 27, 2026

[Flink] Fix ReadOperator to stop reading after LIMIT on dedicated split path. #7991

Open

leaves12138 requested changes May 28, 2026

View reviewed changes

[paimon-common] Fix ApplyBitmapIndexRecordReader to stop readBatch af…

6e7bb2d

…ter bitmap selection is exhausted.

wwj6591812 force-pushed the fix_bug_of_ApplyBitmapIndexRecordReader_0527 branch from 8b47063 to 6e7bb2d Compare May 29, 2026 00:50

wwj6591812 requested a review from leaves12138 May 29, 2026 03:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[paimon-common] Fix ApplyBitmapIndexRecordReader to stop readBatch after bitmap selection is exhausted.#7994

[paimon-common] Fix ApplyBitmapIndexRecordReader to stop readBatch after bitmap selection is exhausted.#7994
wwj6591812 wants to merge 1 commit into
apache:masterfrom
wwj6591812:fix_bug_of_ApplyBitmapIndexRecordReader_0527

wwj6591812 commented May 27, 2026

Uh oh!

leaves12138 left a comment

Uh oh!

wwj6591812 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wwj6591812 commented May 27, 2026

Problem

Root Cause

Fix

Relationship to #7991

Testing

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

wwj6591812 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants