Skip to content

[FLINK-38539][table] Fix integer overflow in AbstractBytesMultiMap wh…#27510

Open
nateab wants to merge 2 commits intoapache:masterfrom
nateab:FLINK-38539-fix-v2
Open

[FLINK-38539][table] Fix integer overflow in AbstractBytesMultiMap wh…#27510
nateab wants to merge 2 commits intoapache:masterfrom
nateab:FLINK-38539-fix-v2

Conversation

@nateab
Copy link
Copy Markdown
Contributor

@nateab nateab commented Feb 3, 2026

…en data exceeds 2GB

What is the purpose of the change

This pull request fixes an IndexOutOfBoundsException in AbstractBytesMultiMap that occurs when window aggregate queries (using TUMBLE TVF with LAST_VALUE aggregations) process large amounts of data with the
memory state backend. When the value/key area exceeds ~2GB (Integer.MAX_VALUE bytes), casting the long offset to int overflows to a negative value, causing invalid array access.

Brief change log

  • Added bounds checking in writePointer() to validate offset before casting long to int
  • Added bounds checking in skipPointer() to validate offset before casting long to int
  • Added bounds checking in appendRecord() to validate key area offset before casting
  • Removed redundant post-cast checks in appendValue() and appendRecord() since writePointer() now handles validation
  • Ensures EOFException is thrown with a clear warning message before any overflow can occur

Verifying this change

This change added tests and can be verified as follows:

  • Added testThrowsEOFExceptionWhenFull() test to BytesMultiMapTestBase that verifies EOFException is thrown when the map exceeds capacity (not IndexOutOfBoundsException which occurred before the fix)
  • Existing tests (BytesMultiMapTest, WindowBytesMultiMapTest) continue to pass, verifying the fix doesn't break normal operation

The fix adds validation that throws EOFException earlier in the code path. The existing EOFException handling in RecordsWindowBuffer.addElement() already properly handles this case by flushing the buffer and
retrying.

##Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no (only adds a comparison before existing cast)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Feb 3, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@nateab nateab force-pushed the FLINK-38539-fix-v2 branch 2 times, most recently from e235731 to cbcb3fa Compare February 3, 2026 09:30
* handling path works correctly when the map is filled to capacity.
*/
@Test
void testThrowsEOFExceptionWhenFull() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test with exactly the maximum, one under and one over (or as close as is feasible with the 4 byte pointer) to make sure that the boundary is exactly policed as expected please.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I've added flink-table/flink-table-runtime/src/test/java/org/apache/flink/table/runtime/util/collections/binary/AbstractBytesMultiMapBoundaryTest.java to test this now!

…en data exceeds 2GB

This fixes an IndexOutOfBoundsException in AbstractBytesMultiMap that occurs when
window aggregate queries process large amounts of data with the memory state backend.
When the key/value area exceeds ~2GB (Integer.MAX_VALUE bytes), casting the long
offset to int overflows to a negative value, causing invalid array access.

The fix adds bounds checking in writePointer(), skipPointer(), and appendRecord()
to validate offsets before casting long to int, ensuring EOFException is thrown
with a clear warning message before any overflow can occur.

Changes:
- Added bounds checking in writePointer() to validate offset before casting
- Added bounds checking in skipPointer() to validate offset before casting
- Added bounds checking in appendRecord() to validate key area offset
- Added AbstractBytesMultiMapBoundaryTest with boundary tests at Integer.MAX_VALUE:
  - Test at MAX - 4 (should succeed, pointer fits)
  - Test at MAX - 3 with alignment skip (should succeed)
  - Test at exactly MAX (should succeed, boundary is >)
  - Test at MAX + 1 (should throw EOFException)
  - Test at 2*MAX (should throw EOFException)
- Added testThrowsEOFExceptionWhenFull() to BytesMultiMapTestBase
@nateab nateab force-pushed the FLINK-38539-fix-v2 branch from cbcb3fa to 2ff9b1e Compare February 3, 2026 11:04
@github-actions github-actions Bot added the community-reviewed PR has been reviewed by the community. label Feb 3, 2026
@nateab nateab requested a review from davidradl February 5, 2026 20:40
@nateab nateab marked this pull request as draft February 16, 2026 21:24
… reflection

The previous test replicated the overflow check logic locally instead
of calling the actual writePointer/skipPointer methods, which meant it
passed regardless of whether the production fix was present. Rewrite
to use reflection to invoke the real AbstractBytesMultiMap methods,
properly verifying the overflow protection works.
@nateab nateab marked this pull request as ready for review February 16, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants