[core] Fix OOM when writing/compacting table with large records#7621
Open
yugan95 wants to merge 1 commit intoapache:masterfrom
Open
[core] Fix OOM when writing/compacting table with large records#7621yugan95 wants to merge 1 commit intoapache:masterfrom
yugan95 wants to merge 1 commit intoapache:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #7620
Fix OOM when writing table with large records (100MB+) and many buckets (e.g. 256) due to unbounded buffer growth in sort, merge and compaction paths. Each bucket's writer independently holds its own sort buffer, merge channels, and compaction readers. When a large record inflates an internal reuse buffer, that bloated buffer is retained per-bucket, causing memory usage to quickly exceed available heap.
Heap dump analysis identified four independent root causes:
1. Sort path —
RowHelperinternal buffer never shrinksRowHelper.reuseWritergrows its internalMemorySegmentlist for large records, butBinaryRowWriter.reset()only resets the cursor without releasing oversized segments. Additionally,InternalRowSerializer.serialize()can exit viaEOFException(a normal signal when the sort buffer is full), skipping any cleanup of the bloated buffer.2. Merge path —
BinaryRowSerializer.deserialize(reuse)only grows, never shrinksEach merge channel holds a
BinaryRowreuse instance. When a large record is deserialized, the backingMemorySegmentgrows to fit it but is never shrunk for subsequent small records. Withmax-num-file-handles(default 128) channels each retaining a 100MB+ buffer, memory usage explodes.3. Compaction read path —
HeapBytesVector.reserveBytes()integer overflowreserveBytes()computesnewCapacity * 2using plain multiplication. WhennewCapacityexceeds ~1.07 billion bytes, this overflowsInteger.MAX_VALUE, causingNegativeArraySizeExceptionor silent data corruption.4. Parquet write — statistics and page-size-check config not passed through
RowDataParquetBuilderdoes not pass throughparquet.statistics.truncate.length,parquet.columnindex.truncate.length,parquet.page.size.row.check.min, andparquet.page.size.row.check.max. Without these, users cannot tune Parquet behavior for large-record scenarios, leading to multi-GB pages and bloated footers.Changes
RowHelper: addresetIfTooLarge()— release internal buffer when segments exceed 4MBInternalRowSerializer: callresetIfTooLarge()infinallyblock ofserialize()andserializeToPages()to handleEOFExceptionexit pathBinaryRowSerializer: add shrink logic indeserialize(reuse)— reallocate when existing buffer > 4MB thresholdHeapBytesVector: use bit-shift (<< 1) instead of* 2, cap atMAX_ARRAY_SIZE = Integer.MAX_VALUE - 8, throw clear error on overflowRowDataParquetBuilder: pass throughstatistics.truncate.length,columnindex.truncate.length,min-row-count-for-page-size-check,max-row-count-for-page-size-checkfrom configTests
RowHelperTest— validatesresetIfTooLarge()releases oversized buffers (> 4MB) and preserves small onesBinaryRowSerializerShrinkTest— validatesdeserialize(reuse)shrinks oversized buffers and preserves small onesHeapBytesVectorReserveBytesTest— validates overflow-safereserveBytes()growth and data correctnessAPI and Format
N/A — no public API or format changes.
Documentation
N/A