Skip to content

branch-4.1: [fix](mc) fix memory leak and optimize large data write for MaxCompute connector (#61245)#61745

Merged
morningman merged 1 commit intoapache:branch-4.1from
morningman:41_bp61245
Mar 27, 2026
Merged

branch-4.1: [fix](mc) fix memory leak and optimize large data write for MaxCompute connector (#61245)#61745
morningman merged 1 commit intoapache:branch-4.1from
morningman:41_bp61245

Conversation

@morningman
Copy link
Contributor

bp #61245

…e connector (apache#61245)

### What problem does this PR solve?

Fix:
- Fix potential memory leak in MaxComputeJniScanner by closing
  currentSplitReader in close().
- Fix potential memory leak in MaxComputeJniWriter by restructuring
  close() with try-finally to ensure allocator is always closed even
  when batchWriter.commit() throws. Also close VectorSchemaRoot after
  each batch write.
- Fix maxWriteBatchRows parameter key mismatch between BE
  ("max_write_batch_rows") and JNI ("mc.max_write_batch_rows"),
  which caused user-customized values to be silently ignored.

Optimization:
- Split large Arrow batches into smaller chunks (controlled by
  mc.max_write_batch_rows, default 4096) to avoid HTTP 413 Request
  Entity Too Large errors from MaxCompute Storage API.
- Skip unnecessary SORT node for static partition INSERT, since all
  data goes to a single known partition and no dynamic routing is
  needed.
- Enable ZSTD compression for Arrow data transfer to reduce network
  bandwidth.
New catalog properties:
- mc.max_write_batch_rows: max rows per Arrow batch for write
  (default: 4096)
- mc.max_field_size_bytes: max field size in bytes for write session
  (default: 8MB)

Co-authored-by: daidai <changyuwei@selectdb.com>
@morningman morningman requested a review from yiguolei as a code owner March 26, 2026 03:54
@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.44% (1786/2277)
Line Coverage 64.24% (32026/49856)
Region Coverage 65.05% (16014/24619)
Branch Coverage 55.58% (8528/15344)

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 26, 2026
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman changed the title [fix](mc) fix memory leak and optimize large data write for MaxCompute connector (#61245) branch-4.1: [fix](mc) fix memory leak and optimize large data write for MaxCompute connector (#61245) Mar 26, 2026
@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/27) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.84% (19587/37069)
Line Coverage 36.13% (182504/505092)
Region Coverage 32.45% (141040/434601)
Branch Coverage 33.65% (61906/183954)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/27) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.02% (25764/36277)
Line Coverage 53.73% (270443/503379)
Region Coverage 51.18% (224445/438572)
Branch Coverage 52.64% (97115/184480)

@morningman morningman merged commit 1ba5519 into apache:branch-4.1 Mar 27, 2026
26 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants