feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support by JingsongLi · Pull Request #234 · apache/paimon-rust

JingsongLi · 2026-04-11T02:43:41Z

Purpose

Subtask of #232

Add TableWrite for writing Arrow RecordBatches to Paimon append-only tables. Each (partition, bucket) pair gets its own DataFileWriter with direct writes (matching delta-rs DeltaWriter pattern). File rolling uses tokio::spawn for background close, and prepare_commit uses try_join_all for parallel finalization across partition writers.

Key components:

TableWrite: routes batches by partition/bucket, holds DataFileWriters
DataFileWriter: manages parquet file lifecycle with rolling support
WriteBuilder: creates TableWrite and TableCommit instances
PaimonDataSink: DataFusion DataSink integration for INSERT/OVERWRITE
FormatFileWriter: extended with flush() and in_progress_size()

Configurable options via CoreOptions:

file.compression (default: zstd)
target-file-size (default: 256MB)
write.parquet-buffer-size (default: 256MB)

Includes E2E integration tests for unpartitioned, partitioned, fixed-bucket, multi-commit, column projection, and bucket filtering.

Brief change log

Tests

API and Format

Documentation

littlecoder04 · 2026-04-11T15:52:02Z

crates/paimon/src/table/table_commit.rs

+                let row = BinaryRow::from_serialized_bytes(&msg.partition)?;
+                let mut spec = HashMap::new();
+                for (i, key) in partition_keys.iter().enumerate() {
+                    if let Some(datum) = extract_datum(&row, i, &data_types[i])? {


This will drop NULL partition keys from the overwrite predicate. I reproduced a case where overwriting the NULL partition also deletes other partitions.

Good catch.

… support Add TableWrite for writing Arrow RecordBatches to Paimon append-only tables. Each (partition, bucket) pair gets its own DataFileWriter with direct writes (matching delta-rs DeltaWriter pattern). File rolling uses tokio::spawn for background close, and prepare_commit uses try_join_all for parallel finalization across partition writers. Key components: - TableWrite: routes batches by partition/bucket, holds DataFileWriters - DataFileWriter: manages parquet file lifecycle with rolling support - WriteBuilder: creates TableWrite and TableCommit instances - PaimonDataSink: DataFusion DataSink integration for INSERT/OVERWRITE - FormatFileWriter: extended with flush() and in_progress_size() Configurable options via CoreOptions: - file.compression (default: zstd) - target-file-size (default: 256MB) - write.parquet-buffer-size (default: 256MB) Includes E2E integration tests for unpartitioned, partitioned, fixed-bucket, multi-commit, column projection, and bucket filtering.

littlecoder04 · 2026-04-12T03:23:53Z

crates/paimon/src/table/table_write.rs

+                let datum = extract_datum_from_arrow(batch, row_idx, field_idx, field.data_type())?;
+                if let Some(d) = datum {
+                    datums.push((d, field.data_type().clone()));
+                }


This will drop NULL bucket-key fields before hashing. Java preserves NULL positions here; see FixedBucketRowKeyExtractorTest.testUnCompactDecimalAndTimestampNullValueBucketNumber.
https://github.com/apache/paimon/blob/master/paimon-core/src/test/java/org/apache/paimon/table/sink/FixedBucketRowKeyExtractorTest.java

JingsongLi force-pushed the writer branch from e91da7d to 00a4023 Compare April 11, 2026 03:17

littlecoder04 reviewed Apr 11, 2026

View reviewed changes

JingsongLi added 2 commits April 12, 2026 09:42

Fix comments

5d59939

JingsongLi force-pushed the writer branch from 5f3748a to 5d59939 Compare April 12, 2026 01:43

littlecoder04 reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support#234

feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support#234
JingsongLi wants to merge 2 commits intoapache:mainfrom
JingsongLi:writer

JingsongLi commented Apr 11, 2026

Uh oh!

littlecoder04 Apr 11, 2026 •

edited

Loading

Uh oh!

JingsongLi Apr 12, 2026

Uh oh!

littlecoder04 Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JingsongLi commented Apr 11, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

littlecoder04 Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

littlecoder04 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

littlecoder04 Apr 11, 2026 •

edited

Loading