Skip to content

feat(table): Add commit pipeline with SnapshotCommit abstraction#233

Merged
JingsongLi merged 3 commits intoapache:mainfrom
JingsongLi:commit
Apr 10, 2026
Merged

feat(table): Add commit pipeline with SnapshotCommit abstraction#233
JingsongLi merged 3 commits intoapache:mainfrom
JingsongLi:commit

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

Purpose

Subtask of #232

Implement the table write and commit infrastructure, including:

  • SnapshotCommit trait with RenamingSnapshotCommit (filesystem) and RESTSnapshotCommit (REST catalog) implementations
  • TableCommit with retry logic, append/overwrite/truncate support, partition statistics generation, and row tracking
  • WriteBuilder as the entry point for creating TableCommit instances, with overwrite mode configured at construction time
  • RESTEnv to hold REST catalog context (API client, identifier, uuid)
  • CommitMessage, PartitionStatistics, and ManifestList types
  • SnapshotManager extensions for atomic snapshot commit and latest hint
  • BinaryRow write_datum and datums_to_binary_row utilities
  • CoreOptions accessors for bucket, commit retry, and row tracking

Reference: pypaimon write/commit pipeline

Brief change log

Tests

API and Format

Documentation

@JingsongLi JingsongLi force-pushed the commit branch 2 times, most recently from 982f180 to 6b120a5 Compare April 10, 2026 05:29
@JingsongLi JingsongLi changed the title [WIP] feat(table): Add write/commit pipeline with SnapshotCommit abstraction feat(table): Add commit pipeline with SnapshotCommit abstraction Apr 10, 2026
Implement the table write and commit infrastructure, including:
- SnapshotCommit trait with RenamingSnapshotCommit (filesystem) and
  RESTSnapshotCommit (REST catalog) implementations
- TableCommit with retry logic, append/overwrite/truncate support,
  partition statistics generation, and row tracking
- WriteBuilder as the entry point for creating TableCommit instances,
  with overwrite mode configured at construction time
- RESTEnv to hold REST catalog context (API client, identifier, uuid)
- CommitMessage, PartitionStatistics, and ManifestList types
- SnapshotManager extensions for atomic snapshot commit and latest hint
- BinaryRow write_datum and datums_to_binary_row utilities
- CoreOptions accessors for bucket, commit retry, and row tracking

Reference: pypaimon write/commit pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@jerry-024 jerry-024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit_snapshot rename path silently overwrites on local filesystem

opendal's rename uses POSIX semantics — it silently overwrites the target if it already exists. Java Paimon's FileIO.rename() returns false when the target exists, and additionally wraps the operation in lock.runWithLock(() -> !fileIO.exists(newSnapshotPath) && callable.call()).

paimon-rust does neither check. On local filesystem with concurrent writers:

  1. Writer A writes temp-A, renames to snapshot-1 → succeeds
  2. Writer B writes temp-B, renames to snapshot-1 → also succeeds (POSIX overwrite)
  3. Both return Ok(true), try_commit stops retrying for both
  4. Writer A's snapshot content is gone, its manifest lists are orphaned — silent data loss

Suggested fix: Add an exists() check before rename, matching Java's pattern:

// Check before rename to avoid silent overwrite (opendal uses POSIX rename semantics)
if self.file_io.exists(&target_path).await? {
    let _ = self.file_io.delete_file(&tmp_path).await;
    return Ok(false);
}
match self.file_io.rename(&tmp_path, &target_path).await {
    ...
}

This narrows the race window significantly. For full mutual exclusion on backends without atomic rename, an external lock mechanism (like Java's Lock interface) would be needed.

Copy link
Copy Markdown

@jerry-024 jerry-024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit db71611 into apache:main Apr 10, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants