Skip to content

FastAppendAction drops delete-only manifests, causing deleted files to reappear #2148

@drbothen

Description

@drbothen

Description

FastAppendAction::existing_manifest() in crates/iceberg/src/transaction/append.rs filters manifest list entries with:

.filter(|entry| entry.has_added_files() || entry.has_existing_files())

This drops manifests that contain only Deleted entries (has_deleted_files() but neither has_added_files() nor has_existing_files()).

Impact

After a rewrite_files operation (or any operation that creates a delete-only manifest to mark old files as removed), a subsequent fast_append drops the delete manifest from the new snapshot's manifest list. The old manifests still carry Added entries for the removed files, but there is no longer a Delete manifest to exclude them. The deleted files reappear as alive.

This causes compounding data duplication — each subsequent append or rewrite cycle adds another copy of the "ghost" files, producing exponential row growth:

Cycle 1: 72 rows
Cycle 2: 145 rows
Cycle 3: 297 rows
...
Cycle 12: 235,026 rows

Root Cause

The filter in existing_manifest() was intended to skip empty manifests, but it inadvertently skips delete-only manifests. A delete-only manifest is not empty — it records which file paths were removed and must be preserved until expire_snapshots cleans it up.

Fix

Add || entry.has_deleted_files() to the filter:

.filter(|entry| {
    entry.has_added_files()
        || entry.has_existing_files()
        || entry.has_deleted_files()
})

Reproduction

  1. Create a table and append data files
  2. Perform a rewrite_files operation (replaces old files with a compacted file)
  3. Perform a fast_append with new data files
  4. Scan the table — deleted files from step 2 reappear as live data
  5. Repeat steps 2-4 — duplication compounds exponentially

Notes

  • Currently, rewrite_files is not yet on main, so this bug is latent. It becomes immediately triggerable once any operation that produces delete-only manifests lands.
  • The Iceberg spec requires delete manifests to persist across snapshots until they are cleaned up by expire_snapshots. Dropping them prematurely violates snapshot isolation guarantees.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions