Skip to content

fix(transaction): preserve delete-only manifests in fast_append#2149

Open
drbothen wants to merge 1 commit intoapache:mainfrom
drbothen:fix/append-drops-delete-manifests
Open

fix(transaction): preserve delete-only manifests in fast_append#2149
drbothen wants to merge 1 commit intoapache:mainfrom
drbothen:fix/append-drops-delete-manifests

Conversation

@drbothen
Copy link

Summary

  • Fix FastAppendAction::existing_manifest() to carry forward delete-only manifests
  • Without this fix, manifests containing only Deleted entries are dropped when a new snapshot is produced by fast_append, causing previously deleted files to reappear as live data

Closes #2148

Details

The existing_manifest() filter checked has_added_files() || has_existing_files() but not has_deleted_files(). A delete-only manifest (created by operations like rewrite_files to record file removals) would be silently dropped on the next fast_append, breaking snapshot isolation and causing compounding data duplication.

The fix adds || entry.has_deleted_files() so delete-only manifests persist until expire_snapshots cleans them up, consistent with the Iceberg spec's requirement that delete tracking survives across snapshot boundaries.

Test plan

  • All existing fast_append tests pass (cargo test -p iceberg --lib -- append)
  • Full unit test suite passes (cargo test -p iceberg --lib — 1,211 tests)
  • Integration test demonstrating the bug requires rewrite_files (planned for that PR)

FastAppendAction::existing_manifest() filters which manifests from the
current snapshot are carried forward to the new snapshot. The filter
only checked has_added_files() and has_existing_files(), which drops
manifests that contain only Deleted entries.

After a rewrite_files operation, a delete-only manifest records which
file paths were removed. If a subsequent fast_append drops this manifest,
the deleted files reappear as alive in the new snapshot because the old
manifests still carry their Added entries with no corresponding Delete
entries to exclude them. This causes data duplication that compounds
with each subsequent operation.

Add has_deleted_files() to the filter so delete-only manifests survive
across snapshot boundaries until expire_snapshots cleans them up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FastAppendAction drops delete-only manifests, causing deleted files to reappear

1 participant