-
Notifications
You must be signed in to change notification settings - Fork 414
Description
Apache Iceberg version
main (development)
Please describe the bug 🐞
Snapshot OVERWRITE operation can calculate the wrong summary fields when the table is partially updated.
update_snapshot_summaries assumes that all OVERWRITE operations are full table overwrite
| truncate_full_table=self._operation == Operation.OVERWRITE, |
iceberg-python/pyiceberg/table/snapshots.py
Lines 358 to 359 in 322ebdd
| if truncate_full_table and summary.operation == Operation.OVERWRITE and previous_summary is not None: | |
| summary = _truncate_table_summary(summary, previous_summary) |
This is likely an oversight when we implemented partial write.
Thankfully the table/transaction's overwrite function is currently implemented as a delete+append.
The only place where OVERWRITE operation is used is during partial deletes.
iceberg-python/pyiceberg/table/__init__.py
Line 678 in 322ebdd
| with self.update_snapshot(snapshot_properties=snapshot_properties).overwrite() as overwrite_snapshot: |
Original thread apache/iceberg-go#356 (comment) (thanks @arnaudbriche and @zeroshade )
Partial overwrite reproduced in #1840
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time