-
Notifications
You must be signed in to change notification settings - Fork 413
refactor: consolidate snapshot expiration into MaintenanceTable #2143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Fokko
merged 77 commits into
apache:main
from
ForeverAngry:refactor/consolidate-snapshot-expiration
Aug 12, 2025
Merged
Changes from all commits
Commits
Show all changes
77 commits
Select commit
Hold shift + click to select a range
0a94d96
Added initial units tests and Class for Removing a Snapshot
ForeverAngry 5f0b62b
Added methods needed to expire snapshots by id, and optionally cleanu…
ForeverAngry f995daa
Update test_expire_snapshots.py
ForeverAngry 65365e1
Added the builder method to __init__.py, updated the snapshot api wit…
ForeverAngry e28815f
Snapshots are not being transacted on, but need to re-assign refs
ForeverAngry 4628ede
Fixed the test case.
ForeverAngry e80c41c
adding print statements to help with debugging
ForeverAngry cb9f0c9
Draft ready
ForeverAngry ebcff2b
Applied suggestions to Fix CICD
ForeverAngry 97399bf
Merge branch 'main' into main
ForeverAngry 95e5af2
Rebuild the poetry lock file.
ForeverAngry 5ab5890
Merge branch 'main' into main
ForeverAngry 5acd690
Refactor implementation of `ExpireSnapshots`
ForeverAngry d30a08c
Fixed format and linting issues
ForeverAngry e62ab58
Merge branch 'main' into main
ForeverAngry 1af3258
Fixed format and linting issues
ForeverAngry 352b48f
Merge branch 'main' of https://github.com/ForeverAngry/iceberg-python
ForeverAngry 382e0ea
Merge branch 'main' into main
ForeverAngry 549c183
rebased: from main
ForeverAngry 386cb15
fixed: typo
ForeverAngry 12729fa
removed errant files
ForeverAngry ce3515c
Added: public method signature to the init table file.
ForeverAngry 28fce4b
Removed: `expire_snapshots_older_than` method, in favor of implementi…
ForeverAngry 2c3153e
Update tests/table/test_expire_snapshots.py
ForeverAngry 27c3ece
Removed: unrelated changes, Added: logic to expire snapshot method.
ForeverAngry fe73a34
feat: implement deduplication of data files in Iceberg table and remo…
ForeverAngry 8dfa038
Closes:
ForeverAngry 42e55c9
refactor: remove obsolete `expire_snapshots_older_than` method
ForeverAngry e1627c4
### Features & Enhancements
ForeverAngry 0e6d45c
feat: enhance table maintenance with deduplication and snapshot reten…
ForeverAngry 311c442
feat: update maintenance features with deduplication and retention st…
ForeverAngry fba592d
Update .gitignore
ForeverAngry b837f86
Update test_writes.py
ForeverAngry 4605a04
Merge branch 'main' into refactor/consolidate-snapshot-expiration
ForeverAngry 536528e
refactor: remove obsolete test file for snapshot expiration
ForeverAngry 6036e12
wip: enhance deduplication logic and improve data file handling in ma…
ForeverAngry 9dc9c82
wip - refactor: update deduplication tests to use file names instead …
ForeverAngry 635a1d9
fix(table): correct deduplication logic for data files in Maintenance…
ForeverAngry 73658e0
fix(tests): ensure commit_table is not called when no snapshots are e…
ForeverAngry a9a01ee
refactor: remove unused expire_snapshots method and clean up transact…
ForeverAngry 8c906d2
refactor: streamline data file retrieval in MaintenanceTable and enha…
ForeverAngry 0e72ccc
Reverted changes back to prior commit version for `_get_all_datafiles`
ForeverAngry cfb4061
refactor: simplify snapshot expiration logic and clean up unused imports
ForeverAngry 9371bca
Merge branch 'main' into refactor/consolidate-snapshot-expiration
ForeverAngry 881fab9
fix: add missing newline in API documentation for clarity
ForeverAngry acb70da
refactor: update license header in test_retention_strategies.py
ForeverAngry 54c1f7f
feat: add license header to test_overwrite_files.py
ForeverAngry 4c6f86c
Update test_literals.py
ForeverAngry 03acf03
fix: update typing-extensions and mkdocs-material versions
ForeverAngry 55a156f
fix: update mkdocs-material and typing-extensions versions
ForeverAngry 6cf08b5
Commit Summary
ForeverAngry 3a5c8e4
fix: remove unused parameter from _get_protected_snapshot_ids method
ForeverAngry 2fc6758
SQLite Connection Cleanup: Added proper cleanup of SQLAlchemy engine …
ForeverAngry 2e7e4cb
fix: remove unnecessary whitespace and improve code formatting in mai…
ForeverAngry 93a79b9
Merge branch 'main' into refactor/consolidate-snapshot-expiration
ForeverAngry 6cfc329
Moved the deduplicate logic found here: https://github.com/apache/ice…
ForeverAngry ee94c47
Update inspect.py
ForeverAngry cee4017
Fixed linting issues for the CI/CD Process
ForeverAngry dfd8a93
chore: remove unused ruff configuration and test file
ForeverAngry 6f1d1a7
add back ruff.toml
kevinjqliu 8c66829
refactor: introduce ExpireSnapshots builder for snapshot expiration
ForeverAngry c30ea6e
reverted `inspect.py` to be at parity with the main branch
ForeverAngry 3591c39
reverted `api.md` changes
ForeverAngry f13227e
implemented the changes suggest by @kevinjqliu to
ForeverAngry c5ff202
docs: add table maintenance section with snapshot expiration examples
ForeverAngry af78e52
refactor: rename expire_snapshot_by_id to by_id to align back with wh…
ForeverAngry 1c2f631
feat: implement snapshot expiration functionality with tests for prot…
ForeverAngry 64ba8f0
feat: add methods to expire snapshots by IDs and older than a timesta…
ForeverAngry bf9427a
refactor: rename snapshot expiration methods for clarity and consistency
ForeverAngry 1b3a95c
refactor: update snapshot expiration method calls for consistency and…
ForeverAngry cab890f
fixed: unrelated changes
ForeverAngry 4df0e83
fixed: unrelated changes
ForeverAngry 44da743
fix: update error messages for protected snapshot expiration tests fo…
ForeverAngry 36f89e6
refactor: remove outdated snapshot expiration documentation for clarity
ForeverAngry d6ec64d
linted update
ForeverAngry 3ba85f0
resolve merge conflict
kevinjqliu c980a16
extra?
kevinjqliu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -1287,6 +1287,47 @@ with table.manage_snapshots() as ms: | |||||||
| ms.create_branch(snapshot_id1, "Branch_A").create_tag(snapshot_id2, "tag789") | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## Table Maintenance | ||||||||
|
|
||||||||
| PyIceberg provides table maintenance operations through the `table.maintenance` API. This provides a clean interface for performing maintenance tasks like snapshot expiration. | ||||||||
|
|
||||||||
| ### Snapshot Expiration | ||||||||
|
|
||||||||
| Expire old snapshots to clean up table metadata and reduce storage costs: | ||||||||
|
|
||||||||
| ```python | ||||||||
| # Basic usage - expire a specific snapshot by ID | ||||||||
| table.maintenance.expire_snapshots().by_id(12345).commit() | ||||||||
|
|
||||||||
| # Context manager usage (recommended for multiple operations) | ||||||||
| with table.maintenance.expire_snapshots() as expire: | ||||||||
| expire.by_id(12345) | ||||||||
| expire.by_id(67890) | ||||||||
| # Automatically commits when exiting the context | ||||||||
|
|
||||||||
| # Method chaining | ||||||||
| table.maintenance.expire_snapshots().by_id(12345).commit() | ||||||||
|
Comment on lines
+1307
to
+1309
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the same example as above?
Suggested change
|
||||||||
| ``` | ||||||||
|
|
||||||||
| #### Real-world Example | ||||||||
|
|
||||||||
| ```python | ||||||||
| def cleanup_old_snapshots(table_name: str, snapshot_ids: list[int]): | ||||||||
| """Remove specific snapshots from a table.""" | ||||||||
| catalog = load_catalog("production") | ||||||||
| table = catalog.load_table(table_name) | ||||||||
|
|
||||||||
| # Use context manager for safe transaction handling | ||||||||
| with table.maintenance.expire_snapshots() as expire: | ||||||||
| for snapshot_id in snapshot_ids: | ||||||||
| expire.by_id(snapshot_id) | ||||||||
|
Comment on lines
+1322
to
+1323
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not use the
Suggested change
|
||||||||
|
|
||||||||
| print(f"Expired {len(snapshot_ids)} snapshots from {table_name}") | ||||||||
|
|
||||||||
| # Usage | ||||||||
| cleanup_old_snapshots("analytics.user_events", [12345, 67890, 11111]) | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## Views | ||||||||
|
|
||||||||
| PyIceberg supports view operations. | ||||||||
|
|
||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from pyiceberg.table import Table | ||
| from pyiceberg.table.update.snapshot import ExpireSnapshots | ||
|
|
||
|
|
||
| class MaintenanceTable: | ||
| tbl: Table | ||
|
|
||
| def __init__(self, tbl: Table) -> None: | ||
| self.tbl = tbl | ||
|
|
||
| def expire_snapshots(self) -> ExpireSnapshots: | ||
| """Return an ExpireSnapshots builder for snapshot expiration operations. | ||
|
|
||
| Returns: | ||
| ExpireSnapshots builder for configuring and executing snapshot expiration. | ||
| """ | ||
| from pyiceberg.table import Transaction | ||
| from pyiceberg.table.update.snapshot import ExpireSnapshots | ||
|
|
||
| return ExpireSnapshots(transaction=Transaction(self.tbl, autocommit=True)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.