-
Notifications
You must be signed in to change notification settings - Fork 412
Manage snapshots mutability issue #2431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manage snapshots mutability issue #2431
Conversation
…tests for concurrent operations
…larity and type safety
|
It looks like there's other cases of this throughout the codebase, such as |
|
Yeah, @kevinjqliu started an issue to track this work here:
|
| _updates: Tuple[TableUpdate, ...] = () | ||
| _requirements: Tuple[TableRequirement, ...] = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same nit as #2430 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smaheshwar-pltr i applied the changes, take a look 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't think the changes are wrong, and I prefer initializing the fields in the constructor, the tests are still passing with the original code (Python 3.10) 🤔 I think this is because a tuple is immutable, so we always create a new instance.
tests/table/test_manage_snapshots.py
Outdated
| @@ -0,0 +1,365 @@ | |||
| # Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I check out your branch, and revert the changes:
--- a/pyiceberg/table/update/snapshot.py
+++ b/pyiceberg/table/update/snapshot.py
@@ -810,13 +810,11 @@ class ManageSnapshots(UpdateTableMetadata["ManageSnapshots"]):
ms.create_tag(snapshot_id1, "Tag_A").create_tag(snapshot_id2, "Tag_B")
"""
- _updates: Tuple[TableUpdate, ...]
- _requirements: Tuple[TableRequirement, ...]
+ _updates: Tuple[TableUpdate, ...] = ()
+ _requirements: Tuple[TableRequirement, ...] = ()
def __init__(self, transaction: Transaction) -> None:
super().__init__(transaction)
- self._updates = ()
- self._requirements = ()
def _commit(self) -> UpdatesAndRequirements:
"""Apply the pending changes and commit."""
While running the tests, everything still passes:
============================= test session starts ==============================
collecting ... collected 5 items
tests/table/test_manage_snapshots.py::test_manage_snapshots_thread_safety_fix PASSED [ 20%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_concurrent_operations PASSED [ 40%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_concurrent_different_tables PASSED [ 60%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_cross_table_isolation PASSED [ 80%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_concurrent_same_table_different_operations PASSED [100%]
============================== 5 passed in 0.05s ===============================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch. I didn't think to test the test on the original code to make sure they failed, before writing and running them on the bug fix. I probably shield have taken a TDD approach on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have built up quite a collection of tests over the years, they take longer and longer to run. So we want to make sure that they address underlying issues. I think we can drop the tests for this change. I don't think this class is subject to the thread-safe issue because it uses tuples rather than mutable collections like set and list. However, it is important to follow the best practice of setting the fields in the constructor.
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @ForeverAngry 🙌 Thanks @tonycox and @smaheshwar-pltr for the reviews 🚀
Related to #2409, and partially closes #2427
Rationale for this change
This PR fixes a thread safety issue in the
ManageSnapshotsclass similar to the one identified inExpireSnapshots(#2409). While the original issue specifically mentionedExpireSnapshots, the same thread safety vulnerability exists inManageSnapshotsdue to identical problematic design patterns. The same testing methodology used in #2430 was adapted for this.Root Cause:
The
ManageSnapshotsclass had class-level attributes (_updates,_requirements) that were shared across all instances. When multiple threads created differentManageSnapshotsinstances for concurrent operations, they all shared the same underlying tuple objects for tracking updates and requirements.Potential Impact:
table1.manage_snapshots().create_tag(...)adds updates to shared tupletable2.manage_snapshots().create_branch(...)adds updates to same shared tupleSolution:
Applied the same fix as ExpireSnapshots - moved the shared class-level attributes to instance-level attributes in the
__init__method, ensuring eachManageSnapshotsinstance has its own isolated state.Relationship to #2409:
While #2409 specifically reported ExpireSnapshots thread safety issues, this PR proactively addresses the same vulnerability pattern in ManageSnapshots to prevent similar issues from occurring with snapshot management operations (tags, branches, etc.).
Are these changes tested?
Yes, comprehensive test coverage has been added with a dedicated test file
test_manage_snapshots_thread_safety.py:test_manage_snapshots_thread_safety_fix()- Verifies that different ManageSnapshots instances have separate update/requirement tuplestest_manage_snapshots_concurrent_operations()- Tests concurrent operations don't contaminate each othertest_manage_snapshots_concurrent_different_tables()- Tests concurrent operations on different tables work correctlytest_manage_snapshots_cross_table_isolation()- Validates no cross-contamination of operations between tablestest_manage_snapshots_concurrent_same_table_different_operations()- Tests concurrent operations on the same tableAll tests demonstrate that the thread safety fix works correctly and that concurrent ManageSnapshots operations maintain proper isolation.
Are there any user-facing changes?
No breaking changes. The public API remains identical:
ManageSnapshotsmethods work the same way (create_tag,create_branch,delete_tag, etc.)Behavioral improvement:
manage_snapshots()operations on different tables now work correctly without interferenceThis is a pure bug fix.