Skip to content

Conversation

@ForeverAngry
Copy link
Contributor

@ForeverAngry ForeverAngry commented Sep 4, 2025

Related to #2409, and partially closes #2427

Rationale for this change

This PR fixes a thread safety issue in the ManageSnapshots class similar to the one identified in ExpireSnapshots (#2409). While the original issue specifically mentioned ExpireSnapshots, the same thread safety vulnerability exists in ManageSnapshots due to identical problematic design patterns. The same testing methodology used in #2430 was adapted for this.

Root Cause:
The ManageSnapshots class had class-level attributes (_updates, _requirements) that were shared across all instances. When multiple threads created different ManageSnapshots instances for concurrent operations, they all shared the same underlying tuple objects for tracking updates and requirements.

Potential Impact:

  • Thread 1: table1.manage_snapshots().create_tag(...) adds updates to shared tuple
  • Thread 2: table2.manage_snapshots().create_branch(...) adds updates to same shared tuple
  • Result: Both threads would have mixed updates, potentially causing incorrect operations or failures

Solution:
Applied the same fix as ExpireSnapshots - moved the shared class-level attributes to instance-level attributes in the __init__ method, ensuring each ManageSnapshots instance has its own isolated state.

Relationship to #2409:
While #2409 specifically reported ExpireSnapshots thread safety issues, this PR proactively addresses the same vulnerability pattern in ManageSnapshots to prevent similar issues from occurring with snapshot management operations (tags, branches, etc.).

Are these changes tested?

📢 🔥 Big shout-out to @QlikFrederic, as the testing methodology was largely derived from the testing and analysis done by the user! 🔥 📢

Yes, comprehensive test coverage has been added with a dedicated test file test_manage_snapshots_thread_safety.py:

  • test_manage_snapshots_thread_safety_fix() - Verifies that different ManageSnapshots instances have separate update/requirement tuples
  • test_manage_snapshots_concurrent_operations() - Tests concurrent operations don't contaminate each other
  • test_manage_snapshots_concurrent_different_tables() - Tests concurrent operations on different tables work correctly
  • test_manage_snapshots_cross_table_isolation() - Validates no cross-contamination of operations between tables
  • test_manage_snapshots_concurrent_same_table_different_operations() - Tests concurrent operations on the same table

All tests demonstrate that the thread safety fix works correctly and that concurrent ManageSnapshots operations maintain proper isolation.

Are there any user-facing changes?

No breaking changes. The public API remains identical:

  • All existing ManageSnapshots methods work the same way (create_tag, create_branch, delete_tag, etc.)
  • Method signatures are unchanged
  • Behavior is identical except for the thread safety improvement

Behavioral improvement:

  • Concurrent manage_snapshots() operations on different tables now work correctly without interference
  • No risk of mixed updates/requirements between different ManageSnapshots instances in multi-threaded environments
  • Improved reliability for applications using ManageSnapshots in concurrent scenarios

This is a pure bug fix.

@ForeverAngry ForeverAngry marked this pull request as ready for review September 4, 2025 23:47
@Anton-Tarazi
Copy link
Contributor

It looks like there's other cases of this throughout the codebase, such as MaintenanceTable as well as locations with both class variables and instance variables of the same name. It might be worth searching the repo and cleaning this up

@ForeverAngry
Copy link
Contributor Author

ForeverAngry commented Sep 5, 2025

Yeah, @kevinjqliu started an issue to track this work here:
#2427

It looks like there's other cases of this throughout the codebase, such as MaintenanceTable as well as locations with both class variables and instance variables of the same name. It might be worth searching the repo and cleaning this up

Comment on lines -813 to -814
_updates: Tuple[TableUpdate, ...] = ()
_requirements: Tuple[TableRequirement, ...] = ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same nit as #2430 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaheshwar-pltr i applied the changes, take a look 👍

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't think the changes are wrong, and I prefer initializing the fields in the constructor, the tests are still passing with the original code (Python 3.10) 🤔 I think this is because a tuple is immutable, so we always create a new instance.

@@ -0,0 +1,365 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

@Fokko Fokko Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I check out your branch, and revert the changes:

--- a/pyiceberg/table/update/snapshot.py
+++ b/pyiceberg/table/update/snapshot.py
@@ -810,13 +810,11 @@ class ManageSnapshots(UpdateTableMetadata["ManageSnapshots"]):
        ms.create_tag(snapshot_id1, "Tag_A").create_tag(snapshot_id2, "Tag_B")
     """
 
-    _updates: Tuple[TableUpdate, ...]
-    _requirements: Tuple[TableRequirement, ...]
+    _updates: Tuple[TableUpdate, ...] = ()
+    _requirements: Tuple[TableRequirement, ...] = ()
 
     def __init__(self, transaction: Transaction) -> None:
         super().__init__(transaction)
-        self._updates = ()
-        self._requirements = ()
 
     def _commit(self) -> UpdatesAndRequirements:
         """Apply the pending changes and commit."""

While running the tests, everything still passes:

============================= test session starts ==============================
collecting ... collected 5 items

tests/table/test_manage_snapshots.py::test_manage_snapshots_thread_safety_fix PASSED [ 20%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_concurrent_operations PASSED [ 40%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_concurrent_different_tables PASSED [ 60%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_cross_table_isolation PASSED [ 80%]
tests/table/test_manage_snapshots.py::test_manage_snapshots_concurrent_same_table_different_operations PASSED [100%]

============================== 5 passed in 0.05s ===============================

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch. I didn't think to test the test on the original code to make sure they failed, before writing and running them on the bug fix. I probably shield have taken a TDD approach on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have built up quite a collection of tests over the years, they take longer and longer to run. So we want to make sure that they address underlying issues. I think we can drop the tests for this change. I don't think this class is subject to the thread-safe issue because it uses tuples rather than mutable collections like set and list. However, it is important to follow the best practice of setting the fields in the constructor.

@ForeverAngry ForeverAngry requested a review from Fokko September 24, 2025 23:56
@Fokko Fokko changed the title Manage snapshots thread safety issue 2409 Manage snapshots mutability issue Sep 26, 2025
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ForeverAngry 🙌 Thanks @tonycox and @smaheshwar-pltr for the reviews 🚀

@Fokko Fokko merged commit cff0c64 into apache:main Sep 26, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

check for class with mutable state as class attributes

5 participants