Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

DynamicFilterPhysicalExpr violates the Hash/Eq contract because the Hash and PartialEq implementations each call self.current() which acquires separate RwLock::read() locks. This allows the underlying expression to change between
hash() and eq() calls via update(), causing:

  • HashMap key instability (keys "disappear" after update)
  • Potential infinite loops during HashMap operations
  • Corrupted HashMap state during concurrent access

What changes are included in this PR?

Replaced content-based Hash/Eq with identity-based implementations:

  • Hash: Uses Arc::as_ptr(&self.inner) instead of hashing the mutable expression content
  • PartialEq: Uses Arc::ptr_eq(&self.inner) instead of comparing expression content via locks

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 6, 2026
@kumarUjjawal
Copy link
Contributor Author

cc @adriangb

@adriangb
Copy link
Contributor

adriangb commented Jan 6, 2026

Thanks for working on this.

Were there any alternatives considered? I've thought about it a little bit and think this is probably the best path forward, but maybe there are other alternatives; it would be good to document why we chose this option. At least one would be to make Hash panic and say you need to call snapshot_physical_expr to be explicit. Wdyt?

@kumarUjjawal
Copy link
Contributor Author

Thanks for the feedback, I did some analysis before implementation to best of my understanding;

Approach Pros Cons
Identity-based (chosen) Stable hash, no breaking change, correct semantics for filter identity Different instances with the same expression aren’t equal
Panic in Hash Explicit, forces use of snapshot_physical_expr() Breaking change — any HashSet<Arc<dyn PhysicalExpr>> containing dynamic filters would panic
Include generation in hash Content-aware Still violates the contract — race between hash() and eq()
Document only No code change Silent runtime bugs remain

Current Implementation:

  • PhysicalExpr requires Hash Generic code (e.g., equivalence classes) uses HashMap<Arc<dyn PhysicalExpr>, _>. Panicking would break this for any tree containing DynamicFilterPhysicalExpr.

  • Two dynamic filters with different inner Arcs are different filters (independent update lifecycles), even if they have the same expression at some moment.

  • with_new_children() does the right thing Filters created via tree transformation share the inner value and compare equal, which is exactly what filter pushdown expects.

The panic approach is reasonable but would be a breaking change(I think). Happy to discuss further if you think the explicitness outweighs the compatibility benefit.

@adriangb
Copy link
Contributor

adriangb commented Jan 6, 2026

Sound good to me. Could we look for any potential call sites and update them to use snapshot_physical_expr if they care about differentiating between two DynamicFilterPhysicalExpr? E.g. in #19639 I needed to differentiate them because if a DynamicFilterPhysicalExpr updates it's expression I want to treat it differently than the previous version.

@adriangb
Copy link
Contributor

adriangb commented Jan 6, 2026

I wonder if we could somehow use Inner::generation to our advantage? I think it runs into the same issues as structural comparisons...

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving because this seems like the best choice and is better than the status quo. I imagine there may be uses where a different behavior is desired, we've tried to think through them but can't pretend to have covered them all. For future readers: if a different behavior is required please comment on this PR or open a new issue.

@adriangb
Copy link
Contributor

adriangb commented Jan 6, 2026

(let's wait a day before merging this)

@alamb
Copy link
Contributor

alamb commented Jan 6, 2026

I think we should include this in datafusion-52 so I added it to the list on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DynamicFilterPhysicalExpr violates Hash/Eq contract

3 participants