-
-
Notifications
You must be signed in to change notification settings - Fork 107
fix: preserve subscriber identity when forwarding SeekNode messages #2222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When intermediate nodes forward SeekNode messages during subscription propagation, they were incorrectly replacing the original subscriber's pub_key with their own. This caused the final contract holder to register the wrong peer as the subscriber, preventing UPDATE notifications from reaching the actual subscriber. Changes: - In SeekNode forwarding (line ~750): Use subscriber.pub_key() instead of this_peer.pub_key() - In ReturnSub retry logic (line ~860): Use upstream_subscriber's pub_key when available The fix ensures the original subscriber's identity is preserved through the entire chain of intermediate peers until the contract holder registers the subscription. Fixes #2220 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug in cross-node subscription propagation where intermediate nodes were incorrectly replacing the original subscriber's public key with their own when forwarding SeekNode messages, causing UPDATE notifications to be sent to the wrong peer.
Key Changes:
- Preserves original subscriber identity by using
subscriber.pub_key()instead ofthis_peer.pub_key()when forwarding SeekNode messages - Implements proper retry logic that checks for
upstream_subscriberto distinguish between intermediate and originating nodes - Re-enables
test_multiple_clients_subscriptiontest that validates cross-node subscription propagation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| crates/core/src/operations/subscribe.rs | Fixed subscriber identity preservation in SeekNode forwarding logic (line 754) and ReturnSub retry logic (lines 860-872); removed unused this_peer variable; added clarifying comments |
| crates/core/tests/operations.rs | Updated test comments to document the fix for issue #2220 and explain known timing-related flakiness; removed #[ignore] attribute to re-enable test |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three fixes for subscription reliability: 1. Push-before-send in handle_op_result (operations/mod.rs) - Push operation state BEFORE sending message to prevent race - Response can arrive before send() returns on fast networks 2. State transition fix in subscribe.rs - When forwarding RequestSub as SeekNode, transition to AwaitingResponse - Previously kept ReceivedRequest state, causing invalid_transition errors 3. try_send to send().await in handler.rs - try_send can fail silently under backpressure - Use blocking send to ensure registration completes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
When a BroadcastTo update arrives from another node and modifies the contract state, local WebSocket clients that subscribed to this contract were not being notified. The upsert_contract_state delta path was missing the send_update_notification() call. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@iduartgomez I've been investigating the cross-node subscription flakiness in Current StatusThe fixes in this PR address several race conditions:
These help but the test still fails ~70% of the time. Root Cause AnalysisThe subscription tree doesn't reliably align with the UPDATE propagation path. Here's what happens:
When UPDATE happens:
The race condition timing (from logs): Even with a 5s delay before UPDATE, the test only improves to ~33% pass rate, suggesting the issue isn't just timing but the subscription tree structure itself. Questions for You
I've documented detailed findings in issue #2220 comment. [AI-assisted - Claude] |
|
I havent read this throughtfully, but shouldnt peers be subscribed to the inmediate upstream becauer they dont hold a direct connection with the other peers? I dont know if I am understanding the proposal here. |
Update: Root Cause IdentifiedAfter discussion with @sanity, we've identified the fundamental issue: The invariant "caching a contract = being subscribed to it" is violated. When a peer receives a PUT and caches a contract via
This means a peer can have a cached contract but no path to receive UPDATEs. When a client on that peer later subscribes, the subscription is handled locally (because The fix in PR #2002 (proximity-based update forwarding) is the proper solution - it ensures neighbors who cache a contract receive updates regardless of subscription tree topology. Recommendation:
[AI-assisted - Claude] |
|
@iduartgomez To answer your question directly (this is Claude, not Ian): You're correct that peers subscribe to their immediate upstream - they don't have direct connections to all peers. The subscription tree design is sound. The issue I identified is that caching ≠ subscribing in the current implementation: When a peer receives a PUT and caches via When a client on that peer later calls subscribe, the peer handles it locally (line 466 in subscribe.rs checks Ian clarified that caching and subscribing should be synonymous - if you cache a contract you should receive updates. The current code violates this invariant. PR #2002 (proximity-based forwarding) addresses this by ensuring neighbors who cache a contract receive updates regardless of subscription tree topology. We think that's the right fix rather than trying to patch the subscription routing. [AI-assisted - Claude] |
Closing in favor of PR #2228After systematic investigation, I've identified why the Root CauseWhen Client 3 on node-b subscribes to a contract, the subscription succeeds locally (node-b sends This is the fundamental issue we discussed earlier: the subscription tree doesn't extend from node-b back to the UPDATE origin. Caching a contract doesn't automatically make a peer part of the subscription tree. Evidence from logs
Why this PR's fixes are insufficientThis PR addressed:
But none of these fix the core problem: node-b needs to receive UPDATE messages in the first place. The proper fix: PR #2228PR #2228 (proximity-based update forwarding) addresses this by:
This ensures peers who cache a contract receive updates regardless of subscription tree topology. RecommendationMerge PR #2228 and enable the test there. The fixes in this PR (SeekNode identity, state transitions) may still be valuable but should be evaluated in the context of PR #2228's architecture. [AI-assisted - Claude] |
Problem
When a client on one node subscribes to a contract stored on a different node, UPDATE notifications fail to reach the subscriber. The root cause is in the subscription propagation logic: when intermediate nodes forward
SeekNodemessages, they incorrectly replace the original subscriber'spub_keywith their own identity.This causes the final contract holder to register the wrong peer as the subscriber, so when updates are broadcast, they're sent to intermediate nodes instead of the actual subscriber.
Example Flow (Before Fix)
RequestSubwith itspub_keySeekNodebut uses its ownpub_keySolution
Preserve the original subscriber's
pub_keywhen forwardingSeekNodemessages:In SeekNode forwarding (line ~750): Use
subscriber.pub_key()instead ofthis_peer.pub_key()when the current node doesn't have the contract and forwards to the next hop.In ReturnSub retry logic (line ~860): When retrying after a
ReturnSub { subscribed: false }, use theupstream_subscriber's pub_key if we're an intermediate node.The subscriber's address is still set to
Unknown(to be filled from transport layer), but the identity is preserved through the entire chain.Testing
Re-enabled
test_multiple_clients_subscriptionwhich validates cross-node subscription propagation:The test passes when network setup succeeds. It has some timing-related flakiness due to multi-node startup coordination, but when connections are established correctly, the cross-node notification now works.
Fixes
Closes #2220
🤖 Generated with Claude Code