-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Hitless upgrades #3142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Hitless upgrades #3142
Conversation
a6aa6bd to
9e052d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
- static-code-analysis-semgrep-pro
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
9e49969 to
9e052d3
Compare
❌ Security scan failedSecurity scan failed: Branch hitless-upgrades does not exist in the remote repository 💡 Need to bypass this check? Comment |
1 similar comment
❌ Security scan failedSecurity scan failed: Branch hitless-upgrades does not exist in the remote repository 💡 Need to bypass this check? Comment |
smigrating notification should effect in increased command and socket timeout for the given connection
Buffer.equals() was failing when reply[0] was a string instead of a Buffer, causing hangs on push notifications. Now converts strings to Buffers before comparison in PubSub and commands-queue handlers. Changes: - PubSub.isStatusReply: convert reply[0] to Buffer if string - PubSub.isShardedUnsubscribe: convert reply[0] to Buffer if string - PubSub.handleMessageReply: convert reply[0] to Buffer if string - commands-queue PONG handler: convert reply[0] to Buffer if string
aed4df3 to
84962df
Compare
Test Infrastructure: - Migrate maintenance tests from maintenance.spec.ts to dedicated e2e test files - Add maintenance.e2e.ts for direct RE cluster testing with testWithRECluster helper - Add maintenance.proxy.e2e.ts for proxy-based cluster testing - Dynamically generate tests based on available action triggers from fault injector API Fault Injector Client: - Add listActionTriggers() to query available triggers by action and effect - Add selectDbConfig() and createAndSelectDatabase() for database context management - Auto-resolve bdb_id from selected database when not explicitly provided - Support trigger-specific database configurations from requirements Test Utils: - Export REClusterTestOptions interface - Refactor testWithRECluster to reset cluster state before each test - Add cluster reset and cleanup between tests for isolation RESP Decoder & Socket: - Add wire-level debug logging for troubleshooting Cluster: - Add debug logging for command execution and MOVED error handling - Add debug logging for slot discovery and client routing Enterprise Maintenance Manager: - Add debug logging for push message handling
…e tests - Extract parseSMigratedPush into static method with proper type definitions - Add Address, Destination, and SMigratedEntry interfaces for better type safety - Support multiple source entries in SMIGRATED events (previously only handled single source) - Add comprehensive test suite covering single slots, ranges, multiple sources/destinations - Update cluster-slots to iterate over all entries in SMIGRATED event - Remove debug console.log statements from production code
…loop - Track all moving slots and destination nodes during destinations loop - Wait for inflight commands AFTER all destinations are processed - Extract commands and handle source cleanup once per entry, not per destination - Unpause all destination nodes at the end of entry processing This fixes an issue where source nodes were being unpaused prematurely when multiple destinations existed, potentially allowing new commands to queue before all slot migrations were complete.
- Wrap entry processing in try-catch to handle async operation failures - Unpause source node in catch block to prevent deadlock on error - Move destination unpause to finally block to ensure cleanup always runs - Re-throw error after cleanup to propagate failures - Remove debug console.log statements
84962df to
34835d1
Compare
Description
Checklist
npm testpass with this change (including linting)?