Skip to content

Conversation

@nkaradzhov
Copy link
Collaborator

Description

Describe your pull request here


Checklist

  • Does npm test pass with this change (including linting)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?

@nkaradzhov nkaradzhov force-pushed the hitless-upgrades branch 4 times, most recently from a6aa6bd to 9e052d3 Compare December 5, 2025 14:31
Copy link

@jit-ci jit-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ The following Jit checks failed to run:

  • secret-detection-trufflehog
  • static-code-analysis-semgrep-pro

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

@jit-ci
Copy link

jit-ci bot commented Jan 16, 2026

❌ Security scan failed

Security scan failed: Branch hitless-upgrades does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

1 similar comment
@jit-ci
Copy link

jit-ci bot commented Jan 16, 2026

❌ Security scan failed

Security scan failed: Branch hitless-upgrades does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

nkaradzhov and others added 21 commits January 29, 2026 13:17
smigrating notification should effect in increased command and socket
timeout for the given connection
Buffer.equals() was failing when reply[0] was a string instead of
a Buffer, causing hangs on push notifications. Now converts strings
to Buffers before comparison in PubSub and commands-queue handlers.

Changes:
 - PubSub.isStatusReply: convert reply[0] to Buffer if string
 - PubSub.isShardedUnsubscribe: convert reply[0] to Buffer if string
 - PubSub.handleMessageReply: convert reply[0] to Buffer if string
 - commands-queue PONG handler: convert reply[0] to Buffer if string
Test Infrastructure:
- Migrate maintenance tests from maintenance.spec.ts to dedicated e2e test files
- Add maintenance.e2e.ts for direct RE cluster testing with testWithRECluster helper
- Add maintenance.proxy.e2e.ts for proxy-based cluster testing
- Dynamically generate tests based on available action triggers from fault injector API

Fault Injector Client:
- Add listActionTriggers() to query available triggers by action and effect
- Add selectDbConfig() and createAndSelectDatabase() for database context management
- Auto-resolve bdb_id from selected database when not explicitly provided
- Support trigger-specific database configurations from requirements

Test Utils:
- Export REClusterTestOptions interface
- Refactor testWithRECluster to reset cluster state before each test
- Add cluster reset and cleanup between tests for isolation

RESP Decoder & Socket:
- Add wire-level debug logging for troubleshooting

Cluster:
- Add debug logging for command execution and MOVED error handling
- Add debug logging for slot discovery and client routing

Enterprise Maintenance Manager:
- Add debug logging for push message handling
…e tests

- Extract parseSMigratedPush into static method with proper type definitions
- Add Address, Destination, and SMigratedEntry interfaces for better type safety
- Support multiple source entries in SMIGRATED events (previously only handled single source)
- Add comprehensive test suite covering single slots, ranges, multiple sources/destinations
- Update cluster-slots to iterate over all entries in SMIGRATED event
- Remove debug console.log statements from production code
…loop

- Track all moving slots and destination nodes during destinations loop
- Wait for inflight commands AFTER all destinations are processed
- Extract commands and handle source cleanup once per entry, not per destination
- Unpause all destination nodes at the end of entry processing

This fixes an issue where source nodes were being unpaused prematurely
when multiple destinations existed, potentially allowing new commands
to queue before all slot migrations were complete.
- Wrap entry processing in try-catch to handle async operation failures
- Unpause source node in catch block to prevent deadlock on error
- Move destination unpause to finally block to ensure cleanup always runs
- Re-throw error after cleanup to propagate failures
- Remove debug console.log statements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants