Skip to content

Conversation

@nvdtf
Copy link
Member

@nvdtf nvdtf commented Jan 30, 2026

🚀 Dual-Worker Architecture

Architecture Upgrade: Replaces single interval-based worker with dual-worker system (SchedulerHandler + WorkerHandler) enabling parallel request processing and crash recovery


📋 Summary

This PR migrates the FlowYieldVaultsEVM worker design from a single interval-based worker that processed one request at a time to a dual-worker architecture. The new system enables parallel processing of multiple requests with automatic scheduling and crash recovery.


🔄 What Changed

Before

  • Single interval-based worker: Processed requests sequentially
    • One PENDING request processed per execution
    • Fixed interval between executions
    • Sequential, one-at-a-time processing
    • No panic recovery

After

  • Dual-worker architecture:
    • SchedulerHandler: Recurrent worker that runs at fixed intervals (1.0s)
      • Crash recovery for failed WorkerHandlers
      • Checks for pending requests
      • Preprocesses requests (PENDING → PROCESSING)
      • Spawns 0 to multiple WorkerHandlers based on available capacity and pending requests
      • Passes request ID to each worker as assigned work
    • WorkerHandler: Per-request worker that processes individual requests
      • Each request gets its own scheduled WorkerHandler
      • Processes request and finalizes status on EVM
      • Removes itself from tracking after completion
      • Safe to panic, will be recovered by SchedulerHandler
  • Parallel processing: Up to maxProcessingRequests (default: 3) requests processed concurrently
  • FlowTransactionScheduler: Native Flow scheduling infrastructure for automatic execution

✨ New Features

Feature Description
Parallel Processing Multiple requests processed concurrently (up to maxProcessingRequests)
🔄 Dual-Worker System SchedulerHandler orchestrates, WorkerHandler executes
🔍 Preprocessing SchedulerHandler validates requests before scheduling (PENDING → PROCESSING)
🔁 Crash Recovery Detects panicked WorkerHandlers and marks requests as FAILED
📊 Capacity Control Configurable concurrent processing limit (maxProcessingRequests: 3)
🎯 Sequential Offsetting Multiple requests from same EVM address offset sequentially to avoid randomization

🏗️ Architecture Changes

Before: Single Worker

┌─────────────────────────────────────┐
│   Single Interval-Based Worker      │
│   - Runs at fixed interval          │
│   - Processes 1 request at a time   │
│   - Sequential execution            │
└─────────────────────────────────────┘

After: Dual-Worker System

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Flow EVM                                       │
│  ┌──────────────┐         ┌───────────────────────────┐                     │
│  │   EVM User   │────────▶│  FlowYieldVaultsRequests  │                     │
│  │              │◀────────│   (Request Queue + Escrow)│                     │
│  └──────────────┘         └─────────────┬─────────────┘                     │
└─────────────────────────────────────────┼───────────────────────────────────┘
                                          │ COA Bridge
┌─────────────────────────────────────────┼───────────────────────────────────┐
│                              Flow Cadence                                   │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                       FlowYieldVaultsEVM (Worker)                     │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │     FlowYieldVaultsEVMWorkerOps (NEW)                                 │  │
│  │  ┌─────────────────────┐    ┌─────────────────────────────────────┐   │  │
│  │  │   SchedulerHandler  │───▶│         WorkerHandler               │   │  │
│  │  │   (Recurrent)       │    │   (Per-request, scheduled)          │   │  │
│  │  │   - Crash recovery  │    │   - Processes 1 request             │   │  │
│  │  │   - Checks queue    │    │   - Finalizes status                │   │  │
│  │  │   - Preprocesses    │    │                                     │   │  │
│  │  │   - Spawns 0-N      │    │                                     │   │  │
│  │  │     WorkerHandlers  │    │                                     │   │  │
│  │  └─────────────────────┘    └─────────────────────────────────────┘   │  │
│  │                                                                       │  │
│  │  State: scheduledRequests, isSchedulerPaused                          │  │
│  │  Config: schedulerWakeupInterval (1s), maxProcessingRequests (3)      │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Request Processing Flow:

  1. User submits request → FlowYieldVaultsRequests escrows funds (status: PENDING)
  2. SchedulerHandler (runs every 1.0s):
    • Perform crash recovery for failed WorkerHandlers
    • Calculates available capacity: maxProcessingRequests - current_in_flight
    • Fetches pending requests from EVM
    • Calls preprocessRequests() to validate and transition PENDING → PROCESSING
    • Spawns 0 to N WorkerHandlers (one per request, up to capacity limit)
  3. WorkerHandler (scheduled per request):
    • Processes single request via processRequest()
    • Executes Cadence operation (create/deposit/withdraw/close YieldVault)
    • Calls completeProcessing() to mark COMPLETED/FAILED on EVM
    • Removes itself from scheduledRequests tracking

🧪 Testing

New Test Suite: run_worker_tests.sh

Comprehensive E2E tests covering:

  • ✅ SchedulerHandler initialization and auto-scheduling
  • ✅ Automated request processing via FlowTransactionScheduler
  • ✅ Parallel processing (multiple WorkerHandlers running concurrently)
  • ✅ Capacity limits (maxProcessingRequests respected)
  • ✅ Pause/unpause scheduler functionality
  • ✅ Stop all scheduled transactions
  • ✅ Multi-user automated processing
  • ✅ Crash recovery for panicked WorkerHandlers
  • ✅ Sequential offsetting for same-user requests

Run Tests:

./local/setup_and_run_emulator.sh && ./local/deploy_full_stack.sh
./local/run_worker_tests.sh
  • ✅ New CI test added that executes ./local/run_worker_tests.sh.

📖 Documentation Updates

  • Updated README with dual-worker architecture diagram
  • Updated design document with SchedulerHandler/WorkerHandler flow
  • Added worker tests documentation
  • Updated configuration tables

✅ Issues Resolved

@onflow onflow deleted a comment from claude bot Feb 2, 2026
@onflow onflow deleted a comment from claude bot Feb 2, 2026
@onflow onflow deleted a comment from claude bot Feb 4, 2026
nvdtf and others added 2 commits February 4, 2026 15:42
Claude was using `gh api` instead of `gh pr comment` to post reviews,
but `gh api` is not in the allowed tools list, causing the comment
to fail silently.

Changes:
- Add explicit warning to use `gh pr comment` and not `gh api`
- Clarify the command format with a proper code block

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
liobrasil and others added 13 commits February 4, 2026 20:11
Claude was using `gh api` instead of `gh pr comment` to post reviews,
but `gh api` is not in the allowed tools list, causing the comment
to fail silently.

Changes:
- Add explicit warning to use `gh pr comment` and not `gh api`
- Clarify the command format with a proper code block

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 9, 2026

Pull Request Review: New Worker Design

📊 Overview

This PR introduces a significant architectural upgrade, transitioning from a single interval-based worker to a dual-worker system (SchedulerHandler + WorkerHandler) that enables parallel request processing with crash recovery. The changes are well-documented and thoughtfully designed.

Summary:

  • Architecture: Well-designed dual-worker pattern with clear separation of concerns
  • ⚠️ Potential Issues: A few edge cases and timing considerations need attention
  • Code Quality: Clean, well-documented code following Cadence best practices
  • Test Coverage: Comprehensive new test suite in run_worker_tests.sh
  • ⚠️ Performance: Consider tuning parameters for production workloads

🎯 Major Strengths

1. Excellent Architecture Design

The dual-worker pattern is a clean solution:

  • SchedulerHandler: Orchestrates work, handles crash recovery, manages capacity
  • WorkerHandler: Processes individual requests in parallel
  • Clear separation enables horizontal scaling and fault tolerance

2. Robust Crash Recovery

The panic detection mechanism in _checkForFailedWorkerRequests() is well-designed:

  • Monitors transaction status after expected execution time
  • Properly marks panicked requests as FAILED
  • Cleans up state consistently

3. Proper Use of Blockchain Atomicity

The code correctly leverages transaction atomicity:

FlowYieldVaultsEVMWorkerOps.scheduledRequests.remove(key: requestId)
processResult = worker.processRequest(request)  // if this panics, removal reverts too

This pattern is safe and correct per blockchain execution model.

4. Preprocessing Validation

Early validation in preprocessRequests() prevents wasted resources:

  • Validates CREATE_YIELDVAULT parameters before scheduling workers
  • Batches status updates efficiently
  • Fails fast for invalid requests

⚠️ Potential Issues & Recommendations

1. Race Condition in Crash Recovery (Medium Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:454

Issue:

if getCurrentBlock().timestamp <= request.workerScheduledTimestamp {
    continue  // Expected timestamp is not reached yet, skip
}

The condition uses <= which means at exactly workerScheduledTimestamp, the scheduler will skip the check. However, if the worker execution gets delayed by 1 block, it might not have executed yet when the next scheduler run happens.

Recommendation:

// Add a grace period before checking for panics
let gracePeriod = 5.0  // seconds
if getCurrentBlock().timestamp < request.workerScheduledTimestamp + gracePeriod {
    continue
}

This gives workers adequate time to execute before being flagged as panicked.

2. Sequential Offsetting May Cause Unfairness (Low-Medium Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:511-527

Issue:

// Offset delay by user request count
let delay = baseDelay + UFix64(userScheduleOffset[key]!)

If a user has many pending requests, their later requests could be delayed significantly while other users requests are processed immediately. This creates a first-come-first-served bias within a users requests rather than globally.

Recommendation:
Consider a round-robin approach that interleaves requests from different users:

// Pseudo-code:
// Sort requests by (user, timestamp)
// For each batch, take one request per user until capacity filled

3. Missing Validation: Duplicate Request Scheduling (Medium Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:548

Issue:

FlowYieldVaultsEVMWorkerOps.scheduledRequests.insert(key: request.id, scheduledRequest)

There is no check if request.id already exists in scheduledRequests. If a request is somehow scheduled twice (e.g., due to a bug), it could cause inconsistent state.

Recommendation:
Add defensive check:

pre {
    FlowYieldVaultsEVMWorkerOps.scheduledRequests[request.id] == nil:
        "Request \(request.id) is already scheduled"
}

4. Unbounded Loop in Crash Recovery (Low Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:450

Issue:

for requestId in FlowYieldVaultsEVMWorkerOps.scheduledRequests.keys {
    // Check each scheduled request...
}

If scheduledRequests grows large (e.g., during a system pause or high load), this loop could consume significant gas/computation.

Recommendation:
Add a circuit breaker or process in batches:

let maxRecoveryChecks = 20
var checksPerformed = 0
for requestId in FlowYieldVaultsEVMWorkerOps.scheduledRequests.keys {
    if checksPerformed >= maxRecoveryChecks {
        break  // Resume in next scheduler execution
    }
    // ... check logic ...
    checksPerformed++
}

5. Fee Estimation May Underestimate Costs (Low Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:589-596

Issue:

let estimate = FlowTransactionScheduler.estimate(
    data: data,
    timestamp: future,
    priority: priority,
    executionEffort: mediumExecutionEffort
)
let fees <- vaultRef.withdraw(amount: estimate.flowFee ?? 0.0)

Using ?? 0.0 means if estimate.flowFee is nil, no fees are withdrawn. This could cause transaction failures.

Recommendation:

let feeAmount = estimate.flowFee ?? panic("Fee estimation returned nil")
let fees <- vaultRef.withdraw(amount: feeAmount)

6. Potential Memory Leak in stopAll() (Low Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:227-234

Issue:
The stopAll() function iterates over scheduledRequests.keys and removes items during iteration, which is generally safe in Cadence but could miss items if the iteration order changes.

Recommendation:
Collect keys first, then iterate:

let requestIds = FlowYieldVaultsEVMWorkerOps.scheduledRequests.keys
for scheduledRequestId in requestIds {
    // ... cancel and remove ...
}

🔍 Code Quality Observations

Positive:

  1. Excellent documentation - Every function has clear purpose and flow documentation
  2. Comprehensive events - Good observability with detailed event emissions
  3. Proper access controls - Well-structured use of access(self) and access(all)
  4. Type safety - Good use of enums and structs for type-safe state management

Minor Improvements:

  1. Magic Numbers - Some hardcoded values could be constants:
let baseDelay = 1.0  // Line 505 - consider making this configurable
let mediumExecutionEffort = 7500 as UInt64  // Line 587 - add comment explaining value
  1. Error Messages - Some could be more actionable:
// Current:
return "No capacity available"
// Better:
return "No capacity available: \(FlowYieldVaultsEVMWorkerOps.scheduledRequests.length) / \(FlowYieldVaultsEVMWorkerOps.maxProcessingRequests) slots in use"

🧪 Test Coverage Assessment

Strengths:

  • ✅ Comprehensive E2E test suite in run_worker_tests.sh
  • ✅ Tests cover automated processing, parallelism, crash recovery
  • ✅ Multi-user scenarios validated
  • ✅ Pause/unpause functionality tested

Gaps:

  • ⚠️ Edge Case: What happens if SchedulerHandler panics? (who recovers the scheduler?)
  • ⚠️ Load Testing: No tests for behavior at maxProcessingRequests boundary
  • ⚠️ Timing Tests: No validation of sequential offsetting correctness
  • ⚠️ Fee Exhaustion: What happens if FlowToken vault runs out of fees?

🔒 Security Considerations

1. Access Control

  • Proper use of onlyAuthorizedCOA modifier in Solidity
  • Admin functions protected by resource-based access control in Cadence
  • No unauthorized state modification paths identified

2. Reentrancy

  • Solidity uses ReentrancyGuard on all state-modifying functions
  • Cadence transaction atomicity prevents reentrancy by design

3. Denial of Service ⚠️

  • Potential Issue: Malicious user could create maxPendingRequestsPerUser requests and never process them, blocking the queue
  • Mitigation: Current maxPendingRequestsPerUser = 10 limit helps but consider adding request expiration
  • Recommendation: Add admin function to force-expire old pending requests

4. Fund Safety

  • Two-phase commit pattern (startProcessingcompleteProcessing) is sound
  • Refunds properly credited to claimableRefunds on failure
  • Pull-over-push pattern for refunds prevents reentrancy

⚡ Performance Considerations

1. Parallel Processing Capacity

Current: maxProcessingRequests = 3

Recommendation:

  • 3 is conservative for production
  • Consider load testing with higher values (5-10) to find optimal throughput
  • Add monitoring for queue depth vs processing capacity

2. Scheduler Interval

Current: schedulerWakeupInterval = 1.0s

Analysis:

  • 1 second is aggressive and ensures low latency
  • May cause unnecessary scheduler executions if queue is empty
  • Consider dynamic interval based on queue depth

3. Batch Size

let fetchCount = pendingRequestCount > capacity ? capacity : pendingRequestCount

Recommendation:

  • Currently fetches exactly up to capacity, which is optimal
  • Consider prefetching +1 for next round if preprocessing is expensive

🎯 Final Recommendations

Critical (Must Address Before Merge):

  1. ✅ None - No blocking issues identified

High Priority (Should Address):

  1. Add grace period to crash recovery timing check (Issue [DO NOT MERGE] Review current contracts #1)
  2. Add duplicate request scheduling validation (Issue Add Claude Code GitHub Workflow #3)
  3. Fix fee estimation to panic on nil (Issue feat(cadence): add EVM admin functions to Worker resource #5)

Medium Priority (Consider Addressing):

  1. Review sequential offsetting fairness (Issue Add Claude Code GitHub Workflow #2)
  2. Add bounded iteration to crash recovery (Issue Contracts Review Feedback #4)
  3. Add edge case test coverage
  4. Add migration and monitoring documentation

Low Priority (Future Enhancements):

  1. Make baseDelay and execution effort configurable
  2. Add request expiration mechanism for DoS prevention
  3. Consider dynamic scheduler interval based on load
  4. Add performance metrics dashboard

✅ Approval Status

Recommendation: APPROVE with minor changes

This is a well-designed architectural upgrade that significantly improves the systems scalability and fault tolerance. The identified issues are minor and can be addressed either before or shortly after merge. The comprehensive test suite provides good confidence in the implementation.

Suggested Action:

  1. Address high-priority issues ([DO NOT MERGE] Review current contracts #1, Add Claude Code GitHub Workflow #3, feat(cadence): add EVM admin functions to Worker resource #5) before merge
  2. Create follow-up issues for medium-priority items
  3. Plan monitoring strategy for production deployment

Great work on this substantial improvement! 🚀


Reviewed by: Claude Code
Review Date: 2026-02-13
Commit: Latest changes on navid/new-worker-arch branch

…rkflow

ci: restore Claude PR commenting and enforce sticky output

test: trigger claude review on navid branch
@onflow onflow deleted a comment from claude bot Feb 10, 2026
@nvdtf nvdtf marked this pull request as ready for review February 10, 2026 23:33
let future = getCurrentBlock().timestamp + 1.0

// Estimate fees and withdraw payment
let estimate = FlowTransactionScheduler.estimate(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

estimate call can be replaced by a call to cheaper calculateFee, same is possible for other estimate calls here.
https://github.com/onflow/flow-core-contracts/blob/2cbc73095e4a2f2d777190e939b813c704f69897/contracts/FlowTransactionScheduler.cdc#L1465
Currently the flow cli has an older version of FlowTransactionScheduler without that function, but it can be copied:
https://github.com/onflow/flow-core-contracts/blob/2cbc73095e4a2f2d777190e939b813c704f69897/contracts/FlowTransactionScheduler.cdc#L699

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion! I didn't know about it

To verify my understanding I would pass dataSizeInMB:

  • 0 for when data is nil
  • 0.000032 for when I pass requestId (UInt256) because it's 32 bytes

Or is it calculated differently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants