New Worker Design #44

nvdtf · 2026-01-30T02:11:02Z

🚀 Dual-Worker Architecture

Architecture Upgrade: Replaces single interval-based worker with dual-worker system (SchedulerHandler + WorkerHandler) enabling parallel request processing and crash recovery

📋 Summary

This PR migrates the FlowYieldVaultsEVM worker design from a single interval-based worker that processed one request at a time to a dual-worker architecture. The new system enables parallel processing of multiple requests with automatic scheduling and crash recovery.

🔄 What Changed

Before

Single interval-based worker: Processed requests sequentially
- One PENDING request processed per execution
- Fixed interval between executions
- Sequential, one-at-a-time processing
- No panic recovery

After

Dual-worker architecture:
- SchedulerHandler: Recurrent worker that runs at fixed intervals (1.0s)
  - Crash recovery for failed WorkerHandlers
  - Checks for pending requests
  - Preprocesses requests (PENDING → PROCESSING)
  - Spawns 0 to multiple WorkerHandlers based on available capacity and pending requests
  - Passes request ID to each worker as assigned work
- WorkerHandler: Per-request worker that processes individual requests
  - Each request gets its own scheduled WorkerHandler
  - Processes request and finalizes status on EVM
  - Removes itself from tracking after completion
  - Safe to panic, will be recovered by SchedulerHandler
Parallel processing: Up to maxProcessingRequests (default: 3) requests processed concurrently
FlowTransactionScheduler: Native Flow scheduling infrastructure for automatic execution

✨ New Features

Feature	Description
⚡ Parallel Processing	Multiple requests processed concurrently (up to `maxProcessingRequests`)
🔄 Dual-Worker System	SchedulerHandler orchestrates, WorkerHandler executes
🔍 Preprocessing	SchedulerHandler validates requests before scheduling (PENDING → PROCESSING)
🔁 Crash Recovery	Detects panicked WorkerHandlers and marks requests as FAILED
📊 Capacity Control	Configurable concurrent processing limit (`maxProcessingRequests: 3`)
🎯 Sequential Offsetting	Multiple requests from same EVM address offset sequentially to avoid randomization

🏗️ Architecture Changes

Before: Single Worker

┌─────────────────────────────────────┐
│   Single Interval-Based Worker      │
│   - Runs at fixed interval          │
│   - Processes 1 request at a time   │
│   - Sequential execution            │
└─────────────────────────────────────┘

After: Dual-Worker System

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Flow EVM                                       │
│  ┌──────────────┐         ┌───────────────────────────┐                     │
│  │   EVM User   │────────▶│  FlowYieldVaultsRequests  │                     │
│  │              │◀────────│   (Request Queue + Escrow)│                     │
│  └──────────────┘         └─────────────┬─────────────┘                     │
└─────────────────────────────────────────┼───────────────────────────────────┘
                                          │ COA Bridge
┌─────────────────────────────────────────┼───────────────────────────────────┐
│                              Flow Cadence                                   │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                       FlowYieldVaultsEVM (Worker)                     │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │     FlowYieldVaultsEVMWorkerOps (NEW)                                 │  │
│  │  ┌─────────────────────┐    ┌─────────────────────────────────────┐   │  │
│  │  │   SchedulerHandler  │───▶│         WorkerHandler               │   │  │
│  │  │   (Recurrent)       │    │   (Per-request, scheduled)          │   │  │
│  │  │   - Crash recovery  │    │   - Processes 1 request             │   │  │
│  │  │   - Checks queue    │    │   - Finalizes status                │   │  │
│  │  │   - Preprocesses    │    │                                     │   │  │
│  │  │   - Spawns 0-N      │    │                                     │   │  │
│  │  │     WorkerHandlers  │    │                                     │   │  │
│  │  └─────────────────────┘    └─────────────────────────────────────┘   │  │
│  │                                                                       │  │
│  │  State: scheduledRequests, isSchedulerPaused                          │  │
│  │  Config: schedulerWakeupInterval (1s), maxProcessingRequests (3)      │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Request Processing Flow:

User submits request → FlowYieldVaultsRequests escrows funds (status: PENDING)
SchedulerHandler (runs every 1.0s):
- Perform crash recovery for failed WorkerHandlers
- Calculates available capacity: maxProcessingRequests - current_in_flight
- Fetches pending requests from EVM
- Calls preprocessRequests() to validate and transition PENDING → PROCESSING
- Spawns 0 to N WorkerHandlers (one per request, up to capacity limit)
WorkerHandler (scheduled per request):
- Processes single request via processRequest()
- Executes Cadence operation (create/deposit/withdraw/close YieldVault)
- Calls completeProcessing() to mark COMPLETED/FAILED on EVM
- Removes itself from scheduledRequests tracking

🧪 Testing

New Test Suite: `run_worker_tests.sh`

Comprehensive E2E tests covering:

✅ SchedulerHandler initialization and auto-scheduling
✅ Automated request processing via FlowTransactionScheduler
✅ Parallel processing (multiple WorkerHandlers running concurrently)
✅ Capacity limits (maxProcessingRequests respected)
✅ Pause/unpause scheduler functionality
✅ Stop all scheduled transactions
✅ Multi-user automated processing
✅ Crash recovery for panicked WorkerHandlers
✅ Sequential offsetting for same-user requests

Run Tests:

./local/setup_and_run_emulator.sh && ./local/deploy_full_stack.sh
./local/run_worker_tests.sh

✅ New CI test added that executes ./local/run_worker_tests.sh.

📖 Documentation Updates

Updated README with dual-worker architecture diagram
Updated design document with SchedulerHandler/WorkerHandler flow
Added worker tests documentation
Updated configuration tables

✅ Issues Resolved

ci: keep single Claude review comment

Claude was using `gh api` instead of `gh pr comment` to post reviews, but `gh api` is not in the allowed tools list, causing the comment to fail silently. Changes: - Add explicit warning to use `gh pr comment` and not `gh api` - Clarify the command format with a proper code block Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-02-09T17:33:39Z

Pull Request Review: New Worker Design

📊 Overview

This PR introduces a significant architectural upgrade, transitioning from a single interval-based worker to a dual-worker system (SchedulerHandler + WorkerHandler) that enables parallel request processing with crash recovery. The changes are well-documented and thoughtfully designed.

Summary:

✅ Architecture: Well-designed dual-worker pattern with clear separation of concerns
⚠️ Potential Issues: A few edge cases and timing considerations need attention
✅ Code Quality: Clean, well-documented code following Cadence best practices
✅ Test Coverage: Comprehensive new test suite in run_worker_tests.sh
⚠️ Performance: Consider tuning parameters for production workloads

🎯 Major Strengths

1. Excellent Architecture Design

The dual-worker pattern is a clean solution:

SchedulerHandler: Orchestrates work, handles crash recovery, manages capacity
WorkerHandler: Processes individual requests in parallel
Clear separation enables horizontal scaling and fault tolerance

2. Robust Crash Recovery

The panic detection mechanism in _checkForFailedWorkerRequests() is well-designed:

Monitors transaction status after expected execution time
Properly marks panicked requests as FAILED
Cleans up state consistently

3. Proper Use of Blockchain Atomicity

The code correctly leverages transaction atomicity:

FlowYieldVaultsEVMWorkerOps.scheduledRequests.remove(key: requestId)
processResult = worker.processRequest(request)  // if this panics, removal reverts too

This pattern is safe and correct per blockchain execution model.

4. Preprocessing Validation

Early validation in preprocessRequests() prevents wasted resources:

Validates CREATE_YIELDVAULT parameters before scheduling workers
Batches status updates efficiently
Fails fast for invalid requests

⚠️ Potential Issues & Recommendations

1. Race Condition in Crash Recovery (Medium Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:454

Issue:

if getCurrentBlock().timestamp <= request.workerScheduledTimestamp {
    continue  // Expected timestamp is not reached yet, skip
}

The condition uses <= which means at exactly workerScheduledTimestamp, the scheduler will skip the check. However, if the worker execution gets delayed by 1 block, it might not have executed yet when the next scheduler run happens.

Recommendation:

// Add a grace period before checking for panics
let gracePeriod = 5.0  // seconds
if getCurrentBlock().timestamp < request.workerScheduledTimestamp + gracePeriod {
    continue
}

This gives workers adequate time to execute before being flagged as panicked.

2. Sequential Offsetting May Cause Unfairness (Low-Medium Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:511-527

Issue:

// Offset delay by user request count
let delay = baseDelay + UFix64(userScheduleOffset[key]!)

If a user has many pending requests, their later requests could be delayed significantly while other users requests are processed immediately. This creates a first-come-first-served bias within a users requests rather than globally.

Recommendation:
Consider a round-robin approach that interleaves requests from different users:

// Pseudo-code:
// Sort requests by (user, timestamp)
// For each batch, take one request per user until capacity filled

3. Missing Validation: Duplicate Request Scheduling (Medium Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:548

Issue:

FlowYieldVaultsEVMWorkerOps.scheduledRequests.insert(key: request.id, scheduledRequest)

There is no check if request.id already exists in scheduledRequests. If a request is somehow scheduled twice (e.g., due to a bug), it could cause inconsistent state.

Recommendation:
Add defensive check:

pre {
    FlowYieldVaultsEVMWorkerOps.scheduledRequests[request.id] == nil:
        "Request \(request.id) is already scheduled"
}

4. Unbounded Loop in Crash Recovery (Low Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:450

Issue:

for requestId in FlowYieldVaultsEVMWorkerOps.scheduledRequests.keys {
    // Check each scheduled request...
}

If scheduledRequests grows large (e.g., during a system pause or high load), this loop could consume significant gas/computation.

Recommendation:
Add a circuit breaker or process in batches:

let maxRecoveryChecks = 20
var checksPerformed = 0
for requestId in FlowYieldVaultsEVMWorkerOps.scheduledRequests.keys {
    if checksPerformed >= maxRecoveryChecks {
        break  // Resume in next scheduler execution
    }
    // ... check logic ...
    checksPerformed++
}

5. Fee Estimation May Underestimate Costs (Low Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:589-596

Issue:

let estimate = FlowTransactionScheduler.estimate(
    data: data,
    timestamp: future,
    priority: priority,
    executionEffort: mediumExecutionEffort
)
let fees <- vaultRef.withdraw(amount: estimate.flowFee ?? 0.0)

Using ?? 0.0 means if estimate.flowFee is nil, no fees are withdrawn. This could cause transaction failures.

Recommendation:

let feeAmount = estimate.flowFee ?? panic("Fee estimation returned nil")
let fees <- vaultRef.withdraw(amount: feeAmount)

6. Potential Memory Leak in stopAll() (Low Severity)

Location: FlowYieldVaultsEVMWorkerOps.cdc:227-234

Issue:
The stopAll() function iterates over scheduledRequests.keys and removes items during iteration, which is generally safe in Cadence but could miss items if the iteration order changes.

Recommendation:
Collect keys first, then iterate:

let requestIds = FlowYieldVaultsEVMWorkerOps.scheduledRequests.keys
for scheduledRequestId in requestIds {
    // ... cancel and remove ...
}

🔍 Code Quality Observations

Positive:

Excellent documentation - Every function has clear purpose and flow documentation
Comprehensive events - Good observability with detailed event emissions
Proper access controls - Well-structured use of access(self) and access(all)
Type safety - Good use of enums and structs for type-safe state management

Minor Improvements:

Magic Numbers - Some hardcoded values could be constants:

let baseDelay = 1.0  // Line 505 - consider making this configurable
let mediumExecutionEffort = 7500 as UInt64  // Line 587 - add comment explaining value

Error Messages - Some could be more actionable:

// Current:
return "No capacity available"
// Better:
return "No capacity available: \(FlowYieldVaultsEVMWorkerOps.scheduledRequests.length) / \(FlowYieldVaultsEVMWorkerOps.maxProcessingRequests) slots in use"

🧪 Test Coverage Assessment

Strengths:

✅ Comprehensive E2E test suite in run_worker_tests.sh
✅ Tests cover automated processing, parallelism, crash recovery
✅ Multi-user scenarios validated
✅ Pause/unpause functionality tested

Gaps:

⚠️ Edge Case: What happens if SchedulerHandler panics? (who recovers the scheduler?)
⚠️ Load Testing: No tests for behavior at maxProcessingRequests boundary
⚠️ Timing Tests: No validation of sequential offsetting correctness
⚠️ Fee Exhaustion: What happens if FlowToken vault runs out of fees?

🔒 Security Considerations

1. Access Control ✅

Proper use of onlyAuthorizedCOA modifier in Solidity
Admin functions protected by resource-based access control in Cadence
No unauthorized state modification paths identified

2. Reentrancy ✅

Solidity uses ReentrancyGuard on all state-modifying functions
Cadence transaction atomicity prevents reentrancy by design

3. Denial of Service ⚠️

Potential Issue: Malicious user could create maxPendingRequestsPerUser requests and never process them, blocking the queue
Mitigation: Current maxPendingRequestsPerUser = 10 limit helps but consider adding request expiration
Recommendation: Add admin function to force-expire old pending requests

4. Fund Safety ✅

Two-phase commit pattern (startProcessing → completeProcessing) is sound
Refunds properly credited to claimableRefunds on failure
Pull-over-push pattern for refunds prevents reentrancy

⚡ Performance Considerations

1. Parallel Processing Capacity

Current: maxProcessingRequests = 3

Recommendation:

3 is conservative for production
Consider load testing with higher values (5-10) to find optimal throughput
Add monitoring for queue depth vs processing capacity

2. Scheduler Interval

Current: schedulerWakeupInterval = 1.0s

Analysis:

1 second is aggressive and ensures low latency
May cause unnecessary scheduler executions if queue is empty
Consider dynamic interval based on queue depth

3. Batch Size

let fetchCount = pendingRequestCount > capacity ? capacity : pendingRequestCount

Recommendation:

Currently fetches exactly up to capacity, which is optimal
Consider prefetching +1 for next round if preprocessing is expensive

🎯 Final Recommendations

Critical (Must Address Before Merge):

✅ None - No blocking issues identified

High Priority (Should Address):

Add grace period to crash recovery timing check (Issue [DO NOT MERGE] Review current contracts #1)
Add duplicate request scheduling validation (Issue Add Claude Code GitHub Workflow #3)
Fix fee estimation to panic on nil (Issue feat(cadence): add EVM admin functions to Worker resource #5)

Medium Priority (Consider Addressing):

Review sequential offsetting fairness (Issue Add Claude Code GitHub Workflow #2)
Add bounded iteration to crash recovery (Issue Contracts Review Feedback #4)
Add edge case test coverage
Add migration and monitoring documentation

Low Priority (Future Enhancements):

Make baseDelay and execution effort configurable
Add request expiration mechanism for DoS prevention
Consider dynamic scheduler interval based on load
Add performance metrics dashboard

✅ Approval Status

Recommendation: APPROVE with minor changes

This is a well-designed architectural upgrade that significantly improves the systems scalability and fault tolerance. The identified issues are minor and can be addressed either before or shortly after merge. The comprehensive test suite provides good confidence in the implementation.

Suggested Action:

Address high-priority issues ([DO NOT MERGE] Review current contracts #1, Add Claude Code GitHub Workflow #3, feat(cadence): add EVM admin functions to Worker resource #5) before merge
Create follow-up issues for medium-priority items
Plan monitoring strategy for production deployment

Great work on this substantial improvement! 🚀

Reviewed by: Claude Code
Review Date: 2026-02-13
Commit: Latest changes on navid/new-worker-arch branch

…rkflow ci: restore Claude PR commenting and enforce sticky output test: trigger claude review on navid branch

holyfuchs · 2026-02-12T19:02:56Z

cadence/transactions/scheduler/init_and_schedule.cdc

+    let future = getCurrentBlock().timestamp + 1.0
+
+    // Estimate fees and withdraw payment
+    let estimate = FlowTransactionScheduler.estimate(


estimate call can be replaced by a call to cheaper calculateFee, same is possible for other estimate calls here.
https://github.com/onflow/flow-core-contracts/blob/2cbc73095e4a2f2d777190e939b813c704f69897/contracts/FlowTransactionScheduler.cdc#L1465
Currently the flow cli has an older version of FlowTransactionScheduler without that function, but it can be copied:
https://github.com/onflow/flow-core-contracts/blob/2cbc73095e4a2f2d777190e939b813c704f69897/contracts/FlowTransactionScheduler.cdc#L699

Nice suggestion! I didn't know about it

To verify my understanding I would pass dataSizeInMB:

0 for when data is nil

0.000032 for when I pass requestId (UInt256) because it's 32 bytes

Or is it calculated differently?

nvdtf and others added 7 commits January 29, 2026 18:08

initial PR

af6914c

Merge branch 'main' into navid/new-worker-arch

81a7175

fix todo

1060fb3

fix issues

b9fd7df

ci: keep single Claude review comment

22663d5

ci: use workflow token for Claude review

f4f91c5

Merge pull request #45 from onflow/ci/claude-review-single-comment

bc89249

ci: keep single Claude review comment

onflow deleted a comment from claude bot Feb 2, 2026

nvdtf added 4 commits February 2, 2026 14:46

evm wip

4eda218

Fix TODOs

eb7c641

update transactions & scripts

780ddee

update docs

c635ffa

onflow deleted a comment from claude bot Feb 4, 2026

nvdtf and others added 2 commits February 4, 2026 15:42

fix tests

539863a

liobrasil mentioned this pull request Feb 5, 2026

ci: fix Claude review not posting comments #47

Merged

3 tasks

liobrasil and others added 13 commits February 4, 2026 20:11

Merge pull request #47 from onflow/fix/claude-review-use-gh-pr-comment

c03cc23

Merge branch 'main' into navid/new-worker-arch

964f943

update submodule

b9405b7

Allow cat for Claude review comment

19f184d

refactor preprocess

cafc7fd

test ci fix

2fcb76d

ci wip

996b04c

ci wip

5574f7b

Use sticky Claude review comment

be9d9dd

ci wip

0e12b1e

ci wip

1b904e5

revert ci changes

7097164

nvdtf added 4 commits February 4, 2026 18:47

revert ci changes

346ab84

add test

e38dc44

docs update

dd24d8f

add some events

10ffbf9

liobrasil mentioned this pull request Feb 9, 2026

fix: use Claude app auth for code review sticky comments #49

Merged

fix: use Claude app auth for sticky comment support in code review wo…

ed3c0aa

…rkflow ci: restore Claude PR commenting and enforce sticky output test: trigger claude review on navid branch

liobrasil force-pushed the navid/new-worker-arch branch from 5f6fdbe to ed3c0aa Compare February 9, 2026 20:57

liobrasil force-pushed the main branch from 886da6d to 2783280 Compare February 9, 2026 21:06

liobrasil and others added 3 commits February 9, 2026 17:14

ci: retrigger PR #44

a5b088e

add state atomicity

9874783

various fixes

db3fa91

onflow deleted a comment from claude bot Feb 10, 2026

nvdtf added 2 commits February 10, 2026 13:49

Update lib/FlowYieldVaults to latest

4df2560

add CI job for worker test

cfef4c0

nvdtf marked this pull request as ready for review February 10, 2026 23:33

nvdtf added 2 commits February 10, 2026 16:46

improvements

7338b0d

add more tests

faa9e37

nvdtf requested a review from liobrasil February 11, 2026 02:27

vishalchangrani requested a review from holyfuchs February 12, 2026 17:34

Merge branch 'main' into navid/new-worker-arch

f15883e

holyfuchs reviewed Feb 12, 2026

View reviewed changes

testnet deployment

2218e2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Worker Design #44

New Worker Design #44

Uh oh!

nvdtf commented Jan 30, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

holyfuchs Feb 12, 2026

Uh oh!

nvdtf Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New Worker Design #44

Are you sure you want to change the base?

New Worker Design #44

Uh oh!

Conversation

nvdtf commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Dual-Worker Architecture

📋 Summary

🔄 What Changed

Before

After

✨ New Features

🏗️ Architecture Changes

Before: Single Worker

After: Dual-Worker System

🧪 Testing

New Test Suite: run_worker_tests.sh

📖 Documentation Updates

✅ Issues Resolved

Uh oh!

claude bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Review: New Worker Design

📊 Overview

🎯 Major Strengths

1. Excellent Architecture Design

2. Robust Crash Recovery

3. Proper Use of Blockchain Atomicity

4. Preprocessing Validation

⚠️ Potential Issues & Recommendations

1. Race Condition in Crash Recovery (Medium Severity)

2. Sequential Offsetting May Cause Unfairness (Low-Medium Severity)

3. Missing Validation: Duplicate Request Scheduling (Medium Severity)

4. Unbounded Loop in Crash Recovery (Low Severity)

5. Fee Estimation May Underestimate Costs (Low Severity)

6. Potential Memory Leak in stopAll() (Low Severity)

🔍 Code Quality Observations

Positive:

Minor Improvements:

🧪 Test Coverage Assessment

Strengths:

Gaps:

🔒 Security Considerations

1. Access Control ✅

2. Reentrancy ✅

3. Denial of Service ⚠️

4. Fund Safety ✅

⚡ Performance Considerations

1. Parallel Processing Capacity

2. Scheduler Interval

3. Batch Size

🎯 Final Recommendations

Critical (Must Address Before Merge):

High Priority (Should Address):

Medium Priority (Consider Addressing):

Low Priority (Future Enhancements):

✅ Approval Status

Uh oh!

holyfuchs Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

nvdtf Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvdtf commented Jan 30, 2026 •

edited

Loading

New Test Suite: `run_worker_tests.sh`

claude bot commented Feb 9, 2026 •

edited

Loading