Skip to content

New Worker Design#44

Open
nvdtf wants to merge 42 commits intomainfrom
navid/new-worker-arch
Open

New Worker Design#44
nvdtf wants to merge 42 commits intomainfrom
navid/new-worker-arch

Conversation

@nvdtf
Copy link
Member

@nvdtf nvdtf commented Jan 30, 2026

🚀 Dual-Worker Architecture

Architecture Upgrade: Replaces single interval-based worker with dual-worker system (SchedulerHandler + WorkerHandler) enabling parallel request processing and crash recovery


✅ Issues Resolved


📋 Summary

This PR migrates the FlowYieldVaultsEVM worker design from a single interval-based worker that processed one request at a time to a dual-worker architecture. The new system enables parallel processing of multiple requests with automatic scheduling and crash recovery.


🔄 What Changed

Before

  • Single interval-based worker: Processed requests sequentially
    • One PENDING request processed per execution
    • Fixed interval between executions
    • Sequential, one-at-a-time processing
    • No panic recovery

After

  • Dual-worker architecture:
    • SchedulerHandler: Recurrent worker that runs at fixed intervals (1.0s)
      • Crash recovery for failed WorkerHandlers
      • Checks for pending requests
      • Preprocesses requests (PENDING → PROCESSING)
      • Spawns 0 to multiple WorkerHandlers based on available capacity and pending requests
      • Passes request ID to each worker as assigned work
    • WorkerHandler: Per-request worker that processes individual requests
      • Each request gets its own scheduled WorkerHandler
      • Processes request and finalizes status on EVM
      • Removes itself from tracking after completion
      • Safe to panic, will be recovered by SchedulerHandler
  • Parallel processing: Up to maxProcessingRequests (default: 3) requests processed concurrently
  • FlowTransactionScheduler: Native Flow scheduling infrastructure for automatic execution

✨ New Features

Feature Description
Parallel Processing Multiple requests processed concurrently (up to maxProcessingRequests)
🔄 Dual-Worker System SchedulerHandler orchestrates, WorkerHandler executes
🔍 Preprocessing SchedulerHandler validates requests before scheduling (PENDING → PROCESSING)
🔁 Crash Recovery Detects panicked WorkerHandlers and marks requests as FAILED
📊 Capacity Control Configurable concurrent processing limit (maxProcessingRequests: 3)
🎯 Sequential Offsetting Multiple requests from same EVM address offset sequentially to avoid randomization

🏗️ Architecture Changes

Before: Single Worker

┌─────────────────────────────────────┐
│   Single Interval-Based Worker      │
│   - Runs at fixed interval          │
│   - Processes 1 request at a time   │
│   - Sequential execution            │
└─────────────────────────────────────┘

After: Dual-Worker System

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Flow EVM                                       │
│  ┌──────────────┐         ┌───────────────────────────┐                     │
│  │   EVM User   │────────▶│  FlowYieldVaultsRequests  │                     │
│  │              │◀────────│   (Request Queue + Escrow)│                     │
│  └──────────────┘         └─────────────┬─────────────┘                     │
└─────────────────────────────────────────┼───────────────────────────────────┘
                                          │ COA Bridge
┌─────────────────────────────────────────┼───────────────────────────────────┐
│                              Flow Cadence                                   │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                       FlowYieldVaultsEVM (Worker)                     │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │     FlowYieldVaultsEVMWorkerOps (NEW)                                 │  │
│  │  ┌─────────────────────┐    ┌─────────────────────────────────────┐   │  │
│  │  │   SchedulerHandler  │───▶│         WorkerHandler               │   │  │
│  │  │   (Recurrent)       │    │   (Per-request, scheduled)          │   │  │
│  │  │   - Crash recovery  │    │   - Processes 1 request             │   │  │
│  │  │   - Checks queue    │    │   - Finalizes status                │   │  │
│  │  │   - Preprocesses    │    │                                     │   │  │
│  │  │   - Spawns 0-N      │    │                                     │   │  │
│  │  │     WorkerHandlers  │    │                                     │   │  │
│  │  └─────────────────────┘    └─────────────────────────────────────┘   │  │
│  │                                                                       │  │
│  │  State: scheduledRequests, isSchedulerPaused                          │  │
│  │  Config: schedulerWakeupInterval (1s), maxProcessingRequests (3)      │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Request Processing Flow:

  1. User submits request → FlowYieldVaultsRequests escrows funds (status: PENDING)
  2. SchedulerHandler (runs every 1.0s):
    • Perform crash recovery for failed WorkerHandlers
    • Calculates available capacity: maxProcessingRequests - current_in_flight
    • Fetches pending requests from EVM
    • Calls preprocessRequests() to validate and transition PENDING → PROCESSING
    • Spawns 0 to N WorkerHandlers (one per request, up to capacity limit)
  3. WorkerHandler (scheduled per request):
    • Processes single request via processRequest()
    • Executes Cadence operation (create/deposit/withdraw/close YieldVault)
    • Calls completeProcessing() to mark COMPLETED/FAILED on EVM
    • Removes itself from scheduledRequests tracking

🧪 Testing

New Test Suite: run_worker_tests.sh

Comprehensive E2E tests covering:

  • ✅ SchedulerHandler initialization and auto-scheduling
  • ✅ Automated request processing via FlowTransactionScheduler
  • ✅ Parallel processing (multiple WorkerHandlers running concurrently)
  • ✅ Capacity limits (maxProcessingRequests respected)
  • ✅ Pause/unpause scheduler functionality
  • ✅ Stop all scheduled transactions
  • ✅ Multi-user automated processing
  • ✅ Crash recovery for panicked WorkerHandlers
  • ✅ Sequential offsetting for same-user requests

Run Tests:

./local/setup_and_run_emulator.sh && ./local/deploy_full_stack.sh
./local/run_worker_tests.sh
  • ✅ New CI test added that executes ./local/run_worker_tests.sh.

📖 Documentation Updates

  • Updated README with dual-worker architecture diagram
  • Updated design document with SchedulerHandler/WorkerHandler flow
  • Added worker tests documentation
  • Updated configuration tables

@onflow onflow deleted a comment from claude bot Feb 2, 2026
@onflow onflow deleted a comment from claude bot Feb 2, 2026
@onflow onflow deleted a comment from claude bot Feb 4, 2026
nvdtf and others added 2 commits February 4, 2026 15:42
Claude was using `gh api` instead of `gh pr comment` to post reviews,
but `gh api` is not in the allowed tools list, causing the comment
to fail silently.

Changes:
- Add explicit warning to use `gh pr comment` and not `gh api`
- Clarify the command format with a proper code block

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
liobrasil and others added 13 commits February 4, 2026 20:11
Claude was using `gh api` instead of `gh pr comment` to post reviews,
but `gh api` is not in the allowed tools list, causing the comment
to fail silently.

Changes:
- Add explicit warning to use `gh pr comment` and not `gh api`
- Clarify the command format with a proper code block

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 9, 2026

Pull Request Review: Dual-Worker Architecture

Summary

This PR introduces a significant architectural upgrade by replacing the single interval-based worker with a dual-worker system consisting of a SchedulerHandler (recurrent orchestrator) and WorkerHandler (per-request processor). The implementation enables parallel request processing, crash recovery, and improved throughput.

Overall Assessment: ✅ APPROVED with minor suggestions

The implementation is well-designed, thoroughly tested, and follows blockchain best practices. The code quality is excellent with comprehensive documentation and proper error handling.


🎯 Architecture Review

Strengths

  1. Clean Separation of Concerns

    • SchedulerHandler: Orchestration, capacity management, crash recovery
    • WorkerHandler: Individual request processing
    • Clear contract boundaries between FlowYieldVaultsEVM.cdc and FlowYieldVaultsEVMWorkerOps.cdc
  2. Robust State Machine

    • Well-defined request lifecycle: PENDING → PROCESSING → COMPLETED/FAILED
    • Atomic transitions prevent race conditions
    • Two-phase commit pattern (startProcessing/completeProcessing) ensures consistency
  3. Crash Recovery

    • SchedulerHandler monitors WorkerHandler transaction status
    • Automatic failure detection when transaction status is not "Scheduled" after expected timestamp
    • Proper cleanup of failed requests (FlowYieldVaultsEVMWorkerOps.cdc:446-486)
  4. Capacity Management

    • Configurable maxProcessingRequests (default: 3)
    • Dynamic capacity calculation: capacity = maxProcessingRequests - scheduledRequests.length
    • Sequential offsetting for same-user requests prevents randomization

🔒 Security Analysis

✅ Security Strengths

  1. Transaction Atomicity (Critical)

    • All operations are atomic per blockchain execution model
    • If processRequest() panics, all state changes revert including removal from scheduledRequests
    • Two-phase commit prevents partial state corruption
  2. Access Control

    • Admin-only operations properly gated (pause/unpause/stopAll)
    • COA-only operations in Solidity contract
    • No privilege escalation vectors identified
  3. Reentrancy Protection

    • Solidity contract uses ReentrancyGuard on all external functions
    • State updates follow checks-effects-interactions pattern
    • Fund transfers happen after state changes
  4. Sentinel Value Handling

    • NO_YIELDVAULT_ID = type(uint64).max properly validated
    • NATIVE_FLOW = 0xFFF...FFF correctly identified
    • Prevents sentinel value registration (FlowYieldVaultsRequests.sol:1660)

🔍 Security Observations (Not Issues)

  1. Preprocessing Batch Size

    • Line FlowYieldVaultsEVMWorkerOps.cdc:405: fetchCount = min(pendingRequestCount, capacity)
    • Good: Prevents overloading scheduler with too many requests
    • Consideration: Large pending queues will take multiple scheduler cycles to clear
  2. WorkerHandler Panic Detection

    • Lines 446-486: Relies on timestamp check + transaction status
    • Works correctly because transactions execute sequentially
    • Alternative: Could add explicit timeout parameter, but current approach is simpler
  3. Fund Handling in completeProcessing

    • Lines 1008-1038 in FlowYieldVaultsRequests.sol
    • Native FLOW: Must send exact amount via msg.value
    • ERC20: Pull pattern (requires approval)
    • Good: Forces COA to return funds on failure, preventing fund loss

🐛 Potential Issues

⚠️ Minor Issues

  1. Scheduler Transaction Fee Management (FlowYieldVaultsEVMWorkerOps.cdc:589-607)

    Code:

    let estimate = FlowTransactionScheduler.estimate(...)
    let fees <- vaultRef.withdraw(amount: estimate.flowFee ?? 0.0)

    Observation: Uses ?? 0.0 fallback for nil fee estimates

    Impact: Low - Fee estimation should always return value in production

    Suggestion: Consider adding precondition to fail fast if estimate.flowFee is nil

  2. Global Pending Array Removal (FlowYieldVaultsRequests.sol:1816-1834)

    Code:

    for (uint256 j = indexInGlobal; j < globalLength - 1; ) {
        pendingRequestIds[j] = pendingRequestIds[j + 1];
        _requestIndexInGlobalArray[pendingRequestIds[j]] = j;
        ...
    }

    Observation: O(n) complexity for array shifts to maintain FIFO order

    Impact: Gas costs increase linearly with queue size

    Assessment: This is intentional and documented (line 1815: "FIFO order is critical for DeFi fairness")

    Suggestion: Document maximum expected queue size in CLAUDE.md or deployment docs


📊 Code Quality

Excellent Practices

  1. Documentation

    • Comprehensive inline comments explaining design decisions
    • Clear function documentation with @notice/@dev/@param tags
    • Architecture diagrams in PR description
    • Updated CLAUDE.md with critical blockchain execution model notes
  2. Testing (run_worker_tests.sh)

    • 1358 lines of comprehensive E2E tests
    • Tests automated processing, pause/unpause, crash recovery
    • Multi-user scenarios
    • Sequential offsetting verification
    • Excellent: New CI workflow added (.github/workflows/worker_tests.yml)
  3. Error Messages

    • Descriptive error messages throughout
    • Event emissions for debugging (WorkerHandlerPanicDetected, SchedulerQueueUpdated)
    • Proper failure propagation
  4. Gas Optimization

    • Uses unchecked blocks appropriately for loop increments
    • Minimizes storage reads via local variables
    • Efficient data structures (mappings for O(1) lookups)

Minor Code Style Suggestions

  1. Magic Numbers (FlowYieldVaultsEVMWorkerOps.cdc:505, 527)

    let baseDelay = 1.0
    let delay = baseDelay + UFix64(userScheduleOffset[key]!)

    Suggestion: Consider making baseDelay a configurable constant or contract variable

  2. Event Parameter Ordering (FlowYieldVaultsEVMWorkerOps.cdc:106-109)

    access(all) event SchedulerHandlerExecuted(
        transactionId: UInt64,
        nextTransactionId: UInt64,
        message: String,
    )

    Observation: Trailing comma after last parameter

    Impact: None (valid Cadence syntax)

    Suggestion: Remove for consistency (other events do not have trailing commas)


🧪 Testing Review

Test Coverage: Excellent ✅

From run_worker_tests.sh and PR description:

  • ✅ SchedulerHandler initialization
  • ✅ Automated request processing
  • ✅ Parallel processing (multiple WorkerHandlers)
  • ✅ Capacity limits respected
  • ✅ Pause/unpause functionality
  • ✅ Stop all scheduled transactions
  • ✅ Multi-user scenarios
  • ✅ Crash recovery
  • ✅ Sequential offsetting

New CI Workflow: .github/workflows/worker_tests.yml added - excellent addition!

Testing Suggestions

  1. Edge Case: Test behavior when maxProcessingRequests is reduced while requests are in-flight
  2. Stress Test: Test with queue size > maxProcessingRequests to verify batch processing
  3. Race Condition: Test rapid pause/unpause cycles to verify state consistency

📝 Documentation Review

Strengths

  1. CLAUDE.md Updates

    • Added "Blockchain Execution Model" section explaining transaction atomicity and sequential execution
    • Clear explanation of why panic-based error handling is correct
    • Updated architecture section with Scheduler/Worker split
  2. Contract Documentation

    • FlowYieldVaultsEVMWorkerOps.cdc has comprehensive module-level documentation (lines 7-43)
    • Clear explanation of state transitions
    • Flow diagrams in comments
  3. PR Description

    • Excellent "Before/After" comparison
    • Architecture diagrams
    • Clear migration notes

Documentation Suggestions

  1. Add to CLAUDE.md: Maximum recommended maxProcessingRequests value based on block gas limits
  2. Add to CLAUDE.md: Expected scheduler wakeup interval ranges and tradeoffs
  3. Add Example: Document how to manually trigger crash recovery if needed

🔄 Migration & Deployment

Deployment Checklist

  • ✅ New contract FlowYieldVaultsEVMWorkerOps.cdc added to flow.json
  • ✅ Existing contract FlowYieldVaultsEVM.cdc modified
  • ✅ Removed old FlowYieldVaultsTransactionHandler.cdc
  • ✅ Updated deployment scripts
  • ✅ Testnet addresses updated in CLAUDE.md

Migration Concerns: None Critical

The deployment appears to be a fresh deployment rather than an upgrade, so no migration path needed.


🎯 Performance Analysis

Improvements

  1. Parallel Processing: Up to 3x throughput (default maxProcessingRequests = 3)
  2. Crash Recovery: No manual intervention needed for panicked transactions
  3. Queue Management: Preprocessing moves invalid requests to FAILED early

Performance Characteristics

  • Best Case: All requests valid, full parallelism → ~3 requests/interval
  • Worst Case: All requests invalid → Still processed and marked FAILED efficiently
  • Scheduler Overhead: Runs every 1.0s, low overhead (just checks + scheduling)

Suggestions

  1. Metrics: Consider adding metrics/events for scheduler cycle duration
  2. Tuning: Document how to tune schedulerWakeupInterval vs maxProcessingRequests for different load patterns

🚀 Final Recommendations

Required Before Merge: None

All critical concerns are addressed. The implementation is production-ready.

Suggested Improvements (Non-blocking)

  1. Documentation: Add tuning guide for maxProcessingRequests and schedulerWakeupInterval
  2. Testing: Add stress test for large queue sizes (>10 pending requests)
  3. Monitoring: Consider adding scheduler health metrics event
  4. Code Style: Remove trailing commas from events for consistency

Post-Merge Follow-up

  1. Monitor scheduler execution timing in testnet
  2. Tune maxProcessingRequests based on observed block gas limits
  3. Consider making baseDelay configurable if offsetting needs tuning

📌 Summary Checklist

Category Status Notes
Architecture ✅ Excellent Clean separation, robust state machine
Security ✅ Strong Atomic transactions, proper access control, reentrancy protection
Code Quality ✅ Excellent Well-documented, proper error handling
Testing ✅ Comprehensive E2E tests cover all scenarios, new CI workflow
Performance ✅ Improved 3x throughput with crash recovery
Documentation ✅ Thorough Updated CLAUDE.md, inline comments, PR description
Migration ✅ Clean Fresh deployment, no migration concerns

Overall Grade: A+

This PR represents a significant improvement to the FlowYieldVaultsEVM architecture. The dual-worker design is well-executed with proper attention to blockchain-specific concerns like atomicity, crash recovery, and FIFO ordering. The testing is comprehensive and documentation is excellent.

Recommendation: APPROVED ✅

Great work on this architectural upgrade! 🎉


Review generated by Claude Code on 2026-02-13

…rkflow

ci: restore Claude PR commenting and enforce sticky output

test: trigger claude review on navid branch
@onflow onflow deleted a comment from claude bot Feb 10, 2026
@nvdtf nvdtf marked this pull request as ready for review February 10, 2026 23:33
let future = getCurrentBlock().timestamp + 1.0

// Estimate fees and withdraw payment
let estimate = FlowTransactionScheduler.estimate(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

estimate call can be replaced by a call to cheaper calculateFee, same is possible for other estimate calls here.
https://github.com/onflow/flow-core-contracts/blob/2cbc73095e4a2f2d777190e939b813c704f69897/contracts/FlowTransactionScheduler.cdc#L1465
Currently the flow cli has an older version of FlowTransactionScheduler without that function, but it can be copied:
https://github.com/onflow/flow-core-contracts/blob/2cbc73095e4a2f2d777190e939b813c704f69897/contracts/FlowTransactionScheduler.cdc#L699

Copy link
Member Author

@nvdtf nvdtf Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion! I didn't know about it, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the current version of FlowTransactionScheduler in emulator doesn't support this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants