Skip to content

Conversation

@h30s
Copy link

@h30s h30s commented Jan 30, 2026

Summary

Fixes a race condition where a resume request could fail with "Paused execution not found" if it arrived immediately after a workflow paused but before the paused state was fully persisted.

This PR ensures atomic persistence of paused executions so resume requests are handled reliably, even under high-throughput or near-simultaneous pause/resume scenarios.

Fixes #3081


Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

How this was tested

  • Added unit tests covering the pause–resume race condition
  • Verified that database operations are executed within a transaction
  • Ensured queued resume requests are processed only after the transaction commits

What reviewers should focus on

  • Transaction wrapping in persistPauseResult
  • Correct sequencing of processQueuedResumes
  • Test coverage for concurrent pause/resume scenarios

Test Results

  • All unit tests passing (Vitest)
  • No TypeScript or linting errors
  • No behavioral changes outside pause/resume flow

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the
    Contributor License Agreement (CLA)

Screenshots/Videos

Not applicable — backend-only change with no UI impact.

@vercel
Copy link

vercel bot commented Jan 30, 2026

@h30s is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

Fixed a race condition in pause/resume flow where resume requests could fail with "Paused execution not found" if they arrived before the paused state was fully persisted to the database.

Key Changes:

  • Wrapped the database insert/update in persistPauseResult within a transaction (line 125-161)
  • Moved processQueuedResumes call after transaction commits to ensure row visibility
  • Added unit tests verifying transaction usage and call sequencing

How it works:
The transaction ensures atomic persistence of paused executions. When a concurrent resume request arrives via enqueueOrStartResume, it uses SELECT ... FOR UPDATE which will wait for the transaction to commit, preventing the "not found" error. After the transaction commits and the row is visible, queued resumes are processed.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it's a focused bug fix with proper transaction handling
  • The implementation correctly addresses the race condition by wrapping the database operation in a transaction, ensuring atomic visibility. The fix is minimal, focused, and follows the existing transaction patterns in the codebase. Unit tests verify the correct behavior.
  • No files require special attention

Important Files Changed

Filename Overview
apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts Wrapped pause persistence in transaction to prevent race condition where resume requests fail if they arrive before pause state is committed
apps/sim/lib/workflows/executor/pause-resume-race-condition.test.ts Added unit tests verifying transaction usage and processQueuedResumes call sequence

Sequence Diagram

sequenceDiagram
    participant WF as Workflow Executor
    participant PM as PauseResumeManager
    participant DB as Database
    participant API as Resume API
    participant Queue as processQueuedResumes

    Note over WF,Queue: Race Condition Scenario (Fixed)

    WF->>PM: persistPauseResult()
    activate PM
    
    PM->>DB: BEGIN TRANSACTION
    activate DB
    
    PM->>DB: INSERT/UPDATE paused_executions
    Note over PM,DB: Row locked in transaction
    
    par Concurrent Resume Request
        API->>PM: enqueueOrStartResume()
        activate PM
        PM->>DB: SELECT ... FOR UPDATE
        Note over PM,DB: Waits for transaction commit
    end
    
    DB-->>PM: INSERT committed
    PM->>DB: COMMIT TRANSACTION
    deactivate DB
    
    Note over PM: Transaction committed,<br/>row now visible
    
    DB-->>PM: paused execution found
    PM->>DB: Insert/update resume queue
    PM-->>API: Resume started/queued
    deactivate PM
    
    PM->>Queue: processQueuedResumes(executionId)
    activate Queue
    Queue->>DB: SELECT pending resumes
    Queue->>PM: startResumeExecution()
    deactivate Queue
    
    deactivate PM
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@h30s h30s force-pushed the fix/pause-resume-race-condition-3081 branch from f0e3c67 to 0365b3b Compare January 31, 2026 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant