-
Notifications
You must be signed in to change notification settings - Fork 3.5k
fix: add retry logic for resume race condition with pause persistence #3577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -164,10 +164,20 @@ export class PauseResumeManager { | |
| static async enqueueOrStartResume(args: EnqueueResumeArgs): Promise<EnqueueResumeResult> { | ||
| const { executionId, contextId, resumeInput, userId } = args | ||
|
|
||
| return await db.transaction(async (tx) => { | ||
| const pausedExecution = await tx | ||
| .select() | ||
| .from(pausedExecutions) | ||
| // Retry to handle race condition where resume request arrives | ||
| // before persistPauseResult commits the paused execution row. | ||
| // The INSERT in persistPauseResult is awaited, so the race window | ||
| // is only between the method call and the await returning (~10-50ms). | ||
| const MAX_RETRIES = 3 | ||
| const RETRY_DELAY_MS = 200 | ||
| let lastError: Error | null = null | ||
|
|
||
| for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) { | ||
| try { | ||
| return await db.transaction(async (tx) => { | ||
| const pausedExecution = await tx | ||
| .select() | ||
| .from(pausedExecutions) | ||
| .where(eq(pausedExecutions.executionId, executionId)) | ||
| .for('update') | ||
| .limit(1) | ||
|
|
@@ -277,7 +287,22 @@ export class PauseResumeManager { | |
| resumeInput, | ||
| userId, | ||
| } | ||
| }) | ||
| }) | ||
| } catch (err: any) { | ||
| lastError = err | ||
| const isNotFound = err.message?.includes('Paused execution not found') | ||
| const isLastAttempt = attempt === MAX_RETRIES | ||
|
|
||
| if (!isNotFound || isLastAttempt) { | ||
| throw err | ||
| } | ||
|
Comment on lines
+291
to
+298
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fragile string-based error matching The retry selector A more robust approach is to use a typed custom error class so the check is structural rather than textual: class PausedExecutionNotFoundError extends Error {
constructor() {
super('Paused execution not found or already resumed')
this.name = 'PausedExecutionNotFoundError'
}
}
// At throw site (line 187):
throw new PausedExecutionNotFoundError()
// In the catch block:
const isNotFound = err instanceof PausedExecutionNotFoundErrorThis makes the intent explicit and refactor-safe. |
||
|
|
||
| await new Promise((resolve) => setTimeout(resolve, RETRY_DELAY_MS)) | ||
| } | ||
| } | ||
|
|
||
| // This should never be reached due to the for loop logic, but TypeScript needs it | ||
| throw lastError ?? new Error('enqueueOrStartResume failed after retries') | ||
| } | ||
|
|
||
| static async startResumeExecution(args: StartResumeExecutionArgs): Promise<void> { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-by-one: 4 attempts, not 3
The loop condition
attempt <= MAX_RETRIESwithMAX_RETRIES = 3results in 4 total attempts (attempts 0, 1, 2, 3), not 3 as stated in the PR description and comments. The variable nameMAX_RETRIEStypically means the number of retries after the initial try, so 3 retries would mean 4 total attempts. However, the comment says "3 attempts" which implies only 3 total. This discrepancy can lead to confusion and unexpected behavior (an extra 200ms delay on the third retry).If the intent is 3 total attempts:
If the intent is 3 retries (4 total attempts), update the comment to say "4 attempts (1 initial + 3 retries)".