scheduler: add force flag to resumeJob for failed-job revival (#128)#144
Open
truffle-dev wants to merge 1 commit into
Open
scheduler: add force flag to resumeJob for failed-job revival (#128)#144truffle-dev wants to merge 1 commit into
truffle-dev wants to merge 1 commit into
Conversation
…right#128) When a scheduled job hits MAX_CONSECUTIVE_ERRORS the executor flips it to status='failed' and stops touching it. Until now the only path back was a raw SQLite UPDATE; resumeJob refused anything that wasn't paused. Shape (1) from the issue: resumeJob gains an optional `{ force?: boolean }` that lets paused-or-failed jobs flip back to active, recomputes next_run_at from the stored schedule, and clears consecutive_errors. Default behaviour is unchanged (paused-only). `completed` stays rejected even with force, because at-kind one-shots may have already deleted themselves via the deleteAfterRun path. UI surface: POST /ui/api/scheduler/:id/resume accepts an optional `{ force: true }` JSON body. Empty body keeps the old semantics. Audit log records `resume:force` when the flag is used so operators can tell forced revivals apart from normal ones. Tests cover paused-default, failed-without-force (no-op), failed-with-force (revive + clear errors + recompute next_run_at), completed-with-force (still refused), and the UI body-plumbing both ways.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #128.
When a scheduled job hits
MAX_CONSECUTIVE_ERRORS(10) the executor atsrc/scheduler/executor.tsflips it tostatus='failed', setsnext_run_at=NULL, and stops touching it. The public API has no path back:resumeJobatsrc/scheduler/service.ts:160refuses anything that isn'tpaused, andrunJobNowatsrc/scheduler/service.ts:226refuses anything that isn'tactive. Until now the only recovery was a raw SQLiteUPDATE.I picked shape (1) from the issue body:
resumeJobgains an optional{ force?: boolean }. Withforce: trueit acceptsfailedin addition topaused, recomputesnext_run_atfromcomputeNextRunAt(job.schedule), and clearsconsecutive_errors. Default behaviour is unchanged (paused-only).completedstays rejected even withforcebecause at-kind one-shots may have already deleted themselves via the executor'sdeleteAfterRunpath, and the issue body's reasoning forcompletedstill holds.UI surface
POST /ui/api/scheduler/:id/resumeaccepts an optional{ "force": true }JSON body. Empty body keeps the old semantics. The audit log recordsresume:forcewhen the flag is used so operators can tell forced revivals apart from normal ones.Tests
Service-level (
src/scheduler/__tests__/service.test.ts):UI API (
src/ui/api/__tests__/scheduler.test.ts):{ force: true }revives a failed job and writesresume:forceto the audit logAll 32 service tests and 35 UI scheduler tests pass.
tsc --noEmitandbiome check src/are clean.What this doesn't do
armTimeralready fires from insideresumeJob, so the in-memory wake-up problem the issue body alludes to is handled the same way a normal resume handles it. Shape (2) (a dedicatedrecoverFailedJobaction) and shape (3) (docs-only) are still on the table if you'd rather not extend the existing API; happy to close this in favour of either.