Skip to content

Expiry watchdog re-emits LEASE_EXPIRED on already-terminated jobs (§9.5) #89

@nficano

Description

@nficano

Category: spec-conformance Severity: minor
Location: src/Arcp.Runtime/Internal/JobSubmitFlow.fs:55-89
Spec: ARCP v1.1 §9.5

What

Spec §9.5: the runtime MUST emit job.error with LEASE_EXPIRED 'if the job is still active when its lease expires'. JobManager.Terminate does not remove the record from byId, so jobs.TryGet still returns Some for a previously-succeeded/cancelled job. The watchdog fires and emits a spurious job.error after the legitimate job.result.

Evidence

let private buildWatchdog ... =
    constraints |> Option.map (fun c ->
        let w = new ExpiryWatchdog(timeProvider)
        w.Start(c.ExpiresAt, fun () ->
            let payload: JobErrorPayload = { FinalStatus = JobStatus.Error; Code = "LEASE_EXPIRED"; ... }
            match jobs.TryGet jobId with
            | Some r -> ignore (task { do! jobs.EmitErrorAsync(r, payload); ... })
            | None -> ())

Proposed fix

Check r.Status against terminal states (Success, Error, Cancelled, TimedOut) before calling EmitErrorAsync; also call w.Stop() inside JobManager.Terminate so the timer is disposed at terminal transition.

Acceptance criteria

  • A job that completes successfully before its lease expires never produces a follow-on LEASE_EXPIRED job.error envelope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    audit/spec-conformanceAudit finding — ARCP v1.1 spec conformancesev/minorSeverity — minor

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions