-
Notifications
You must be signed in to change notification settings - Fork 0
Description
1 Purpose
Provide a reliable backend workflow that turns an uploaded media file or a remote video URL into a transcript. The public REST interface remains unchanged; all orchestration lives inside the workflow engine.
2 Scope of Work
| Phase | Description | Public status |
|---|---|---|
| Acquire input | • If source is a URL, download it into processing storage, applying internal retries (e.g. 3 attempts, exponential back-off).• If source is an uploaded file, persist it immediately. |
DOWNLOADING |
| Prepare | Sanity checks or lightweight conversions required by the transcription component. | PREPARING |
| Transcribe | Run the existing transcriber, producing JSON-segment output. | TRANSCRIBING |
| Finish | All steps succeed → COMPLETED.Any fatal, non-retryable error → FAILED, recording the step and message. |
COMPLETED / FAILED |
The workflow is all-or-nothing; partial success is not exposed to the client.
3 API Behaviour (no new endpoints)
POST /media/parse– unchanged; returns202+jobId.GET /media/job/{id}– returns:
{
"jobId": "job_123",
"status": "TRANSCRIBING",
"createdAt": "2025-05-26T12:00:00Z",
"updatedAt": "2025-05-26T12:05:03Z",
"progress": { "currentStep": "TRANSCRIBING" },
"error": null // Populated only when status = "FAILED"
}-
GET /media/job/{id}/resultstatus = COMPLETED→ returns raw JSON-segment transcript (the “most granular” representation).- Optional query
?format=srt|vtt|txtconverts on-the-fly and responds with the chosen variant (Content-Typeset accordingly). - For any other status →
404.
4 Non-functional Requirements
-
Retries & non-retryable errors
- Retries are internal; the public
statusnever shows “retrying”. - Steps must flag errors as retryable vs non-retryable; only the latter ends the job in
FAILED.
- Retries are internal; the public
-
Storage
- Persist original media, working copies, and transcripts in the designated bucket/folder.
- Automatic cleanup after the configured TTL.
-
Idempotency
- No implementation work in this ticket, but design must not preclude future support (e.g. allow passing an
Idempotency-Keyheader without error).
- No implementation work in this ticket, but design must not preclude future support (e.g. allow passing an
5 Acceptance Criteria
-
Workflow executes the four steps above with correct public
statustransitions. -
Internal retries operate as configured; exceeding the limit moves the job to
FAILED. -
GET /media/job/{id}returns accurate timestamps, current step, and error information when relevant. -
GET /media/job/{id}/result- returns JSON segments when
formatis omitted, - returns SRT/VTT/TXT when
formatis specified, - responds
404until the job isCOMPLETED.
- returns JSON segments when
-
All artifacts are deleted after the retention period.
-
Unit tests cover happy path, retry exhaustion, and non-retryable failures.
6 Out of scope
Might come later
- Optional query ?format=srt|vtt|txt converts on-the-fly and responds with the chosen variant (Content-Type set accordingly).
- Webhook callbacks
- Manual cancel/retry endpoints
- Analysis steps
- Full idempotency (duplicate POST requests)
Completely unrelated
- Authentication
- Payments
Not planned
- Streaming results
Metadata
Metadata
Assignees
Labels
Projects
Status