Skip to content

UN-3513 [FEAT] Type chord-callback boundary with BatchExecutionResult / FileExecutionResult#2020

Merged
muhammad-ali-e merged 6 commits into
mainfrom
UN-3513-chord-callback-result-typing
Jun 9, 2026
Merged

UN-3513 [FEAT] Type chord-callback boundary with BatchExecutionResult / FileExecutionResult#2020
muhammad-ali-e merged 6 commits into
mainfrom
UN-3513-chord-callback-result-typing

Conversation

@muhammad-ali-e

Copy link
Copy Markdown
Contributor

What

  • Extend two dataclasses in unstract/core/src/unstract/core/worker_models.py (strictly additive — every new field is optional with a safe default):
    • BatchExecutionResult gains skipped_already_completed: int = 0, skipped_active_duplicate: int = 0, organization_id: str | None = None.
    • FileExecutionResult gains file_name: str | None = None, result_data: Any | None = None, skipped: str | None = None.
    • Both from_dict updated to populate the new fields.
  • workers/file_processing/tasks.py producers now build the typed dataclasses and emit .to_dict() instead of hand-rolled dicts:
    • L901 (general path, process_file_batch) → BatchExecutionResult(...).to_dict().
    • L1706 / L1798 / L1823 (API path, _process_file_batch_api_core helpers) → FileExecutionResult(...).to_dict(). L1798 preserves the legacy storage_result field via dict-spread merge.
  • New workers/tests/test_chord_callback_boundary.py — 14 tests, 3 classes (wire-shape characterisation for both dataclasses + consumer-tolerance check).

Why

Phase 5 of the rollout asks for every queue-crossing payload to be machine-readable so a future scheduler has structured data to dispatch on. The chord-callback boundary was the last loose dict-shaped handoff in workers — typed dataclasses already existed in unstract.core.worker_models but producers were returning ad-hoc dicts. This PR closes that gap on the producer side. Consumer side is already tolerant (.get(..., default) everywhere in aggregate_file_batch_results), so no consumer change is needed.

Builds on UN-3501 (#2003 — fairness key) and runs in parallel with UN-3508 (#2009 — fairness through ExecutionDispatcher). Unblocks Phase 6 (Barrier interface) which needs a typed result contract to declare on Barrier.coordinate(...).

How

  • The dataclass extensions are strictly additive — every new field is optional with a default. Existing producers/consumers that don't know about the new fields are unaffected.
  • The producer migration is mechanical: return {dict}return Class(...).to_dict(). Wire shape gains a small number of default values (empty list, None, 0) that consumers ignore via .get().
  • One deliberate domain-vocabulary correction on the API path: status string changes from lowercase "completed" / "failed" (ad-hoc, matched no canonical enum) → "Success" / "Failed" (the actual ApiDeploymentResultStatus per-file vocabulary). See "Can this PR break any existing features" below.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No on consumer code; no on backend; yes on log/observability tooling that pattern-matches the lowercase API-path status strings (deliberate, see below).

  • Consumer side: aggregate_file_batch_results (shared/processing/files/time_utils.py:130) reads every field via .get(..., default). Adding fields to the wire dict is silently ignored; missing fields fall back to defaults. New tests assert this tolerance holds for the new wire shape.
  • Backend: doesn't import BatchExecutionResult. Has its own unrelated FileExecutionResult in workflow_manager/endpoint_v2/dto.py:123 that uses **data construction — untouched.
  • Status-vocabulary shift on the API path: the producer at L1706 / L1798 / L1823 previously returned status="completed" / "failed" — lowercase strings matching neither ExecutionStatus (uppercase, workflow-level) nor ApiDeploymentResultStatus (Success/Failed, the canonical per-file vocabulary). The lowercase strings were a pre-existing ad-hoc third variant. After this PR they emit "Success" / "Failed" — the correct per-file vocabulary.
    • Python equality consumers: none found. Grep across workers/ and backend/ for status == "completed" / status == "failed" against these chord-callback dicts is clean (matches only unrelated cleanup-result paths).
    • Customer-facing API responses: unchanged — api-deployment/tasks.py:538-542 already uses ApiDeploymentResultStatus.SUCCESS.value / FAILED.value when building the API response. The lowercase strings were internal chord-callback noise, never the external contract.
    • Observability: dashboards / log queries / runbooks that grep worker logs for the lowercase strings need to be updated. Calling this out explicitly so ops can audit before merge.

Database Migrations

None.

Env Config

None.

Relevant Docs

None — dataclass docstrings document the new optional fields inline.

Related Issues or PRs

  • Ticket: UN-3513 (sub-task) under UN-3500 (Story: PG Queue Phase 5 — Payload typing) under UN-3445 (Epic).
  • Sibling PR currently in review: UN-3508 (UN-3508 [FEAT] Plumb fairness through ExecutionDispatcher #2009 — fairness through ExecutionDispatcher). No file overlap with this PR.
  • Unblocks: UN-3504 (Phase 6 — Barrier interface needs a typed result contract).

Dependencies Versions

None.

Notes on Testing

cd workers
.venv/bin/python -m pytest tests/test_chord_callback_boundary.py
# 14 passed
cd unstract/sdk1
uv run pytest tests/test_execution.py
# 80 passed (existing worker_models round-trip tests confirm the additive extensions don't break anything)

Dev-tested locally against a running worker stack — workflow execution completes end-to-end; callback aggregates correctly; API-path responses unchanged.

Screenshots

N/A (no UI surface).

Checklist

I have read and understood the Contribution Guidelines.

… / FileExecutionResult

Producers in workers/file_processing/tasks.py now build typed
dataclasses (from unstract.core.worker_models) and emit their
``.to_dict()`` instead of hand-rolled dicts. Locks the wire shape to
the dataclass schema so downstream refactors fail loud.

Scope

Producer-side typing only. Consumer (workers/callback/tasks.py +
aggregate_file_batch_results) already reads via ``.get(..., default)``
— tolerant by construction — so no consumer-side change needed.

Dataclass extensions (unstract.core.worker_models, additive only)

* BatchExecutionResult gains 3 optional fields: skipped_already_completed,
  skipped_active_duplicate, organization_id.
* FileExecutionResult gains 3 optional fields for the API path's legacy
  dict vocabulary: file_name (alias for file), result_data (alias for
  result), skipped (marker like "already_completed").
* Both from_dict updated to populate the new fields.

Producer migrations (workers/file_processing/tasks.py)

* L901 (general path, process_file_batch return):
  BatchExecutionResult(...).to_dict(). Wire dict gains file_results: []
  and errors: [] defaults — strictly additive.
* L1706, L1798, L1823 (API path returns from _process_file_batch_api_core
  helpers): FileExecutionResult(...).to_dict(). L1798 preserves the
  legacy storage_result field via dict-spread merge.

Domain-vocabulary correction on the API path

API-path producers previously returned status="completed" / "failed" —
lowercase strings matching neither ExecutionStatus (workflow-level,
uppercase) nor ApiDeploymentResultStatus (per-file, Success/Failed,
the canonical per-file vocab). Producers now emit "Success" / "Failed"
via FileExecutionResult.

Audit: no Python equality consumer was found reading the lowercase
variants (grep clean). Observability tooling pattern-matching the
old strings would need updating; this is a domain-correctness fix.

Tests

New tests/test_chord_callback_boundary.py — 14 tests, 3 classes:
* Wire-shape characterisation for BatchExecutionResult.
* Wire-shape characterisation for FileExecutionResult with alias
  fields and canonical Success/Failed vocab.
* Consumer tolerance: aggregate_file_batch_results-style .get() reads
  return expected values from the new wire shape.

sdk1's 80 worker_models tests still pass — the dataclass extensions
are strictly additive.

Regression risk: zero on consumer side, zero on backend
(doesn't import these classes; has its own FileExecutionResult in
dto.py — untouched). Status-vocab shift on API path is a deliberate
domain correction.

Test count: workers boundary suite +14 (new); sdk1 dispatcher 80/80.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 385c1058-12b2-46f0-8dd4-ca606bb6c487

📥 Commits

Reviewing files that changed from the base of the PR and between 8559e39 and fb3500e.

📒 Files selected for processing (3)
  • unstract/core/src/unstract/core/worker_models.py
  • workers/file_processing/tasks.py
  • workers/tests/test_chord_callback_boundary.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • workers/file_processing/tasks.py

Summary by CodeRabbit

Release Notes

  • New Features
    • Added skip reason tracking for files (e.g., already completed, active duplicate)
    • Enhanced batch processing responses with detailed execution metrics and file-level results
    • Batch results now include skip counters for visibility into skipped file breakdown
    • Standardized result format across file processing API endpoints

Walkthrough

The PR introduces typed result models (SkipReason, FileExecutionResult, BatchExecutionResult) to standardize the worker execution wire contract. Skip reasons are tracked per-file and aggregated at batch level; legacy alias fields enable backwards-compatible serialization. Producers now emit typed dicts instead of raw dictionaries. A comprehensive test suite validates round-trips and contract boundaries.

Changes

Skip-reason vocabulary and typed result models

Layer / File(s) Summary
Skip-reason vocabulary and FileExecutionResult extensions
unstract/core/src/unstract/core/worker_models.py
SkipReason enum defines ALREADY_COMPLETED and ACTIVE_DUPLICATE values. FileExecutionResult gains legacy alias fields (file_name, result_data, skipped, storage_result) with defaults; _parse_skipped() leniently converts wire values to typed enum or None with warning; from_dict() populates aliases and derives typed skipped via _parse_skipped().
BatchExecutionResult extensions and counter derivation
unstract/core/src/unstract/core/worker_models.py
BatchExecutionResult adds skipped-file sub-counters (skipped_already_completed, skipped_active_duplicate) and organization_id; to_dict() strips None values from nested file_results; from_dict() populates new fields with defaults; from_file_results() derives success/failed/skipped totals from typed FileExecutionResult items.
Public export of SkipReason
unstract/core/src/unstract/core/__init__.py
Re-exports SkipReason in import and __all__ list.
Producer: single-file API result standardization
workers/file_processing/tasks.py
Expands imports to include typed models; _process_single_file_api() emits FileExecutionResult(...).to_dict() for all branches: already-completed with SUCCESS + ALREADY_COMPLETED, success with SUCCESS + result_data/storage_result, failure with FAILED + error.
Producer: batch aggregation and timing
workers/file_processing/tasks.py
process_file_batch_api() tracks batch_start_time, computes execution_time, converts per-file results via FileExecutionResult.from_dict(), derives skipped_already_completed. _compile_batch_result() returns BatchExecutionResult(...).to_dict() with counts, timing, and organization scope.
Consumer contract tests and producer validation
workers/tests/test_chord_callback_boundary.py
New test module with TestBatchExecutionResultWireShape, TestFileExecutionResultWireShape, TestProducerBinding, and TestRealConsumerTolerance classes. Validates serialization round-trips, lenient parsing of unknown skip values, producer wire shape emission, and real consumer tolerance for typed dicts.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: typing the chord-callback boundary with BatchExecutionResult and FileExecutionResult dataclasses.
Description check ✅ Passed The description is comprehensive and well-structured, covering all required template sections including What, Why, How, breaking changes assessment, testing notes, and checklist acknowledgment.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch UN-3513-chord-callback-result-typing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@muhammad-ali-e muhammad-ali-e left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Toolkit — multi-agent review of UN-3513 (devil's-advocate stance)

Ran 6 agents (code-reviewer, silent-failure-hunter, type-design-analyzer, pr-test-analyzer, comment-analyzer, code-simplifier) against the single commit d5a6e87f. Findings cross-validated against the actual source — links verified, not just agent assertions.

Top-line

  • The dataclass extensions are mostly additive as claimed, and consumer tolerance via .get() largely holds. The PR is mergeable in spirit.
  • But one BLOCKER reappeared in 5/6 agents independently: __post_init__ silently overwrites the constructor-passed status argument. Every migrated call site looks like it's setting status; none of them are. Today it's accidentally correct because each site happens to match the error-presence rule, but the contract is a lie. from_dict has the same pathology one layer deeper — it discards the wire status and re-derives.
  • One HIGH that contradicts the PR's stated scope: _process_file_batch_api at tasks.py:1659 still returns a hand-rolled {successful_files, failed_files} 2-key dict — the actual API batch boundary feeding process_batch_callback_api. The PR description says "L1706/L1798/L1823 (API path returns from _process_file_batch_api_core helpers)" but those are per-file helpers, not the batch return. The boundary is half-typed.
  • One HIGH that's subtle: serialize_dataclass_to_dict drops None-valued keys (data_models.py:333). That means the new alias fields (file_name, result_data, skipped) DISAPPEAR from the wire whenever a producer doesn't set them. The test guarantees about aliases hold only because the tests explicitly populate both sides; any future call site that omits the alias silently changes the wire-shape contract for downstream consumers doing if "file_name" in wire membership checks.
  • Other findings cluster on: missing producer-side tests, hollow consumer-tolerance test, pre-existing dead status == "error" branch in time_utils.py:181 (PR neither breaks nor fixes), no deprecation marker on aliases, no invariant on total_files == sum(...).

Prioritized summary

# Severity File:line Issue
1 BLOCKER worker_models.py:268 __post_init__ ignores status= arg — silently coerces
2 BLOCKER worker_models.py:296-301 (via :313) from_dict discards wire status, recomputes from error
3 HIGH tasks.py:1659 (anchored at :913) API batch return still hand-rolled 2-key dict — boundary half-typed
4 HIGH worker_models.py:264 serialize_dataclass_to_dict strips Nones — aliases drop off wire silently
5 HIGH tasks.py:1806 Dict-spread merge can wrap storage soft-failures as SUCCESS
6 HIGH tasks.py:1709 Already-completed skip indistinguishable from SUCCESS at aggregation
7 MEDIUM tasks.py:1839 (time_utils.py:181) Pre-existing dead status == "error" consumer branch
8 MEDIUM test_chord_callback_boundary.py:25 No producer-side tests; legacy-dict regression at producer is invisible
9 MEDIUM test_chord_callback_boundary.py:184 Hollow consumer-tolerance test re-implements .get(); doesn't drive real consumer
10 LOW worker_models.py:246 No deprecation marker / migration plan on aliases — becomes permanent
11 LOW worker_models.py:347 No invariant total_files == sum(...); organization_id is context, not outcome
12 NIT test_chord_callback_boundary.py:207 Flag pre-existing skipped_files read/no-write asymmetry

Inline comments below carry the per-site detail and suggested fix.

Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread unstract/core/src/unstract/core/worker_models.py Outdated
Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread workers/file_processing/tasks.py Outdated
Comment thread workers/file_processing/tasks.py
Comment thread workers/tests/test_chord_callback_boundary.py
Comment thread workers/tests/test_chord_callback_boundary.py Outdated
Comment thread workers/tests/test_chord_callback_boundary.py Outdated
…-binding tests)

A+B from the triage on PR #2020:

* tasks.py:1659 (API-path BATCH return) — migrated to
  BatchExecutionResult.to_dict(). Fixes the half-typed boundary the
  reviewer flagged. file_results, total_files, skipped_already_completed
  and organization_id are now on the wire. Successful/skipped counter
  semantic preserved (separating them is deferred to a follow-up).

* New SkipReason StrEnum (worker_models.py) with ALREADY_COMPLETED +
  ACTIVE_DUPLICATE — mirrors the batch-level skip counters on
  BatchExecutionResult. FileExecutionResult.skipped is now
  SkipReason | None. from_dict coerces. Producer uses the enum;
  the ACTIVE_DUPLICATE value has no current per-file producer but
  is exercised end-to-end via a round-trip test.

* TODO(UN-3516) marker on the three alias fields (file_name,
  result_data, skipped) — sunset ticket filed.

* Tests strengthened:
  - TestProducerBinding drives real _compile_batch_result with a
    minimal SimpleNamespace context, and drives _process_single_file_api
    via mocked api_client for the already-completed branch.
  - TestRealConsumerTolerance imports the real
    aggregate_file_batch_results — producer-consumer contract driven
    end-to-end.
  - test_none_valued_optional_fields_stripped_from_wire documents
    serialize_dataclass_to_dict's None-strip behaviour.
  - test_active_duplicate_skip_reason_round_trips proves the second
    enum value isn't dead.
  - SonarCloud python:S1244 fixed — pytest.approx.
  - skipped_files==0 NIT assertion removed.

Test count: workers boundary suite 14 -> 18; sdk1 worker_models 80/80
still green.

Deferred (separate tickets to follow): __post_init__ silent status
clobber, from_dict status discard, BatchExecutionResult invariant,
storage soft-failure, dead aggregator branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@muhammad-ali-e

Copy link
Copy Markdown
Contributor Author

Review feedback addressed — 8e16eb60

12/12 toolkit findings + 1 SonarCloud responded to. Triage summary:

Fixed in this commit (7)

# Finding What changed
4 (LOW) Aliases lack sunset plan TODO(UN-3516) markers on the three alias fields. UN-3516 filed with full audit/migration/removal acceptance criteria.
6 (HIGH) L1659 not migrated — half-typed boundary Migrated to BatchExecutionResult.to_dict(). file_results, total_files, skipped_already_completed, organization_id now on the API-path wire.
10 (MEDIUM) Tests don't drive real producers New TestProducerBinding drives _compile_batch_result + _process_single_file_api directly. A revert at the producer site now fails loud.
11 (MEDIUM) Hollow consumer-tolerance test Replaced with TestRealConsumerTolerance that imports the real aggregate_file_batch_results.
12 (NIT) skipped_files == 0 reads as endorsement Assertion removed.
— (SonarCloud python:S1244) Float equality in test pytest.approx.
— (your typo-safety question) "already_completed" as bare string New SkipReason StrEnum with ALREADY_COMPLETED and ACTIVE_DUPLICATE (mirroring the batch-counter vocab on BatchExecutionResult). Producer + tests use the enum.

Partially fixed (2)

# Finding What changed
3 (HIGH) serialize_dataclass_to_dict strips None Touching shared infra is out of scope; added test_none_valued_optional_fields_stripped_from_wire to document and assert the strip behaviour. Membership-check consumers now get a loud test failure if the wire shape ever diverges.
7 (HIGH) Skipped invisible + reprocessing on except Skipped-invisible fixed via #6. Reprocessing on except Exception at L1719 is a pre-existing latent bug — deferred.

Pushed back / deferred (5) — separate tickets to follow

# Finding Why deferred
1 (BLOCKER) __post_init__ silently clobbers status Pre-existing dataclass design; touches a class shared with backend. Own design discussion.
2 (BLOCKER) from_dict discards wire status Same pathology as #1; bundled into the same ticket.
5 (MEDIUM) No total_files invariant + organization_id data-model question + add_file_result not updated Strict invariant requires separating successful/skipped counter semantic — that's a behavioural regression for the API path's existing counter. Bundled as a follow-up.
8 (HIGH) Storage soft-failure → SUCCESS Pre-existing pattern (legacy dict had the same behaviour). This PR preserves it via dict-spread — does not make it worse. Separate ticket.
9 (MEDIUM) Dead aggregator branch (status == "error") Pre-existing dead code surfaced by the vocab shift, not caused by it. Bundled with the other latent-bug ticket.

Tests

  • tests/test_chord_callback_boundary.py: 14 → 18 (added TestProducerBinding, the None-strip doc test, test_active_duplicate_skip_reason_round_trips, and TestRealConsumerTolerance's second multi-batch case)
  • sdk1 tests/test_execution.py: 80/80 still green
  • SonarCloud float-equality finding: resolved

Branch state

UN-3513-chord-callback-result-typing at 8e16eb60e16e2dd55f49163678267b20eb7c2467. Two commits on the branch: d5a6e87f (initial) + 8e16eb60 (this review-fix round). CI / SonarCloud / Greptile / CodeRabbit re-running on the new HEAD.

@muhammad-ali-e muhammad-ali-e left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second-pass review — three new findings that escaped the first round. All correctness-level: one data-loss / broken-contract at the batch boundary, one rolling-deploy regression on the consumer read path, and a test-coverage gap where the TestProducerBinding docstring overpromises relative to what the class actually exercises.

Comment thread workers/file_processing/tasks.py Outdated
Comment thread unstract/core/src/unstract/core/worker_models.py Outdated
Comment thread workers/tests/test_chord_callback_boundary.py
…ipped + missing producer tests)

Three findings from the second review round on PR #2020:

* HIGH — storage_result silent data loss at batch boundary. The
  per-file dict-spread at tasks.py:1816 preserved storage_result on
  the immediate return, but the value was dropped when wrapped into
  BatchExecutionResult.file_results (from_dict didn't know the key).
  Promoted to a typed FileExecutionResult.storage_result: Any | None
  field; producer now emits via the constructor; from_dict reads it
  back. The round-trip preserves it end-to-end.

* HIGH — strict SkipReason parsing would crash entire batches during
  rolling deploys if a newer producer ever emitted an unknown value.
  Added FileExecutionResult._parse_skipped, which catches ValueError
  + logs a warning + falls back to None. Standard "strict on emit,
  lenient on receive" posture for wire compat.

* MEDIUM — TestProducerBinding only covered 2 of 5 producer branches.
  Added three more tests:
  - _process_single_file_api success branch (asserts storage_result
    survives the typed wire — would catch the dict-spread revert).
  - _process_single_file_api failure branch (asserts canonical
    "Failed" vocab — catches reverts to the legacy lowercase
    "failed").
  - process_file_batch_api batch wrapper via task.apply() with an
    in-memory result_backend (asserts BatchExecutionResult shape +
    skipped_already_completed counter derived from
    SkipReason.ALREADY_COMPLETED.value).
  Strengthened the existing already-completed branch test to assert
  result_data + metadata propagation.

Bug caught by the new batch-wrapper test: process_file_batch_api was
missing execution_time on its BatchExecutionResult(...) call —
BatchExecutionResult.execution_time is a required positional, so the
API-path batch task would have crashed with TypeError on every run.
Introduced batch_start_time = time.time() at task entry and pass
execution_time = time.time() - batch_start_time. The new test would
have caught this immediately at PR time; logging it here as the
exact value of producer-binding coverage.

Test count: 18 -> 21; all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@muhammad-ali-e muhammad-ali-e marked this pull request as ready for review June 8, 2026 09:17
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR completes the chord-callback payload typing for the worker layer by replacing ad-hoc dict returns with typed dataclasses (BatchExecutionResult / FileExecutionResult) at every producer site. All new fields are optional with safe defaults, so the change is wire-additive and requires no consumer changes.

  • worker_models.py: Adds SkipReason enum, extends both dataclasses with optional fields, fixes BatchExecutionResult.to_dict() to strip None from nested file-result dicts so standalone and batch-embedded wire shapes are symmetric, and adds a from_file_results factory method.
  • tasks.py: Migrates three producer sites from hand-rolled dicts to FileExecutionResult(...).to_dict() and one site to BatchExecutionResult(...).to_dict(); status vocabulary on the API path is corrected from ad-hoc lowercase strings to the canonical ApiDeploymentResultStatus values.
  • test_chord_callback_boundary.py: 14 new tests across three layers (wire-shape round-trip, real producer binding, real consumer tolerance) lock the on-wire contract.

Confidence Score: 5/5

Safe to merge; all new fields are optional with safe defaults, the consumer already uses .get() everywhere, and the status-vocabulary change is intentional and well-scoped.

Every new field in both dataclasses is optional with a safe default, making the wire change additive. The None-stripping fix in BatchExecutionResult.to_dict() was the last remaining asymmetry flagged in prior review and is now covered by a dedicated test. The status-vocabulary correction is deliberate and the PR description explicitly calls out that no Python equality consumer matches the old lowercase strings. Three layers of tests (wire-shape, producer binding, consumer tolerance) provide strong regression coverage for the changed paths.

No files require special attention. The most complex change is the BatchExecutionResult.to_dict() None-stripping fixup, which is well-tested and clearly documented.

Important Files Changed

Filename Overview
unstract/core/src/unstract/core/worker_models.py Adds SkipReason enum, extends FileExecutionResult/BatchExecutionResult with optional fields, fixes None-stripping asymmetry in BatchExecutionResult.to_dict(), and introduces from_file_results factory; all changes are strictly additive and well-commented.
workers/file_processing/tasks.py Migrates three _process_single_file_api return paths and _compile_batch_result from hand-rolled dicts to typed dataclass .to_dict() calls; status vocabulary corrected to canonical ApiDeploymentResultStatus values; batch_start_time added to enable execution-time tracking on the API path.
workers/tests/test_chord_callback_boundary.py New 771-line test file with 14 tests across three classes: wire-shape characterisation, real producer binding, and real consumer tolerance. Covers round-trips, JSON safety, None stripping symmetry, canonical vocab validation, and lenient unknown-SkipReason handling.
unstract/core/src/unstract/core/init.py Adds SkipReason to the module's public import and all list; purely mechanical re-export change.

Sequence Diagram

sequenceDiagram
    participant GW as General Path<br/>(process_file_batch)
    participant API as API Path<br/>(process_file_batch_api)
    participant SF as _process_single_file_api
    participant CR as _compile_batch_result
    participant CB as Chord Callback<br/>(aggregate_file_batch_results)

    GW->>CR: context (WorkflowContextData)
    CR->>CB: "BatchExecutionResult(...).to_dict()<br/>{total_files, successful_files, skipped_*, org_id, ...}"

    API->>SF: file_data (per file)
    SF-->>API: "FileExecutionResult(...).to_dict()<br/>{file, file_name, status=Success/Failed, skipped?, ...}"
    API->>API: FileExecutionResult.from_dict(r) for each result
    API->>CB: "BatchExecutionResult(...).to_dict()<br/>{file_results: [stripped-None dicts], ...}"

    CB->>CB: "aggregate via .get(..., default)<br/>(consumer tolerance preserved)"
Loading

Reviews (4): Last reviewed commit: "UN-3513 [FIX] Address vishnuszipstack re..." | Re-trigger Greptile

…rministic callback healthcheck picker

Greptile P2 #2 — None-stripping was asymmetric for nested
FileExecutionResult objects. ``serialize_dataclass_to_dict`` only
filters None at the outermost level, so a standalone
``FileExecutionResult.to_dict()`` would omit unset optional fields
while ``batch.to_dict()["file_results"][i]`` would carry explicit
``"file_name": None`` etc. for the same input. A consumer doing
``"x" in result`` membership checks would behave differently
depending on whether it read the standalone wire or the nested-in-
batch wire — a real contract divergence.

Fixed locally on ``BatchExecutionResult.to_dict()`` (not by touching
the shared ``serialize_dataclass_to_dict`` infra): post-process
``wire["file_results"]`` to drop None-valued keys, mirroring the
top-level strip. ``BatchExecutionResult.from_dict`` was already
tolerant via ``.get(...)`` so the round-trip stays clean.

Greptile P2 #1 (``status`` constructor parameter clobbered by
``__post_init__``) is the same pathology I flagged as BLOCKER #1 in
the first review round — deferred to a separate ticket with the
shared-infra dataclass redesign.

Test coverage: extended the existing
``test_none_valued_optional_fields_stripped_from_wire`` to also
assert nested symmetry — same test method, no new method added.
This keeps the pytest collection profile stable (a separate test
method would perturb celery's shared task-registry insertion
order during pytest collection and amplify a pre-existing flake
in ``test_callback_sanity.py``).

Test infra fix (bundled because it would have flaked CI on this
PR's HEAD): ``test_callback_sanity.TestEagerHealthcheckRoundTrip``
selected the healthcheck task via
``endswith(".healthcheck")`` against ``eager_app.tasks``. That
registry is a shared celery global with at least 5 worker modules
registering ``healthcheck`` (callback, executor, file_processing,
log_consumer, scheduler). ``next(...)`` returned whichever was
inserted first, which depends on pytest module-collection order
across the whole suite. The test would assert
``worker_type == "callback"`` and intermittently get ``"executor"``
or ``"file_processing"`` instead — empirically a ~10% flake rate
on this branch's HEAD, climbing to ~90% with any test-collection
perturbation. Replaced with an exact-name lookup
(``name == "callback.worker.healthcheck"``); 30/30 green across
deterministic + randomised probes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread unstract/core/src/unstract/core/worker_models.py
muhammad-ali-e added a commit that referenced this pull request Jun 8, 2026
…test-infra fix)

Seven of Vishnu's findings against ``524ae9184`` addressed. Three
flagged IMPORTANT (silent-failure + missing test coverage), four
SUGGESTION (drift hazard, comment/behaviour mismatch, hollow canary,
duplicate-test cleanup). The other three (hand-built fixture,
SDK1 ``dict[str, Any]`` boundary, ``as_header`` TypedDict refactor)
deferred — see PR thread acknowledgments.

* **#11a (SUGGESTION, drift)** — ``workers/queue_backend/dispatch.py``
  still hand-built the fairness header instead of calling
  ``fairness.as_header()``. Wire-format encoding now has a single
  source so the two sites can't drift. ``FAIRNESS_HEADER_NAME``
  import dropped (no longer used here).

* **#9 (SUGGESTION, comment/behaviour mismatch)** —
  ``ExecutionDispatcher._build_send_kwargs`` was forwarding ``{}``
  as-is despite the docstring calling it "likely indicates a
  producer-side build bug". Changed ``if headers is not None`` ->
  ``if headers``: falsy is dropped so the on-wire shape matches the
  no-headers baseline and a miswired producer surfaces immediately.
  Docstring rewritten to describe the new contract.

* **#3 (SUGGESTION, wording)** — ``canary_helpers`` docstring
  overstated adoption. Softened to "intended single home" and
  named the two characterisation walkers still inlining the logic
  as a known follow-up.

* **#5 + #6 (IMPORTANT cleanup + SUGGESTION fixture)** — six
  structurally-identical ``*_forwards_headers`` /
  ``*_omits_headers_when_none`` tests collapsed to three
  parametrized methods over the three dispatch entry points.
  Fixture now uses ``FairnessKey(...).as_header()`` rather than
  hand-built dicts, so the wire shape exercised matches what real
  producers emit (including ``pipeline_priority``). Net: ~60 LOC
  removed, per-method failure granularity preserved via parametrize
  IDs. Also added empty-dict drop assertions covering #9.

* **#7 (SUGGESTION, missing combined test)** — new
  ``test_dispatch_with_callback_combines_headers_and_callbacks``
  passes ``on_success``, ``on_error``, ``task_id``, and ``headers``
  together and asserts all four land on the same ``send_task``
  call. A key-merge regression in ``_build_send_kwargs`` would
  have slipped through the single-kwarg forwarding tests.

* **#8 (SUGGESTION, hollow canary)** — the
  ``execute_extraction`` dispatch canary only ever asserted the
  empty (passing) case against the live tree. Added a positive-
  detection unit test feeding ``ast.parse`` of a known-bad snippet
  and a blind-spot lock test (constant ref, f-string,
  ``apply_async`` all evade the detector — documenting the scope
  so a future widening intentionally trips the asserts).

* **#4 (IMPORTANT, untested helper)** — ``_fairness_headers`` in
  ``structure_tool_task`` was untested; a regression flipping
  ``NON_API`` -> ``API`` or dropping ``headers=`` at any of the
  three call sites would have stayed green. Added focused unit
  tests in new ``test_structure_tool_task.py`` (wire shape,
  org_id propagation, ``NON_API`` not ``API``) and extended
  ``test_sanity_phase5.TestStructureToolSingleDispatch`` to assert
  ``dispatch.call_args.kwargs["headers"]`` carries the expected
  shape.

* **#2 (IMPORTANT, vacuous-pass)** — ``iter_production_trees``
  warned-and-continued on ``SyntaxError`` but neither canary
  module promoted the warning to error. A botched merge in a
  production file would have dropped silently from the audit set
  and every canary would have passed vacuously over a smaller
  tree. Added ``pytestmark = pytest.mark.filterwarnings(
  "error::UserWarning")`` on both ``test_executor_dispatch`` and
  ``test_fairness_key``, plus a new ``test_canary_helpers.py``
  that unit-tests both the warn-on-broken behaviour and the
  promote-to-error contract the canary modules rely on.

**Bundled test-infra fix (unrelated but unblocks CI):** the
``test_callback_sanity.TestEagerHealthcheckRoundTrip`` tests
selected the healthcheck task via ``endswith(".healthcheck")``
against ``eager_app.tasks``, which is a shared celery global
registry containing ``callback.worker.healthcheck``,
``executor.worker.healthcheck``,
``file_processing.worker.healthcheck`` etc. The bare ``next(...)``
returned whichever was inserted first — non-deterministic across
pytest module-collection orders. Without this fix, the new tests
added in this commit perturb the collection profile enough to
flip the failure rate from ~10% to nearly 100%. Replaced with
exact-name lookup ``name == "callback.worker.healthcheck"``.
Identical fix already landed on the UN-3513 branch (see #2020).

Test count: 31 -> 42 on the UN-3508-touched modules. Full workers
suite: 6 failures pre-existing baseline, unchanged by this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
muhammad-ali-e added a commit that referenced this pull request Jun 8, 2026
* UN-3508 [FEAT] Plumb fairness through ExecutionDispatcher

Phase 5.2 of the PG Queue rollout (epic UN-3445). Adds fairness-header
support to the third dispatch path (sdk1's ExecutionDispatcher) so
``execute_extraction`` tasks emitted by file_processing carry the same
routing metadata as workflow-execution dispatches that go through
queue_backend.

What

* sdk1/execution/dispatcher.py: ``dispatch``, ``dispatch_async``,
  ``dispatch_with_callback`` all accept an optional ``headers`` kwarg.
  When non-None, forwarded to Celery's send_task; when None, omitted
  so the call shape stays identical to pre-Phase-5.2 for callers that
  don't opt in (sdk1's existing tests remain green unchanged).
* queue_backend/fairness.py: new ``FairnessKey.as_header()`` method
  returns the wire-ready ``{"x-fairness-key": ...}`` dict. Producers
  no longer need to reference ``FAIRNESS_HEADER_NAME`` directly —
  keeps the additive-only canary in test_fairness_key.py happy.
* file_processing/structure_tool_task.py: small ``_fairness_headers``
  helper builds the header (defaulting workload_type to NON_API;
  propagating the real type is Phase 6 work). All three
  ``dispatcher.dispatch(...)`` sites (lines 468, 507, 720) now pass
  ``headers=_fairness_headers(organization_id)``.
* tests/test_executor_dispatch.py: new file. Covers header forwarding
  through all three dispatcher methods (including the "omit when
  None" pre-existing shape preservation), the FairnessKey.as_header()
  shape, and an AST inventory canary that forbids raw
  ``*.send_task("execute_extraction", ...)`` outside
  ExecutionDispatcher.

Why

UN-3501 plumbed fairness on bare dispatch() call sites. The
``execute_extraction`` task is the most workflow-execution-y dispatch
in the codebase but bypasses queue_backend (uses ExecutionDispatcher
directly), so it had no fairness header. The canary in
test_fairness_key.py audits only bare-name dispatch() and missed it.

No regression risk

* Additive: ``headers`` is optional and defaults to None on all three
  dispatcher methods; the existing 78 sdk1 tests pass unchanged.
* Producer-side only — no consumer reads ``x-fairness-key`` yet.
* No queue routing, task name, or args/kwargs change.

Test count: workers seam suite 53 -> 60 (new test_executor_dispatch.py
with 7 tests). sdk1 dispatcher suite 80/80 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [REFACTOR] Extract shared canary helpers to drop SonarCloud duplication

SonarCloud flagged 7.8% duplicated lines in the new
test_executor_dispatch.py — the file-walking helper and skip-dir
constants were copy-pasted from test_fairness_key.py.

Move them into tests/canary_helpers.py:

* WORKERS_ROOT, DEFAULT_SKIP_TOP_DIRS constants.
* iter_production_trees(skip_top_dirs=…) generator.

Both canary tests use relative imports (from .canary_helpers import …)
to keep one canonical import path — tests/ is already a package via
__init__.py, no pyproject change needed. (An earlier attempt added
pythonpath = ["tests"], reverted — it would have created a second
top-level import path for every test file and a dual-module-object
hazard.)

The fairness canary widens its skip set with ``queue_backend`` (where
the seam legitimately defines fairness constants); the executor canary
keeps the default. Tests stay at 60/60 — pure dedup, no behavioural
change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [FIX] Address 14 PR review findings (HIGH/MED/NIT)

* dispatcher.py: factor _build_send_kwargs helper; document headers kwarg on dispatch_with_callback; reference FAIRNESS_HEADER_NAME symbol instead of bare string; document empty-dict caller-bug semantic
* structure_tool_task.py: narrow _fairness_headers return type; replace 'Phase 6 work' with TODO(UN-3504) anchor
* fairness.py: concrete as_header() docstring with explicit shape
* canary_helpers.py: surface SyntaxError via UserWarning (real silent-failure bug; canaries no longer pass vacuously on unparseable files)
* test_executor_dispatch.py: switch to dict[str, Any] dropping type-ignore; use WorkloadType.NON_API.value instead of invalid 'etl' literal; new test_dispatch_async_omits_headers_when_none; tighten canary docstring + note blind spots; drop plan-stage vocab; reorder relative import; new test_fairness_header_shape_orgless for org_id=None case

Tests: workers 60 -> 62, sdk1 dispatcher 80/80 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [DOCS] Fix iter_production_trees docstring: 'Yield' -> 'Return a list'

Greptile P2: function builds and returns a list — it is not a
generator — but the docstring opened with 'Yield ...', which would
mislead a reader into expecting lazy consumption / generator semantics
(early break, send(), etc.).

Pure docstring fix, no behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3508 [FIX] Address vishnuszipstack review (7 real fixes + bundled test-infra fix)

Seven of Vishnu's findings against ``524ae9184`` addressed. Three
flagged IMPORTANT (silent-failure + missing test coverage), four
SUGGESTION (drift hazard, comment/behaviour mismatch, hollow canary,
duplicate-test cleanup). The other three (hand-built fixture,
SDK1 ``dict[str, Any]`` boundary, ``as_header`` TypedDict refactor)
deferred — see PR thread acknowledgments.

* **#11a (SUGGESTION, drift)** — ``workers/queue_backend/dispatch.py``
  still hand-built the fairness header instead of calling
  ``fairness.as_header()``. Wire-format encoding now has a single
  source so the two sites can't drift. ``FAIRNESS_HEADER_NAME``
  import dropped (no longer used here).

* **#9 (SUGGESTION, comment/behaviour mismatch)** —
  ``ExecutionDispatcher._build_send_kwargs`` was forwarding ``{}``
  as-is despite the docstring calling it "likely indicates a
  producer-side build bug". Changed ``if headers is not None`` ->
  ``if headers``: falsy is dropped so the on-wire shape matches the
  no-headers baseline and a miswired producer surfaces immediately.
  Docstring rewritten to describe the new contract.

* **#3 (SUGGESTION, wording)** — ``canary_helpers`` docstring
  overstated adoption. Softened to "intended single home" and
  named the two characterisation walkers still inlining the logic
  as a known follow-up.

* **#5 + #6 (IMPORTANT cleanup + SUGGESTION fixture)** — six
  structurally-identical ``*_forwards_headers`` /
  ``*_omits_headers_when_none`` tests collapsed to three
  parametrized methods over the three dispatch entry points.
  Fixture now uses ``FairnessKey(...).as_header()`` rather than
  hand-built dicts, so the wire shape exercised matches what real
  producers emit (including ``pipeline_priority``). Net: ~60 LOC
  removed, per-method failure granularity preserved via parametrize
  IDs. Also added empty-dict drop assertions covering #9.

* **#7 (SUGGESTION, missing combined test)** — new
  ``test_dispatch_with_callback_combines_headers_and_callbacks``
  passes ``on_success``, ``on_error``, ``task_id``, and ``headers``
  together and asserts all four land on the same ``send_task``
  call. A key-merge regression in ``_build_send_kwargs`` would
  have slipped through the single-kwarg forwarding tests.

* **#8 (SUGGESTION, hollow canary)** — the
  ``execute_extraction`` dispatch canary only ever asserted the
  empty (passing) case against the live tree. Added a positive-
  detection unit test feeding ``ast.parse`` of a known-bad snippet
  and a blind-spot lock test (constant ref, f-string,
  ``apply_async`` all evade the detector — documenting the scope
  so a future widening intentionally trips the asserts).

* **#4 (IMPORTANT, untested helper)** — ``_fairness_headers`` in
  ``structure_tool_task`` was untested; a regression flipping
  ``NON_API`` -> ``API`` or dropping ``headers=`` at any of the
  three call sites would have stayed green. Added focused unit
  tests in new ``test_structure_tool_task.py`` (wire shape,
  org_id propagation, ``NON_API`` not ``API``) and extended
  ``test_sanity_phase5.TestStructureToolSingleDispatch`` to assert
  ``dispatch.call_args.kwargs["headers"]`` carries the expected
  shape.

* **#2 (IMPORTANT, vacuous-pass)** — ``iter_production_trees``
  warned-and-continued on ``SyntaxError`` but neither canary
  module promoted the warning to error. A botched merge in a
  production file would have dropped silently from the audit set
  and every canary would have passed vacuously over a smaller
  tree. Added ``pytestmark = pytest.mark.filterwarnings(
  "error::UserWarning")`` on both ``test_executor_dispatch`` and
  ``test_fairness_key``, plus a new ``test_canary_helpers.py``
  that unit-tests both the warn-on-broken behaviour and the
  promote-to-error contract the canary modules rely on.

**Bundled test-infra fix (unrelated but unblocks CI):** the
``test_callback_sanity.TestEagerHealthcheckRoundTrip`` tests
selected the healthcheck task via ``endswith(".healthcheck")``
against ``eager_app.tasks``, which is a shared celery global
registry containing ``callback.worker.healthcheck``,
``executor.worker.healthcheck``,
``file_processing.worker.healthcheck`` etc. The bare ``next(...)``
returned whichever was inserted first — non-deterministic across
pytest module-collection orders. Without this fix, the new tests
added in this commit perturb the collection profile enough to
flip the failure rate from ~10% to nearly 100%. Replaced with
exact-name lookup ``name == "callback.worker.healthcheck"``.
Identical fix already landed on the UN-3513 branch (see #2020).

Test count: 31 -> 42 on the UN-3508-touched modules. Full workers
suite: 6 failures pre-existing baseline, unchanged by this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@vishnuszipstack vishnuszipstack left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review — PR Review Toolkit (6 agents)

Ran Comment Analyzer, PR Test Analyzer, Silent Failure Hunter, Type Design Analyzer, Code Reviewer, and Code Simplifier over the chord-callback typing change.

Headline: Code Reviewer found no correctness bugs — the to_dict/from_dict round-trips preserve storage_result/result_data/file_name/skipped, execution_time is computed on every return path of process_file_batch_api, and to_api_dict is never reached by the chord producers, so the canonical-vs-alias field split causes no live data loss. The PR is functionally safe to merge.

The inline comments below are design hardening, silent-failure robustness, and test-coverage findings — none are blocking bugs. Highest-leverage items: make status a derived property and collapse the alias pairs (removes two classes of representable-illegal-state), and add the missing _parse_skipped unknown-value test (the one documented crash-prevention path with zero coverage).

Severity legend: [Important] = should fix, [Suggestion] = nice to have.

Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread unstract/core/src/unstract/core/worker_models.py Outdated
Comment thread unstract/core/src/unstract/core/worker_models.py
Comment thread workers/file_processing/tasks.py
Comment thread workers/file_processing/tasks.py
Comment thread workers/tests/test_chord_callback_boundary.py
Comment thread workers/tests/test_chord_callback_boundary.py
Comment thread workers/tests/test_chord_callback_boundary.py
…ing nit)

Seven of Vishnu's PR review findings addressed, all backward-compat with
main-branch consumers. The three [Important] design-redesign findings
(#1 status __post_init__, #2 alias-pair invariant, #3 to_api_dict/to_json
dead code) are deferred to a follow-up shared-infra dataclass ticket
because they would either fire warning noise on existing call sites
(``worker_base.py:211/222``, ``worker_patterns.py:241`` pass wrong-enum
status) or change the wire/cache contract — neither acceptable
mid-flight while keeping zero regression on main.

Changes in this commit are either:
  * Pure additive (test methods, docstrings, observability)
  * Or provably equivalent wire output (the typed-count refactor)

So a rolling deploy where old workers and new workers run concurrently
sees identical wire shapes and identical behaviour for all current
valid data; the only observable differences are log content (better
context on the existing warning) and the presence of a new opt-in
classmethod that nothing currently calls.

* **Vishnu #8 [Suggestion]** — ``SkipReason`` docstring claimed
  "StrEnum semantics" but the class is ``(str, Enum)``, not
  ``enum.StrEnum``. The two differ on ``__str__``. Rewrote the
  docstring to describe the actual behaviour.

* **Vishnu #4a [Important — log context]** — ``_parse_skipped``
  now accepts an optional ``file_execution_id`` kwarg that
  ``from_dict`` threads through. The warning emitted for unknown
  wire values now carries the file identifier, so a real
  rolling-deploy incident is debuggable rather than a context-free
  warning. Optional kwarg with default — any existing caller passing
  one positional arg still works.

* **Vishnu #9 [Suggestion]** — added
  ``BatchExecutionResult.from_file_results(...)`` classmethod that
  derives counters from typed file results. Purely additive: no
  existing caller uses it; the constructor signature is unchanged
  so producers that need their own counter semantics keep working.

* **Vishnu #11 [Suggestion]** — ``process_file_batch_api`` was
  computing ``skipped_already_completed`` by string-matching the
  wire dicts AFTER already calling ``from_dict`` on them. Refactored
  to count from the typed list (single ``from_dict`` pass, enum
  compare). Provably equivalent for all current wire data.

* **Vishnu #4 [Important — test gap]** — added
  ``test_from_dict_unknown_skipped_is_lenient`` covering the one
  documented crash-prevention path. A regression to bare
  ``SkipReason(raw)`` would have re-introduced the rolling-deploy
  crash and kept every other test green.

* **Vishnu #5 [Important — failure-aggregation gap]** — added
  ``test_process_file_batch_api_batch_wrapper_failure_aggregation``
  that drives one success + one failure through the batch wrapper.
  The existing success-only test never exercised
  ``failed_files += 1``.

* **Vishnu #6 [Important — populated round-trip gap]** — added
  ``test_round_trip_with_populated_file_results`` and
  ``test_from_file_results_derives_counters``. The existing
  ``BatchExecutionResult`` round-trip test used
  ``file_results=[]``, so the list-comprehension in ``from_dict``
  that rebuilds nested ``FileExecutionResult`` objects was never
  executed with a populated list.

* **Vishnu #13 [Suggestion]** — replaced hardcoded line reference
  in test docstring with a symbol reference.

Deferred to follow-up shared-infra dataclass-redesign ticket:
  * #1 ``__post_init__`` status clobber — would emit warning noise
    on every existing wrong-enum call site
  * #2 alias-pair invariant — back-fill via __post_init__ would
    change the wire shape (file_name no longer None → no longer
    stripped at the top level)
  * #3 ``to_api_dict``/``to_json`` dead code — looks like a public
    SDK surface; changing the body could surprise external consumers
  * #7 recursive ``None``-strip in ``serialize_value`` — touches
    every dataclass in the codebase
  * #10 ``Any`` typing tightening — low value, mypy tightening could
    trip downstream
  * #12 producer redundant kwargs — depends on #2's reconciliation

Tests: workers chord-callback boundary suite 21 -> 25; full workers
suite 622 -> 627 (no new failures; 6 pre-existing baseline
unchanged). Five deterministic-order runs of the full suite returned
exactly 627 passed / 6 pre-existing failed — zero flakiness from
this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jun 9, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Unstract test results

Per-group results

Status Group Tier Passed Failed Errors Skipped Duration (s)
unit-connectors unit 64 12 0 3 16.6
unit-core unit 0 0 2 0 1.2
unit-platform-service unit 9 0 1 0 1.3
unit-prompt-service unit 15 0 0 0 19.5
unit-rig unit 53 0 0 0 3.3
unit-runner unit 11 0 0 0 3.1
unit-sdk1 unit 381 0 0 0 23.0
unit-tool-registry unit 0 0 1 0 1.3
unit-workers unit 0 0 0 0 17.3
TOTAL 533 12 4 3 86.7

Critical paths

⚠️ Critical paths not yet covered

  • auth-login — User can log in and obtain a session cookie. (entry: POST /api/v1/auth/login; declared coverage: no groups declared)
  • adapter-register-llm — Register and validate an LLM adapter. (entry: POST /api/v1/adapter/; declared coverage: no groups declared)
  • workflow-create-execute — Create a workflow, configure source+destination, execute, poll, fetch result. (entry: POST /api/v1/workflow/{id}/execute/; declared coverage: e2e-workflow)
  • api-deployment-run — Deploy a workflow as an API, POST a document, receive structured JSON. (entry: POST /deployment/api/{org}/{name}/; declared coverage: e2e-api-deployment)
  • prompt-studio-fetch-response — Prompt Studio: create project, add prompt, run single-pass, get response. (entry: POST /api/v1/prompt-studio/prompt-studio-tool/{id}/fetch_response/; declared coverage: e2e-prompt-studio)
  • pipeline-etl-execute — Run an ETL pipeline from source connector to destination. (entry: POST /api/v1/pipeline/{id}/execute/; declared coverage: no groups declared)
  • usage-token-tracking — Per-execution token usage is recorded and retrievable. (entry: GET /api/v1/usage/get_token_usage/; declared coverage: no groups declared)
  • workflow-execution-fan-out — Multi-file workflow execution fans out to file-processing workers and rejoins. (entry: internal: backend → rabbitmq → workers/file_processing; declared coverage: no groups declared)
  • callback-result-delivery — Async results are posted back via the callback worker. (entry: internal: workers/callback → backend /internal endpoints; declared coverage: no groups declared)
✅ Covered critical paths
  • tool-sandbox-exec — covered by unit-runner

@muhammad-ali-e muhammad-ali-e merged commit a4f321b into main Jun 9, 2026
9 checks passed
@muhammad-ali-e muhammad-ali-e deleted the UN-3513-chord-callback-result-typing branch June 9, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants