feat(sprint-2734): runtime StyleCapturer + componentStyles surface + falsifier harness (RFC 0001 Task B) by Narrator · Pull Request #50 · patchorbit/domscribe

Narrator · 2026-06-07T13:59:44Z

Summary

Sprint 2734 Task B — ships the runtime half of RFC 0001's two-tier component-style attribution and the falsifier measurement instrument that gates sprint close. PE marked the test-fixtures harness as a hard sprint-close gate; this PR includes it.

What's in here

Runtime + relay + core (RFC 0001 runtime half)

@domscribe/core — extends RuntimeContextSchema with optional componentStyles (additive; no schema bump until Task A's build-time styleSource lands).
@domscribe/runtime — new StyleCapturer reads a ≤32-property computed-CSS allowlist + resolves --* custom properties from element through ancestors. Wired into ContextCapturer behind the domscribe.config.captureStyles flag (off by default; preserves v0.x payload size). Honors the existing ≤4 KB per-element serialization budget.
@domscribe/relay — extends query.bySource response + the MCP tool surface with componentStyles. Tool description rewritten to nudge agents to call this first for styling annotations and to prefer token names over raw values when resolved.

Falsifier harness (HARD SPRINT GATE)

packages/domscribe-test-fixtures/styling/tailwind-app/ — Vite React + Tailwind v3 fixture.
packages/domscribe-test-fixtures/styling/styled-app/ — Vite React + styled-components v6 fixture.
styling/annotations.json — 10 annotations across both fixtures: padding, color tokens, border-radius, typography, flex-gap.
styling/baselines/ — canonical-after PNGs (committed; recorded via --mode=record).
styling/scripts/falsifier.ts — Playwright + pixelmatch harness with three modes:
- --mode=self-test (CI default): asserts the mechanism works end-to-end.
- --mode=record: regenerate baselines after editing an /after route.
- --mode=measure --agent-output=<dir>: grade an external agent's screenshots.
package.json script: test:falsifier. Nx target: falsifier.

Emits one JSON object to stdout: { mode, total, passes, fails, oneShotRate, annotations: [...] }. Exits non-zero on any failure (except record mode).

Task A coordination

Task A's schema PR has not landed at the time of this PR. To keep this PR shippable without blocking, I added only the additive optional componentStyles field. ANNOTATION_SCHEMA_VERSION is unchanged. When Task A's styleSource PR lands, the two additions stack cleanly (both optional). Per the PM's note, this PR also owns the MCP tool description text on query.bySource; Task A owns relay schema definitions for manifest.query / resolve. No file overlap.

What this harness is and is not

It is the measurement instrument the sprint thesis requires — "did the agent's edit produce the right pixels". It is not an agent integration; that's wired post-sprint via --mode=measure. The self-test today exercises the rig itself end-to-end and reports oneShotRate = 1.0, proving the mechanism is sound before any agent metric is gathered against it.

Test plan

nx run domscribe-core:test — 100% (existing 137 tests + schema-only changes)
nx run domscribe-runtime:test — 544 tests pass (12 new StyleCapturer + 4 new ContextCapturer integration tests)
nx run domscribe-relay:test — 306 tests pass (1 updated query.bySource expectation + 1 new componentStyles flow test)
tsc --noEmit clean across core / runtime / relay / both fixture apps / falsifier
eslint clean on all new files (the 2 pre-existing lint errors in e2e/fixtures.ts are not from this PR — confirmed on main)
pnpm test:falsifier reports oneShotRate=1.0 (10/10 self-test, exit 0)
pnpm test:falsifier --mode=measure with deliberately-wrong agent output reports oneShotRate=0.1 (1/10) and exits 1 — proves the harness can fail

Self-verification (per staff-swe contract)

Schema field is additive + optional → won't conflict with Task A.
Style capture defaults off (captureStyles: false) → no payload-size regression for existing v0.x integrations.
Style capture respects existing 4 KB budget → verified by style-capturer.spec.ts "drops custom-property tail entries until payload fits maxBytes".
Falsifier baselines are deterministic across runs → animations disabled, viewport/locale/timezone/scale locked, asset filenames stable.
Pixel-diff tolerance tightened from 0.5% → 0.1% after sanity-test caught a false positive on two mostly-white-background images.

🤖 Generated with Claude Code

…face (RFC 0001) Adds the runtime half of RFC 0001's two-tier component-style attribution. * @domscribe/core: extends RuntimeContextSchema with optional componentStyles (additive — schema bump deferred until Task A's build-time styleSource lands). * @domscribe/runtime: new StyleCapturer reads a ≤32-property computed allowlist + resolves --* CSS custom properties from element through ancestors. Wired into ContextCapturer behind the domscribe.config.captureStyles flag (off by default). Honors the ≤4KB per-element serialization budget. * @domscribe/relay: extends QueryBySourceResponse and the MCP query.bySource tool to surface componentStyles. Tool description nudges agents to call this first for styling annotations. Tests: 16 new (12 StyleCapturer unit + 4 ContextCapturer integration), 544 runtime tests pass total, 306 relay tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…pps + harness Adds the sprint 2734 hard-gate measurement instrument. Two Vite React fixture apps (Tailwind v3, styled-components v6), ten styling annotations, and a Playwright + pixelmatch harness that grades agent output against canonical baseline screenshots. * styling/tailwind-app, styling/styled-app: pnpm workspaces, deterministic Vite output, animations + caret disabled for pixel-diff stability. * styling/annotations.json: 10 annotations (5 per fixture) covering padding, color tokens, border-radius, typography, and flex-gap fixes. * styling/baselines/: canonical-after PNGs committed (recorded via --mode=record). * styling/scripts/falsifier.ts: three modes (self-test default for CI, record, measure --agent-output=<dir>). Emits machine-readable JSON with oneShotRate metric to stdout. Exits non-zero on fails. * package.json: test:falsifier + test:falsifier:record scripts. * project.json: nx falsifier + falsifier:record targets. Self-test currently reports 10/10 oneShotRate=1.0, verifying the mechanism. Real-agent integration is plumbing for a follow-on sprint (--mode=measure already wired). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Task A (#49) landed on main while this PR was open and added the same optional componentStyles field to RuntimeContextSchema plus the COMPONENT_STYLES_ALLOWLIST constant and a v1→v2 schema version bump in @domscribe/core. Resolved annotation.ts by taking main's version (Task A's additions are a functional superset — same shape for ComponentStylesSchema and componentStyles, plus the version-history note, the allowlist constant, and ComponentStylesAllowlist type). All other changes auto-merged cleanly: this PR's StyleCapturer, relay componentStyles surface, and falsifier harness sit on top of Task A's build-time style-source attribution. Note for follow-up: there are now two allowlists living in two places — COMPONENT_STYLES_ALLOWLIST in @domscribe/core (32 entries from Task A) and STYLE_CAPTURE_ALLOWLIST in @domscribe/runtime (31 entries from this PR). They overlap but are not identical. The schema field is record<string,string> so neither is enforced; this is a documentation/ contract divergence, not a runtime defect. PE/PM should pick one canonical list in a follow-up. Post-merge verification (Nx targets on the merge commit): - domscribe-core:test → pass (all suites green, 100% coverage on annotation.ts) - domscribe-runtime:test → pass (548 tests) - domscribe-relay:test → pass - domscribe-transform:test → pass - typecheck across all 4 → clean - lint across all 4 → clean Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

nx-cloud · 2026-06-07T14:09:46Z

View your CI Pipeline Execution ↗ for commit d668904

Command	Status	Duration	Result
`nx run domscribe-test-fixtures:integration--web...`	✅ Succeeded	1m 46s	View ↗
`nx run domscribe-test-fixtures:integration--web...`	✅ Succeeded	1m 36s	View ↗
`nx run domscribe-test-fixtures:integration--web...`	✅ Succeeded	1m 42s	View ↗
`nx run domscribe-test-fixtures:integration--web...`	✅ Succeeded	1m 37s	View ↗
`nx run domscribe-test-fixtures:install-fixture-...`	✅ Succeeded	46s	View ↗
`nx run domscribe-test-fixtures:install-fixture-...`	✅ Succeeded	46s	View ↗
`nx run domscribe-test-fixtures:integration--vit...`	✅ Succeeded	32s	View ↗
`nx run domscribe-test-fixtures:integration--vit...`	✅ Succeeded	34s	View ↗
`Additional runs (18)`	✅ Succeeded	...	View ↗

💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗

☁️ Nx Cloud last updated this comment at 2026-06-07 14:21:25 UTC

The Nx @nx/js/typescript plugin auto-generates a typecheck target that runs `tsc --build --emitDeclarationOnly`. The styling fixture apps (`@domscribe/styling-fixture-tailwind`, `@domscribe/styling-fixture-styled`) use a Vite app tsconfig with `noEmit: true` and neither `declaration` nor `composite`, so the auto-generated command fails with TS5069. The parent `domscribe-test-fixtures` already opts out of typecheck via `nx:noop`; these nested apps are separate Nx projects and didn't inherit that. Adding a minimal `project.json` for each that overrides typecheck to `tsc --noEmit` (which the PR author already verified is clean) restores the CI green light without weakening type safety. Verified locally: npx nx run-many -t lint test build typecheck --exclude domscribe-test-fixtures → Successfully ran targets lint, test, build, typecheck for 14 projects Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Domscribe Staff SWE (bot) and others added 2 commits June 7, 2026 06:58

Narrator marked this pull request as ready for review June 7, 2026 13:59

Narrator merged commit a171724 into main Jun 7, 2026
24 checks passed

Narrator deleted the feat/sprint-2734-styles-runtime-and-harness branch June 7, 2026 14:23

This was referenced Jun 8, 2026

Sprint 3071 Task A — RFC 0001 baseline + @domscribe/core schema (v3) + @domscribe/verify scaffold #51

Closed

feat: verify_after_edit MCP tool (RFC 0002, sprint 3071 Task B + vendored Task A) #53

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sprint-2734): runtime StyleCapturer + componentStyles surface + falsifier harness (RFC 0001 Task B)#50

feat(sprint-2734): runtime StyleCapturer + componentStyles surface + falsifier harness (RFC 0001 Task B)#50
Narrator merged 4 commits into
mainfrom
feat/sprint-2734-styles-runtime-and-harness

Narrator commented Jun 7, 2026

Uh oh!

nx-cloud Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Narrator commented Jun 7, 2026

Summary

What's in here

Task A coordination

What this harness is and is not

Test plan

Self-verification (per staff-swe contract)

Uh oh!

nx-cloud Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nx-cloud Bot commented Jun 7, 2026 •

edited

Loading