Skip to content

feat(sprint-2734): runtime StyleCapturer + componentStyles surface + falsifier harness (RFC 0001 Task B)#50

Merged
Narrator merged 4 commits into
mainfrom
feat/sprint-2734-styles-runtime-and-harness
Jun 7, 2026
Merged

feat(sprint-2734): runtime StyleCapturer + componentStyles surface + falsifier harness (RFC 0001 Task B)#50
Narrator merged 4 commits into
mainfrom
feat/sprint-2734-styles-runtime-and-harness

Conversation

@Narrator
Copy link
Copy Markdown
Member

@Narrator Narrator commented Jun 7, 2026

Summary

Sprint 2734 Task B — ships the runtime half of RFC 0001's two-tier component-style attribution and the falsifier measurement instrument that gates sprint close. PE marked the test-fixtures harness as a hard sprint-close gate; this PR includes it.

What's in here

Runtime + relay + core (RFC 0001 runtime half)

  • @domscribe/core — extends RuntimeContextSchema with optional componentStyles (additive; no schema bump until Task A's build-time styleSource lands).
  • @domscribe/runtime — new StyleCapturer reads a ≤32-property computed-CSS allowlist + resolves --* custom properties from element through ancestors. Wired into ContextCapturer behind the domscribe.config.captureStyles flag (off by default; preserves v0.x payload size). Honors the existing ≤4 KB per-element serialization budget.
  • @domscribe/relay — extends query.bySource response + the MCP tool surface with componentStyles. Tool description rewritten to nudge agents to call this first for styling annotations and to prefer token names over raw values when resolved.

Falsifier harness (HARD SPRINT GATE)

  • packages/domscribe-test-fixtures/styling/tailwind-app/ — Vite React + Tailwind v3 fixture.
  • packages/domscribe-test-fixtures/styling/styled-app/ — Vite React + styled-components v6 fixture.
  • styling/annotations.json — 10 annotations across both fixtures: padding, color tokens, border-radius, typography, flex-gap.
  • styling/baselines/ — canonical-after PNGs (committed; recorded via --mode=record).
  • styling/scripts/falsifier.ts — Playwright + pixelmatch harness with three modes:
    • --mode=self-test (CI default): asserts the mechanism works end-to-end.
    • --mode=record: regenerate baselines after editing an /after route.
    • --mode=measure --agent-output=<dir>: grade an external agent's screenshots.
  • package.json script: test:falsifier. Nx target: falsifier.

Emits one JSON object to stdout: { mode, total, passes, fails, oneShotRate, annotations: [...] }. Exits non-zero on any failure (except record mode).

Task A coordination

Task A's schema PR has not landed at the time of this PR. To keep this PR shippable without blocking, I added only the additive optional componentStyles field. ANNOTATION_SCHEMA_VERSION is unchanged. When Task A's styleSource PR lands, the two additions stack cleanly (both optional). Per the PM's note, this PR also owns the MCP tool description text on query.bySource; Task A owns relay schema definitions for manifest.query / resolve. No file overlap.

What this harness is and is not

It is the measurement instrument the sprint thesis requires — "did the agent's edit produce the right pixels". It is not an agent integration; that's wired post-sprint via --mode=measure. The self-test today exercises the rig itself end-to-end and reports oneShotRate = 1.0, proving the mechanism is sound before any agent metric is gathered against it.

Test plan

  • nx run domscribe-core:test — 100% (existing 137 tests + schema-only changes)
  • nx run domscribe-runtime:test — 544 tests pass (12 new StyleCapturer + 4 new ContextCapturer integration tests)
  • nx run domscribe-relay:test — 306 tests pass (1 updated query.bySource expectation + 1 new componentStyles flow test)
  • tsc --noEmit clean across core / runtime / relay / both fixture apps / falsifier
  • eslint clean on all new files (the 2 pre-existing lint errors in e2e/fixtures.ts are not from this PR — confirmed on main)
  • pnpm test:falsifier reports oneShotRate=1.0 (10/10 self-test, exit 0)
  • pnpm test:falsifier --mode=measure with deliberately-wrong agent output reports oneShotRate=0.1 (1/10) and exits 1 — proves the harness can fail

Self-verification (per staff-swe contract)

  • Schema field is additive + optional → won't conflict with Task A.
  • Style capture defaults off (captureStyles: false) → no payload-size regression for existing v0.x integrations.
  • Style capture respects existing 4 KB budget → verified by style-capturer.spec.ts "drops custom-property tail entries until payload fits maxBytes".
  • Falsifier baselines are deterministic across runs → animations disabled, viewport/locale/timezone/scale locked, asset filenames stable.
  • Pixel-diff tolerance tightened from 0.5% → 0.1% after sanity-test caught a false positive on two mostly-white-background images.

🤖 Generated with Claude Code

Domscribe Staff SWE (bot) and others added 2 commits June 7, 2026 06:58
…face (RFC 0001)

Adds the runtime half of RFC 0001's two-tier component-style attribution.

* @domscribe/core: extends RuntimeContextSchema with optional componentStyles
  (additive — schema bump deferred until Task A's build-time styleSource lands).
* @domscribe/runtime: new StyleCapturer reads a ≤32-property computed allowlist
  + resolves --* CSS custom properties from element through ancestors. Wired
  into ContextCapturer behind the domscribe.config.captureStyles flag
  (off by default). Honors the ≤4KB per-element serialization budget.
* @domscribe/relay: extends QueryBySourceResponse and the MCP query.bySource
  tool to surface componentStyles. Tool description nudges agents to call
  this first for styling annotations.

Tests: 16 new (12 StyleCapturer unit + 4 ContextCapturer integration), 544
runtime tests pass total, 306 relay tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pps + harness

Adds the sprint 2734 hard-gate measurement instrument. Two Vite React fixture
apps (Tailwind v3, styled-components v6), ten styling annotations, and a
Playwright + pixelmatch harness that grades agent output against canonical
baseline screenshots.

* styling/tailwind-app, styling/styled-app: pnpm workspaces, deterministic
  Vite output, animations + caret disabled for pixel-diff stability.
* styling/annotations.json: 10 annotations (5 per fixture) covering padding,
  color tokens, border-radius, typography, and flex-gap fixes.
* styling/baselines/: canonical-after PNGs committed (recorded via
  --mode=record).
* styling/scripts/falsifier.ts: three modes (self-test default for CI,
  record, measure --agent-output=<dir>). Emits machine-readable JSON
  with oneShotRate metric to stdout. Exits non-zero on fails.
* package.json: test:falsifier + test:falsifier:record scripts.
* project.json: nx falsifier + falsifier:record targets.

Self-test currently reports 10/10 oneShotRate=1.0, verifying the
mechanism. Real-agent integration is plumbing for a follow-on sprint
(--mode=measure already wired).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Narrator Narrator marked this pull request as ready for review June 7, 2026 13:59
Task A (#49) landed on main while this PR was open and added the same
optional componentStyles field to RuntimeContextSchema plus the
COMPONENT_STYLES_ALLOWLIST constant and a v1→v2 schema version bump in
@domscribe/core. Resolved annotation.ts by taking main's version
(Task A's additions are a functional superset — same shape for
ComponentStylesSchema and componentStyles, plus the version-history
note, the allowlist constant, and ComponentStylesAllowlist type).

All other changes auto-merged cleanly: this PR's StyleCapturer, relay
componentStyles surface, and falsifier harness sit on top of Task A's
build-time style-source attribution.

Note for follow-up: there are now two allowlists living in two places —
COMPONENT_STYLES_ALLOWLIST in @domscribe/core (32 entries from Task A)
and STYLE_CAPTURE_ALLOWLIST in @domscribe/runtime (31 entries from
this PR). They overlap but are not identical. The schema field is
record<string,string> so neither is enforced; this is a documentation/
contract divergence, not a runtime defect. PE/PM should pick one
canonical list in a follow-up.

Post-merge verification (Nx targets on the merge commit):
  - domscribe-core:test     → pass (all suites green, 100% coverage on annotation.ts)
  - domscribe-runtime:test  → pass (548 tests)
  - domscribe-relay:test    → pass
  - domscribe-transform:test → pass
  - typecheck across all 4 → clean
  - lint across all 4      → clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Jun 7, 2026

View your CI Pipeline Execution ↗ for commit d668904

Command Status Duration Result
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 46s View ↗
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 36s View ↗
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 42s View ↗
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 37s View ↗
nx run domscribe-test-fixtures:install-fixture-... ✅ Succeeded 46s View ↗
nx run domscribe-test-fixtures:install-fixture-... ✅ Succeeded 46s View ↗
nx run domscribe-test-fixtures:integration--vit... ✅ Succeeded 32s View ↗
nx run domscribe-test-fixtures:integration--vit... ✅ Succeeded 34s View ↗
Additional runs (18) ✅ Succeeded ... View ↗

💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗


☁️ Nx Cloud last updated this comment at 2026-06-07 14:21:25 UTC

The Nx @nx/js/typescript plugin auto-generates a typecheck target that
runs `tsc --build --emitDeclarationOnly`. The styling fixture apps
(`@domscribe/styling-fixture-tailwind`, `@domscribe/styling-fixture-styled`)
use a Vite app tsconfig with `noEmit: true` and neither `declaration`
nor `composite`, so the auto-generated command fails with TS5069.

The parent `domscribe-test-fixtures` already opts out of typecheck via
`nx:noop`; these nested apps are separate Nx projects and didn't inherit
that. Adding a minimal `project.json` for each that overrides typecheck
to `tsc --noEmit` (which the PR author already verified is clean)
restores the CI green light without weakening type safety.

Verified locally:
  npx nx run-many -t lint test build typecheck --exclude domscribe-test-fixtures
  → Successfully ran targets lint, test, build, typecheck for 14 projects

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Narrator Narrator merged commit a171724 into main Jun 7, 2026
24 checks passed
@Narrator Narrator deleted the feat/sprint-2734-styles-runtime-and-harness branch June 7, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant