feat(sprint-2734): runtime StyleCapturer + componentStyles surface + falsifier harness (RFC 0001 Task B)#50
Merged
Conversation
…face (RFC 0001) Adds the runtime half of RFC 0001's two-tier component-style attribution. * @domscribe/core: extends RuntimeContextSchema with optional componentStyles (additive — schema bump deferred until Task A's build-time styleSource lands). * @domscribe/runtime: new StyleCapturer reads a ≤32-property computed allowlist + resolves --* CSS custom properties from element through ancestors. Wired into ContextCapturer behind the domscribe.config.captureStyles flag (off by default). Honors the ≤4KB per-element serialization budget. * @domscribe/relay: extends QueryBySourceResponse and the MCP query.bySource tool to surface componentStyles. Tool description nudges agents to call this first for styling annotations. Tests: 16 new (12 StyleCapturer unit + 4 ContextCapturer integration), 544 runtime tests pass total, 306 relay tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pps + harness Adds the sprint 2734 hard-gate measurement instrument. Two Vite React fixture apps (Tailwind v3, styled-components v6), ten styling annotations, and a Playwright + pixelmatch harness that grades agent output against canonical baseline screenshots. * styling/tailwind-app, styling/styled-app: pnpm workspaces, deterministic Vite output, animations + caret disabled for pixel-diff stability. * styling/annotations.json: 10 annotations (5 per fixture) covering padding, color tokens, border-radius, typography, and flex-gap fixes. * styling/baselines/: canonical-after PNGs committed (recorded via --mode=record). * styling/scripts/falsifier.ts: three modes (self-test default for CI, record, measure --agent-output=<dir>). Emits machine-readable JSON with oneShotRate metric to stdout. Exits non-zero on fails. * package.json: test:falsifier + test:falsifier:record scripts. * project.json: nx falsifier + falsifier:record targets. Self-test currently reports 10/10 oneShotRate=1.0, verifying the mechanism. Real-agent integration is plumbing for a follow-on sprint (--mode=measure already wired). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Task A (#49) landed on main while this PR was open and added the same optional componentStyles field to RuntimeContextSchema plus the COMPONENT_STYLES_ALLOWLIST constant and a v1→v2 schema version bump in @domscribe/core. Resolved annotation.ts by taking main's version (Task A's additions are a functional superset — same shape for ComponentStylesSchema and componentStyles, plus the version-history note, the allowlist constant, and ComponentStylesAllowlist type). All other changes auto-merged cleanly: this PR's StyleCapturer, relay componentStyles surface, and falsifier harness sit on top of Task A's build-time style-source attribution. Note for follow-up: there are now two allowlists living in two places — COMPONENT_STYLES_ALLOWLIST in @domscribe/core (32 entries from Task A) and STYLE_CAPTURE_ALLOWLIST in @domscribe/runtime (31 entries from this PR). They overlap but are not identical. The schema field is record<string,string> so neither is enforced; this is a documentation/ contract divergence, not a runtime defect. PE/PM should pick one canonical list in a follow-up. Post-merge verification (Nx targets on the merge commit): - domscribe-core:test → pass (all suites green, 100% coverage on annotation.ts) - domscribe-runtime:test → pass (548 tests) - domscribe-relay:test → pass - domscribe-transform:test → pass - typecheck across all 4 → clean - lint across all 4 → clean Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
View your CI Pipeline Execution ↗ for commit d668904
💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗ ☁️ Nx Cloud last updated this comment at |
The Nx @nx/js/typescript plugin auto-generates a typecheck target that runs `tsc --build --emitDeclarationOnly`. The styling fixture apps (`@domscribe/styling-fixture-tailwind`, `@domscribe/styling-fixture-styled`) use a Vite app tsconfig with `noEmit: true` and neither `declaration` nor `composite`, so the auto-generated command fails with TS5069. The parent `domscribe-test-fixtures` already opts out of typecheck via `nx:noop`; these nested apps are separate Nx projects and didn't inherit that. Adding a minimal `project.json` for each that overrides typecheck to `tsc --noEmit` (which the PR author already verified is clean) restores the CI green light without weakening type safety. Verified locally: npx nx run-many -t lint test build typecheck --exclude domscribe-test-fixtures → Successfully ran targets lint, test, build, typecheck for 14 projects Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sprint 2734 Task B — ships the runtime half of RFC 0001's two-tier component-style attribution and the falsifier measurement instrument that gates sprint close. PE marked the test-fixtures harness as a hard sprint-close gate; this PR includes it.
What's in here
Runtime + relay + core (RFC 0001 runtime half)
@domscribe/core— extendsRuntimeContextSchemawith optionalcomponentStyles(additive; no schema bump until Task A's build-timestyleSourcelands).@domscribe/runtime— newStyleCapturerreads a ≤32-property computed-CSS allowlist + resolves--*custom properties from element through ancestors. Wired intoContextCapturerbehind thedomscribe.config.captureStylesflag (off by default; preserves v0.x payload size). Honors the existing ≤4 KB per-element serialization budget.@domscribe/relay— extendsquery.bySourceresponse + the MCP tool surface withcomponentStyles. Tool description rewritten to nudge agents to call this first for styling annotations and to prefer token names over raw values when resolved.Falsifier harness (HARD SPRINT GATE)
packages/domscribe-test-fixtures/styling/tailwind-app/— Vite React + Tailwind v3 fixture.packages/domscribe-test-fixtures/styling/styled-app/— Vite React + styled-components v6 fixture.styling/annotations.json— 10 annotations across both fixtures: padding, color tokens, border-radius, typography, flex-gap.styling/baselines/— canonical-after PNGs (committed; recorded via--mode=record).styling/scripts/falsifier.ts— Playwright + pixelmatch harness with three modes:--mode=self-test(CI default): asserts the mechanism works end-to-end.--mode=record: regenerate baselines after editing an/afterroute.--mode=measure --agent-output=<dir>: grade an external agent's screenshots.package.jsonscript:test:falsifier. Nx target:falsifier.Emits one JSON object to stdout:
{ mode, total, passes, fails, oneShotRate, annotations: [...] }. Exits non-zero on any failure (exceptrecordmode).Task A coordination
Task A's schema PR has not landed at the time of this PR. To keep this PR shippable without blocking, I added only the additive optional
componentStylesfield.ANNOTATION_SCHEMA_VERSIONis unchanged. When Task A'sstyleSourcePR lands, the two additions stack cleanly (both optional). Per the PM's note, this PR also owns the MCP tool description text onquery.bySource; Task A owns relay schema definitions formanifest.query/resolve. No file overlap.What this harness is and is not
It is the measurement instrument the sprint thesis requires — "did the agent's edit produce the right pixels". It is not an agent integration; that's wired post-sprint via
--mode=measure. The self-test today exercises the rig itself end-to-end and reportsoneShotRate = 1.0, proving the mechanism is sound before any agent metric is gathered against it.Test plan
nx run domscribe-core:test— 100% (existing 137 tests + schema-only changes)nx run domscribe-runtime:test— 544 tests pass (12 new StyleCapturer + 4 new ContextCapturer integration tests)nx run domscribe-relay:test— 306 tests pass (1 updated query.bySource expectation + 1 newcomponentStylesflow test)tsc --noEmitclean across core / runtime / relay / both fixture apps / falsifiereslintclean on all new files (the 2 pre-existing lint errors ine2e/fixtures.tsare not from this PR — confirmed onmain)pnpm test:falsifierreportsoneShotRate=1.0(10/10 self-test, exit 0)pnpm test:falsifier --mode=measurewith deliberately-wrong agent output reportsoneShotRate=0.1(1/10) and exits 1 — proves the harness can failSelf-verification (per staff-swe contract)
captureStyles: false) → no payload-size regression for existing v0.x integrations.style-capturer.spec.ts"drops custom-property tail entries until payload fits maxBytes".🤖 Generated with Claude Code