patchorbit · Narrator · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026
diff --git a/docs/sprints/3071-rfc-0001-baseline.harness-self-test.json b/docs/sprints/3071-rfc-0001-baseline.harness-self-test.json
@@ -0,0 +1,83 @@
+
+> @domscribe/test-fixtures@0.0.1 test:falsifier /tmp/wt-sprint-3071-task-a/packages/domscribe-test-fixtures
+> tsx styling/scripts/falsifier.ts
+
+{
+  "mode": "self-test",
+  "total": 10,
+  "passes": 10,
+  "fails": 0,
+  "oneShotRate": 1,
+  "annotations": [
+    {
+      "id": "A001",
+      "fixture": "tailwind",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A002",
+      "fixture": "tailwind",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A003",
+      "fixture": "tailwind",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A004",
+      "fixture": "tailwind",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A005",
+      "fixture": "tailwind",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A101",
+      "fixture": "styled",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A102",
+      "fixture": "styled",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A103",
+      "fixture": "styled",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A104",
+      "fixture": "styled",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    },
+    {
+      "id": "A105",
+      "fixture": "styled",
+      "passed": true,
+      "pixelDiffRatio": 0,
+      "diffPixels": 0
+    }
+  ]
+}
diff --git a/docs/sprints/3071-rfc-0001-baseline.md b/docs/sprints/3071-rfc-0001-baseline.md
@@ -0,0 +1,100 @@
+# Sprint 3071 — RFC 0001 baseline + positioning verdict
+
+**Author:** Staff SWE (sprint run 3071, issue #51)
+**Date:** 2026-06-08
+**Decides:** the positioning language for `verify_after_edit` (RFC 0002).
+**Does not decide:** whether to ship `verify_after_edit` — that bet is made by the DOP memo and RFC 0002; this doc only sequences how it is framed.
+
+---
+
+## TL;DR
+
+| Quantity                                                    | Value                                   | Source                                          |
+| ----------------------------------------------------------- | --------------------------------------- | ----------------------------------------------- |
+| RFC 0001 falsifier (≥70% agent one-shot styling completion) | **unmeasured**                          | no agent-integration harness exists on `main`   |
+| RFC 0001 mechanism self-test                                | **10/10 (100%), 0 pixel diff**          | `styling/scripts/falsifier.ts --mode=self-test` |
+| Positioning verdict                                         | **self-correction layer (<85% branch)** | conservative default in absence of measurement  |
+| Slack alert (≥85% trigger)                                  | **not posted**                          | threshold neither met nor measurable            |
+
+The lift of the comparator into `@domscribe/verify` (this PR, Task A3) is independently validated by the self-test: the harness re-imports the comparator and continues to grade all 10 baselines at 0 pixel diff.
+
+## What the harness can measure today
+
+The RFC 0001 falsifier harness (`packages/domscribe-test-fixtures/styling/scripts/falsifier.ts`) supports three modes:
+
+1. **`self-test`** — builds the Tailwind and styled-components fixture apps, screenshots each annotation's `afterRoute`, and diffs against the committed baseline. Expected pass rate is **100% by design** — this is the harness's own correctness check, not a measurement of agent capability. The README is explicit:
+
+   > It does not invoke an agent. The agent-integration loop is built on top of this — see `--mode=measure`.
+
+2. **`record`** — re-captures the baseline PNGs from the canonical `/after` routes.
+
+3. **`measure --agent-output=<dir>`** — production grading: reads one screenshot per annotation from an external directory (produced by an agent-integration harness) and diffs against the baseline. **This is the mode that would actually answer "what is the agent's one-shot styling completion rate?"**
+
+## What is missing
+
+The agent-integration loop required to run `--mode=measure` does **not** exist on `main`. Specifically, there is no harness that:
+
+- Reads each annotation from `styling/annotations.json`,
+- Drives an agent (Claude / Codex / similar) through the edit using the intent + source-file context,
+- Boots the fixture from the post-edit source,
+- Screenshots the rendered element into a per-annotation PNG,
+- Hands the directory to `falsifier.ts --mode=measure`.
+
+Until that loop exists, the inherited RFC 0001 falsifier (≥70% one-shot agent styling completion by sprint 2734+6) is **unmeasured**. The self-test pass rate is structurally **not** a substitute — the self-test screenshots the canonical-after route, not an agent's edit, so it cannot fall below 100% no matter how poorly an agent would perform.
+
+## Self-test result (mechanism-only)
+
+```
+mode=self-test, total=10, passes=10, fails=0, oneShotRate=1.0
+all annotations: pixelDiffRatio=0, diffPixels=0
+```
+
+Raw JSON: [`3071-rfc-0001-baseline.harness-self-test.json`](./3071-rfc-0001-baseline.harness-self-test.json).
+
+The 100% pass rate means:
+
+- The Vite build for both fixtures is reproducible.
+- Chromium + screenshot capture is locale/font/viewport-deterministic in this CI environment.
+- The lifted comparator in `@domscribe/verify` (this PR) diffs identically to the inline version it replaces — none of the 10 baseline diffs shifted off zero.
+
+The 100% pass rate **does not mean** the agent's one-shot styling completion rate is 100%. That number is unknown.
+
+## Methodology
+
+- **Where:** ephemeral dev sandbox; node v20.19.4; pnpm 9.12.0; playwright 1.58.2 (chrome-headless-shell 1208); locale `en-US`, timezone `UTC`, viewport `800×600`, scale 1, animations disabled (matches the harness defaults).
+- **Source:** worktree at `origin/main@a171724` (RFC 0001 Task B merge), plus the `@domscribe/verify` lift introduced by this PR.
+- **Command:** `pnpm --filter @domscribe/test-fixtures test:falsifier`.
+- **Reproducibility:** the same command on the same commit on a CI runner with the documented Playwright cache returns the same JSON. Re-recording baselines (`--mode=record`) would only be needed if the canonical-after routes or the Chromium build changes.
+
+## Positioning verdict
+
+Per RFC 0002 §Implications-for-PM and issue #51, the baseline gates how `verify_after_edit` is framed:
+
+- **≥85% → trust layer.** Verify catches the long tail; the build is conservative; PM may consider deferring relay registration (Task B) if capacity is tight.
+- **<85% → self-correction layer.** Verify is load-bearing for the value loop; the full build proceeds.
+
+The baseline is unmeasured. The conservative default in the absence of measurement is the **self-correction layer** branch — we cannot justify treating verify as a long-tail polish layer when we have no evidence the short tail is solved. The full build proceeds; the Slack alert (which fires only on ≥85%) is **not** posted.
+
+## What this means for Task B
+
+No change. Task B (runtime `ScreenshotCapturer` + relay `verify_after_edit` MCP tool) ships as planned, soft-recommended in MCP prompts, no lifecycle gate. The package-level value of `@domscribe/verify` is independent of the agent one-shot rate — the harness already consumes it, and the relay tool will consume it on the same contract.
+
+## Follow-up — agent-integration harness (next sprint)
+
+The cleanest way to retire this measurement gap is to add `--mode=agent` (or a separate driver script under `styling/scripts/`) that:
+
+1. For each annotation in `annotations.json`, spawns the agent under test with a fixed prompt (intent + sourceFile + sourceLine + the merged RFC 0001 `styleSource` + `componentStyles`).
+2. Applies the agent's edit to a scratch copy of the fixture, builds it, screenshots `afterRoute`.
+3. Writes `<id>.png` into a deterministic agent-output directory.
+4. Invokes the existing `--mode=measure` with that directory.
+
+This is the prerequisite for measuring both the inherited RFC 0001 falsifier (≥70% one-shot) **and** the RFC 0002 falsifier (≥60% retry-resolution rate). Sized as a separate sprint task; out of scope for issue #51 (per the issue's "Out of scope" enumeration, which lists agent-side work as a P1 follow-up rather than in-scope).
+
+## References
+
+- [RFC 0001 — Two-tier component-style attribution](../rfcs/0001-component-styles-capture.md)
+- [RFC 0002 — Post-edit verification as an MCP diagnostic tool](../rfcs/0002-post-edit-verify-mcp-tool.md)
+- Issue [#51](https://github.com/patchorbit/domscribe/issues/51), Issue [#52](https://github.com/patchorbit/domscribe/issues/52)
+- PRs [#49](https://github.com/patchorbit/domscribe/pull/49), [#50](https://github.com/patchorbit/domscribe/pull/50) (RFC 0001 Tasks A and B)
+- Harness source: [`packages/domscribe-test-fixtures/styling/scripts/falsifier.ts`](../../packages/domscribe-test-fixtures/styling/scripts/falsifier.ts)
+- Harness README: [`packages/domscribe-test-fixtures/styling/README.md`](../../packages/domscribe-test-fixtures/styling/README.md)
diff --git a/eslint.config.mjs b/eslint.config.mjs
@@ -43,6 +43,19 @@ export default [
                 'scope:adapter',
               ],
             },
+            {
+              // scope:test consumes the same packages adapters do — it
+              // grades them. Notably, `@domscribe/test-fixtures` now imports
+              // `@domscribe/verify` (scope:infra) so the harness and the
+              // relay verify_after_edit tool share one comparator.
+              sourceTag: 'scope:test',
+              onlyDependOnLibsWithTags: [
+                'scope:core',
+                'scope:infra',
+                'scope:build',
+                'scope:adapter',
+              ],
+            },
           ],
         },
       ],

diff --git a/packages/domscribe-core/src/lib/migrations/annotation-migrations.spec.ts b/packages/domscribe-core/src/lib/migrations/annotation-migrations.spec.ts
@@ -141,4 +141,31 @@ describe('migrateAnnotation', () => {
 
     expect(result.context.runtimeContext).toBeUndefined();
   });
+
+  it('should migrate a v2 annotation up to v3 (additive verifyHistory, no field rewrite)', () => {
+    // Simulates a v2 annotation persisted between RFC 0001 (v1→v2) and
+    // RFC 0002 (v2→v3). The v2 → v3 step is purely additive (verifyHistory
+    // is a new optional field) — pre-existing runtimeContext data must
+    // survive untouched.
+    const raw = buildRawAnnotation({
+      metadata: { schemaVersion: 2 },
+      context: {
+        pageUrl: 'http://localhost:3000',
+        pageTitle: 'Test',
+        viewport: { width: 1920, height: 1080 },
+        userAgent: 'test-agent',
+        runtimeContext: {
+          componentStyles: { computed: { padding: '16px' } },
+        },
+      },
+    });
+
+    const result = migrateAnnotation(raw);
+
+    expect(result.metadata.schemaVersion).toBe(ANNOTATION_SCHEMA_VERSION);
+    expect(result.context.runtimeContext).toEqual({
+      componentStyles: { computed: { padding: '16px' } },
+    });
+    expect(result.context.verifyHistory).toBeUndefined();
+  });
 });
diff --git a/packages/domscribe-core/src/lib/migrations/annotation-migrations.ts b/packages/domscribe-core/src/lib/migrations/annotation-migrations.ts
@@ -31,6 +31,16 @@ const migrationSteps: Record<number, (data: Record<string, unknown>) => void> =
     1: () => {
       // No-op: v1 → v2 is purely additive.
     },
+    /**
+     * v2 → v3: additive only (per RFC 0002).
+     *
+     * v3 adds optional `context.verifyHistory` (an array of `VerifyResult`
+     * records emitted by the `verify_after_edit` MCP tool). The field is
+     * absent on v2 payloads, so no field rewriting is required.
+     */
+    2: () => {
+      // No-op: v2 → v3 is purely additive.
+    },
   };
 
 /**

diff --git a/packages/domscribe-core/src/lib/types/annotation.spec.ts b/packages/domscribe-core/src/lib/types/annotation.spec.ts
@@ -0,0 +1,117 @@
+/**
+ * Schema tests for RFC 0002 additions to @domscribe/core.
+ *
+ * Covers the additive surface: VerifyResultSchema, AnnotationContext.verifyHistory,
+ * and the v3 schema-version bump. The pre-RFC 0002 annotation shape is exercised
+ * exhaustively in `annotation-migrations.spec.ts` and the wider integration
+ * suites; this spec is scoped to the new fields.
+ */
+
+import { describe, it, expect } from 'vitest';
+import {
+  ANNOTATION_SCHEMA_VERSION,
+  AnnotationContextSchema,
+  VerifyResultSchema,
+  VerifyVerdictSchema,
+} from './annotation.js';
+
+describe('ANNOTATION_SCHEMA_VERSION', () => {
+  it('is at v3 (RFC 0002 — verifyHistory)', () => {
+    expect(ANNOTATION_SCHEMA_VERSION).toBe(3);
+  });
+});
+
+describe('VerifyVerdictSchema', () => {
+  it.each(['match', 'partial', 'no_change', 'regression'] as const)(
+    'accepts %s',
+    (verdict) => {
+      expect(VerifyVerdictSchema.parse(verdict)).toBe(verdict);
+    },
+  );
+
+  it('rejects unknown verdicts', () => {
+    expect(() => VerifyVerdictSchema.parse('ok')).toThrow();
+  });
+});
+
+describe('VerifyResultSchema', () => {
+  it('parses a minimal match result (verdict + timestamp only)', () => {
+    const parsed = VerifyResultSchema.parse({
+      verdict: 'match',
+      timestamp: '2026-06-08T12:00:00.000Z',
+    });
+    expect(parsed.verdict).toBe('match');
+    expect(parsed.componentStylesDelta).toBeUndefined();
+    expect(parsed.screenshotRef).toBeUndefined();
+  });
+
+  it('parses a fully-populated partial result with all delta arrays', () => {
+    const parsed = VerifyResultSchema.parse({
+      verdict: 'partial',
+      timestamp: '2026-06-08T12:00:00.000Z',
+      pixelDiffRatio: 0.012,
+      componentStylesDelta: [
+        { property: 'padding', before: '16px', after: '24px' },
+      ],
+      computedStyleDelta: [
+        { property: 'background-color', before: null, after: 'rgb(0, 0, 0)' },
+      ],
+      boundingRectDelta: [{ field: 'height', before: 32, after: 40 }],
+      screenshotRef: 'blob://relay/ann_x/post-edit-1.png',
+      notes: 'padding matched intent; background-color regressed',
+    });
+    expect(parsed.componentStylesDelta).toHaveLength(1);
+    expect(parsed.boundingRectDelta?.[0]?.field).toBe('height');
+    expect(parsed.screenshotRef).toMatch(/^blob:\/\//);
+  });
+
+  it('rejects out-of-range pixelDiffRatio', () => {
+    expect(() =>
+      VerifyResultSchema.parse({
+        verdict: 'match',
+        timestamp: '2026-06-08T12:00:00.000Z',
+        pixelDiffRatio: 1.5,
+      }),
+    ).toThrow();
+  });
+
+  it('rejects unknown BoundingRectDelta fields', () => {
+    expect(() =>
+      VerifyResultSchema.parse({
+        verdict: 'partial',
+        timestamp: '2026-06-08T12:00:00.000Z',
+        boundingRectDelta: [
+          // @ts-expect-error — runtime rejection is the point
+          { field: 'depth', before: 0, after: 10 },
+        ],
+      }),
+    ).toThrow();
+  });
+});
+
+describe('AnnotationContextSchema.verifyHistory', () => {
+  it('accepts a context without verifyHistory (older clients silently ignore)', () => {
+    const parsed = AnnotationContextSchema.parse({
+      pageUrl: 'http://localhost:3000',
+      pageTitle: 'Test',
+      viewport: { width: 1920, height: 1080 },
+      userAgent: 'test-agent',
+    });
+    expect(parsed.verifyHistory).toBeUndefined();
+  });
+
+  it('accepts an append-only history of VerifyResults', () => {
+    const parsed = AnnotationContextSchema.parse({
+      pageUrl: 'http://localhost:3000',
+      pageTitle: 'Test',
+      viewport: { width: 1920, height: 1080 },
+      userAgent: 'test-agent',
+      verifyHistory: [
+        { verdict: 'partial', timestamp: '2026-06-08T12:00:00.000Z' },
+        { verdict: 'match', timestamp: '2026-06-08T12:00:05.000Z' },
+      ],
+    });
+    expect(parsed.verifyHistory).toHaveLength(2);
+    expect(parsed.verifyHistory?.[1]?.verdict).toBe('match');
+  });
+});