Skip to content

feat: verify_after_edit MCP tool (RFC 0002, sprint 3071 Task B + vendored Task A)#53

Merged
Narrator merged 7 commits into
feat/sprint-2734-styles-runtime-and-harnessfrom
feat/sprint-3071-verify-after-edit
Jun 8, 2026
Merged

feat: verify_after_edit MCP tool (RFC 0002, sprint 3071 Task B + vendored Task A)#53
Narrator merged 7 commits into
feat/sprint-2734-styles-runtime-and-harnessfrom
feat/sprint-3071-verify-after-edit

Conversation

@Narrator
Copy link
Copy Markdown
Member

@Narrator Narrator commented Jun 8, 2026

Summary

Implements RFC 0002 (docs/rfcs/0002-post-edit-verify-mcp-tool.md) — productizes the RFC 0001 visual-snapshot diff harness as a live MCP tool. The agent calls domscribe.verify.afterEdit after annotation.respond and receives a structured VerifyResult (verdict + per-axis deltas) it can reconcile on retry.

Closes #52 (Task B). Also vendors Task A (#51) — see "Coordination" below.

What's in the box

Task A (vendored — see Coordination)

  • @domscribe/core: VerifyResultSchema, VerifyVerdictSchema, ComponentStylesDeltaSchema, BoundingRectDeltaSchema; optional Annotation.verifyHistory: VerifyResult[]; ANNOTATION_SCHEMA_VERSION 1 → 2 with a pure-stamp migration step
  • @domscribe/verify: new package, scope:infra. Pure-TS comparator lifted from packages/domscribe-test-fixtures/styling/scripts/falsifier.ts — three modules (pixel-diff, style-delta, verdict) + a top-level compare() that returns a VerifyResult
  • @domscribe/test-fixtures: first re-consumer — falsifier.ts imports diffPng + MAX_DIFF_RATIO from @domscribe/verify instead of pixelmatch directly

Task B (this sprint)

  • @domscribe/runtime (§B1): new ScreenshotCapturer alongside StyleCapturer. Driver-injected (overlay does the pixel grab), enforces the 200 KB cap from RFC 0002 §B1, returns a relay-blob reference — never inlines bytes into the result. Asserted by a cardinal-rule unit test.
  • @domscribe/relay (§B2):
    • POST /api/v1/annotations/:id/verify — Fastify route
    • AnnotationService.verifyAfterEdit — reads the pre-edit baseline from context.runtimeContext.componentStyles + interaction.boundingRect, runs the @domscribe/verify comparator, appends to verifyHistory
    • RelayHttpClient.verifyAnnotation typed client method
    • domscribe.verify.afterEdit MCP tool — registered as the 13th active tool. The nextStep hint routes match/partial to updateStatus and no_change/regression to retry with the deltas inlined into the hint string.
  • Prompts (§B3, soft-recommend, no lifecycle gate):
    • process-next.prompt lists verify as step 5 with an explicit "NOT a lifecycle gate" callout
    • annotation_respond's description + structured nextStep recommend verify before updateStatus
    • annotation_update_status is unchanged — accepts PROCESSED regardless of whether verify ran (asserted by integration test)

Acceptance checklist (from issue #52)

  • verify_after_edit registered and visible in the MCP tool list (count 12 → 13)†
  • ✅ End-to-end test: annotation → annotation_respondverify_after_edit returns a structured VerifyResult on an RFC 0001 styling fixture (A001 padding case). See packages/domscribe-relay/src/server/routes/v1/annotation-verify.route.spec.ts.
  • ✅ Blob-reference contract — screenshot payload never appears inline in Annotation. Asserted by serialization-size tests in screenshot-capturer.spec.ts, verify-after-edit.tool.spec.ts, and annotation-verify.route.spec.ts (/data:image/, /base64/ negative checks; under-budget length checks).
  • process-next.prompt and annotation_respond description updated; matched by prompts.spec.ts and annotation-respond.tool.spec.ts assertions.
  • tsc and npm test green across the monorepo (nx run-many -t test --exclude domscribe-test-fixtures — 11 projects pass)
  • ✅ Relay startup smoke-tested via createTestServer() in all integration tests including the new route

† PE memo said 13 → 14 because it assumed e8d452d (underscore-grammar) was merged. On this base it's 12 → 13. See Coordination.

Coordination — what reviewers need to know

This PR was authored by st-staff-swe for sprint 3071. The planning chain (DOP memo → RFC 0002 → PM breakdown into issues #51/#52) referenced PRs #49 / #50 (RFC 0001) and commit e8d452d (MCP underscore tool-name grammar) as merged to main. They are not merged — they live on feat/sprint-2734-styles-runtime-and-harness (RFC 0001) and a separate local branch (e8d452d).

To keep the falsifier path (verify_after_edit against an RFC 0001 fixture) testable, this PR is based on feat/sprint-2734-styles-runtime-and-harness and not on main. That feat branch is now pushed to origin so this PR can target it directly — making the visible delta this PR's actual scope, not RFC 0001 + this work bundled together.

Task A (issue #51) is not in flight — no parallel PR. Per issue #52's explicit coordination clause ("if A is still in flight when B is otherwise ready, vendor a local copy and replace in a follow-up PR"), Task A is vendored here. Specifically:

  • The schema bump is 1 → 2, not 2 → 3 as the PE memo predicted. The RFC 0001 work claimed to bump 1 → 2 in commit ea777a0 but inspection shows ANNOTATION_SCHEMA_VERSION is still 1 on the feat branch — the bump never landed. This PR makes the real first bump.
  • The MCP tool name is domscribe.verify.afterEdit (dotted), not domscribe_verify_after_edit (underscored), because e8d452d is not in this base. When the underscore grammar lands, the tool will pick up the alias layer automatically; the canonical key will become VERIFY_AFTER_EDIT: 'domscribe_verify_after_edit'.
  • Tool count is 12 → 13 today; will become 13 → 14 + alias-layer-as-24 once underscore grammar is in.
  • The RFC 0001 baseline doc (Task A1) is deliberately not in this PR. That's the gating measurement — a human should run it and decide positioning (trust-layer ≥85% vs. self-correction <85%).

Posted to $SLACK_ALERT_CHANNEL for human attention:
https://patchorbit.slack.com/archives/C0B6B781V0X/p1780921860527549

Risks self-identified

  1. Pixel-diff axis is inactive in v1. The relay's verifyAfterEdit never sees raw screenshot bytes — it receives only the opaque screenshotRef. The pixel-diff axis of the comparator therefore always returns 0, which means verdicts will be no_change, partial (style or rect changed), or technically never regression until a follow-up wires the overlay's ScreenshotCapturer through a blob-upload endpoint and decodes the referenced bytes server-side for pixel comparison. The integration tests document this explicitly. Acceptable for v1 because the falsifier-trip measurement protocol can drive verify deterministically via known-good post-edit captures.
  2. Blob storage is not yet implemented. screenshotRef is a passthrough today — the relay accepts and stores the ref string but does not own a blob endpoint. RFC 0002 §B1 implies the overlay produces refs and the relay knows how to dereference them; that's a follow-up PR. The schema contract is in place.
  3. Schema bump cadence. Two schemas-bump-only commits in two sprints (1 → 2 here; the next one for verify-screenshot-bytes when blob endpoint lands) flags the @domscribe/protocol stability policy. Both bumps are strictly additive (older clients ignore the new field), but the cadence is worth surfacing.
  4. nx sync cosmetic noise. chore: nx sync is reformatting unrelated tsconfig files. Functionally a no-op apart from @domscribe/relay/tsconfig.lib.json adding a reference to @domscribe/verify.

Test plan

  • nx run-many -t test --exclude domscribe-test-fixtures — 11 projects pass (320+ tests; coverage thresholds green)
  • nx run-many -t typecheck --exclude="domscribe-test-fixtures,@domscribe/styling-fixture-tailwind,@domscribe/styling-fixture-styled" — 13 projects clean (the two excluded fixture apps have a pre-existing emitDeclarationOnly config issue on the base branch, unrelated to this PR)
  • nx run-many -t lint --exclude=… — 13 projects clean
  • Cardinal-rule serialization assertion — verified in 3 places (capturer, MCP tool, route)
  • Manual: publish the RFC 0001 baseline (A1) to inform positioning — see Slack
  • Reviewer: decide merge sequencing — land feat/sprint-2734 first (then rebase this on main), or land this PR as one bundle and reconcile

🤖 Generated with Claude Code

Narrator added 6 commits June 8, 2026 05:31
…0002 Task A vendored)

Adds VerifyResultSchema, ComponentStylesDeltaSchema, BoundingRectDeltaSchema,
and VerifyVerdictSchema to @domscribe/core. Annotation gains an optional
verifyHistory: VerifyResult[] field so older clients ignore it.

ANNOTATION_VERIFY API path added to the API_PATHS constants alongside the
existing annotation routes.

Schema version bumps 1 → 2; migration is a pure stamp (additive only).

Per issue #52 — vendoring Task A pieces because the parallel issue #51 PR
is not in flight. Will reconcile when Task A lands.
…ator (RFC 0002 Task A vendored)

@domscribe/verify is the shared comparator behind verify_after_edit and
the RFC 0001 falsifier harness — one implementation grades both CI and
the live MCP tool.

Pure-TS, Node-friendly, no DOM dep. Three modules:
  - pixel-diff.ts: lifted from styling/scripts/falsifier.ts verbatim
  - style-delta.ts: per-CSS-property and per-rect-axis delta with
    sub-pixel tolerance for boundingRect jitter
  - verdict.ts: three-axis (pixel + style + rect) derivation of
    no_change | match | partial | regression with composeReason helper

@domscribe/test-fixtures becomes the first re-consumer — falsifier.ts now
imports diffPng + MAX_DIFF_RATIO from the package instead of pixelmatch
directly.
…2 §B1)

Adds ScreenshotCapturer alongside StyleCapturer/PropsCapturer/StateCapturer.
Driver-injected (the overlay does the actual pixel grab); the capturer is
the runtime-side contract.

Constraints honored per RFC 0002 §B1:
  - Per-capture payload ≤200 KB (SCREENSHOT_MAX_BYTES, enforced)
  - Returns a relay-blob reference; bytes never inlined into the annotation
  - Default JPEG quality 0.85 to land mid-density UI well under the cap

Cardinal-rule unit test asserts that even with a long screenshotRef and a
maximum byteSize, the serialized CaptureResult stays under 512 bytes.
…2 §B2)

Wires the verify_after_edit endpoint:
  POST /api/v1/annotations/:id/verify

  Request: { postEdit: { componentStyles?, boundingRect?, screenshotRef? } }
  Response: { success, result: VerifyResult, annotationId }

AnnotationService.verifyAfterEdit reads the pre-edit baseline from
context.runtimeContext.componentStyles + interaction.boundingRect, runs
the @domscribe/verify comparator, and appends the VerifyResult to
verifyHistory.

RelayHttpClient.verifyAnnotation typesafely calls the new endpoint.

NO lifecycle gate — annotation_update_status accepts PROCESSED with or
without a verify call. Integration test asserts this.
…(RFC 0002 §B2, §B3)

Registers domscribe.verify.afterEdit as the 13th active MCP tool.
(NOTE: PE memo's '13 → 14' assumed e8d452d underscore grammar was merged;
on this base it is 12 → 13. Will become 13 → 14 + alias layer when
e8d452d lands.)

Tool input: { annotationId, postEditComponentStyles?, postEditBoundingRect?,
screenshotRef? }. Output: structured VerifyResult with a nextStep hint —
match/partial points at updateStatus, no_change/regression points at retry
with the deltas inlined into the hint string.

SOFT-RECOMMENDED, no lifecycle gate:
  - process-next.prompt now lists verify_after_edit as step 5 with an
    explicit 'NOT a lifecycle gate' callout.
  - annotation_respond's description + nextStep recommend verify before
    updateStatus.
  - annotation_update_status accepts PROCESSED with or without verify
    (asserted via integration test in the previous commit).

Cardinal-rule unit test: even with a long screenshotRef, the serialized
tool output stays under 2 KB and never matches /base64/i or /data:image/i.
Output of `nx sync` after introducing @domscribe/verify. Adds the new
package to the root tsconfig references, wires @domscribe/relay's
tsconfig.lib.json to reference verify, refreshes pnpm-lock for the new
workspace dep. Other tsconfig files are reformatted by the syncer but
not functionally changed.
@Narrator Narrator marked this pull request as ready for review June 8, 2026 12:34
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Jun 8, 2026

View your CI Pipeline Execution ↗ for commit 85b10be

Command Status Duration Result
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 46s View ↗
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 45s View ↗
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 44s View ↗
nx run domscribe-test-fixtures:integration--web... ✅ Succeeded 1m 38s View ↗
nx run domscribe-test-fixtures:install-fixture-... ✅ Succeeded 48s View ↗
nx run domscribe-test-fixtures:install-fixture-... ✅ Succeeded 47s View ↗
nx run domscribe-test-fixtures:integration--vit... ✅ Succeeded 28s View ↗
nx run domscribe-test-fixtures:install-fixture-... ✅ Succeeded 22s View ↗
Additional runs (18) ✅ Succeeded ... View ↗

💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗


☁️ Nx Cloud last updated this comment at 2026-06-08 12:53:16 UTC

The Nx @nx/js/typescript plugin auto-infers `tsc --build --emitDeclarationOnly`
as the typecheck target for any project with a tsconfig.json. The Tailwind
and styled-components fixture apps have `noEmit: true` (they are Vite apps,
not libraries), so the inferred command fails with TS5069.

Override the inferred target with `tsc --noEmit` via the package.json `nx`
field. Type errors are still surfaced; the apps just no longer try to emit
declaration files they don't need.

Pre-existing failure on the base branch (feat/sprint-2734) — surfaced by CI
on this PR's merge gate. Reviewer-pushed fix to unblock the merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Narrator
Copy link
Copy Markdown
Member Author

Narrator commented Jun 8, 2026

Review summary (PR-review run)

Pushed one mechanical fix: ci(test-fixtures): override typecheck target for styling fixture apps.

What I reviewed

  • @domscribe/verify (new package): clean separation into pixel-diff / style-delta / verdict modules; pure-TS comparator; tests cover the three-axis verdict matrix; tolerance constants (PER_PIXEL_THRESHOLD=0.1, MAX_DIFF_RATIO=0.001) lifted bit-identical from the falsifier.
  • @domscribe/runtime ScreenshotCapturer: driver-injected (overlay does the pixel grab), enforces the 200 KB cap, asserts the no-inline-bytes rule.
  • @domscribe/relay HTTP route + service + MCP tool: route wired with Zod schemas; verifyAfterEdit reads pre-edit baseline from runtimeContext.componentStyles and interaction.boundingRect, runs the comparator, appends to verifyHistory; the tool is the 13th active MCP tool.
  • Soft-recommend prompts: process-next.prompt step 5 + annotation_respond.tool description both surface verify between respond and updateStatus, with explicit "NOT a lifecycle gate" callouts; annotation_update_status semantics unchanged (asserted by integration test).
  • Schema bump 1 → 2 in @domscribe/core with a pure no-op migration step (additive verifyHistory field).

Local verification

  • nx run-many -t lint test build typecheck --exclude domscribe-test-fixtures15 projects pass (with my fix; before the fix, the two styling fixture apps failed typecheck).
  • The cardinal "no inlined screenshot bytes" rule is asserted in three places (screenshot-capturer.spec.ts, verify-after-edit.tool.spec.ts, annotation-verify.route.spec.ts) — confirmed.

Observations (non-blocking)

  • deriveVerdict() never emits 'match' — when every axis is within tolerance it returns 'no_change'. This is intentional per the docstring (the design choice is that "edit didn't land" is the only thing every-axis-matching can mean), and the verdict schema retains 'match' as a future-reserved case (e.g. when the pixel-diff axis activates after the blob endpoint lands). The PR's nextStep hint and tests correctly handle the routing. Worth a brief note in RFC 0002 §Decision but not a blocker.
  • Base branch is feat/sprint-2734-styles-runtime-and-harness (RFC 0001) by design, per the PR body's Coordination section. The visible delta in this PR is the actual Task B scope plus the vendored Task A — matches the issue Sprint 3071 Task B — @domscribe/runtime ScreenshotCapturer + @domscribe/relay verify_after_edit MCP tool #52 explicit coordination clause.

The mechanical fix I pushed

@domscribe/styling-fixture-tailwind:typecheck and @domscribe/styling-fixture-styled:typecheck were failing on CI because the Nx @nx/js/typescript plugin auto-infers tsc --build --emitDeclarationOnly, but these are Vite app fixtures with noEmit: true, so the inferred command hits TS5069. The PR body called this out as pre-existing on the base branch; I overrode the target to tsc --noEmit via the nx field in each app's package.json. Two package.json edits, no source changes.

@Narrator Narrator merged commit 37caf27 into feat/sprint-2734-styles-runtime-and-harness Jun 8, 2026
24 checks passed
@Narrator Narrator deleted the feat/sprint-3071-verify-after-edit branch June 8, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant