feat: verify_after_edit MCP tool (RFC 0002, sprint 3071 Task B + vendored Task A)#53
Conversation
…0002 Task A vendored) Adds VerifyResultSchema, ComponentStylesDeltaSchema, BoundingRectDeltaSchema, and VerifyVerdictSchema to @domscribe/core. Annotation gains an optional verifyHistory: VerifyResult[] field so older clients ignore it. ANNOTATION_VERIFY API path added to the API_PATHS constants alongside the existing annotation routes. Schema version bumps 1 → 2; migration is a pure stamp (additive only). Per issue #52 — vendoring Task A pieces because the parallel issue #51 PR is not in flight. Will reconcile when Task A lands.
…ator (RFC 0002 Task A vendored)
@domscribe/verify is the shared comparator behind verify_after_edit and
the RFC 0001 falsifier harness — one implementation grades both CI and
the live MCP tool.
Pure-TS, Node-friendly, no DOM dep. Three modules:
- pixel-diff.ts: lifted from styling/scripts/falsifier.ts verbatim
- style-delta.ts: per-CSS-property and per-rect-axis delta with
sub-pixel tolerance for boundingRect jitter
- verdict.ts: three-axis (pixel + style + rect) derivation of
no_change | match | partial | regression with composeReason helper
@domscribe/test-fixtures becomes the first re-consumer — falsifier.ts now
imports diffPng + MAX_DIFF_RATIO from the package instead of pixelmatch
directly.
…2 §B1) Adds ScreenshotCapturer alongside StyleCapturer/PropsCapturer/StateCapturer. Driver-injected (the overlay does the actual pixel grab); the capturer is the runtime-side contract. Constraints honored per RFC 0002 §B1: - Per-capture payload ≤200 KB (SCREENSHOT_MAX_BYTES, enforced) - Returns a relay-blob reference; bytes never inlined into the annotation - Default JPEG quality 0.85 to land mid-density UI well under the cap Cardinal-rule unit test asserts that even with a long screenshotRef and a maximum byteSize, the serialized CaptureResult stays under 512 bytes.
…2 §B2)
Wires the verify_after_edit endpoint:
POST /api/v1/annotations/:id/verify
Request: { postEdit: { componentStyles?, boundingRect?, screenshotRef? } }
Response: { success, result: VerifyResult, annotationId }
AnnotationService.verifyAfterEdit reads the pre-edit baseline from
context.runtimeContext.componentStyles + interaction.boundingRect, runs
the @domscribe/verify comparator, and appends the VerifyResult to
verifyHistory.
RelayHttpClient.verifyAnnotation typesafely calls the new endpoint.
NO lifecycle gate — annotation_update_status accepts PROCESSED with or
without a verify call. Integration test asserts this.
…(RFC 0002 §B2, §B3) Registers domscribe.verify.afterEdit as the 13th active MCP tool. (NOTE: PE memo's '13 → 14' assumed e8d452d underscore grammar was merged; on this base it is 12 → 13. Will become 13 → 14 + alias layer when e8d452d lands.) Tool input: { annotationId, postEditComponentStyles?, postEditBoundingRect?, screenshotRef? }. Output: structured VerifyResult with a nextStep hint — match/partial points at updateStatus, no_change/regression points at retry with the deltas inlined into the hint string. SOFT-RECOMMENDED, no lifecycle gate: - process-next.prompt now lists verify_after_edit as step 5 with an explicit 'NOT a lifecycle gate' callout. - annotation_respond's description + nextStep recommend verify before updateStatus. - annotation_update_status accepts PROCESSED with or without verify (asserted via integration test in the previous commit). Cardinal-rule unit test: even with a long screenshotRef, the serialized tool output stays under 2 KB and never matches /base64/i or /data:image/i.
Output of `nx sync` after introducing @domscribe/verify. Adds the new package to the root tsconfig references, wires @domscribe/relay's tsconfig.lib.json to reference verify, refreshes pnpm-lock for the new workspace dep. Other tsconfig files are reformatted by the syncer but not functionally changed.
|
View your CI Pipeline Execution ↗ for commit 85b10be
💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗ ☁️ Nx Cloud last updated this comment at |
The Nx @nx/js/typescript plugin auto-infers `tsc --build --emitDeclarationOnly` as the typecheck target for any project with a tsconfig.json. The Tailwind and styled-components fixture apps have `noEmit: true` (they are Vite apps, not libraries), so the inferred command fails with TS5069. Override the inferred target with `tsc --noEmit` via the package.json `nx` field. Type errors are still surfaced; the apps just no longer try to emit declaration files they don't need. Pre-existing failure on the base branch (feat/sprint-2734) — surfaced by CI on this PR's merge gate. Reviewer-pushed fix to unblock the merge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review summary (PR-review run)Pushed one mechanical fix: What I reviewed
Local verification
Observations (non-blocking)
The mechanical fix I pushed
|
37caf27
into
feat/sprint-2734-styles-runtime-and-harness
Summary
Implements RFC 0002 (
docs/rfcs/0002-post-edit-verify-mcp-tool.md) — productizes the RFC 0001 visual-snapshot diff harness as a live MCP tool. The agent callsdomscribe.verify.afterEditafterannotation.respondand receives a structuredVerifyResult(verdict + per-axis deltas) it can reconcile on retry.Closes #52 (Task B). Also vendors Task A (#51) — see "Coordination" below.
What's in the box
Task A (vendored — see Coordination)
@domscribe/core:VerifyResultSchema,VerifyVerdictSchema,ComponentStylesDeltaSchema,BoundingRectDeltaSchema; optionalAnnotation.verifyHistory: VerifyResult[];ANNOTATION_SCHEMA_VERSION1 → 2 with a pure-stamp migration step@domscribe/verify: new package, scope:infra. Pure-TS comparator lifted frompackages/domscribe-test-fixtures/styling/scripts/falsifier.ts— three modules (pixel-diff,style-delta,verdict) + a top-levelcompare()that returns aVerifyResult@domscribe/test-fixtures: first re-consumer —falsifier.tsimportsdiffPng+MAX_DIFF_RATIOfrom@domscribe/verifyinstead ofpixelmatchdirectlyTask B (this sprint)
@domscribe/runtime(§B1): newScreenshotCaptureralongsideStyleCapturer. Driver-injected (overlay does the pixel grab), enforces the 200 KB cap from RFC 0002 §B1, returns a relay-blob reference — never inlines bytes into the result. Asserted by a cardinal-rule unit test.@domscribe/relay(§B2):POST /api/v1/annotations/:id/verify— Fastify routeAnnotationService.verifyAfterEdit— reads the pre-edit baseline fromcontext.runtimeContext.componentStyles+interaction.boundingRect, runs the@domscribe/verifycomparator, appends toverifyHistoryRelayHttpClient.verifyAnnotationtyped client methoddomscribe.verify.afterEditMCP tool — registered as the 13th active tool. ThenextStephint routesmatch/partialtoupdateStatusandno_change/regressionto retry with the deltas inlined into the hint string.process-next.promptlists verify as step 5 with an explicit "NOT a lifecycle gate" calloutannotation_respond's description + structurednextSteprecommend verify beforeupdateStatusannotation_update_statusis unchanged — acceptsPROCESSEDregardless of whether verify ran (asserted by integration test)Acceptance checklist (from issue #52)
verify_after_editregistered and visible in the MCP tool list (count 12 → 13)†annotation_respond→verify_after_editreturns a structuredVerifyResulton an RFC 0001 styling fixture (A001 padding case). Seepackages/domscribe-relay/src/server/routes/v1/annotation-verify.route.spec.ts.Annotation. Asserted by serialization-size tests inscreenshot-capturer.spec.ts,verify-after-edit.tool.spec.ts, andannotation-verify.route.spec.ts(/data:image/,/base64/negative checks; under-budget length checks).process-next.promptandannotation_responddescription updated; matched byprompts.spec.tsandannotation-respond.tool.spec.tsassertions.tscandnpm testgreen across the monorepo (nx run-many -t test --exclude domscribe-test-fixtures— 11 projects pass)createTestServer()in all integration tests including the new route† PE memo said 13 → 14 because it assumed
e8d452d(underscore-grammar) was merged. On this base it's 12 → 13. See Coordination.Coordination — what reviewers need to know
This PR was authored by
st-staff-swefor sprint 3071. The planning chain (DOP memo → RFC 0002 → PM breakdown into issues #51/#52) referenced PRs #49 / #50 (RFC 0001) and commite8d452d(MCP underscore tool-name grammar) as merged tomain. They are not merged — they live onfeat/sprint-2734-styles-runtime-and-harness(RFC 0001) and a separate local branch (e8d452d).To keep the falsifier path (
verify_after_editagainst an RFC 0001 fixture) testable, this PR is based onfeat/sprint-2734-styles-runtime-and-harnessand not onmain. That feat branch is now pushed to origin so this PR can target it directly — making the visible delta this PR's actual scope, not RFC 0001 + this work bundled together.Task A (issue #51) is not in flight — no parallel PR. Per issue #52's explicit coordination clause ("if A is still in flight when B is otherwise ready, vendor a local copy and replace in a follow-up PR"), Task A is vendored here. Specifically:
ea777a0but inspection showsANNOTATION_SCHEMA_VERSIONis still1on the feat branch — the bump never landed. This PR makes the real first bump.domscribe.verify.afterEdit(dotted), notdomscribe_verify_after_edit(underscored), becausee8d452dis not in this base. When the underscore grammar lands, the tool will pick up the alias layer automatically; the canonical key will becomeVERIFY_AFTER_EDIT: 'domscribe_verify_after_edit'.Posted to
$SLACK_ALERT_CHANNELfor human attention:https://patchorbit.slack.com/archives/C0B6B781V0X/p1780921860527549
Risks self-identified
verifyAfterEditnever sees raw screenshot bytes — it receives only the opaquescreenshotRef. The pixel-diff axis of the comparator therefore always returns 0, which means verdicts will beno_change,partial(style or rect changed), or technically neverregressionuntil a follow-up wires the overlay'sScreenshotCapturerthrough a blob-upload endpoint and decodes the referenced bytes server-side for pixel comparison. The integration tests document this explicitly. Acceptable for v1 because the falsifier-trip measurement protocol can drive verify deterministically via known-good post-edit captures.screenshotRefis a passthrough today — the relay accepts and stores the ref string but does not own a blob endpoint. RFC 0002 §B1 implies the overlay produces refs and the relay knows how to dereference them; that's a follow-up PR. The schema contract is in place.@domscribe/protocolstability policy. Both bumps are strictly additive (older clients ignore the new field), but the cadence is worth surfacing.nx synccosmetic noise.chore: nx syncis reformatting unrelated tsconfig files. Functionally a no-op apart from@domscribe/relay/tsconfig.lib.jsonadding a reference to@domscribe/verify.Test plan
nx run-many -t test --exclude domscribe-test-fixtures— 11 projects pass (320+ tests; coverage thresholds green)nx run-many -t typecheck --exclude="domscribe-test-fixtures,@domscribe/styling-fixture-tailwind,@domscribe/styling-fixture-styled"— 13 projects clean (the two excluded fixture apps have a pre-existingemitDeclarationOnlyconfig issue on the base branch, unrelated to this PR)nx run-many -t lint --exclude=…— 13 projects clean🤖 Generated with Claude Code