Sprint 3071 Task A — RFC 0001 baseline + @domscribe/core schema (v3) + @domscribe/verify scaffold#54
Merged
Merged
Conversation
A1) Harness self-test passes 10/10 at 0 pixel diff — proves the measurement mechanism is sound. Agent one-shot styling completion is unmeasured: no agent-integration harness exists on main to drive the falsifier's `--mode=measure` path. Conservative default applies — verify is positioned as a self-correction layer (the <85% branch); no Slack alert (threshold neither met nor measurable). Documents the agent-integration harness as the follow-up sprint task that retires this measurement gap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…SION 2 → 3 (#51 A2) Adds the canonical shape for the RFC 0002 verify_after_edit MCP tool output: - VerifyVerdictSchema: match | partial | no_change | regression. - VerifyResultSchema: verdict + timestamp + structured deltas (componentStylesDelta, computedStyleDelta, boundingRectDelta) + optional pixelDiffRatio, screenshotRef (relay blob ref, never inline bytes — preserves the 4 KB-per-element budget), and notes. - AnnotationContextSchema.verifyHistory: optional VerifyResult[]; older clients silently ignore it. - ANNOTATION_SCHEMA_VERSION bumped 2 → 3; additive-only v2 → v3 migration step registered. Schema additions are deliberately structured (not a binary pass/fail) — the RFC 0002 retry-resolution falsifier requires actionable deltas for the agent to reason about its own edit. All additions are optional; pre-v3 clients see no behaviour change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…01 harness (#51 A3) New package owns the pixel-diff comparator shared between the test- fixtures falsifier (RFC 0001) and the upcoming relay verify_after_edit MCP tool (RFC 0002): - Lifted verbatim from styling/scripts/falsifier.ts: PER_PIXEL_THRESHOLD, MAX_DIFF_RATIO, loadPng, diff. Numeric defaults unchanged so the committed baselines stay valid. - compareScreenshots(actual, baseline, { maxDiffRatio? }) wraps the loadPng + diff pair for the common one-shot call site (the RFC 0002 retry round uses a stricter tolerance — opt in per call). - Pure-TS (pixelmatch + pngjs only); no DOM; consumable from Node CI and the relay runtime. - Tagged scope:infra; depends only on @domscribe/core. eslint enforce-module-boundaries: scope:test now permitted to consume scope:infra (test-fixtures is the first such consumer). - @domscribe/test-fixtures swaps inline comparator + pixelmatch/pngjs dev-deps for `@domscribe/verify`. Falsifier self-test rerun: 10/10 passes, 0 pixel diff — lift is behavior-preserving. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
View your CI Pipeline Execution ↗ for commit 7d46466
💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗ ☁️ Nx Cloud last updated this comment at |
Member
Author
|
PR-review pass — green to merge. Verified locally (worktree off head
Verified on CI: all 24 checks pass — Review notes:
Squash-merging (PR body lists three logical commits but does not request a specific merge strategy; squash is the default per the review contract). Cannot self-approve since the token belongs to the PR author — using |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #51 — Sprint 3071 Task A (RFC 0001 baseline + core schema + @domscribe/verify scaffold).
Foundation PR for the RFC 0002 verify_after_edit MCP tool. Task B (#52, parallel) builds on this.
What's in this PR
Three logical commits:
docs/sprints/3071-rfc-0001-baseline.md+ raw harness JSON. Documents the falsifier self-test result (10/10, 0 pixel diff — mechanism check passes) and the measurement gap: no agent-integration harness exists yet to drive the falsifier's--mode=measurepath, so the inherited RFC 0001 agent one-shot rate is unmeasured. Conservative default applies — verify is positioned as a self-correction layer (the <85% branch); the Slack alert (which fires only on ≥85%) is not posted.@domscribe/coreschema additions.VerifyVerdictSchema,VerifyResultSchema(structured deltas — not binary; the RFC 0002 retry-resolution falsifier requires actionable deltas),AnnotationContextSchema.verifyHistory(optional),ANNOTATION_SCHEMA_VERSION2 → 3, additive v2 → v3 migration step. Pre-v3 clients silently ignore the new field.@domscribe/verifypackage. Newscope:infrapackage. Comparator (PER_PIXEL_THRESHOLD,MAX_DIFF_RATIO,loadPng,diff,compareScreenshots) lifted verbatim fromstyling/scripts/falsifier.ts.@domscribe/test-fixturesis the first re-consumer — drops inline comparator +pixelmatch/pngjsdev-deps and imports from@domscribe/verify. ESLintenforce-module-boundariesextended soscope:testcan depend onscope:infra.Acceptance — issue #51
docs/sprints/3071-rfc-0001-baseline.mdcommitted with measured number, methodology, positioning verdict (<85% / self-correction layer)tsc+npm testgreen across the monorepo (pnpm build:allsucceeds for all 15 included projects)@domscribe/test-fixturesstyling tests still pass via the lifted comparator — falsifier self-test rerun: 10/10 passes, 0 pixel diffANNOTATION_SCHEMA_VERSION === 3; new schemas exported, typed, tree-shakeableCoordination with Task B (#52)
Task B can consume both schema and package directly:
VerifyResult/VerifyResultSchemaexported from@domscribe/core.@domscribe/verifyexportscompareScreenshots,diff,loadPng,MAX_DIFF_RATIO,PER_PIXEL_THRESHOLD.verify_after_edit) was already landed in commite8d452d; B just registers.Out of scope
@domscribe/runtimeScreenshotCapturer(Task B)@domscribe/relayverify_after_editMCP tool registration (Task B)@domscribe/overlaysidebar rendering ofVerifyResult(P1 follow-up, next sprint)Test plan
pnpm run build:allgreen (lint + test + build across 15 projects)pnpm exec nx run domscribe-core:test— 134 tests pass, 100% coverage onlib/types/annotation.tsandlib/migrations/annotation-migrations.tspnpm exec nx run domscribe-verify:test— 7 tests pass, 100% coverage oncomparator.tsPLAYWRIGHT_BROWSERS_PATH=$HOME/.cache/ms-playwright pnpm --filter @domscribe/test-fixtures test:falsifier— 10/10 passes, 0 pixel diff, confirming the lifted comparator is behavior-preserving🤖 Generated with Claude Code