Releases: BackendStack21/ai-verification-protocol
v5.2.6 — Accuracy + Consistency Fixes
v5.2.6 — Accuracy + Consistency Fixes
This tag bundles two reviews against v5.2.5: the original 5 accuracy fixes (editorial review) and 6 rendering/consistency fixes (#2). Every fix is verified against the protocol document's own definitions — no speculation.
Part 1 — Accuracy Patch (editorial review)
Five independently verifiable accuracy fixes found during systematic editorial review of v5.2.5.
Changes
ai-verification-protocol.md
- §3.6 —
six sub-signals (m, o, b, f, s, t)→seven sub-signals (m, o, b, f, s, t, d) - §8.2 — Same fix. Signal
d(doc coverage) was added in v5.0 (§2.9, weight 0.05). Seven signals are defined in §3.1, but two sections still referenced "six."
index.html
3. Carousel slide 1 — η 0.94→0.97, ρ 0.04→0.03. Per §3.3, η ≥ 0.95 is required for AutoApprove. η=0.94 falls in the 0.80–0.95 band → HumanReviewRecommended. Cert now matches its verdict.
4. Carousel slide 4 — Removed false "PR size 1,892 LOC exceeds 1,500 hard cap". Per §0.3, the hard cap is 5,000 LOC; 1,892 falls in 1,501–5,000 (capped at HumanReviewRecommended). The actual CannotVerify trigger is ρ > 0.30.
5. Body text — $0.0015 → $0.015 per-PR generation cost. $50 ÷ $0.0015 = 33,333:1, not the claimed ~3,300:1. Fixed to $0.015 (50/0.015 = 3,333 ≈ 3,300).
Verification
| Fix | Source of Truth | Result |
|---|---|---|
| Signal count | §3.1 defines 7 signals | 7 ✓ |
| η=0.97 verdict | §3.3 η≥0.95 → AutoApprove | Correct ✓ |
| Size cap removed | §0.3 hard cap at 5,000 | Removed ✓ |
| Math: $0.015 | 50/0.015=3,333≈3,300 | Matches ✓ |
Part 2 — Rendering & Consistency Fixes (PR #2)
Six surgical fixes from a two-pass reassessment. No protocol logic changed — these restore document self-consistency and fix markdown rendering. Finding 4 was redirected during review (the original target, Appendix A, was already correct; the real defect was the §3.5 table).
Changes
ai-verification-protocol.md
- §7.5 — Moved the mandatory same-family constraint out of the code fence; it was rendering as gray monospace with literal
**asterisks**, now reads as body prose. - §6.1, §6.6 — Repaired three table rows missing their leading
|(Adversarial Surface, Documentation Coverage, Documentation generation) — they were misrendering. - §6.7 / §7.5 — Scoped the 3-attempt repair cap explicitly as per-PR (shared budget across all auto-repairable findings), not per-finding.
- §3.5 — Added the missing
spec_independencerow to the ρ sub-signal table. The four prior rows summed to 0.25 while the stated cap is 0.30 — the fifth+0.05row closes the gap. (Appendix A, the schema, and §6.5.1 already listed it; the §3.5 table was the lone omission.) - §3.5 — Added a back-reference to the §0.2 unknown-
generator_identityfallback (family/version → maximum contribution). - §3.8 — Annotated the
pr_size_capceiling asHumanReviewRequiredper §0.3 inside themax_severity()block.
index.html / og.svg — Reconciled the two version strings the accuracy patch missed (5.2.5 → 5.2.6 in the CTA footer and the social card), so the version is now consistent across every carrier.
Verification
| Finding | Source of Truth | Result |
|---|---|---|
| §7.5 prose outside fence | markdown fence balance | ✓ |
| 3 table rows repaired | leading ` | ` present |
| Cap scoped per-PR | §6.7 + §7.5 agree | ✓ |
| §3.5 table = 5 rows | contributions sum to 0.30 cap | ✓ |
| §3.5 → §0.2 cross-ref | present | ✓ |
| §3.8 ceiling annotated | HumanReviewRequired per §0.3 |
✓ |
| Version consistency | zero residual 5.2.5 |
✓ |
Protocol self-run
PR #2 was verified by running the protocol on itself (five-agent pipeline). Verdict: HumanReviewRequired — content clean (η_raw 0.93, 8/8 oracles), but ρ=0.20 from single-family self-verification (Claude verifying Claude, no provenance) pulled η to 0.73. The protocol correctly distrusted its own monoculture pipeline. Full certificate.
Full diff: v5.2.5...v5.2.6
v5.2.5 — The AI Verification Protocol
🚀 The AI Verification Protocol
Diagnose, repair, and measure — the operational answer to AI verification debt.
A multi-agent pipeline specification and system prompt that quantifies verification debt, derives η from observable signals, tracks Ci/Cv ratios, and orchestrates a five-agent review pipeline with provenance attestation.
📋 What It Does
| Feature | Detail |
|---|---|
| 9 verification axes | Semantic, behavioral, security, structural, fuzzing, dependency, provenance, adversarial, documentation |
| 5 pipeline agents | A (generator) → B (reviewer) → C (contract) → D (fuzzer) → E (certificate) |
| η from signals | Mechanical efficiency score from 7 observable signals (m, o, b, f, s, t, d) |
| ρ correlation penalty | Quantifies verifier-generator dependency; ρ > 0.30 → CannotVerify |
| Ci/Cv ratio | Cost-to-Verify ÷ Cost-to-Implement per PR — the metric that matters |
| Active Repair Mode | Auto-generates tests, docs, and type fixes with 5-gate verification |
| Machine-readable certificates | JSON + in-toto attestation; markdown rendering for humans |
| Meta-audit loop | 5% monthly sampling, Brier calibration, weight auto-recalibration |
🌐 Landing Page
Live at vprotocol.21no.de — includes:
- Certificate carousel showcasing all 4 verdict outcomes (AutoApprove → CannotVerify)
- Five Whys root-cause analysis with whitepaper citations
- Pipeline flow and feature cards
📖 Reads
- Protocol:
ai-verification-protocol.md— 14 sections, 2 appendices, 1,167 lines - Companion whitepaper: The AI Verification Debt
📦 Contents
index.html · ai-verification-protocol.md · README.md · CNAME · og.svg · LICENSE