Skip to content

Releases: BackendStack21/ai-verification-protocol

v5.2.6 — Accuracy + Consistency Fixes

15 May 11:05
39aacbe

Choose a tag to compare

v5.2.6 — Accuracy + Consistency Fixes

This tag bundles two reviews against v5.2.5: the original 5 accuracy fixes (editorial review) and 6 rendering/consistency fixes (#2). Every fix is verified against the protocol document's own definitions — no speculation.


Part 1 — Accuracy Patch (editorial review)

Five independently verifiable accuracy fixes found during systematic editorial review of v5.2.5.

Changes

ai-verification-protocol.md

  1. §3.6six sub-signals (m, o, b, f, s, t)seven sub-signals (m, o, b, f, s, t, d)
  2. §8.2 — Same fix. Signal d (doc coverage) was added in v5.0 (§2.9, weight 0.05). Seven signals are defined in §3.1, but two sections still referenced "six."

index.html
3. Carousel slide 1 — η 0.94→0.97, ρ 0.04→0.03. Per §3.3, η ≥ 0.95 is required for AutoApprove. η=0.94 falls in the 0.80–0.95 band → HumanReviewRecommended. Cert now matches its verdict.
4. Carousel slide 4 — Removed false "PR size 1,892 LOC exceeds 1,500 hard cap". Per §0.3, the hard cap is 5,000 LOC; 1,892 falls in 1,501–5,000 (capped at HumanReviewRecommended). The actual CannotVerify trigger is ρ > 0.30.
5. Body text$0.0015$0.015 per-PR generation cost. $50 ÷ $0.0015 = 33,333:1, not the claimed ~3,300:1. Fixed to $0.015 (50/0.015 = 3,333 ≈ 3,300).

Verification

Fix Source of Truth Result
Signal count §3.1 defines 7 signals 7 ✓
η=0.97 verdict §3.3 η≥0.95 → AutoApprove Correct ✓
Size cap removed §0.3 hard cap at 5,000 Removed ✓
Math: $0.015 50/0.015=3,333≈3,300 Matches ✓

Part 2 — Rendering & Consistency Fixes (PR #2)

Six surgical fixes from a two-pass reassessment. No protocol logic changed — these restore document self-consistency and fix markdown rendering. Finding 4 was redirected during review (the original target, Appendix A, was already correct; the real defect was the §3.5 table).

Changes

ai-verification-protocol.md

  1. §7.5 — Moved the mandatory same-family constraint out of the code fence; it was rendering as gray monospace with literal **asterisks**, now reads as body prose.
  2. §6.1, §6.6 — Repaired three table rows missing their leading | (Adversarial Surface, Documentation Coverage, Documentation generation) — they were misrendering.
  3. §6.7 / §7.5 — Scoped the 3-attempt repair cap explicitly as per-PR (shared budget across all auto-repairable findings), not per-finding.
  4. §3.5 — Added the missing spec_independence row to the ρ sub-signal table. The four prior rows summed to 0.25 while the stated cap is 0.30 — the fifth +0.05 row closes the gap. (Appendix A, the schema, and §6.5.1 already listed it; the §3.5 table was the lone omission.)
  5. §3.5 — Added a back-reference to the §0.2 unknown-generator_identity fallback (family/version → maximum contribution).
  6. §3.8 — Annotated the pr_size_cap ceiling as HumanReviewRequired per §0.3 inside the max_severity() block.

index.html / og.svg — Reconciled the two version strings the accuracy patch missed (5.2.55.2.6 in the CTA footer and the social card), so the version is now consistent across every carrier.

Verification

Finding Source of Truth Result
§7.5 prose outside fence markdown fence balance
3 table rows repaired leading ` ` present
Cap scoped per-PR §6.7 + §7.5 agree
§3.5 table = 5 rows contributions sum to 0.30 cap
§3.5 → §0.2 cross-ref present
§3.8 ceiling annotated HumanReviewRequired per §0.3
Version consistency zero residual 5.2.5

Protocol self-run

PR #2 was verified by running the protocol on itself (five-agent pipeline). Verdict: HumanReviewRequired — content clean (η_raw 0.93, 8/8 oracles), but ρ=0.20 from single-family self-verification (Claude verifying Claude, no provenance) pulled η to 0.73. The protocol correctly distrusted its own monoculture pipeline. Full certificate.


Full diff: v5.2.5...v5.2.6

v5.2.5 — The AI Verification Protocol

13 May 09:59

Choose a tag to compare

🚀 The AI Verification Protocol

Diagnose, repair, and measure — the operational answer to AI verification debt.

A multi-agent pipeline specification and system prompt that quantifies verification debt, derives η from observable signals, tracks Ci/Cv ratios, and orchestrates a five-agent review pipeline with provenance attestation.

📋 What It Does

Feature Detail
9 verification axes Semantic, behavioral, security, structural, fuzzing, dependency, provenance, adversarial, documentation
5 pipeline agents A (generator) → B (reviewer) → C (contract) → D (fuzzer) → E (certificate)
η from signals Mechanical efficiency score from 7 observable signals (m, o, b, f, s, t, d)
ρ correlation penalty Quantifies verifier-generator dependency; ρ > 0.30 → CannotVerify
Ci/Cv ratio Cost-to-Verify ÷ Cost-to-Implement per PR — the metric that matters
Active Repair Mode Auto-generates tests, docs, and type fixes with 5-gate verification
Machine-readable certificates JSON + in-toto attestation; markdown rendering for humans
Meta-audit loop 5% monthly sampling, Brier calibration, weight auto-recalibration

🌐 Landing Page

Live at vprotocol.21no.de — includes:

  • Certificate carousel showcasing all 4 verdict outcomes (AutoApprove → CannotVerify)
  • Five Whys root-cause analysis with whitepaper citations
  • Pipeline flow and feature cards

📖 Reads

📦 Contents

index.html · ai-verification-protocol.md · README.md · CNAME · og.svg · LICENSE