You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #992 verified 34 reductions with dual Python scripts (constructor + adversary, 443M+ checks). PR #1010 then rejected two of those (#368, #905) as "unsound" — but re-running PR #992's scripts confirms they pass. The problem: PR #1010 implemented a different construction, got it wrong, and blamed the reduction.
This exposes a gap in the current verification methodology: brute-force enumeration + random PBT can miss structural bugs, and two independent implementations can share blind spots.
This issue proposes adding algebraic, combinatorial, and analytic verification techniques to /verify-reduction to catch bug classes that the current approach cannot.
Current methodology (7 sections)
Symbolic overhead (SymPy)
Exhaustive forward/backward (n ≤ 5)
Solution extraction
Overhead formula spot-check
Structural properties
YES example
NO example
Plus an independent adversary with hypothesis PBT.
Proposed enhancements
A. Algebraic/Combinatorial Techniques (discrete math)
Technique
What it catches
Applicable to
Invariant-based induction — verify a preserved quantity (e.g., independence number = m for satisfiable instances) symbolically for general n, not just small n
Parity checks — verify odd/even solution count is preserved; extremely sensitive to off-by-one gadget errors
Single misplaced edge in gadget construction
NAE-SAT reductions, set splitting, any reduction with complementary solution pairs
Rank/dimension — for GF(2) / linear equation reductions, verify constraint matrix rank matches proof claims
Accidentally linearly dependent rows that only matter at large n
X3C→GF2 (#859), X3C→MinWeightLinEq (#860), any linear algebra reduction
Boundary instance generation — systematically test at exact feasibility thresholds (S=2T, minimum degree, single-clause)
Off-by-one in case splits, edge cases in padding
SubsetSum→Partition (#973), all padding/embedding reductions (4 rules)
Extremal graph testing — test on K_n, empty, cycle, star, Petersen where answers are known analytically, scale to n=50+
Construction bugs that only appear beyond brute-force range
All 18 graph transformation reductions
Monotonicity — verify adding a constraint to source predictably changes target
Coupling bugs between gadgets
Gadget reductions, scheduling encodings (8 rules)
B. Real Analysis Techniques (novel, experimental)
Technique
What it catches
Applicable to
Continuity of reduction mapping — for weighted problems, perturb weights continuously and verify target changes continuously; discontinuities reveal bugs
Construction bugs in weighted reductions that discrete testing misses
MaxCut, TSP, weighted MIS, all weighted ILP reductions
IVT for objective value gaps — if source objective takes values {a,b}, target should take all corresponding values in [f(a),f(b)]; gaps indicate missing solution mappings
Incomplete solution space coverage
All Max/Min optimization reductions
Convergence of overhead ratios — verify overhead(n)/n converges as n→∞ using limit theorems
Overhead formulas that are correct for small n but diverge
All reductions with non-trivial overhead
Compactness arguments — LP relaxation of target ILP has solution by compactness; verify relaxation tightness
ILP reductions where integrality gap matters
85 ILP reductions
C. SMT/SAT Bounded Verification
Encode "∀ instances of size ≤ n: source feasible ⟺ target feasible" as a Z3 formula and machine-check it. Stronger than enumeration because SMT reasons symbolically over all instances simultaneously.
What does NOT work
Lean 4 formal proofs: Tried, doesn't integrate well with this library's Python/Rust verification pipeline. Mathlib's real analysis coverage is extensive but the formalization overhead (~500 hrs per project) is prohibitive for 167 reductions.
Questions for discussion
Which techniques have the best effort/impact ratio? The 85 ILP reductions are mechanical — should we focus algebraic techniques on the ~80 non-ILP reductions instead?
Should analysis techniques (Section B) be pursued? Nobody is applying real analysis to reduction verification — it could be novel and publishable, but unclear how many reductions actually benefit. Most applicable to weighted/optimization reductions.
Is SMT verification (Section C) worth the Z3 dependency? It's strictly stronger than brute force for bounded n, but adds complexity.
Integration: Should new techniques be additional script sections (8-11), a separate third verification layer, or upgrades to existing sections?
Motivation
PR #992 verified 34 reductions with dual Python scripts (constructor + adversary, 443M+ checks). PR #1010 then rejected two of those (#368, #905) as "unsound" — but re-running PR #992's scripts confirms they pass. The problem: PR #1010 implemented a different construction, got it wrong, and blamed the reduction.
This exposes a gap in the current verification methodology: brute-force enumeration + random PBT can miss structural bugs, and two independent implementations can share blind spots.
This issue proposes adding algebraic, combinatorial, and analytic verification techniques to
/verify-reductionto catch bug classes that the current approach cannot.Current methodology (7 sections)
Plus an independent adversary with hypothesis PBT.
Proposed enhancements
A. Algebraic/Combinatorial Techniques (discrete math)
B. Real Analysis Techniques (novel, experimental)
C. SMT/SAT Bounded Verification
Encode "∀ instances of size ≤ n: source feasible ⟺ target feasible" as a Z3 formula and machine-check it. Stronger than enumeration because SMT reasons symbolically over all instances simultaneously.
What does NOT work
Questions for discussion
Which techniques have the best effort/impact ratio? The 85 ILP reductions are mechanical — should we focus algebraic techniques on the ~80 non-ILP reductions instead?
Should analysis techniques (Section B) be pursued? Nobody is applying real analysis to reduction verification — it could be novel and publishable, but unclear how many reductions actually benefit. Most applicable to weighted/optimization reductions.
Is SMT verification (Section C) worth the Z3 dependency? It's strictly stronger than brute force for bounded n, but adds complexity.
Integration: Should new techniques be additional script sections (8-11), a separate third verification layer, or upgrades to existing sections?
Priority: Given that the current pipeline already caught the [Rule] 3SAT to DIRECTED TWO-COMMODITY INTEGRAL FLOW #368/[Rule] 3-SATISFIABILITY to FEASIBLE REGISTER ASSIGNMENT #905 issue (PR docs: batch verify-reduction — 34 implementable reductions verified #992 was right, PR feat: add 11 medium-confidence reduction rules #1010 was wrong), is the main goal (a) preventing future false rejections, (b) catching real bugs at scale, or (c) building toward publishable verification methodology?
Reduction landscape (for context)
References