ReasonForge is a Bittensor subnet proposal implementing a decentralized marketplace for verifiable multi-step reasoning. This document describes the technical architecture of the MVP implementation.
┌─────────────────────────┐
│ Interactive Dashboard │
│ (React/TypeScript) │
│ Embedded JS Engine │
└──────────┬──────────────┘
│ HTTP/SSE
┌──────────▼──────────────┐
│ FastAPI Server │
│ /api/simulate │
│ /api/simulate/stream │
│ /api/constants │
└──────────┬──────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌─────────▼──────┐ ┌──────▼───────┐ ┌────────▼────────┐
│ Scoring Engine │ │ Simulator │ │ Task Generator │
│ (13 Formulas) │ │ (Epoch Loop)│ │ (6 Domains) │
└─────────┬──────┘ └──────┬───────┘ └────────┬────────┘
│ │ │
└────────────────┼───────────────────┘
│
┌──────────▼──────────┐
│ Types & Constants │
│ (Protocol Layer) │
└─────────────────────┘
The foundation of the system. Contains:
- 20+ Protocol Constants: Exact values from the whitepaper (CMS weights, emission splits, PEB parameters, trap thresholds, slashing parameters, etc.)
- 2 Enums:
Domain(6 reasoning domains) andTaskSource(4 task origins) - 9 Dataclasses:
Task,ReasoningStep,MinerSubmission,DimensionScores,ValidatorScore,MinerState,ValidatorState,EpochResult - Domain Check Weights: Per-domain weight maps for objective scoring (Eq. 11)
Design principle: All constants are defined once and imported everywhere. No magic numbers.
A stateless class with 13 @staticmethod methods, one per whitepaper formula:
| Method | Formula | Invariants |
|---|---|---|
compute_cms |
Eq. 2 | Output in [0, 1] for valid inputs |
compute_s_epoch |
Eq. 3 | Returns 0 for empty task list |
compute_peb |
Eq. 4 | Returns 0 for rank > K |
distribute_miner_emissions |
Eq. 5 | Conservation: sum(rewards) == pool |
apply_breakthrough |
Eq. 6 | Only applies if CMS > 0.8 AND unsolved |
compute_vas |
Eq. 7 | Returns 1.0 for perfect scoring |
distribute_validator_emissions |
Eq. 8 | Conservation: sum(rewards) == pool |
compute_trap_penalty |
Eq. 9 | Returns 1.0 if no traps or above threshold |
compute_slash |
Eq. 10 | Quadratic penalty, 0 above threshold |
compute_objective_score |
Eq. 11 | Weighted sum of domain checks |
compute_consensus_score |
Eq. 12 | Trimmed median for |
compute_final_score |
Eq. 13 | 0.60O + 0.40C |
Design principle: Pure functions. No side effects. Easy to test in isolation.
Three main classes:
Simulated miners with statistical capability profiles across 5 tiers:
- Elite (base Q=0.88, A=0.90) — consistently high performance
- Strong (Q=0.78, A=0.80) — reliable above-average
- Mid (Q=0.65, A=0.68) — average performance with higher variance
- Weak (Q=0.45, A=0.50) — below average
- Adversarial (Q=0.20, A=0.15) — deliberately poor, high variance
Each miner also gets random per-domain bonuses, simulating domain specialization.
Simulated validators with 4 accuracy profiles:
- Honest (noise=0.03, bias=0.0) — accurate scoring
- Good (noise=0.06, bias=0.0) — slightly noisy but unbiased
- Lazy (noise=0.15, bias=-0.10) — high noise, scores low
- Malicious (noise=0.25, bias=+0.20) — high noise, inflates scores
The main simulation loop (run_epoch()):
- Generate 12 tasks (15% traps) across 6 domains
- Each of 12 miners solves each task (probabilistic scores from profiles)
- Compute objective scores per miner per task
- Assign 3 random validators per miner-task pair
- Validators evaluate (adding noise/bias per profile)
- Compute consensus score (stake-weighted average)
- Compute final score (0.60O + 0.40C)
- Compute CMS, apply breakthrough multiplier if applicable
- Compute epoch scores (difficulty-weighted average * trap penalty)
- Rank miners, compute PEB for top-10
- Distribute miner emissions (Eq. 5 — emission-conserving)
- Finalize validator VAS, reputation, slashing
- Distribute validator emissions (Eq. 8)
Generates synthetic reasoning tasks:
- 5-7 templates per domain (30+ total across 6 domains)
- Trap tasks with known ground truth scores
- Random difficulty assignment (2-9)
- 5% chance of
previously_unsolvedfor breakthrough opportunities
MVP implementation using:
- Jaccard similarity on 3-gram token sets from reasoning steps
- Hash-based exact match via SHA-256 submission hashes
- Rolling buffer of last 30 epochs
- Threshold: 0.95 similarity = plagiarism, applies 0.5x penalty
In production, this would use embedding cosine similarity.
FastAPI server with CORS enabled for dashboard:
GET /api/health— Health checkPOST /api/simulate— Run N epochs, return full JSON resultsGET /api/simulate/stream— Server-Sent Events for live dashboardGET /api/constants— All protocol constantsGET /api/formulas— Formula descriptions
Single-file React/TypeScript dashboard with:
- Embedded JS simulation engine porting all Python formulas
- Miner leaderboard with sortable table, tier badges, streak indicators
- Validator health cards with VAS gauges (green/yellow/red)
- TAO distribution bar chart (recharts)
- S_epoch history line chart tracking trends over time
- CMS formula display with live values
- Miner detail panel showing quality/accuracy/novelty/efficiency breakdown
- Controls: Run Epoch, Auto-Run (2s interval), Reset, emission input
Works completely standalone without the Python API.
Task Generator → Tasks → Miner Profiles → Submissions → Scoring Engine
↓
Validator Profiles → Evaluations
↓
Consensus Score
↓
Final Score → CMS
↓
Epoch Scores
↓
Emission Distribution
↓
EpochResult → JSON
67 tests across 3 test files:
- test_engine.py (40 tests): Every formula validated against hand-calculated expected values. Tests for boundary conditions (zeros, ones), conservation invariants, monotonicity.
- test_simulator.py (17 tests): Integration tests for full epoch lifecycle, emission conservation, adversarial detection, streak accumulation, validator slashing.
- test_types.py (10 tests): Dataclass properties, constant verification, computed properties.
Key invariant: Emission conservation — sum(miner_rewards) + sum(validator_rewards) == total_emission (within floating point tolerance).
- Stateless engine: All scoring methods are pure static functions. State lives in dataclasses, not in the engine.
- Statistical simulation: Miners and validators are statistical profiles, not real LLMs. This demonstrates mechanism design without requiring AI inference.
- Dual-mode dashboard: The React dashboard embeds its own JS simulation engine, making it fully functional without the Python backend.
- JSON interchange: All components communicate via JSON. The CLI outputs JSON, the API returns JSON, the dashboard consumes JSON.
- Single-source constants: Protocol constants are defined once in
types.pyand imported everywhere (Python) / defined once at the top ofapp.tsx(JavaScript).