From 9407e620e17d391b5477e7f529daa0312386db5a Mon Sep 17 00:00:00 2001 From: bussyjd Date: Fri, 27 Mar 2026 06:01:52 +0000 Subject: [PATCH 1/5] docs: add autoresearch and reth-indexer issue specs with Gherkin features MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two issue specifications for carving PR #288 into focused PRs: 1. reth-erc8004-indexer Helm chart - Standalone chart with 3-tier discovery fallback - Reth indexer → BaseScan (native ERC-8004 metadata) → 8004scan - Go DiscoveryClient interface with FallbackClient 2. Autoresearch infrastructure Helm chart - Round-based reward engine with OPOW influence calculation - Anti-monopoly parity formula penalizing concentrated workers - Commit-reveal Merkle proof verification - Escrow settlement via x402 Commerce Payments Protocol (5x-audited Base contracts, zero custom Solidity) - x402-rs implications and contribution path 9 Gherkin feature files (40+ scenarios): - escrow_round_lifecycle.feature (fixed: receiver is per-PaymentInfo, uses RewardDistributor contract as single receiver) - opow_influence_calculation_with_anti_monopoly_parity.feature (fixed: penalty values verified against TIG opow.rs math, added single-challenge and phase-in scenarios) - commit_reveal_work_verification.feature - reward_pool_distribution_across_roles.feature - multi_tier_worker_discovery_with_fallback.feature - end_to_end_autoresearch_round.feature - erc8004_identity_lifecycle.feature (NEW: registration, transfer, deactivation, schema validation during active rounds) - leaderboard_api.feature (NEW: REST API, historical rounds, cumulative earnings) - round_state_continuity.feature (NEW: fund rollover, atomic transitions, worker state reset, on-chain auditability) Cross-reference analysis: - escrow_contract_cross_reference.md (AuthCaptureEscrow contract audit) - FEATURE_REVIEW.md (comprehensive gap analysis vs ERC-8004/x402) --- .../escrow_contract_cross_reference.md | 283 ++++ docs/issues/features/FEATURE_REVIEW.md | 547 +++++++ .../commit_reveal_work_verification.feature | 63 + .../end_to_end_autoresearch_round.feature | 59 + .../erc8004_identity_lifecycle.feature | 105 ++ .../features/escrow_round_lifecycle.feature | 87 ++ docs/issues/features/leaderboard_api.feature | 66 + ...ier_worker_discovery_with_fallback.feature | 47 + ...culation_with_anti_monopoly_parity.feature | 92 ++ ...ard_pool_distribution_across_roles.feature | 59 + .../features/round_state_continuity.feature | 68 + docs/issues/issue-autoresearch-helm-chart.md | 1288 +++++++++++++++++ .../issue-reth-erc8004-indexer-helm-chart.md | 277 ++++ 13 files changed, 3041 insertions(+) create mode 100644 docs/issues/analysis/escrow_contract_cross_reference.md create mode 100644 docs/issues/features/FEATURE_REVIEW.md create mode 100644 docs/issues/features/commit_reveal_work_verification.feature create mode 100644 docs/issues/features/end_to_end_autoresearch_round.feature create mode 100644 docs/issues/features/erc8004_identity_lifecycle.feature create mode 100644 docs/issues/features/escrow_round_lifecycle.feature create mode 100644 docs/issues/features/leaderboard_api.feature create mode 100644 docs/issues/features/multi_tier_worker_discovery_with_fallback.feature create mode 100644 docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature create mode 100644 docs/issues/features/reward_pool_distribution_across_roles.feature create mode 100644 docs/issues/features/round_state_continuity.feature create mode 100644 docs/issues/issue-autoresearch-helm-chart.md create mode 100644 docs/issues/issue-reth-erc8004-indexer-helm-chart.md diff --git a/docs/issues/analysis/escrow_contract_cross_reference.md b/docs/issues/analysis/escrow_contract_cross_reference.md new file mode 100644 index 00000000..6203ef1b --- /dev/null +++ b/docs/issues/analysis/escrow_contract_cross_reference.md @@ -0,0 +1,283 @@ +# Escrow Feature vs. AuthCaptureEscrow Contract: Cross-Reference Analysis + +**Date:** 2025-03-27 +**Contract:** AuthCaptureEscrow at 0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff (Base + Base Sepolia) +**Source:** https://github.com/base/commerce-payments (src/AuthCaptureEscrow.sol) +**Feature:** escrow_round_lifecycle.feature + +--- + +## 1. Multiple capture() Calls Per Authorization + +**Feature assumes:** capture() called once per worker (lines 39-40: capture for 0xAAA then 0xBBB) + +**Contract reality: YES — multiple captures ARE supported.** + +From AuthCaptureEscrow.sol line 258-260: +``` +/// @dev Can be called multiple times up to cumulative authorized amount +``` + +The logic at lines 283-291: +```solidity +if (state.capturableAmount < amount) { + revert InsufficientAuthorization(...) +} +state.capturableAmount -= uint120(amount); +state.refundableAmount += uint120(amount); +``` + +Each capture reduces capturableAmount and increases refundableAmount. The test +`test_succeeds_withMultipleCaptures` in capture.t.sol confirms two consecutive +captures work correctly. + +**Verdict: Feature assumption is correct for multiple calls. BUT SEE ISSUE #4 BELOW.** + +--- + +## 2. void() After Partial Captures (Partial Void) + +**Feature assumes:** void() returns "remaining 30 USDC" after captures (line 42) + +**Contract reality: YES — partial void works correctly.** + +From void() at lines 304-316: +```solidity +uint256 authorizedAmount = paymentState[paymentInfoHash].capturableAmount; +if (authorizedAmount == 0) revert ZeroAuthorization(paymentInfoHash); +paymentState[paymentInfoHash].capturableAmount = 0; +_sendTokens(paymentInfo.operator, paymentInfo.token, paymentInfo.payer, authorizedAmount); +``` + +After partial captures, capturableAmount holds only the REMAINING uncaptured amount. +void() returns exactly that remainder to the payer. It does NOT touch refundableAmount +(previously captured funds). + +**Verdict: Feature assumption is correct.** Capture 70 USDC, void returns 30 USDC. + +--- + +## 3. reclaim() After authorizationExpiry + +**Feature assumes:** "platform wallet calls reclaim() directly" with "no operator signature required" (lines 67-69) + +**Contract reality: PARTIALLY CORRECT — important nuances.** + +From reclaim() at lines 323-340: +```solidity +function reclaim(PaymentInfo calldata paymentInfo) + external nonReentrant onlySender(paymentInfo.payer) { + + if (block.timestamp < paymentInfo.authorizationExpiry) { + revert BeforeAuthorizationExpiry(...) + } + uint256 authorizedAmount = paymentState[paymentInfoHash].capturableAmount; + if (authorizedAmount == 0) revert ZeroAuthorization(paymentInfoHash); + + paymentState[paymentInfoHash].capturableAmount = 0; + _sendTokens(paymentInfo.operator, paymentInfo.token, paymentInfo.payer, authorizedAmount); +} +``` + +Key findings: +- reclaim() is restricted to `onlySender(paymentInfo.payer)` — the PAYER must call it +- No operator involvement needed — feature is correct about "no operator signature" +- Requires `block.timestamp >= authorizationExpiry` — feature's >= condition is correct +- Returns only capturableAmount (remaining after any captures) +- The payer needs the full PaymentInfo struct to call reclaim (for hash verification) + +**ISSUE:** Feature says "platform wallet calls reclaim()". This works ONLY IF the platform +wallet address equals paymentInfo.payer. In our model, the platform locks its own USDC, +so it IS the payer. This is fine but must be explicitly ensured. + +**ISSUE:** The feature says "full 100 USDC returns." This is only true if NO captures +were made before the crash. If any captures happened, reclaim returns only the remainder. +After authorizationExpiry, capture() also stops working (block.timestamp >= check), so +there's no race — once expired, only reclaim works. + +**Verdict: Feature is correct IF platform wallet == payer in PaymentInfo.** + +--- + +## 4. CRITICAL: Receiver Address Per Capture Call + +**Feature assumes:** Different receiver per capture: +``` +capture() is called for "0xAAA" with 42 USDC (line 39) +capture() is called for "0xBBB" with 28 USDC (line 40) +``` + +**Contract reality: NO — receiver is FIXED in PaymentInfo. THIS IS A BLOCKING ISSUE.** + +From capture() at line 295: +```solidity +_distributeTokens(paymentInfo.token, paymentInfo.receiver, amount, feeBps, feeReceiver); +``` + +The PaymentInfo struct (lines 27-52) has a SINGLE `receiver` field: +```solidity +struct PaymentInfo { + address operator; + address payer; + address receiver; // <-- FIXED per authorization + ... +} +``` + +The PaymentInfo hash is computed from ALL fields including receiver. You CANNOT change +the receiver between capture calls because: +1. capture() takes the full PaymentInfo as calldata +2. The hash must match the authorization's hash +3. Changing receiver = different hash = InsufficientAuthorization revert + +**ALL captures from one authorization go to the SAME receiver address.** + +### Workaround Options: + +**Option A: Separate Authorization Per Worker** +Create individual PaymentInfo (with unique salt+receiver) per worker. Each gets its own +authorize() call. Pros: Direct payment to workers. Cons: N authorize() transactions per +round (gas-expensive), requires knowing workers before round starts, complex payer +signature management. + +**Option B: Distributor Contract as Receiver (RECOMMENDED)** +Set receiver to a Splitter/Distributor contract that the platform controls. Flow: +1. authorize() with receiver = DistributorContract +2. capture(fullAmount) sends all to DistributorContract +3. DistributorContract.distribute() pays individual workers +Pros: Single authorize+capture. Cons: Extra contract, extra step, workers don't see +direct on-chain escrow commitment to their individual share. + +**Option C: Platform Wallet as Receiver + Direct Transfers** +Set receiver = platform wallet. After capture, platform sends to workers via standard +ERC20 transfers. Pros: Simplest. Cons: Workers must trust platform for last-mile +distribution — undermines the escrow trust model. + +**Option D: Multiple Authorizations with Multicall** +Use the deployed Multicall3 (at 0xcA11bde05977b3631167028862bE2a173976CA11) to batch +multiple authorize() calls in one transaction, each with a different worker as receiver. +Requires separate payer signatures per authorization. Could work with EIP-7702 batch +or Smart Wallet batching. + +**Verdict: Feature file is INCORRECT. Must be redesigned. Option B or D recommended.** + +--- + +## 5. Refund Flow — Per-Capture or Global? + +**Feature assumes:** "refund() for 42 USDC" (line 74) — appears per-worker amount + +**Contract reality: GLOBAL refund pool, not per-capture.** + +From refund() at lines 351-374: +```solidity +uint120 captured = paymentState[paymentInfoHash].refundableAmount; +if (captured < amount) revert RefundExceedsCapture(amount, captured); +paymentState[paymentInfoHash].refundableAmount = captured - uint120(amount); +``` + +Key findings: +- refundableAmount is the CUMULATIVE total of all captures for that paymentInfoHash +- refund() can return any amount up to that cumulative total +- Refund goes to the PAYER (paymentInfo.payer), not to any specific receiver/worker +- The OPERATOR must provide the refund tokens via a tokenCollector +- Refund has its own expiry: paymentInfo.refundExpiry + +**IMPORTANT:** The operator must have liquidity to fund the refund. The contract pulls +tokens FROM the operator (via OperatorRefundCollector), not from the receiver. This +means: +- After capture sends tokens to the receiver, the operator can't automatically refund +- The operator needs to acquire tokens independently to execute a refund +- In our model, this means the platform (as operator) must hold enough USDC to cover + potential refunds + +**Feature says "42 USDC returns to the platform wallet."** This is correct IF +platform wallet == payer. The refund goes to paymentInfo.payer. + +**Verdict: Feature is approximately correct but oversimplifies. Refund is global, +requires operator to supply tokens, and goes to payer not receiver.** + +--- + +## 6. Reentrancy and Ordering Issues Not Covered + +### Covered by Contract +- **ReentrancyGuardTransient** on all state-changing functions (authorize, capture, void, + reclaim, refund, charge). Uses Solady's transient storage variant. +- **Single authorization** enforced: `hasCollectedPayment` flag prevents double-authorize +- **Void idempotency**: void sets capturableAmount=0, second void reverts ZeroAuthorization +- Reentrancy tests in reentrancy.t.sol confirm protection works + +### Issues NOT Covered by Feature Scenarios + +**A. Authorization Expiry Race Condition** +The feature doesn't cover the case where authorizationExpiry is reached DURING the +capture sequence. If captures for workers are sequential (not batched), the last capture +could fail because block.timestamp >= authorizationExpiry. Must set authorizationExpiry +with generous buffer (feature says "round end + 1 hour" which helps). + +**B. Refund After Partial Void** +After void(), refundableAmount still holds the captured amount. refund() can still be +called. The feature has no scenario for: capture some, void remainder, THEN refund. + +**C. Payer Reclaim vs Operator Void Race** +Both void() (operator) and reclaim() (payer) clear capturableAmount. If both are +attempted near authorizationExpiry: +- Before expiry: only void() works (reclaim reverts) +- After expiry: both revert on void() (no timestamp check) but void still works! + Actually, void() has NO timestamp check — it can be called any time. So the operator + could void() even after authorizationExpiry, before the payer calls reclaim(). + First one to execute wins (both set capturableAmount to 0). + +**D. Fee Rounding on Multiple Small Captures** +Multiple small captures may accumulate rounding errors in fees vs. one large capture. +The contract calculates: `feeAmount = amount * feeBps / 10_000` per capture. With many +small captures, total fees could differ by a few wei from a single large capture. + +**E. Front-Running by Operator** +The operator could theoretically front-run a payer's reclaim() with a capture() just +before authorizationExpiry. The feature doesn't cover adversarial operator behavior. +Since in our model the operator IS the platform, this is self-defeating, but worth noting. + +**F. PaymentInfo Replay Protection** +Each PaymentInfo (identified by hash) can only be authorized once (hasCollectedPayment). +A unique `salt` field prevents hash collisions across rounds. The feature doesn't +explicitly test salt uniqueness per round — if the same salt is reused, authorize() +will revert with PaymentAlreadyCollected. + +--- + +## Summary of Feature File Accuracy + +| Check | Status | Notes | +|-------|--------|-------| +| Multiple capture() calls | CORRECT | Supported, decrements capturableAmount | +| Partial void after captures | CORRECT | Returns only remaining capturableAmount | +| reclaim() after expiry | CORRECT* | *Only if platform wallet == payer | +| Different receiver per capture | **INCORRECT** | **Receiver is fixed in PaymentInfo** | +| Refund flow | MOSTLY CORRECT | Global pool, operator must supply tokens | +| Reentrancy protection | ADEQUATE | transient reentrancy guard on all functions | + +## REQUIRED CHANGES TO FEATURE FILE + +1. **CRITICAL:** Lines 39-40 and 50 must be redesigned. Cannot capture to different + worker addresses from one authorization. Must either: + - Use a distributor/splitter contract as the single receiver + - Create separate authorizations per worker + - Use platform as receiver + off-chain distribution + +2. **Line 42:** void() call is correct but should specify it returns to the PAYER + (paymentInfo.payer), not generically to "platform wallet" + +3. **Line 67-68:** reclaim() must be called by the PAYER address. Feature should + clarify platform wallet == paymentInfo.payer + +4. **Lines 73-75:** refund() pulls tokens FROM the operator, not from the receiver. + The platform (operator) must have separate USDC to fund refunds. + +5. **Missing scenario:** Salt uniqueness per round to avoid PaymentAlreadyCollected + +6. **Missing scenario:** Authorization expiry race during sequential captures + +7. **Missing scenario:** What happens if authorizationExpiry is set too tight and + the last capture fails diff --git a/docs/issues/features/FEATURE_REVIEW.md b/docs/issues/features/FEATURE_REVIEW.md new file mode 100644 index 00000000..3c6f5379 --- /dev/null +++ b/docs/issues/features/FEATURE_REVIEW.md @@ -0,0 +1,547 @@ +# Feature File Review: Gherkin Best Practices, ERC-8004 & x402 Edge Cases + +## Files Reviewed +1. multi_tier_worker_discovery_with_fallback.feature (47 lines) +2. reward_pool_distribution_across_roles.feature (59 lines) +3. end_to_end_autoresearch_round.feature (59 lines) +4. opow_influence_calculation_with_anti_monopoly_parity.feature (69 lines) +5. escrow_round_lifecycle.feature (76 lines) +6. commit_reveal_work_verification.feature (63 lines) + +--- + +## 1. GHERKIN BEST PRACTICES ISSUES + +### 1a. Missing tags for filtering +- discovery.feature: has @discovery, good +- reward.feature: has @rewards, good +- e2e.feature: has @e2e @slow, good +- opow.feature: has @opow @critical, good +- escrow.feature: has @escrow @critical, good +- verification.feature: has @verification @critical, good +- ISSUE: No @erc8004 or @x402 tags on relevant scenarios. These cross-cutting + concerns should be tagged for protocol-specific test runs. + +### 1b. Over-long E2E scenario +- end_to_end_autoresearch_round.feature "Complete round with two honest workers" + has ~25 steps with inline comments. Gherkin best practice says scenarios should + be 5-10 steps max. The comments (# Round setup, # Worker experiments, etc.) + are a code smell indicating this should be split into focused scenarios or use + a scenario outline. +- RECOMMENDATION: Split into "Round initialization with escrow", "Worker + experiments and verification", "Reward settlement" as separate scenarios + chained via shared state, or keep as a single narrative but trim to essential + assertions. + +### 1c. Magic numbers without context +- escrow.feature line 39: "42 USDC" and "28 USDC" appear without showing the + math. A reader must compute 100 * 0.7 * 0.6 = 42 themselves. Add a comment + or use a scenario outline with formula reference. +- opow.feature: penalty values (0.65, 0.11, 0.05) lack formula reference. + Consider adding a comment showing the parity formula being applied. + +### 1d. Inconsistent Background granularity +- escrow.feature Background specifies contract address "0xBdEA0D1bcC5...", + which is good for precision. +- discovery.feature Background uses only OASF skill filter but no contract + address or chain ID. Since the ERC-8004 contract is at a fixed address on + Base, this should be specified. + +### 1e. No "Rule:" groupings in some files +- e2e.feature lacks Rule: groupings entirely. Even for an E2E feature, Rules + help organize the phases (setup, execution, settlement). + +--- + +## 2. ERC-8004 METADATA READING GAPS (Question 1) + +### What exists: +- discovery.feature mentions "ERC-8004 NFT metadata is read for each agent" + (line 24) and "workers with the model_versioning skill are returned" but + this is extremely shallow. + +### What is MISSING: + +#### 2a. No tokenURI resolution scenario +The ERC-8004 spec requires calling tokenURI(tokenId) to get the registration +JSON URL. There is no scenario testing: +- tokenURI returns a valid HTTPS URL +- tokenURI returns an IPFS URL (needs gateway resolution) +- tokenURI returns empty/malformed data +- tokenURI call reverts (token burned or contract paused) + +RECOMMENDED SCENARIO: +```gherkin +Scenario: Discovery resolves tokenURI to registration JSON + Given worker "0xW001" has ERC-8004 token ID 12345 + When the coordinator calls tokenURI(12345) + Then a valid registration JSON URL is returned + And the JSON is fetched and parsed + +Scenario: Discovery handles IPFS tokenURI with gateway fallback + Given worker "0xW001" has tokenURI "ipfs://Qm..." + When the coordinator resolves the tokenURI + Then the IPFS gateway is used to fetch the registration JSON + And the registration is successfully parsed +``` + +#### 2b. No registration JSON schema validation scenario +The ERC-8004 AgentRegistration document has specific required fields (name, +description, services[], supportedTrust[]). No scenario tests: +- JSON missing required fields +- JSON with unknown/extra fields +- JSON with invalid service types +- JSON with services[].endpoint that is unreachable + +RECOMMENDED SCENARIO: +```gherkin +Scenario: Discovery rejects agent with malformed registration JSON + Given agent "0xW003" has registration JSON missing "services" field + When the coordinator parses the registration + Then agent "0xW003" is excluded from discovery results + And a warning is logged with the token ID and missing field +``` + +#### 2c. No OASF taxonomy filtering scenario +The Background says `the OASF skill filter is "devops_mlops/model_versioning"` +but there is NO scenario testing: +- How the skill filter maps to registration JSON fields +- What happens when an agent has multiple skills (partial match) +- What happens when an agent has no skills listed +- Hierarchical taxonomy matching (e.g., "devops_mlops/*" wildcard) +- The ServiceOffer CRD has services[].name with types: web, A2A, MCP, OASF, + ENS, DID, email — but the feature file never references OASF as a service type + +RECOMMENDED SCENARIOS: +```gherkin +Scenario: Discovery filters agents by OASF taxonomy path + Given agent "0xW001" has OASF service with skill "devops_mlops/model_versioning" + And agent "0xW002" has OASF service with skill "security/threat_detection" + When the coordinator discovers workers with filter "devops_mlops/model_versioning" + Then only agent "0xW001" is returned + +Scenario: Discovery supports wildcard OASF taxonomy matching + Given agent "0xW001" has skill "devops_mlops/model_versioning" + And agent "0xW002" has skill "devops_mlops/container_orchestration" + When the coordinator discovers workers with filter "devops_mlops/*" + Then both agents are returned + +Scenario: Agent with no OASF service entry is excluded from skill-filtered queries + Given agent "0xW003" has only a "web" service entry (no OASF) + When the coordinator discovers workers with any OASF skill filter + Then agent "0xW003" is not in the results +``` + +--- + +## 3. x402 PaymentRequirements & ESCROW SCHEME (Question 2) + +### What exists: +- escrow.feature tests authorize(), capture(), void(), reclaim(), refund() +- e2e.feature mentions "x402 payments were collected" as a precondition +- ServiceOffer CRD defines payment.scheme as "exact" (only enum value) + +### What is MISSING: + +#### 3a. No PaymentRequirements generation scenario +The ServiceOffer CRD (serviceoffer-crd.yaml) defines x402 PaymentRequirements +fields (payTo, network, scheme, maxTimeoutSeconds, price) but NO feature tests: +- PaymentRequirements struct generation from ServiceOffer spec +- CAIP-2 network resolution ("base-sepolia" -> "eip155:84532") +- maxTimeoutSeconds enforcement +- Price calculation (perRequest vs perMTok vs perHour) + +RECOMMENDED SCENARIOS: +```gherkin +@x402 +Scenario: ServiceOffer generates valid x402 PaymentRequirements + Given a ServiceOffer with network "base-sepolia" and payTo "0xAAA" + And price.perRequest is "0.01" + And scheme is "exact" + When the reconciler generates PaymentRequirements + Then the network field is "eip155:84532" (CAIP-2) + And the payTo field is "0xAAA" + And the maxAmountRequired matches "0.01" in USDC base units + +Scenario: Escrow scheme PaymentRequirements includes authorization metadata + Given the escrow round manager is preparing round 7 + And the reward pool is 60 USDC + When PaymentRequirements are generated for the escrow scheme + Then the scheme field is "escrow" + And the authorizationId references the current round + And the maxAmountRequired equals 60 USDC + And the authorizationExpiry is set to round_end + grace_period +``` + +#### 3b. No "exact" vs "escrow" scheme distinction +The CRD only allows scheme: "exact". But the escrow round lifecycle clearly +uses a different payment flow (authorize/capture/void). There is no scenario +showing how the two schemes coexist: +- Worker earns via x402 "exact" scheme (instant per-request payments) +- Platform collects those payments, then uses escrow for reward distribution +- The feature files treat these as independent but never show the handoff + +RECOMMENDED SCENARIO: +```gherkin +Scenario: x402 exact payments flow into escrow pool for next round + Given workers served 200 x402 "exact" scheme requests in round N + And the total collected USDC is 200 + When round N+1 begins + Then the escrow pool is 200 * 30% = 60 USDC + And the escrow authorization uses the "escrow" scheme internally + And workers can verify the pool amount on-chain +``` + +--- + +## 4. MISSING EDGE CASE SCENARIOS (Question 3) + +### 4a. Worker re-registration mid-round +NO SCENARIO EXISTS. Critical gap because ERC-8004 allows updating registration +at any time. + +```gherkin +@erc8004 +Scenario: Worker updates ERC-8004 registration mid-round + Given worker "0xW001" is participating in round 5 + And worker "0xW001" updates their registration JSON to remove the + "devops_mlops/model_versioning" skill + When the round completes + Then worker "0xW001" still qualifies for round 5 (snapshot at round start) + But worker "0xW001" is NOT discovered for round 6 + +Scenario: Worker re-registers with a different address mid-round + Given worker "0xW001" is participating in round 5 + And worker "0xW001" registers a new ERC-8004 token with address "0xW001b" + When the round completes + Then only the original registration is used for round 5 settlement +``` + +### 4b. ERC-8004 NFT transfer during round +NO SCENARIO EXISTS. Since ERC-8004 tokens are NFTs, they can be transferred. +This could cause: +- Worker loses ownership of their identity mid-round +- New owner could try to claim rewards +- The payTo address no longer matches the NFT owner + +```gherkin +@erc8004 +Scenario: ERC-8004 NFT transferred during active round + Given worker "0xW001" owns ERC-8004 token 12345 + And worker "0xW001" is a qualifier in round 5 + When token 12345 is transferred to "0xATTACKER" during the round + Then capture() still pays "0xW001" (the address that did the work) + And the new NFT owner "0xATTACKER" receives nothing for round 5 + +Scenario: Discovery uses token ownership snapshot at round start + Given worker "0xW001" owned token 12345 at block 1000 (round start) + And token 12345 was transferred to "0xW002" at block 1005 + When the coordinator discovers workers at round start + Then "0xW001" is the registered worker, not "0xW002" +``` + +### 4c. x402 facilitator timeout +NO SCENARIO EXISTS. The ServiceOffer CRD has maxTimeoutSeconds (default: 300). + +```gherkin +@x402 +Scenario: x402 payment verification times out + Given a ServiceOffer with maxTimeoutSeconds of 300 + And a buyer sends a payment header + When the x402 facilitator does not respond within 300 seconds + Then the payment is considered failed + And the request is rejected with HTTP 402 + And no USDC is deducted from the buyer + +Scenario: x402 facilitator timeout during round does not affect escrow + Given the escrow authorization for round 5 is already locked + And an x402 facilitator timeout occurs during the round + Then the escrow authorization remains valid + And workers can still submit proofs + But the affected request is not counted toward x402 revenue for round 6 +``` + +### 4d. BaseScan rate limiting +NO SCENARIO EXISTS. BaseScan free tier is 5 req/sec. With 18,512 ERC-8004 +holders, pagination + metadata fetching will hit limits. + +```gherkin +@discovery +Scenario: BaseScan API returns HTTP 429 rate limit + Given the coordinator is using BaseScan for discovery + And the BaseScan API returns HTTP 429 after 5 requests + When the coordinator discovers workers + Then the coordinator implements exponential backoff + And retries after the Retry-After header period + And eventually returns partial results with a warning + +Scenario: BaseScan rate limiting causes fallback to 8004scan + Given the coordinator is using BaseScan for discovery + And the BaseScan API consistently returns HTTP 429 + When 3 consecutive retries fail + Then the coordinator falls back to 8004scan.io + And workers are still discovered successfully +``` + +### 4e. Chain reorg affecting indexer data +NO SCENARIO EXISTS. The Reth indexer syncs Base chain data. Base has finality +~2 seconds but reorgs do happen. + +```gherkin +@discovery +Scenario: Chain reorg removes a recent ERC-8004 registration + Given the Reth indexer has synced to block 1000 + And agent "0xNEW" was registered at block 999 + When a 2-block reorg occurs at block 999 + And the new chain does not contain the registration transaction + Then the indexer re-processes blocks 999-1000 + And agent "0xNEW" is removed from discovery results + +Scenario: Coordinator uses confirmation depth for registration finality + Given the Reth indexer requires 12-block confirmation depth + And a new registration appears at block 1000 + When the current block is 1005 (only 5 confirmations) + Then the registration is not yet included in discovery results + When the current block reaches 1012 + Then the registration becomes discoverable +``` + +--- + +## 5. LEADERBOARD API FEATURE (Question 4) + +YES, a leaderboard feature is needed. Currently: +- The issue doc says "Exposes leaderboard API (GET /leaderboard, GET /round/:id)" +- The e2e feature mentions it once (line 43): "the leaderboard API shows both + workers with correct earnings" +- But there is NO dedicated feature file testing the leaderboard API + +RECOMMENDED: New file `leaderboard_api.feature`: + +```gherkin +@leaderboard @api +Feature: Leaderboard API + The reward engine exposes a leaderboard API that shows per-worker + earnings, influence, and qualification status across rounds. + + Background: + Given the autoresearch chart is deployed + And rounds 1-3 have completed with verified workers + + Rule: Current round leaderboard shows live state + + Scenario: GET /leaderboard returns ranked workers by cumulative earnings + When a client requests GET /leaderboard + Then workers are returned sorted by total USDC earned descending + And each entry includes address, total_earned, rounds_participated, avg_influence + + Scenario: GET /leaderboard includes only ERC-8004 registered workers + Given worker "0xAAA" is registered on ERC-8004 + And worker "0xBBB" was registered but token was burned + When a client requests GET /leaderboard + Then worker "0xAAA" appears in results + And worker "0xBBB" does not appear + + Rule: Per-round details are queryable + + Scenario: GET /round/:id returns round-specific data + When a client requests GET /round/3 + Then the response includes: + | field | type | + | round_id | integer | + | pool_usdc | decimal | + | num_qualifiers | integer | + | num_excluded | integer | + | captures | array | + | voided_usdc | decimal | + | escrow_tx_hash | string | + + Scenario: GET /round/:id for non-existent round returns 404 + When a client requests GET /round/9999 + Then the response status is 404 + + Rule: Leaderboard reflects escrow settlement accurately + + Scenario: Leaderboard updates only after capture() confirms on-chain + Given round 5 just completed + And capture() has been called for worker "0xAAA" + But the transaction is still pending + When a client requests GET /leaderboard + Then worker "0xAAA" earnings do NOT include round 5 yet + When the capture transaction confirms + Then worker "0xAAA" earnings include round 5 +``` + +--- + +## 6. ROUND-OVER-ROUND STATE FEATURE (Question 5) + +YES, a round-over-round state feature is needed. Currently: +- reward.feature line 45 mentions "the unadopted share rolls into the next round" +- The issue doc describes void() returning uncaptured funds for the next round +- But there is NO feature testing the cumulative/rollover mechanics + +RECOMMENDED: New file `round_over_round_state.feature`: + +```gherkin +@rounds @state +Feature: Round-over-round state (pool rollover, cumulative earnings) + The reward engine maintains state across rounds, rolling uncaptured + funds into the next round's pool and tracking cumulative worker + performance. + + Background: + Given the reward pool percentage is 30% + And the platform wallet holds 1000 USDC + + Rule: Uncaptured funds roll into the next round + + Scenario: Voided USDC from round N increases round N+1 pool + Given round 1 collected 200 USDC in x402 payments + And the round 1 pool was 60 USDC + And only 40 USDC was captured (void returned 20 USDC) + When round 2 begins + Then the round 2 pool includes the 20 USDC rollover + And the total round 2 pool is (round2_x402_revenue * 30%) + 20 USDC + + Scenario: Fully captured round has zero rollover + Given round 1 pool was 60 USDC + And all 60 USDC was captured across workers + When round 2 begins + Then the round 2 pool is exactly (round2_x402_revenue * 30%) + + Rule: Cumulative earnings are tracked per worker + + Scenario: Worker earnings accumulate across rounds + Given worker "0xAAA" earned 42 USDC in round 1 + And worker "0xAAA" earned 35 USDC in round 2 + When the cumulative earnings are queried + Then worker "0xAAA" has total earnings of 77 USDC + + Scenario: Worker who skips a round retains prior earnings + Given worker "0xAAA" earned 42 USDC in round 1 + And worker "0xAAA" did not participate in round 2 + When the cumulative earnings are queried after round 2 + Then worker "0xAAA" still has total earnings of 42 USDC + + Rule: Round numbering is monotonic and gap-free + + Scenario: Failed round start still increments round counter + Given round 5 completed successfully + And round 6 failed to authorize escrow (insufficient funds) + When the next successful authorization occurs + Then it is labeled round 7 (round 6 is recorded as failed) + And round 6 shows zero pool and zero captures in history +``` + +--- + +## 7. ERC-8004 IDENTITY REVOCATION/DEACTIVATION MID-ROUND (Question 6) + +NO SCENARIOS EXIST. This is a critical gap. ERC-8004 tokens can be: +- Burned (destroying the identity) +- Transferred (changing ownership) +- Have their registration JSON updated (removing services) + +RECOMMENDED SCENARIOS (add to discovery.feature or new file): + +```gherkin +@erc8004 @critical +Rule: ERC-8004 identity changes during active round + + Scenario: Worker's ERC-8004 token is burned mid-round + Given worker "0xW001" has ERC-8004 token 12345 + And worker "0xW001" is a qualifier in round 5 with verified proofs + When token 12345 is burned during the round + Then worker "0xW001" STILL receives their capture for round 5 + (work was verified before identity was revoked) + But worker "0xW001" is NOT discovered for round 6 + + Scenario: Worker's ERC-8004 registration JSON is updated to remove services + Given worker "0xW001" has OASF service "devops_mlops/model_versioning" + And worker "0xW001" is participating in round 5 + When worker "0xW001" updates their registration JSON to remove all services + Then round 5 continues using the snapshot taken at round start + And worker "0xW001" is excluded from round 6 discovery + + Scenario: ERC-8004 contract is paused during active round + Given the ERC-8004 contract at 0x8004...9432 is paused + And round 5 has already started with discovered workers + When the round completes + Then captures are still executed (escrow is independent of ERC-8004) + But round 6 discovery fails with "ERC-8004 contract paused" error + + Scenario: Worker address is sanctioned/blocklisted mid-round + Given worker "0xW001" is a qualifier in round 5 + When address "0xW001" appears on a sanctions list + Then the escrow round manager skips capture for "0xW001" + And the uncaptured amount is voided back to the platform + And the event is logged with the sanctions reason +``` + +--- + +## 8. ADDITIONAL MISSING SCENARIOS + +### 8a. Concurrent round edge cases +```gherkin +Scenario: Two rounds cannot be active simultaneously + Given round 5 is in progress with active escrow authorization + When the system attempts to start round 6 + Then the start is rejected with "round 5 still active" + And no new escrow authorization is created +``` + +### 8b. Gas price spike during settlement +```gherkin +Scenario: Gas price spike during capture phase + Given round 5 has 10 workers to pay + And 5 captures have succeeded + When the Base gas price exceeds the configured maxGasPrice + Then remaining captures are queued for retry + And the escrow authorization has not expired yet + And captures resume when gas price drops +``` + +### 8c. Zero-worker round +```gherkin +Scenario: Round starts but no workers respond + Given round 5 authorized 60 USDC in escrow + And no workers submitted precommitments + When the round duration expires + Then void() returns all 60 USDC to the platform + And the round is recorded with zero qualifiers +``` + +### 8d. Duplicate ERC-8004 registrations (same address, multiple tokens) +```gherkin +Scenario: Worker holds multiple ERC-8004 tokens with same skill + Given address "0xW001" owns tokens 12345 and 12346 + And both tokens have "devops_mlops/model_versioning" skill + When the coordinator discovers workers + Then "0xW001" appears only once (deduplicated by address) + And the most recent registration is used +``` + +--- + +## 9. SUMMARY OF RECOMMENDATIONS + +| Priority | Gap | Affected File(s) | Action | +|----------|-----|-------------------|--------| +| P0 | No tokenURI resolution testing | discovery.feature | Add 3 scenarios | +| P0 | No OASF taxonomy filtering tests | discovery.feature | Add 3 scenarios | +| P0 | No ERC-8004 identity revocation mid-round | NEW: erc8004_identity_lifecycle.feature | Create file | +| P0 | No NFT transfer during round | NEW or discovery.feature | Add 2 scenarios | +| P1 | No PaymentRequirements generation tests | NEW: x402_payment_requirements.feature | Create file | +| P1 | No leaderboard API feature | NEW: leaderboard_api.feature | Create file | +| P1 | No round-over-round state feature | NEW: round_over_round_state.feature | Create file | +| P1 | No BaseScan rate limiting scenarios | discovery.feature | Add 2 scenarios | +| P1 | No chain reorg scenarios | discovery.feature | Add 2 scenarios | +| P2 | No x402 facilitator timeout scenarios | escrow.feature or NEW | Add 2 scenarios | +| P2 | No gas spike during settlement | escrow.feature | Add 1 scenario | +| P2 | E2E scenario too long | e2e.feature | Refactor | +| P3 | Missing @erc8004 @x402 tags | All files | Add tags | +| P3 | Magic numbers without context | escrow.feature, opow.feature | Add comments | + +Total: 6 existing files need enhancements, 3-4 new feature files recommended. diff --git a/docs/issues/features/commit_reveal_work_verification.feature b/docs/issues/features/commit_reveal_work_verification.feature new file mode 100644 index 00000000..cecb9cfc --- /dev/null +++ b/docs/issues/features/commit_reveal_work_verification.feature @@ -0,0 +1,63 @@ +@verification @critical +Feature: Commit-reveal work verification + Workers commit to results via a Merkle root before learning + which nonces will be sampled. This prevents retroactive + fabrication of results. + + Background: + Given the verifier is running + And the neuralnet_optimizer challenge is active + And the sample count is 5 nonces per benchmark + + Rule: Honest workers pass verification + + Scenario: Worker with valid proofs becomes a qualifier + Given worker "0xAAA" precommits a benchmark with 100 nonces + And the verifier assigns a random hash and track + When worker "0xAAA" submits a Merkle root over 100 results + And the verifier samples 5 nonces for verification + And worker "0xAAA" submits valid Merkle proofs for all 5 + Then worker "0xAAA" is recorded as a qualifier + And the benchmark quality scores are accepted + + Scenario: Re-execution confirms claimed quality + Given worker "0xAAA" claims val_bpb of 3.2 for nonce 42 + When the verifier re-executes nonce 42 with the same settings + Then the re-executed val_bpb matches the claimed 3.2 + And the proof is accepted + + Rule: Dishonest workers fail verification + + Scenario: Invalid Merkle proof is rejected + Given worker "0xCCC" submitted a Merkle root + And the verifier sampled nonces [7, 23, 45, 61, 89] + When worker "0xCCC" submits a proof for nonce 23 that does not match the root + Then the verification fails for worker "0xCCC" + And worker "0xCCC" is excluded from qualifiers for this round + And no escrow capture is made for worker "0xCCC" + + Scenario: Worker who inflates quality scores is caught + Given worker "0xCCC" claims val_bpb of 2.8 for nonce 42 + When the verifier re-executes nonce 42 with the same settings + And the re-executed val_bpb is 3.5 + Then the quality mismatch is detected + And the verification fails for worker "0xCCC" + + Scenario: Worker who times out on proof submission is excluded + Given worker "0xCCC" submitted a Merkle root + And the verifier sampled 5 nonces + When worker "0xCCC" does not submit proofs within 300 seconds + Then worker "0xCCC" is excluded from qualifiers + And the round proceeds without them + + Rule: Sampling is fair and deterministic + + Scenario: Nonce sampling is deterministic from the round seed + Given the same benchmark settings and random hash + When nonces are sampled twice + Then the same 5 nonces are selected both times + + Scenario: Worker cannot predict which nonces will be sampled + Given the random hash is derived from a future block hash + When the worker commits their Merkle root + Then the sampled nonces have not yet been determined diff --git a/docs/issues/features/end_to_end_autoresearch_round.feature b/docs/issues/features/end_to_end_autoresearch_round.feature new file mode 100644 index 00000000..d7b4fa35 --- /dev/null +++ b/docs/issues/features/end_to_end_autoresearch_round.feature @@ -0,0 +1,59 @@ +@e2e @slow +Feature: End-to-end autoresearch round + A complete round from escrow authorization through worker + experiments to reward distribution and settlement. + + Background: + Given the autoresearch chart is deployed with default values + And an Anvil fork of Base Sepolia is running + And the platform wallet holds 500 USDC + And 2 GPU workers are registered on ERC-8004: + | address | skill | gpu | + | 0xW001 | devops_mlops/model_versioning | NVIDIA T4 | + | 0xW002 | devops_mlops/model_versioning | NVIDIA A10 | + And 1 innovator submitted algorithm "muon-opt-v2" for neuralnet_optimizer + + Scenario: Complete round with two honest workers + # Round setup + Given 100 USDC of x402 payments were collected in the previous round + When a new round begins + Then 30 USDC is authorized in escrow + + # Worker experiments + When worker "0xW001" precommits a benchmark with 50 nonces + And worker "0xW002" precommits a benchmark with 50 nonces + And both workers submit Merkle roots over their results + And the verifier samples 5 nonces from each worker + And both workers submit valid Merkle proofs + Then both workers are recorded as qualifiers + + # Reward calculation + When the round duration expires + Then the reward engine computes influence for both workers + And both workers have balanced challenge participation + And influence is split proportionally to qualifier count + + # Settlement + When captures are executed + Then worker "0xW001" receives their earned USDC via capture() + And worker "0xW002" receives their earned USDC via capture() + And innovator "muon-opt-v2" receives adoption-weighted USDC + And the operator receives 10% of the pool + And void() returns any remainder to the platform wallet + And the leaderboard API shows both workers with correct earnings + And the next round begins with a new authorization + + Scenario: Round where one worker submits fraudulent proofs + Given 100 USDC of x402 payments were collected + When a new round begins + Then 30 USDC is authorized in escrow + + When worker "0xW001" submits valid proofs for all sampled nonces + And worker "0xW002" submits a proof with a quality mismatch + Then worker "0xW001" is a qualifier + And worker "0xW002" is excluded + + When captures are executed + Then worker "0xW001" receives the entire worker pool share + And worker "0xW002" receives nothing + And void() returns worker "0xW002"'s unclaimed share to the platform diff --git a/docs/issues/features/erc8004_identity_lifecycle.feature b/docs/issues/features/erc8004_identity_lifecycle.feature new file mode 100644 index 00000000..42cb8e43 --- /dev/null +++ b/docs/issues/features/erc8004_identity_lifecycle.feature @@ -0,0 +1,105 @@ +@erc8004 @identity +Feature: ERC-8004 identity lifecycle during rounds + Workers are identified by ERC-8004 agent NFTs on Base. + The system must handle registration, metadata updates, + NFT transfers, and deactivation gracefully during + active rounds. + + Background: + Given the ERC-8004 Identity Registry is at "0x8004A169FB4a3325136EB29fA0ceB6D2e539a432" + And the OASF skill filter is "devops_mlops/model_versioning" + + Rule: Only registered agents can participate + + Scenario: Worker with valid ERC-8004 registration is discovered + Given worker "0xW001" holds agent NFT token ID 12345 + And the NFT metadata includes skill "devops_mlops/model_versioning" + And the registration JSON at .well-known/agent-registration.json is valid + When the coordinator discovers workers + Then worker "0xW001" appears in the results + And the worker's x402 endpoint is read from the registration services list + + Scenario: Worker without ERC-8004 registration is excluded + Given worker "0xW002" has no agent NFT + When the coordinator discovers workers + Then worker "0xW002" does not appear in the results + + Scenario: Worker with wrong OASF skill is filtered out + Given worker "0xW003" holds agent NFT token ID 12346 + And the NFT metadata includes skill "communication/chat" but not "devops_mlops/model_versioning" + When the coordinator discovers workers with skill filter + Then worker "0xW003" does not appear in the results + + Rule: Metadata updates are reflected in discovery + + Scenario: Worker updates best_val_bpb in registration metadata + Given worker "0xW001" registered with best_val_bpb of 3.5 + When worker "0xW001" calls URIUpdated with best_val_bpb of 3.1 + And the discovery cache TTL expires + Then the coordinator sees worker "0xW001" with best_val_bpb 3.1 + + Scenario: Worker adds a new OASF skill to their registration + Given worker "0xW004" registered with skill "data_processing/etl" + When worker "0xW004" updates metadata to add "devops_mlops/model_versioning" + Then worker "0xW004" becomes discoverable by the coordinator + + Rule: Identity is snapshotted at benchmark acceptance + + # At the moment a worker's benchmark is accepted (precommit confirmed), + # the verifier snapshots: ownerOf(tokenId), payout wallet (from + # registration JSON), and registration metadata. All reward routing + # for that round uses the SNAPSHOT, not live on-chain state. + # This prevents mid-round transfers or metadata updates from + # redirecting or nullifying rewards after work is accepted. + + Scenario: Snapshot captures payout wallet at benchmark acceptance + Given worker "0xW001" holds agent NFT token ID 12345 + And the registration JSON lists payout wallet "0xPAY1" + When worker "0xW001"'s precommit is accepted by the verifier + Then the verifier snapshots owner "0xW001" and payout "0xPAY1" + And rewards for this round are routed to "0xPAY1" regardless of later changes + + Scenario: NFT transferred mid-round does not redirect rewards + Given worker "0xW001" is a qualifier in the current round + And the snapshot records payout wallet "0xPAY1" + When worker "0xW001" transfers their agent NFT to "0xNEW" + And "0xNEW" updates the registration payout to "0xPAY_NEW" + And the round completes + Then the RewardDistributor sends "0xW001"'s share to "0xPAY1" + And "0xNEW" is the registered owner for subsequent rounds + + Scenario: Worker deactivates registration mid-round + Given worker "0xW001" is a qualifier in the current round + And worker "0xW001" sets registration active=false + When the round completes + Then the distribution includes "0xW001"'s verified work from this round + And "0xW001" is excluded from discovery in the next round + + Scenario: Metadata URI update mid-round does not affect current snapshot + Given worker "0xW001" is a qualifier with snapshotted best_val_bpb 3.2 + When worker "0xW001" calls URIUpdated with best_val_bpb 2.8 + Then the current round still uses the snapshotted 3.2 + And the next round's discovery will reflect 2.8 + + Scenario: Burned agent NFT removes worker from future rounds only + Given worker "0xW001" holds agent NFT token ID 12345 + And worker "0xW001" is a qualifier in the current round + When the NFT is burned (transferred to address zero) + Then the current round's rewards are still distributed per snapshot + And worker "0xW001" is removed from all discovery backends + And "0xW001" cannot participate in subsequent rounds + + Rule: Registration JSON schema is validated + + Scenario: Malformed registration JSON is rejected + Given worker "0xW005" has a tokenURI pointing to invalid JSON + When the discovery client fetches the registration + Then worker "0xW005" is skipped with a schema validation warning + And discovery continues with remaining workers + + Scenario: Registration JSON with missing x402 endpoint is skipped + Given worker "0xW006" has valid registration JSON + But the services list contains no x402-compatible endpoint + When the coordinator discovers workers + Then worker "0xW006" is excluded + And a warning is logged about missing x402 endpoint diff --git a/docs/issues/features/escrow_round_lifecycle.feature b/docs/issues/features/escrow_round_lifecycle.feature new file mode 100644 index 00000000..15490a9f --- /dev/null +++ b/docs/issues/features/escrow_round_lifecycle.feature @@ -0,0 +1,87 @@ +@escrow @critical +Feature: Escrow round lifecycle + The escrow round manager locks USDC in the Commerce Payments + AuthCaptureEscrow contract at the start of each round and + distributes earnings to verified workers at round end. + + Background: + Given the autoresearch chart is deployed on a k3s cluster + And an Anvil fork of Base Sepolia is running + And the platform wallet holds 1000 USDC + And the AuthCaptureEscrow contract is at "0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff" + And the reward pool percentage is 30% + + Rule: Funds must be locked before any work begins + + Scenario: Round starts with successful escrow authorization + Given 200 USDC of x402 payments were collected in the previous round + When a new round begins + Then the escrow round manager calls authorize() for 60 USDC + And the AuthCaptureEscrow capturableAmount equals 60 USDC + And the authorizationExpiry is set to round end plus 1 hour grace + And workers can verify the commitment on-chain + + Scenario: Round start fails when platform wallet has insufficient USDC + Given the platform wallet holds 0 USDC + When a new round begins + Then the escrow round manager logs an authorization failure + And no work is accepted for this round + And the previous round's uncaptured funds are not affected + + Rule: Workers are paid proportionally to verified influence + + # NOTE: The AuthCaptureEscrow receiver is FIXED per PaymentInfo. + # All captures from one authorize() go to the SAME receiver address. + # We use a RewardDistributor contract as the single receiver, + # which then splits USDC to individual workers via ERC20 transfers. + # + # Flow: authorize(receiver=RewardDistributor) → capture(full worker pool) + # → RewardDistributor.distribute(workers[], amounts[]) + + Scenario: Two verified workers receive proportional rewards + Given a round with 100 USDC authorized in escrow + And the escrow receiver is the RewardDistributor contract + And worker "0xAAA" has 60% influence + And worker "0xBBB" has 40% influence + And both workers passed commit-reveal verification + When the round completes + Then capture() is called once for 70 USDC to the RewardDistributor + And the RewardDistributor transfers 42 USDC to "0xAAA" + And the RewardDistributor transfers 28 USDC to "0xBBB" + And the platform fee receiver gets 2% of the capture + And void() is called for the remaining 30 USDC + And the remaining USDC returns to the platform wallet + + Scenario: Unverified worker receives no distribution + Given a round with 100 USDC authorized in escrow + And worker "0xAAA" passed verification with 100% influence + And worker "0xCCC" failed commit-reveal verification + When the round completes + Then capture() sends the worker pool to the RewardDistributor + And the RewardDistributor transfers funds only to "0xAAA" + And worker "0xCCC" receives nothing + And void() returns the uncaptured remainder to the platform wallet + + Scenario: Round with no verified workers voids entirely + Given a round with 100 USDC authorized in escrow + And no workers submitted valid proofs + When the round completes + Then no capture() is called + And void() returns the full 100 USDC to the platform wallet + + Rule: Funds are always recoverable + + Scenario: Platform reclaims funds after manager crash + Given a round with 100 USDC authorized in escrow + And the escrow round manager process has crashed + When the authorizationExpiry passes + Then the platform wallet calls reclaim() directly + And the full 100 USDC returns to the platform wallet + And no operator signature is required + + Scenario: Operator refunds a worker after post-capture fraud discovery + Given worker "0xAAA" received a 42 USDC capture in round 5 + And fraud is discovered within the refund window + When the operator calls refund() for 42 USDC + Then 42 USDC returns to the platform wallet + And the refund is recorded in the round history diff --git a/docs/issues/features/leaderboard_api.feature b/docs/issues/features/leaderboard_api.feature new file mode 100644 index 00000000..e8802583 --- /dev/null +++ b/docs/issues/features/leaderboard_api.feature @@ -0,0 +1,66 @@ +@leaderboard @api +Feature: Leaderboard API + The reward engine exposes a REST API showing per-round + rankings, cumulative earnings, and worker performance + history. + + Background: + Given the reward engine is running + And 3 completed rounds exist in the history + + Rule: Current round leaderboard reflects live state + + Scenario: Leaderboard shows workers ranked by influence + Given round 4 is in progress + And worker "0xAAA" has influence 0.45 + And worker "0xBBB" has influence 0.35 + And worker "0xCCC" has influence 0.20 + When GET /leaderboard is called + Then the response contains 3 workers in descending influence order + And each entry includes worker address, influence, and estimated reward + + Scenario: Leaderboard includes innovator rankings + Given algorithm "muon-v3" has 60% adoption + And algorithm "adamw-base" has 40% adoption + When GET /leaderboard?role=innovator is called + Then the response shows innovators ranked by adoption percentage + + Rule: Historical round data is queryable + + Scenario: Completed round data includes settlement details + When GET /round/3 is called + Then the response includes: + | field | description | + | round_id | 3 | + | pool_amount | total USDC in the reward pool | + | worker_rewards | per-worker capture amounts | + | innovator_rewards | per-innovator adoption earnings | + | operator_reward | operator share | + | escrow_tx_hash | authorize() transaction hash | + | capture_tx_hashes | list of capture() transaction hashes| + | void_tx_hash | void() transaction hash | + | round_start | ISO 8601 timestamp | + | round_end | ISO 8601 timestamp | + + Scenario: Round history respects retention limit + Given the retention is set to 100 rounds + And 150 rounds have completed + When GET /round/10 is called + Then a 404 is returned + When GET /round/51 is called + Then the round data is returned + + Rule: Cumulative earnings are tracked per participant + + Scenario: Worker cumulative earnings span multiple rounds + Given worker "0xAAA" earned 42 USDC in round 1 + And worker "0xAAA" earned 35 USDC in round 2 + And worker "0xAAA" earned 50 USDC in round 3 + When GET /leaderboard?cumulative=true is called + Then worker "0xAAA" shows total earnings of 127 USDC + + Scenario: Leaderboard is empty before first round completes + Given no rounds have completed yet + When GET /leaderboard is called + Then the response contains an empty workers list + And the response includes round_in_progress=true diff --git a/docs/issues/features/multi_tier_worker_discovery_with_fallback.feature b/docs/issues/features/multi_tier_worker_discovery_with_fallback.feature new file mode 100644 index 00000000..0a64940a --- /dev/null +++ b/docs/issues/features/multi_tier_worker_discovery_with_fallback.feature @@ -0,0 +1,47 @@ +@discovery +Feature: Multi-tier worker discovery with fallback + The coordinator discovers GPU workers through a prioritized + chain of discovery backends. If the preferred backend is + unavailable, it falls back to the next tier automatically. + + Background: + Given the OASF skill filter is "devops_mlops/model_versioning" + + Rule: Discovery uses the highest-priority available backend + + Scenario: Coordinator uses Reth indexer when available + Given the reth-erc8004-indexer is deployed in the cluster + And the indexer has synced past the latest registration + When the coordinator discovers workers + Then the query goes to the Reth indexer API + And workers with the model_versioning skill are returned + + Scenario: Coordinator falls back to BaseScan when indexer is down + Given the reth-erc8004-indexer is not deployed + And a BaseScan API key is configured + When the coordinator discovers workers + Then the query goes to the BaseScan API + And ERC-8004 NFT metadata is read for each agent + And workers with the model_versioning skill are returned + + Scenario: Coordinator falls back to 8004scan as last resort + Given the reth-erc8004-indexer is not deployed + And no BaseScan API key is configured + When the coordinator discovers workers + Then the query goes to 8004scan.io + And workers with the model_versioning skill are returned + + Scenario: All backends unavailable produces a clear error + Given no discovery backends are reachable + When the coordinator discovers workers + Then a "no discovery backend available" error is returned + And the round proceeds with zero workers + + Rule: Discovery results are cached to reduce API calls + + Scenario: Repeated queries within TTL use cached results + Given the cache TTL is 300 seconds + And a discovery query succeeded 60 seconds ago + When the coordinator discovers workers again + Then no external API call is made + And the cached results are returned diff --git a/docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature b/docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature new file mode 100644 index 00000000..e4bc40b4 --- /dev/null +++ b/docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature @@ -0,0 +1,92 @@ +@opow @critical +Feature: OPOW influence calculation with anti-monopoly parity + The reward engine computes per-worker influence using a parity + formula that penalizes concentration on a single challenge. + Workers must diversify across all active challenges to maximize + their earnings. + + Background: + Given the imbalance multiplier is 3.0 + + Rule: Diversified workers earn more than concentrated workers + + Scenario: Equally diversified worker has zero penalty + Given 4 active challenges + And worker "0xAAA" has qualifier fractions: + | challenge | fraction | + | c001 | 0.25 | + | c002 | 0.25 | + | c003 | 0.25 | + | c004 | 0.25 | + When influence is calculated + Then worker "0xAAA" imbalance is 0.0 + And worker "0xAAA" penalty factor is 1.0 + + Scenario: Fully concentrated worker is severely penalized + Given 4 active challenges + And worker "0xBBB" has qualifier fractions: + | challenge | fraction | + | c001 | 1.00 | + | c002 | 0.00 | + | c003 | 0.00 | + | c004 | 0.00 | + When influence is calculated + Then worker "0xBBB" imbalance is 1.0 + And the imbalance-multiplier product is 3.0 + And worker "0xBBB" penalty factor is less than 0.05 + + Scenario: Concentrated worker earns less despite equal total output + Given 2 active challenges and a worker pool of 100 USDC + And worker "0xAAA" submitted 50 proofs to c001 and 50 to c002 + And worker "0xBBB" submitted 100 proofs to c001 and 0 to c002 + When influence is calculated and rewards are distributed + Then worker "0xAAA" earns more than worker "0xBBB" + And the ratio of earnings exceeds 5:1 + + Scenario Outline: Parity penalty scales with concentration + Given 2 active challenges + And a worker has qualifier fractions and + When influence is calculated + Then the penalty factor is approximately + + Examples: + | f1 | f2 | penalty | + | 0.50 | 0.50 | 1.00 | + | 0.70 | 0.30 | 0.62 | + | 0.90 | 0.10 | 0.15 | + | 1.00 | 0.00 | 0.05 | + + Rule: Influence values are normalized across all workers + + Scenario: Total influence sums to 1.0 + Given 3 workers with varying qualifier fractions + When influence is calculated for all workers + Then the sum of all influence values equals 1.0 + + Scenario: Single worker in a round gets full influence + Given 1 worker who participated in all active challenges + When influence is calculated + Then that worker's influence is 1.0 + And they receive the entire worker pool + + Rule: Single-challenge rounds disable the parity penalty + + Scenario: With only one active challenge all workers get zero imbalance + Given 1 active challenge + And worker "0xAAA" has qualifier fraction 0.8 in c001 + And worker "0xBBB" has qualifier fraction 0.2 in c001 + When influence is calculated + Then worker "0xAAA" imbalance is 0.0 + And worker "0xBBB" imbalance is 0.0 + And influence is proportional to qualifier count only + + Rule: New challenges phase in gradually + + Scenario: Newly added challenge does not immediately penalize existing workers + Given 2 active challenges c001 and c002 + And challenge c003 is added with a phase-in period of 100 blocks + And worker "0xAAA" has proofs in c001 and c002 but not c003 + When influence is calculated at block 10 of the phase-in + Then worker "0xAAA" receives a blended penalty + And the c003 weight is 10% of its final weight + And the penalty is less severe than after full phase-in diff --git a/docs/issues/features/reward_pool_distribution_across_roles.feature b/docs/issues/features/reward_pool_distribution_across_roles.feature new file mode 100644 index 00000000..3db33d85 --- /dev/null +++ b/docs/issues/features/reward_pool_distribution_across_roles.feature @@ -0,0 +1,59 @@ +@rewards +Feature: Reward pool distribution across roles + The reward engine splits the pool among innovators, workers, + and operators according to configured percentages. Worker + distribution is influence-weighted. Innovator distribution + is adoption-weighted. + + Background: + Given the pool split is 20% innovators, 70% workers, 10% operators + And a round with 100 USDC in the reward pool + + Rule: Pool splits match configured percentages + + Scenario: Standard round distributes to all three roles + When the round completes with verified workers + Then 20 USDC is allocated to innovators + And 70 USDC is allocated to workers + And 10 USDC is allocated to operators + + Rule: Workers earn by influence + + Scenario: Workers are paid proportionally to influence + Given the worker pool is 70 USDC + And worker "0xAAA" has influence 0.6 + And worker "0xBBB" has influence 0.4 + When worker rewards are distributed + Then worker "0xAAA" earns 42 USDC + And worker "0xBBB" earns 28 USDC + + Rule: Innovators earn by adoption + + Scenario: Algorithm author earns when workers adopt their code + Given the innovator pool is 20 USDC for the neuralnet_optimizer challenge + And algorithm "fast-muon-v3" by innovator "0xINN1" has 75% adoption + And algorithm "baseline-adamw" by innovator "0xINN2" has 25% adoption + When innovator rewards are distributed + Then innovator "0xINN1" earns 15 USDC + And innovator "0xINN2" earns 5 USDC + + Scenario: Unadopted algorithm earns nothing + Given the innovator pool is 20 USDC + And algorithm "untested-v1" has 0% adoption + When innovator rewards are distributed + Then the author of "untested-v1" earns 0 USDC + And the unadopted share rolls into the next round + + Rule: Gamma scaling adjusts for challenge count + + Scenario Outline: Reward scales with number of active challenges + Given gamma parameters a=1.0, b=0.5, c=0.3 + And challenges are active + When the gamma value is calculated + Then the scaling factor is approximately + + Examples: + | n | gamma | + | 1 | 0.63 | + | 3 | 0.80 | + | 7 | 0.94 | diff --git a/docs/issues/features/round_state_continuity.feature b/docs/issues/features/round_state_continuity.feature new file mode 100644 index 00000000..ce9d1adc --- /dev/null +++ b/docs/issues/features/round_state_continuity.feature @@ -0,0 +1,68 @@ +@rounds @state +Feature: Round-over-round state continuity + The reward pool, uncaptured funds, and participant state + carry over correctly between rounds. No funds are lost + or double-counted during transitions. + + Background: + Given the autoresearch chart is deployed + And the reward pool percentage is 30% + + Rule: Uncaptured funds roll into the next round + + Scenario: Voided funds increase the next round's pool + Given round 1 had 100 USDC in the pool + And 70 USDC was captured to workers + And void() returned 30 USDC to the platform wallet + And 50 USDC of new x402 payments arrived during round 1 + When round 2 begins + Then the pool for round 2 is 15 USDC from new payments plus the 30 USDC rollover + And authorize() locks 45 USDC in escrow + + Scenario: Unadopted innovator share rolls into next round + Given round 1 had 20 USDC in the innovator pool + And algorithm "untested-v1" had 0% adoption + And 5 USDC of the innovator pool was unadopted + When round 2 begins + Then the unadopted 5 USDC is added to round 2's innovator pool + + Rule: Round transitions are atomic + + Scenario: No gap between rounds allows work to go unrecorded + Given round 1 is ending + And worker "0xAAA" submits a proof at the round boundary + When the round transitions + Then the proof is attributed to round 1 if submitted before the cutoff + Or attributed to round 2 if submitted after the cutoff + And the proof is never lost or double-counted + + Scenario: Authorize for new round happens after void of previous round + Given round 1 is completing + When captures and void are executed for round 1 + Then authorize() for round 2 is called only after void() confirms + And there is no period where two rounds have active escrow authorizations + + Rule: Worker state resets each round + + Scenario: Worker's influence is recalculated fresh each round + Given worker "0xAAA" had 80% influence in round 1 + And worker "0xBBB" joins in round 2 with equal qualifier count + When round 2 influence is calculated + Then round 1 influence values have no effect + And both workers compete on round 2 qualifiers only + + Scenario: Worker who was excluded in round N can rejoin in round N+1 + Given worker "0xCCC" failed verification in round 3 + And worker "0xCCC" received no capture in round 3 + When round 4 begins + Then worker "0xCCC" is eligible to submit benchmarks + And their round 3 failure does not affect round 4 influence + + Rule: Platform wallet balance is tracked across rounds + + Scenario: Cumulative earnings are auditable from on-chain events + Given 5 rounds have completed + When the audit script reads all authorize/capture/void/reclaim events + Then the sum of all captures equals total worker + innovator + operator payouts + And the sum of all voids equals total uncaptured rollover + And the platform wallet balance matches expected remainder diff --git a/docs/issues/issue-autoresearch-helm-chart.md b/docs/issues/issue-autoresearch-helm-chart.md new file mode 100644 index 00000000..87064d7d --- /dev/null +++ b/docs/issues/issue-autoresearch-helm-chart.md @@ -0,0 +1,1288 @@ +# Autoresearch infrastructure Helm chart with verified reward distribution + +## Summary + +Extract the autoresearch components from PR #288 into a standalone Helm chart that adds a round-based reward engine, commit-reveal work verification, and escrow-based reward settlement using the x402 Commerce Payments Protocol. This chart depends on the reth-erc8004-indexer (or its BaseScan/8004scan fallback) for worker discovery and on the base stack for x402 payment settlement. + +## Motivation + +PR #288 introduced the autoresearch coordinator, worker, and publish skills as embedded agent skills. This was the right starting point for validating the flow, but the economic layer — how workers get paid fairly, how results are verified, and how bad actors are penalized — is missing. + +Today's gaps: + +1. **Workers self-report val_bpb** — no independent verification of claimed results +2. **Direct 1:1 payment** — buyer pays seller, no reward pool or merit-based distribution +3. **No skin-in-the-game** — workers can submit garbage with no penalty +4. **Naive worker selection** — coordinator picks first available, not best performer +5. **No anti-monopoly** — a single well-resourced worker can capture all experiments +6. **Local provenance only** — results stored on disk, no on-chain attestation + +The autoresearch Helm chart addresses all six by adding infrastructure-level components that the skills can rely on. + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ obol-stack cluster │ +│ │ +│ ┌─────────────────┐ ┌───────────────────────────────────────┐ │ +│ │ base chart │ │ autoresearch chart │ │ +│ │ │ │ │ │ +│ │ traefik │ │ ┌─────────────┐ ┌───────────────┐ │ │ +│ │ x402-verifier │ │ │ reward │ │ challenge │ │ │ +│ │ ollama │ │ │ engine │ │ registry │ │ │ +│ │ litellm │ │ │ (per-round │ │ (configmap) │ │ │ +│ │ │ │ │ OPOW calc) │ │ │ │ │ +│ └────────┬────────┘ │ └──────┬──────┘ └───────────────┘ │ │ +│ │ │ │ │ │ +│ │ x402 │ ┌──────▼──────┐ ┌───────────────┐ │ │ +│ │ payments │ │ verifier │ │ escrow round │ │ │ +│ │ │ │ (commit- │ │ manager │ │ │ +│ │ │ │ reveal │ │ (authorize/ │ │ │ +│ │ │ │ proofs) │ │ capture/void)│ │ │ +│ │ │ └─────────────┘ └───────────────┘ │ │ +│ │ │ │ │ +│ │ └───────────────────────────────────────┘ │ +│ │ │ │ +│ ┌────────▼────────────┐ ┌───────▼───────────┐ │ +│ │ discovery │ │ GPU workers │ │ +│ │ (reth-indexer / │ │ (ServiceOffers │ │ +│ │ BaseScan / │ │ with x402 gate) │ │ +│ │ 8004scan) │ │ │ │ +│ └─────────────────────┘ └────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### Component Responsibilities + +**Reward Engine** — runs per-round reward calculation: +- Reads x402 payment logs from the round period +- Reads worker qualifier data from the verifier +- Computes influence per worker using OPOW-style parity formula +- Instructs the escrow round manager on per-worker capture amounts +- Exposes leaderboard API (GET /leaderboard, GET /round/:id) + +**Challenge Registry** — ConfigMap defining active challenges: +- Challenge parameters (quality metric, difficulty, tracks) +- Instance generation seeds (deterministic from block hash + nonce) +- Quality thresholds and verification parameters +- Lifecycle: challenges added/retired via values.yaml updates + +**Verifier** — commit-reveal work verification: +- Workers submit Merkle root of results before knowing which will be sampled +- Verifier randomly samples N nonces for re-execution +- Workers submit Merkle proofs for sampled nonces +- Proofs verified against committed root — unverified workers receive no capture + +**Escrow Round Manager** — manages per-round USDC settlement via x402 Commerce Payments: +- At round start: calls authorize() to lock the round's reward pool in escrow +- At round end: calls capture() per worker for their earned amount +- Uncaptured funds: void() returns them to the pool for the next round +- Safety net: reclaim() recovers funds if the manager fails +- See "Escrow-Based Reward Settlement" section below for full design + +## Escrow-Based Reward Settlement + +### Why Not a Custom Escrow Smart Contract + +A bespoke `USDCEscrow.sol` for worker bond/slash would require: +- Writing, auditing, and deploying a novel Solidity contract +- Taking on security liability for custom code handling real funds +- Requiring workers to lock upfront capital (barrier to entry) +- Managing adversarial slash mechanics (workers may refuse to participate) +- Paying gas costs for deposit/withdraw/slash per worker per round + +This is the highest-risk, highest-cost path. Instead, we use infrastructure that already exists and is already audited. + +### The x402 Commerce Payments Protocol + +Base (Coinbase) maintains the Commerce Payments Protocol — a set of audited smart contracts for escrow-based payments. These contracts are: + +- **Already deployed** on Base Mainnet and Base Sepolia at deterministic addresses +- **Audited 5 times**: 3x by Coinbase Protocol Security, 2x by Spearbit +- **Battle-tested**: used by Coinbase Commerce for merchant payments +- **Zero deployment cost**: we call existing contracts, not deploy new ones + +#### Deployed Contract Addresses (same on mainnet + sepolia) + +| Contract | Address | Purpose | +|----------|---------|---------| +| AuthCaptureEscrow | `0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff` | Core escrow: authorize/capture/void/reclaim/refund | +| ERC3009PaymentCollector | `0x0E3dF9510de65469C4518D7843919c0b8C7A7757` | Collects USDC via ERC-3009 receiveWithAuthorization | +| Permit2PaymentCollector | `0x992476B9Ee81d52a5BdA0622C333938D0Af0aB26` | Collects USDC via Permit2 signatures | +| PreApprovalPaymentCollector | `0x1b77ABd71FCD21fbe2398AE821Aa27D1E6B94bC6` | Pre-approved payment collection | +| OperatorRefundCollector | `0x934907bffd0901b6A21e398B9C53A4A38F02fa5d` | Handles refund flows | + +References: +- Contracts repo: https://github.com/base/commerce-payments +- x402 escrow scheme spec (WIP): https://github.com/coinbase/x402/pull/1425 +- x402 escrow scheme issue: https://github.com/coinbase/x402/issues/839 +- Reference TypeScript implementation: https://github.com/BackTrackCo/x402r-scheme (npm: @x402r/evm) + +#### AuthCaptureEscrow Lifecycle Functions + +```solidity +// Core data structure — all fields are client-signed, tamper-proof +struct PaymentInfo { + address operator; // who can capture/void/refund + address payer; // reward pool wallet + address receiver; // worker address (set per-capture) + address token; // USDC contract address + uint120 maxAmount; // total pool for this round + uint48 preApprovalExpiry; // deadline to submit authorization + uint48 authorizationExpiry; // deadline to capture; after this payer reclaims + uint48 refundExpiry; // deadline for post-capture refunds + uint16 minFeeBps; // fee floor (basis points) + uint16 maxFeeBps; // fee ceiling (basis points) + address feeReceiver; // platform fee recipient + uint256 salt; // unique per-round nonce +} + +// Round start: lock USDC in escrow +function authorize( + PaymentInfo calldata paymentInfo, + uint120 amount, // amount to lock (up to maxAmount) + address tokenCollector, // ERC3009PaymentCollector address + bytes calldata collectorData // ERC-3009 receiveWithAuthorization signature +) external nonReentrant; + +// Round end: pay the RewardDistributor the total earned amount +function capture( + PaymentInfo calldata paymentInfo, + uint120 amount, // total pool to distribute + uint16 feeBps, // platform fee (within signed bounds) + address feeReceiver // platform fee recipient +) external nonReentrant; +// IMPORTANT: receiver is FIXED in PaymentInfo. All captures from one +// authorize() go to the SAME receiver. For multi-worker distribution, +// set receiver = RewardDistributor contract, then call distribute(). +// Can be called MULTIPLE TIMES (partial captures), sum <= maxAmount. +// Must be called BEFORE authorizationExpiry. + +// Round end: return uncaptured funds to pool +function void( + PaymentInfo calldata paymentInfo +) external nonReentrant; +// Returns ALL remaining escrowed funds (capturableAmount) to payer. +// Can be called AFTER partial captures — only returns what remains. +// Callable by operator at any time (no expiry gate on void itself). + +// Safety net: payer self-recovers if operator disappears +function reclaim( + PaymentInfo calldata paymentInfo +) external nonReentrant; +// Only callable AFTER authorizationExpiry has passed +// No operator needed — payer calls directly + +// Post-capture correction: return captured funds +function refund( + PaymentInfo calldata paymentInfo, + uint120 amount, + address refundCollector, + bytes calldata collectorData +) external nonReentrant; +// Only within refundExpiry window +// amount <= refundableAmount (previously captured) +``` + +Expiry ordering enforced by contract: `preApprovalExpiry <= authorizationExpiry <= refundExpiry` + +### Inverted Trust Model — Why It's Better + +The Commerce Payments escrow protects the **payer** (the reward pool), not the service provider (the worker). This inversion is actually the superior design for our use case: + +``` +TRADITIONAL BOND MODEL (what we rejected): + Worker posts capital → Platform slashes on fraud → Worker loses money + Problems: + - Workers need upfront capital (barrier to entry) + - Slashing is adversarial (discourages participation) + - Custom contract (security liability) + - Gas per deposit/withdraw/slash + +INVERTED ESCROW MODEL (what we use): + Platform locks reward pool → Workers earn by doing verified work → Platform captures proportionally + Advantages: + - Workers need ZERO upfront capital + - "Penalty" for bad work = not getting paid (natural, non-adversarial) + - Uses 5x-audited contracts (zero security liability) + - Gas only for authorize (once/round) + capture (once/worker) + - reclaim() = platform safety net if manager crashes + - refund() = can recover funds if fraud discovered post-capture +``` + +The economic effect is equivalent: +- In the bond model: bad worker loses $X deposit +- In the escrow model: bad worker earns $0 while honest workers split their share +- Both create strong incentives for honest work, but the escrow model does it without requiring workers to risk capital + +### Per-Round Escrow Flow + +``` +ROUND START: + │ + │ 1. Reward engine calculates pool for this round + │ pool = sum(x402_payments_last_round) * pool_percentage + │ + │ 2. Platform wallet signs ERC-3009 receiveWithAuthorization + │ to: AuthCaptureEscrow (0xBdEA...0cff) + │ amount: pool_amount (e.g., 100 USDC) + │ validAfter: now + │ validBefore: now + round_duration + grace_period + │ + │ 3. Escrow round manager calls authorize() + │ maxAmount: pool_amount + │ authorizationExpiry: round_end + 1 hour (grace for computation) + │ refundExpiry: round_end + 24 hours (fraud discovery window) + │ operator: escrow_round_manager_address + │ + │ Funds are now LOCKED in TokenStore. Platform cannot spend them + │ on anything else. Workers can see the commitment on-chain. + │ + ▼ +DURING ROUND: + │ + │ Workers submit experiments via existing coordinator loop: + │ THINK → CLAIM → RUN → VERIFY (commit-reveal proofs) + │ + │ Verifier records qualifiers per worker per challenge. + │ No money moves during the round. + │ + ▼ +ROUND END: + │ + │ 4. Reward engine computes per-worker earnings: + │ influence[i] = OPOW_formula(qualifiers, parity_penalty) + │ reward[i] = worker_pool * influence[i] + │ + │ 5. Settlement via RewardDistributor: + │ NOTE: AuthCaptureEscrow receiver is FIXED per PaymentInfo. + │ All captures go to the SAME address. We set receiver = + │ RewardDistributor contract, then distribute to workers. + │ + │ a) capture(paymentInfo, worker_total + innovator_total + operator_total, + │ feeBps, feeReceiver) + │ → USDC moves from escrow to RewardDistributor + │ → Platform fee (e.g., 2%) to feeReceiver + │ b) RewardDistributor.distribute( + │ workers[], worker_amounts[], + │ innovators[], innovator_amounts[], + │ operator, operator_amount) + │ → ERC20 transfers from distributor to each participant + │ + │ 6. Return uncaptured funds: + │ void(paymentInfo) + │ → Remaining USDC (pool - captured) returns to platform + │ → void() has no expiry gate — callable any time by operator + │ → Rolls into next round's pool + │ + │ 7. If manager crashes or bugs out: + │ After authorizationExpiry, platform calls reclaim() + │ → ALL remaining escrowed funds return safely + │ → Reclaim does NOT affect already-captured amounts + │ → Already-captured funds are in RewardDistributor + │ (operator can call emergency withdraw if needed) + │ + ▼ +NEXT ROUND starts. Repeat from step 1. +``` + +### On-Chain Verification of Commitment + +Workers can independently verify that the reward pool is committed before doing work: + +``` +On-chain check (any worker, any block explorer): + 1. Read PaymentState for the current round's paymentInfo hash + 2. hasCollectedPayment == true → funds are locked + 3. capturableAmount == expected pool size → correct amount + 4. authorizationExpiry > current block → still active + +This makes the reward commitment credible and transparent without +any trust in the platform beyond the audited contract logic. +``` + +## x402-rs Implications + +### Current State + +x402-rs (v1.4.5) ships the `exact` and `upto` schemes but has **zero escrow support** today. No branch, no issue, no WIP code tracking the upstream escrow scheme. + +However, x402-rs has an excellent scheme extension system designed for exactly this kind of addition. + +### Scheme Extension Architecture + +x402-rs uses a trait-based plugin system for payment schemes: + +``` +Three core traits (from x402-rs/crates/core/): + + X402SchemeId + → identifies scheme: namespace ("eip155") + scheme name ("escrow") + + X402SchemeFacilitatorBuilder

+ → factory that creates scheme handlers from JSON config + + X402SchemeFacilitator + → verify(payload, requirements) → VerifyResult + → settle(payload, requirements) → SettleResult + → supported(requirements) → bool + +Registration: + SchemeBlueprints registry → SchemeRegistry at runtime + New schemes register via: blueprints.and_register(V2Eip155Escrow) + +Config-driven activation: + {"id": "v2-eip155-escrow", "chains": "eip155:*", "config": {...}} +``` + +Reference: `x402-rs/docs/how-to-write-a-scheme.md` provides a step-by-step guide. + +### What x402-rs Needs for Escrow + +Once the upstream x402 escrow scheme spec (PR #1425) stabilizes, x402-rs needs: + +``` +New directory: + crates/chains/x402-chain-eip155/src/v2_eip155_escrow/ + ├── mod.rs # scheme registration + ├── types.rs # PaymentInfo, PaymentState, escrow extras + ├── facilitator.rs # verify + settle (authorize/capture/void) + ├── client.rs # sign ERC-3009 for escrow + └── server.rs # generate PaymentRequirements with escrow fields + +Key implementation points: + + verify(): + - Validate ERC-3009 signature for the authorized amount + - Simulate authorize() call against AuthCaptureEscrow + - Check operator, expiries, fee bounds match requirements + - Verify payer has sufficient USDC balance + allowance + + settle(): + - Determine settlement method from requirements.extra: + "authorize" → call operator.authorize() + "capture" → call operator.capture() + "void" → call operator.void() + - Submit transaction, wait for receipt (60s timeout) + - Return tx hash + network + payer address + + supported(): + - Check chain ID matches configured chains + - Check scheme name == "escrow" (or "commerce" if renamed) + +Registration (in facilitator/src/schemes.rs): + blueprints.and_register(V2Eip155Escrow) + // ~15 lines of boilerplate, same pattern as exact/upto +``` + +### Stateful vs Stateless Facilitator + +Current x402-rs facilitator is **stateless** — each verify/settle is independent. The escrow scheme introduces a **session concept** (authorize → use → capture/void) that spans multiple HTTP requests. + +Two approaches discussed in upstream x402 issue #839: + +``` +"Dumb facilitator" (recommended, aligns with x402-rs): + - Facilitator remains stateless + - Session tracking happens in the escrow round manager (our Helm chart) + - Facilitator only handles individual authorize/capture/void calls + - Each call is self-contained (paymentInfo hash identifies the session) + - No facilitator-side state storage needed + +"Smart facilitator" (rejected): + - Facilitator tracks session lifecycle internally + - Requires persistent state, adds complexity + - Goes against x402 principle of minimal facilitator trust +``` + +The "dumb facilitator" approach means x402-rs needs **no architectural changes** to its core — the escrow scheme handler is just another verify/settle implementation, same as exact. The session lifecycle lives in our escrow round manager, not in the facilitator. + +### Contribution Path + +This represents a concrete contribution opportunity for the obol-stack team back to x402-rs: + +``` +Phase 1: Use reference impl directly + - The BackTrackCo/x402r-scheme npm package implements the escrow + scheme for TypeScript x402 clients + - Our escrow round manager can call Commerce Payments contracts + directly via ethers/viem without going through x402-rs facilitator + - This works TODAY, no upstream dependency + +Phase 2: Port to x402-rs (contribute upstream) + - Once PR #1425 merges and the spec stabilizes + - Implement V2Eip155Escrow scheme in x402-rs + - Estimated effort: 2-3 days for a Rust developer + - The "upto" scheme implementation (variable settlement amounts) + is the closest analog and provides the template + - File path: crates/chains/x402-chain-eip155/src/v2_eip155_escrow/ + - Submit as PR to x402-rs/x402-rs + +Phase 3: Native x402 flow + - Once x402-rs ships escrow scheme support + - Escrow round manager uses x402 HTTP flow natively: + 402 response with scheme="escrow" → client signs → facilitator settles + - The entire authorize/capture/void flow goes through standard + x402 payment headers, same as current per-request payments + - Workers see escrow commitments as standard x402 PaymentRequirements +``` + +### Dependency Timeline + +``` +TODAY: PR #1425 is open, spec under review + Commerce Payments contracts are deployed and audited + Reference impl exists (BackTrackCo/x402r-scheme) + → We can build Phase 1 NOW + +WEEKS: PR #1425 merges (spec only, no SDK code) + → Spec is stable, safe to build Phase 2 + +MONTHS: x402-rs adds escrow scheme + → Phase 3, native flow + +Our Helm chart should work in ALL three phases: + values.yaml: + escrow: + mode: direct # Phase 1: call contracts directly + # mode: x402-rs # Phase 3: use x402 facilitator +``` + +## Reward Distribution — Detailed Design + +### Round Lifecycle + +``` +Round N starts + │ + ├─ Escrow round manager calls authorize() + │ └─ USDC locked in Commerce Payments escrow + │ + ├─ Workers submit experiments (x402 paid per-request as today) + │ └─ Each submission: precommit → benchmark → proof + │ + ├─ Verifier checks proofs, records qualifiers + │ + ├─ Round N ends (configurable duration, default: 1 hour) + │ + ├─ Reward engine runs: + │ │ + │ ├─ 1. Collect x402 payment totals from round + │ │ (total_pool = sum of payments * pool_percentage) + │ │ + │ ├─ 2. For each worker, compute challenge factors: + │ │ factor[c] = worker_qualifiers[c] / total_qualifiers[c] + │ │ + │ ├─ 3. Compute parity (anti-monopoly): + │ │ weighted_avg = mean(factors, weights) + │ │ variance = weighted_var(factors, weights) + │ │ imbalance = variance / (weighted_avg * (1 - weighted_avg)) + │ │ penalty = exp(-imbalance_multiplier * imbalance) + │ │ + │ ├─ 4. Compute influence: + │ │ weight[i] = weighted_avg[i] * penalty[i] + │ │ influence[i] = weight[i] / sum(weights) + │ │ + │ ├─ 5. Split pool: + │ │ innovator_pool = total_pool * innovator_share (e.g., 20%) + │ │ worker_pool = total_pool * worker_share (e.g., 70%) + │ │ operator_pool = total_pool * operator_share (e.g., 10%) + │ │ + │ ├─ 6. Settle via RewardDistributor: + │ │ capture(paymentInfo, total_distributable, feeBps, feeReceiver) + │ │ → single capture to RewardDistributor (receiver is fixed per PaymentInfo) + │ │ + │ └─ 7. Distribute from RewardDistributor: + │ RewardDistributor.distribute(workers[], amounts[], innovators[], ...) + │ → ERC20 transfers to each worker (influence-weighted) + │ → ERC20 transfers to each innovator (adoption-weighted, + │ only for algorithms above adoption_threshold or merged) + │ → Unadopted innovator share held in distributor for next round + │ + ├─ Escrow round manager calls void() + │ └─ Remaining USDC returns to pool wallet + │ + └─ Round N+1 starts +``` + +### Anti-Monopoly Formula + +> **Design note:** This formula is inspired by OPOW (Optimizable Proof of Work) +> research but is a **deliberate simplification** for our use case. It omits +> several mechanisms present in the full OPOW specification — specifically: +> challenge-factor weighting, capped self/delegated deposit factors, legacy +> track multipliers, and phase-in blending for new challenges. These are +> omitted because obol-stack uses USDC (not a staking token with weighted +> deposits) and starts with a small challenge set where these refinements +> add complexity without proportionate benefit. As the challenge set grows +> beyond 4+ challenges, these mechanisms should be revisited. + +Workers MUST participate across ALL active challenges to earn maximum rewards. Concentrating on a single challenge triggers an exponential penalty: + +``` +Given: + N challenges, worker i has qualifier fraction f[c] in each challenge c + w[c] = weight per challenge = 1/N (equal) + +Compute: + avg_i = Σ(w[c] * f_i[c]) + var_i = Σ(w[c] * (f_i[c] - avg_i)²) + imb_i = var_i / (avg_i * (1 - avg_i)) max value = 1.0 + pen_i = e^(-k * imb_i) where k = configurable multiplier + + influence_i = normalize(avg_i * pen_i) + +Effect (with k = 3.0): + Worker spreading effort across 4 challenges equally: + f = [0.25, 0.25, 0.25, 0.25] → imbalance = 0 → k*imb = 0 → penalty = 1.0 + + Worker concentrating on 1 challenge: + f = [1.0, 0.0, 0.0, 0.0] → imbalance = 1.0 → k*imb = 3.0 → penalty ≈ 0.05 + + The concentrated worker earns ~5% of what the diversified worker earns + despite producing the same total output. + +Verified penalty values for 2 challenges (k = 3.0): + f = [0.50, 0.50] → imb = 0.00 → penalty = 1.00 + f = [0.70, 0.30] → imb = 0.16 → penalty = 0.62 + f = [0.90, 0.10] → imb = 0.64 → penalty = 0.15 + f = [1.00, 0.00] → imb = 1.00 → penalty = 0.05 + +Note: with a single active challenge, imbalance is forced to 0.0 +(avg approaches 1.0, denominator approaches 0, clamped by epsilon). +All workers get penalty = 1.0 regardless. The parity mechanism +activates only with ≥2 challenges. +``` + +### Commit-Reveal Verification + +> **Design note:** This verification flow is inspired by OPOW proof-of-work +> verification but uses stratified sampling (above/below median quality +> regions) rather than flat random sampling. Flat sampling is easier to game +> by hiding low-quality work in unsampled bundles. Stratified sampling ensures +> both high-quality and low-quality results are checked. + +``` +Step 1: PRECOMMIT + Worker → Verifier: {challenge_id, settings, num_nonces} + Verifier → Worker: {benchmark_id, rand_hash, track_id} + Worker pays: base_fee + per_nonce_fee * num_nonces (via x402 exact scheme) + +Step 2: BENCHMARK + Worker runs all nonces, builds Merkle tree over results + Worker → Verifier: {merkle_root, solution_quality[]} + Verifier computes median quality across all nonces + Verifier performs STRATIFIED sampling: + - samples_above_median nonces from the above-median quality set + - samples_below_median nonces from the below-median quality set + This ensures both high- and low-quality results are verified, + preventing workers from hiding bad results in one quality tier. + Verifier → Worker: {sampled_nonces: [above1..., below1...]} + +Step 3: PROOF + Worker → Verifier: {merkle_proofs for sampled nonces} + Each proof: {nonce, solution, runtime_hash, quality} + Verifier checks: + - proof.nonce ∈ sampled_nonces + - hash(proof) produces leaf in Merkle tree + - Merkle branch validates against committed root + - solution quality matches claimed quality (re-execute) + + If any proof fails → worker is NOT a qualifier → no capture for them + If worker times out on proofs → same result, no capture + + Note: no slashing occurs. The penalty is simply not earning. + This is enforced by the escrow — uncaptured funds void() back to pool. +``` + +## Helm Chart Structure + +``` +charts/autoresearch/ +├── Chart.yaml +├── values.yaml +├── templates/ +│ ├── _helpers.tpl +│ ├── reward-engine-deployment.yaml +│ ├── reward-engine-service.yaml +│ ├── verifier-deployment.yaml +│ ├── verifier-service.yaml + ├── escrow-round-manager-deployment.yaml + ├── reward-distributor-configmap.yaml # RewardDistributor contract address + ├── challenge-registry-configmap.yaml + ├── commerce-payments-configmap.yaml # AuthCaptureEscrow + collector addresses +│ ├── servicemonitor.yaml # Prometheus metrics +│ └── tests/ +│ ├── test-reward-engine.yaml +│ ├── test-verifier.yaml +│ └── test-escrow-round-manager.yaml +└── scripts/ + ├── rewards.py # OPOW influence + pool distribution + ├── opow.py # Parity formula, imbalance calculation + ├── verifier.py # Commit-reveal, Merkle proof checking + └── escrow_round_manager.py # authorize/capture/void lifecycle +``` + +### values.yaml + +```yaml +rounds: + duration: 3600 # 1 hour per round + overlap: 0 # no overlapping rounds + +rewards: + poolPercentage: 0.30 # 30% of x402 payments go to reward pool + distribution: + innovators: 0.20 # 20% to algorithm authors + workers: 0.70 # 70% to GPU workers (OPOW) + operators: 0.10 # 10% to infrastructure operators + gamma: # reward scaling by active challenges + a: 1.0 + b: 0.5 + c: 0.3 + imbalanceMultiplier: 3.0 # parity penalty strength + +verification: + samplesAboveMedian: 3 # nonces sampled from above-median quality + samplesBelowMedian: 2 # nonces sampled from below-median quality + minActiveQuality: 3.5 # minimum val_bpb to qualify + proofTimeout: 300 # seconds to submit proofs + adoptionThreshold: 0.01 # minimum adoption fraction to earn innovator rewards + +escrow: + mode: direct # direct | x402-rs (Phase 1 vs Phase 3) + chain: base-sepolia # chain for Commerce Payments contracts + # AuthCaptureEscrow — same address on mainnet + sepolia + authCaptureEscrow: "0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff" + # ERC3009PaymentCollector — same address on mainnet + sepolia + erc3009Collector: "0x0E3dF9510de65469C4518D7843919c0b8C7A7757" + # OperatorRefundCollector — same address on mainnet + sepolia + refundCollector: "0x934907bffd0901b6A21e398B9C53A4A38F02fa5d" + # RewardDistributor — deployed per-cluster, receives all captures + # then distributes to individual workers/innovators via ERC20 transfers. + # Required because AuthCaptureEscrow receiver is fixed per PaymentInfo. + rewardDistributor: "" # set after deploying the distributor contract + # USDC contract addresses + usdcAddress: + base: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913" + base-sepolia: "0x036CbD53842c5426634e7929541eC2318f3dCF7e" + # Escrow timing + authorizationGrace: 3600 # 1 hour after round end to capture + refundWindow: 86400 # 24 hours post-capture for refunds + # Platform fee + feeBps: 200 # 2% platform fee on captures + # Operator wallet (calls capture/void) + operatorWallet: "" # set by operator, signs capture txs + +challenges: + - name: neuralnet_optimizer + qualityMetric: val_bpb + qualityDirection: lower_is_better + minActiveQuality: 3.5 + baseFee: "0.001" # USDC per benchmark precommit + perNonceFee: "0.0001" # USDC per nonce + oasfSkill: "devops_mlops/model_versioning" + tracks: + default: + noncesPerBundle: 10 + maxQualifiersPerTrack: 100 + +discovery: + preferredBackend: auto # auto | reth | basescan | 8004scan + rethIndexerUrl: "" # auto-detected from cluster if empty + basescanApiKey: "" + cacheTtlSeconds: 300 + +leaderboard: + enabled: true + port: 8080 + retentionRounds: 100 # keep last 100 rounds of history + +image: + repository: ghcr.io/obolnetwork/autoresearch + tag: latest +``` + +## Dependency Chain + +```yaml +# Chart.yaml +apiVersion: v2 +name: autoresearch +version: 0.1.0 +dependencies: + - name: reth-erc8004-indexer + version: ">=0.1.0" + repository: "file://../reth-erc8004-indexer" + condition: discovery.preferredBackend == "reth" # optional dependency +``` + +The autoresearch chart MUST work without the reth-indexer installed (falls back to BaseScan/8004scan). The indexer is a "nice to have" for operators who want real-time, self-hosted discovery. + +## Relationship to Existing Skills + +The embedded skills in PR #288 remain as the **agent-facing interface**. The Helm chart provides the **infrastructure** those skills rely on: + +``` +SKILL USES FROM CHART +───────────────────────────────────────────────────── +autoresearch-coordinator reward-engine API (leaderboard) + coordinate.py verifier API (commit-reveal) + discovery client (worker lookup) + +autoresearch-worker escrow round manager (round status) + worker_api.py verifier API (proof submission) + +autoresearch (publish) reward-engine API (earnings query) + publish.py (mostly unchanged) +``` + +The coordinator's loop becomes: + +``` +THINK → pick hypothesis +CLAIM → discover worker via FallbackClient +CHECK → verify round escrow is authorized (on-chain) ← NEW +RUN → submit experiment, pay via x402 exact scheme +VERIFY → commit-reveal proof cycle ← NEW +SCORE → verifier records qualifiers ← NEW +REWARD → round-end OPOW distribution via capture() ← NEW +``` + +## Test Plan + +### Unit tests + +- [ ] `opow.py`: imbalance calculation with known inputs +- [ ] `opow.py`: parity penalty with balanced vs concentrated workers +- [ ] `opow.py`: influence normalization sums to 1.0 +- [ ] `rewards.py`: pool split matches configured percentages +- [ ] `rewards.py`: adoption-weighted innovator distribution +- [ ] `rewards.py`: gamma scaling with 1, 3, 7 active challenges +- [ ] `verifier.py`: Merkle tree construction and proof verification +- [ ] `verifier.py`: random nonce sampling is deterministic from seed +- [ ] `verifier.py`: unverified workers excluded from qualifiers +- [ ] `escrow_round_manager.py`: authorize() call construction +- [ ] `escrow_round_manager.py`: capture() per-worker amount calculation +- [ ] `escrow_round_manager.py`: void() after all captures +- [ ] `escrow_round_manager.py`: reclaim() on manager timeout/crash + +### Integration tests + +- [ ] Full round lifecycle: authorize → precommit → benchmark → proof → capture → void +- [ ] Worker with valid proofs receives capture proportional to influence +- [ ] Worker with invalid proofs gets no capture, funds void back to pool +- [ ] Anti-monopoly: worker in 1/4 challenges earns <<< worker in 4/4 challenges +- [ ] Innovator whose algorithm is adopted earns proportional to adoption +- [ ] Leaderboard API returns correct rankings after round completion +- [ ] Round transitions correctly (no double-counting, no missed payments) +- [ ] reclaim() works after authorizationExpiry if manager fails mid-round +- [ ] refund() works within refundExpiry window for post-capture corrections +- [ ] On-chain escrow state matches expected capturableAmount at each step + +### Integration tests (Commerce Payments on Anvil fork) + +- [ ] authorize() locks correct USDC amount in TokenStore +- [ ] Multiple capture() calls to different workers succeed +- [ ] sum(captures) cannot exceed maxAmount +- [ ] capture() fails after authorizationExpiry +- [ ] void() returns remaining funds to payer +- [ ] reclaim() works only after authorizationExpiry +- [ ] refund() works within refundExpiry window +- [ ] ERC-3009 nonce prevents replay of authorization + +### BDD scenarios + +```gherkin +Scenario: Honest worker earns proportional reward + Given a round with 100 USDC authorized in escrow + And worker A has 60% influence and worker B has 40% + When the round completes and captures are executed + Then worker A receives 42 USDC (60% of 70% worker pool) + And worker B receives 28 USDC (40% of 70% worker pool) + And 30 USDC is captured for innovators (20%) and operators (10%) + And void() returns 0 USDC (fully distributed) + +Scenario: Concentrated worker is penalized + Given challenges X and Y are both active + And worker A submits 100 proofs to X and 100 to Y + And worker B submits 200 proofs to X and 0 to Y + When influence is calculated + Then worker A influence > worker B influence + And worker B penalty factor < 0.10 + +Scenario: Worker fails verification — no capture + Given worker C submits a benchmark with invalid Merkle proofs + When the verifier checks the proofs + Then worker C is excluded from qualifiers + And no capture() is called for worker C + And worker C's share remains in escrow + And void() returns worker C's unclaimed share to pool + +Scenario: Manager crash — funds are safe + Given a round with 100 USDC authorized in escrow + And the escrow round manager crashes mid-round + When authorizationExpiry passes + Then the platform wallet calls reclaim() + And all 100 USDC returns to the platform wallet + And no funds are permanently locked +``` + +## Migration from PR #288 + +Files that stay in PR #288 (base sell/buy flow): +- `cmd/obol/sell.go` — sell command +- `internal/inference/store.go` — provenance types +- `internal/schemas/payment.go` — x402 payment parsing +- `flows/flow-06,07,08,10` — sell/buy flow tests +- `ralph-m1.md` — sell flow validation + +Files that move to this PR: +- `internal/embed/skills/autoresearch-coordinator/` — coordinator skill +- `internal/embed/skills/autoresearch-worker/` — worker skill +- `internal/embed/skills/autoresearch/` — publish skill +- `ralph-m2.md` — autoresearch validation (becomes test plan reference) +- `Dockerfile.worker` — worker container image +- `tests/test_autoresearch_worker.py` — worker tests + +New files: +- `charts/autoresearch/` — full Helm chart as described above +- `internal/discovery/` — shared with indexer PR (or imported) + +## Open Questions + +1. **Round duration**: 1 hour seems right for autoresearch experiments (5-10 min each, gives workers time for multiple submissions). Need to validate with real GPU workloads. + +2. **Pool funding source**: Currently "30% of x402 payments." Alternative: fixed per-round emission funded by the operator (simpler but requires operator treasury). Or a hybrid — operator seeds the pool, x402 payments top it up. + +3. **Innovator identity**: How does an algorithm author register? Options: (a) anyone who submits a train.py that gets adopted, (b) explicit registration via ERC-8004 with algorithm metadata, (c) git-based — author identified by commit signature in provenance. + +4. **Multi-challenge readiness**: Currently only `neuralnet_optimizer` (val_bpb) exists. The parity formula needs ≥2 challenges to be meaningful. Second challenge candidates: inference latency optimization, model compression ratio, data preprocessing throughput. + +5. **x402 escrow scheme timeline**: PR #1425 is under active review but spec-only. The reference implementation exists at BackTrackCo/x402r-scheme. We should build Phase 1 (direct contract calls) now and add Phase 3 (native x402-rs flow) when the ecosystem catches up. This is a clear contribution opportunity for obol-stack back to x402-rs. + +6. **Operator wallet security**: The operator wallet can call capture/void/refund. It should be a multisig or hardware wallet, not a hot key in a ConfigMap secret. Consider integration with the existing Secure Enclave key support in obol-stack's sell command. + +## References + +- Commerce Payments Protocol: https://github.com/base/commerce-payments +- x402 escrow scheme spec PR: https://github.com/coinbase/x402/pull/1425 +- x402 escrow scheme discussion: https://github.com/coinbase/x402/issues/839 +- Reference TypeScript implementation: https://github.com/BackTrackCo/x402r-scheme +- x402-rs scheme extension guide: https://github.com/x402-rs/x402-rs/blob/main/docs/how-to-write-a-scheme.md +- x402-rs facilitator scheme registration: https://github.com/x402-rs/x402-rs/tree/main/crates/facilitator/src +- AuthCaptureEscrow on BaseScan: https://basescan.org/address/0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff +- ERC-8004 Identity Registry on BaseScan: https://basescan.org/token/0x8004A169FB4a3325136EB29fA0ceB6D2e539a432 +- BaseScan ERC-8004 metadata announcement: https://x.com/etherscan/status/2037131140608434517 + +## Gherkin Feature Specifications + +The following `.feature` files define the executable BDD specifications for the autoresearch economic layer. Each feature covers one bounded area of behavior using declarative, domain-level language. Step definitions target the autoresearch Python scripts and the Commerce Payments contracts on an Anvil fork. + +### Feature: Escrow Round Lifecycle + +```gherkin +@escrow @critical +Feature: Escrow round lifecycle + The escrow round manager locks USDC in the Commerce Payments + AuthCaptureEscrow contract at the start of each round and + distributes earnings to verified workers at round end. + + Background: + Given the autoresearch chart is deployed on a k3s cluster + And an Anvil fork of Base Sepolia is running + And the platform wallet holds 1000 USDC + And the AuthCaptureEscrow contract is at "0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff" + And the reward pool percentage is 30% + + Rule: Funds must be locked before any work begins + + Scenario: Round starts with successful escrow authorization + Given 200 USDC of x402 payments were collected in the previous round + When a new round begins + Then the escrow round manager calls authorize() for 60 USDC + And the AuthCaptureEscrow capturableAmount equals 60 USDC + And the authorizationExpiry is set to round end plus 1 hour grace + And workers can verify the commitment on-chain + + Scenario: Round start fails when platform wallet has insufficient USDC + Given the platform wallet holds 0 USDC + When a new round begins + Then the escrow round manager logs an authorization failure + And no work is accepted for this round + And the previous round's uncaptured funds are not affected + + Rule: Workers are paid proportionally to verified influence + + Scenario: Two verified workers receive proportional captures + Given a round with 100 USDC authorized in escrow + And worker "0xAAA" has 60% influence + And worker "0xBBB" has 40% influence + And both workers passed commit-reveal verification + When the round completes + Then capture() is called for "0xAAA" with 42 USDC + And capture() is called for "0xBBB" with 28 USDC + And the platform fee receiver gets 2% of each capture + And void() is called for the remaining 30 USDC + And the remaining USDC returns to the platform wallet + + Scenario: Unverified worker receives no capture + Given a round with 100 USDC authorized in escrow + And worker "0xAAA" passed verification with 50% influence + And worker "0xCCC" failed commit-reveal verification + When the round completes + Then capture() is called for "0xAAA" with 35 USDC + And no capture() is called for "0xCCC" + And void() returns 65 USDC to the platform wallet + + Scenario: Round with no verified workers voids entirely + Given a round with 100 USDC authorized in escrow + And no workers submitted valid proofs + When the round completes + Then void() is called + And the full 100 USDC returns to the platform wallet + + Rule: Funds are always recoverable + + Scenario: Platform reclaims funds after manager crash + Given a round with 100 USDC authorized in escrow + And the escrow round manager process has crashed + When the authorizationExpiry passes + Then the platform wallet calls reclaim() directly + And the full 100 USDC returns to the platform wallet + And no operator signature is required + + Scenario: Operator refunds a worker after post-capture fraud discovery + Given worker "0xAAA" received a 42 USDC capture in round 5 + And fraud is discovered within the refund window + When the operator calls refund() for 42 USDC + Then 42 USDC returns to the platform wallet + And the refund is recorded in the round history +``` + +### Feature: OPOW Influence and Anti-Monopoly + +```gherkin +@opow @critical +Feature: OPOW influence calculation with anti-monopoly parity + The reward engine computes per-worker influence using a parity + formula that penalizes concentration on a single challenge. + Workers must diversify across all active challenges to maximize + their earnings. + + Background: + Given the imbalance multiplier is 3.0 + + Rule: Diversified workers earn more than concentrated workers + + Scenario: Equally diversified worker has zero penalty + Given 4 active challenges + And worker "0xAAA" has qualifier fractions: + | challenge | fraction | + | c001 | 0.25 | + | c002 | 0.25 | + | c003 | 0.25 | + | c004 | 0.25 | + When influence is calculated + Then worker "0xAAA" imbalance is 0.0 + And worker "0xAAA" penalty factor is 1.0 + + Scenario: Fully concentrated worker is severely penalized + Given 4 active challenges + And worker "0xBBB" has qualifier fractions: + | challenge | fraction | + | c001 | 1.00 | + | c002 | 0.00 | + | c003 | 0.00 | + | c004 | 0.00 | + When influence is calculated + Then worker "0xBBB" imbalance is 3.0 + And worker "0xBBB" penalty factor is less than 0.05 + + Scenario: Concentrated worker earns less despite equal total output + Given 2 active challenges and a worker pool of 100 USDC + And worker "0xAAA" submitted 50 proofs to c001 and 50 to c002 + And worker "0xBBB" submitted 100 proofs to c001 and 0 to c002 + When influence is calculated and rewards are distributed + Then worker "0xAAA" earns more than worker "0xBBB" + And the ratio of earnings exceeds 5:1 + + Scenario Outline: Parity penalty scales with concentration + Given 2 active challenges + And a worker has qualifier fractions and + When influence is calculated + Then the penalty factor is approximately + + Examples: + | f1 | f2 | penalty | + | 0.50 | 0.50 | 1.00 | + | 0.70 | 0.30 | 0.65 | + | 0.90 | 0.10 | 0.11 | + | 1.00 | 0.00 | 0.05 | + + Rule: Influence values are normalized across all workers + + Scenario: Total influence sums to 1.0 + Given 3 workers with varying qualifier fractions + When influence is calculated for all workers + Then the sum of all influence values equals 1.0 + + Scenario: Single worker in a round gets full influence + Given 1 worker who participated in all active challenges + When influence is calculated + Then that worker's influence is 1.0 + And they receive the entire worker pool +``` + +### Feature: Commit-Reveal Work Verification + +```gherkin +@verification @critical +Feature: Commit-reveal work verification + Workers commit to results via a Merkle root before learning + which nonces will be sampled. This prevents retroactive + fabrication of results. + + Background: + Given the verifier is running + And the neuralnet_optimizer challenge is active + And the sample count is 5 nonces per benchmark + + Rule: Honest workers pass verification + + Scenario: Worker with valid proofs becomes a qualifier + Given worker "0xAAA" precommits a benchmark with 100 nonces + And the verifier assigns a random hash and track + When worker "0xAAA" submits a Merkle root over 100 results + And the verifier samples 5 nonces for verification + And worker "0xAAA" submits valid Merkle proofs for all 5 + Then worker "0xAAA" is recorded as a qualifier + And the benchmark quality scores are accepted + + Scenario: Re-execution confirms claimed quality + Given worker "0xAAA" claims val_bpb of 3.2 for nonce 42 + When the verifier re-executes nonce 42 with the same settings + Then the re-executed val_bpb matches the claimed 3.2 + And the proof is accepted + + Rule: Dishonest workers fail verification + + Scenario: Invalid Merkle proof is rejected + Given worker "0xCCC" submitted a Merkle root + And the verifier sampled nonces [7, 23, 45, 61, 89] + When worker "0xCCC" submits a proof for nonce 23 that does not match the root + Then the verification fails for worker "0xCCC" + And worker "0xCCC" is excluded from qualifiers for this round + And no escrow capture is made for worker "0xCCC" + + Scenario: Worker who inflates quality scores is caught + Given worker "0xCCC" claims val_bpb of 2.8 for nonce 42 + When the verifier re-executes nonce 42 with the same settings + And the re-executed val_bpb is 3.5 + Then the quality mismatch is detected + And the verification fails for worker "0xCCC" + + Scenario: Worker who times out on proof submission is excluded + Given worker "0xCCC" submitted a Merkle root + And the verifier sampled 5 nonces + When worker "0xCCC" does not submit proofs within 300 seconds + Then worker "0xCCC" is excluded from qualifiers + And the round proceeds without them + + Rule: Sampling is fair and deterministic + + Scenario: Nonce sampling is deterministic from the round seed + Given the same benchmark settings and random hash + When nonces are sampled twice + Then the same 5 nonces are selected both times + + Scenario: Worker cannot predict which nonces will be sampled + Given the random hash is derived from a future block hash + When the worker commits their Merkle root + Then the sampled nonces have not yet been determined +``` + +### Feature: Reward Pool Distribution + +```gherkin +@rewards +Feature: Reward pool distribution across roles + The reward engine splits the pool among innovators, workers, + and operators according to configured percentages. Worker + distribution is influence-weighted. Innovator distribution + is adoption-weighted. + + Background: + Given the pool split is 20% innovators, 70% workers, 10% operators + And a round with 100 USDC in the reward pool + + Rule: Pool splits match configured percentages + + Scenario: Standard round distributes to all three roles + When the round completes with verified workers + Then 20 USDC is allocated to innovators + And 70 USDC is allocated to workers + And 10 USDC is allocated to operators + + Rule: Workers earn by influence + + Scenario: Workers are paid proportionally to influence + Given the worker pool is 70 USDC + And worker "0xAAA" has influence 0.6 + And worker "0xBBB" has influence 0.4 + When worker rewards are distributed + Then worker "0xAAA" earns 42 USDC + And worker "0xBBB" earns 28 USDC + + Rule: Innovators earn by adoption + + Scenario: Algorithm author earns when workers adopt their code + Given the innovator pool is 20 USDC for the neuralnet_optimizer challenge + And algorithm "fast-muon-v3" by innovator "0xINN1" has 75% adoption + And algorithm "baseline-adamw" by innovator "0xINN2" has 25% adoption + When innovator rewards are distributed + Then innovator "0xINN1" earns 15 USDC + And innovator "0xINN2" earns 5 USDC + + Scenario: Unadopted algorithm earns nothing + Given the innovator pool is 20 USDC + And algorithm "untested-v1" has 0% adoption + When innovator rewards are distributed + Then the author of "untested-v1" earns 0 USDC + And the unadopted share rolls into the next round + + Rule: Gamma scaling adjusts for challenge count + + Scenario Outline: Reward scales with number of active challenges + Given gamma parameters a=1.0, b=0.5, c=0.3 + And challenges are active + When the gamma value is calculated + Then the scaling factor is approximately + + Examples: + | n | gamma | + | 1 | 0.63 | + | 3 | 0.80 | + | 7 | 0.94 | +``` + +### Feature: Worker Discovery and Fallback + +```gherkin +@discovery +Feature: Multi-tier worker discovery with fallback + The coordinator discovers GPU workers through a prioritized + chain of discovery backends. If the preferred backend is + unavailable, it falls back to the next tier automatically. + + Background: + Given the OASF skill filter is "devops_mlops/model_versioning" + + Rule: Discovery uses the highest-priority available backend + + Scenario: Coordinator uses Reth indexer when available + Given the reth-erc8004-indexer is deployed in the cluster + And the indexer has synced past the latest registration + When the coordinator discovers workers + Then the query goes to the Reth indexer API + And workers with the model_versioning skill are returned + + Scenario: Coordinator falls back to BaseScan when indexer is down + Given the reth-erc8004-indexer is not deployed + And a BaseScan API key is configured + When the coordinator discovers workers + Then the query goes to the BaseScan API + And ERC-8004 NFT metadata is read for each agent + And workers with the model_versioning skill are returned + + Scenario: Coordinator falls back to 8004scan as last resort + Given the reth-erc8004-indexer is not deployed + And no BaseScan API key is configured + When the coordinator discovers workers + Then the query goes to 8004scan.io + And workers with the model_versioning skill are returned + + Scenario: All backends unavailable produces a clear error + Given no discovery backends are reachable + When the coordinator discovers workers + Then a "no discovery backend available" error is returned + And the round proceeds with zero workers + + Rule: Discovery results are cached to reduce API calls + + Scenario: Repeated queries within TTL use cached results + Given the cache TTL is 300 seconds + And a discovery query succeeded 60 seconds ago + When the coordinator discovers workers again + Then no external API call is made + And the cached results are returned +``` + +### Feature: End-to-End Round + +```gherkin +@e2e @slow +Feature: End-to-end autoresearch round + A complete round from escrow authorization through worker + experiments to reward distribution and settlement. + + Background: + Given the autoresearch chart is deployed with default values + And an Anvil fork of Base Sepolia is running + And the platform wallet holds 500 USDC + And 2 GPU workers are registered on ERC-8004: + | address | skill | gpu | + | 0xW001 | devops_mlops/model_versioning | NVIDIA T4 | + | 0xW002 | devops_mlops/model_versioning | NVIDIA A10 | + And 1 innovator submitted algorithm "muon-opt-v2" for neuralnet_optimizer + + Scenario: Complete round with two honest workers + # Round setup + Given 100 USDC of x402 payments were collected in the previous round + When a new round begins + Then 30 USDC is authorized in escrow + + # Worker experiments + When worker "0xW001" precommits a benchmark with 50 nonces + And worker "0xW002" precommits a benchmark with 50 nonces + And both workers submit Merkle roots over their results + And the verifier samples 5 nonces from each worker + And both workers submit valid Merkle proofs + Then both workers are recorded as qualifiers + + # Reward calculation + When the round duration expires + Then the reward engine computes influence for both workers + And both workers have balanced challenge participation + And influence is split proportionally to qualifier count + + # Settlement + When captures are executed + Then worker "0xW001" receives their earned USDC via capture() + And worker "0xW002" receives their earned USDC via capture() + And innovator "muon-opt-v2" receives adoption-weighted USDC + And the operator receives 10% of the pool + And void() returns any remainder to the platform wallet + And the leaderboard API shows both workers with correct earnings + And the next round begins with a new authorization + + Scenario: Round where one worker submits fraudulent proofs + Given 100 USDC of x402 payments were collected + When a new round begins + Then 30 USDC is authorized in escrow + + When worker "0xW001" submits valid proofs for all sampled nonces + And worker "0xW002" submits a proof with a quality mismatch + Then worker "0xW001" is a qualifier + And worker "0xW002" is excluded + + When captures are executed + Then worker "0xW001" receives the entire worker pool share + And worker "0xW002" receives nothing + And void() returns worker "0xW002"'s unclaimed share to the platform +``` + +## Labels + +`component:autoresearch` `component:rewards` `component:x402` `priority:high` `size:XL` diff --git a/docs/issues/issue-reth-erc8004-indexer-helm-chart.md b/docs/issues/issue-reth-erc8004-indexer-helm-chart.md new file mode 100644 index 00000000..13217e69 --- /dev/null +++ b/docs/issues/issue-reth-erc8004-indexer-helm-chart.md @@ -0,0 +1,277 @@ +# Extract reth-erc8004-indexer into standalone Helm chart with discovery fallback + +## Summary + +Carve the `reth-erc8004-indexer/` component out of PR #288 into its own PR with a dedicated Helm chart, proper test coverage, and a multi-tier discovery architecture that can fall back to Etherscan/BaseScan's native ERC-8004 metadata support when a full Reth node is impractical. + +## Motivation + +The reth-erc8004-indexer currently lives as a loose directory in the repo with a Dockerfile but no Helm chart, no CI, and no tests beyond ralph-m3's manual validation checklist. The autoresearch coordinator (and any future service that needs agent discovery) depends on a working ERC-8004 query API, but today that dependency is either: + +1. **8004scan.io** — a third-party centralized API we don't control +2. **reth-erc8004-indexer** — a custom Reth binary that requires syncing an entire Base L2 node + +Neither option is great for all deployment scenarios. Meanwhile, **Etherscan/BaseScan announced native ERC-8004 metadata display** (operational status, x402 support, services) on NFT detail pages. This creates a third discovery tier that's reliable, free, and doesn't require running infrastructure. + +The indexer should ship as a properly tested, independently installable Helm chart that the autoresearch chart (and others) can declare as an optional dependency. + +## Scope + +### In scope + +- [ ] Move `reth-erc8004-indexer/` and `Dockerfile.reth-erc8004-indexer` into a self-contained Helm chart at `charts/reth-erc8004-indexer/` +- [ ] Implement 3-tier discovery fallback in the coordinator's discovery client +- [ ] Integration tests for the indexer API surface +- [ ] CI pipeline for building the Reth binary image +- [ ] Document deployment scenarios (full node, lightweight, external-only) + +### Out of scope + +- Autoresearch reward engine or OPOW mechanics (separate issue) +- Changes to the autoresearch coordinator loop logic +- ERC-8004 registration/minting changes + +## Architecture: 3-Tier Discovery + +The coordinator and any other discovery consumer should attempt sources in priority order: + +``` +Priority 1: Internal Reth Indexer (self-hosted, real-time) + │ + │ OBOL_INDEXER_API_URL=http://reth-indexer:3400 + │ Latency: <100ms, block-level freshness + │ Cost: runs a full Base L2 node (~500GB disk, ongoing sync) + │ + ▼ if unavailable or not deployed +Priority 2: BaseScan / Etherscan API (hosted, reliable) + │ + │ BASESCAN_API_URL=https://api.basescan.org/api + │ BASESCAN_API_KEY= + │ Latency: <500ms, near real-time + │ Cost: free tier = 5 calls/sec, Pro = 100K calls/day + │ Coverage: ERC-8004 metadata now displayed natively + │ - agent operational status + │ - x402 support flag + │ - registered services list + │ - NFT detail page with full metadata + │ + ▼ if unavailable or no API key +Priority 3: 8004scan.io (community, best-effort) + │ + │ SCAN_API_URL=https://www.8004scan.io/api/v1/public + │ Latency: <1s, minutes behind chain + │ Cost: free, no key required + │ Risk: third-party, no SLA + │ + ▼ if all unavailable + Error: no discovery backend available +``` + +### BaseScan Integration Details + +As of March 26, 2026, BaseScan displays ERC-8004 metadata on NFT detail pages: +- Contract: `0x8004A169FB4a3325136EB29fA0ceB6D2e539a432` (18,512 holders, 45,198 transfers) +- Each agent NFT page shows: operational status, x402 support, services, metadata +- BaseScan API can query token holders, transfer events, and read contract state + +The BaseScan adapter needs to: +1. Query NFT holders of the Identity Registry contract +2. For each token ID, read the metadata URI via `tokenURI(tokenId)` +3. Fetch the off-chain registration JSON from the metadata URI +4. Filter by OASF skill/domain taxonomy (same as 8004scan queries) +5. Cache results with configurable TTL (default: 5 minutes) + +This is more work per query than the indexer (N+1 calls vs single query), but it requires **zero infrastructure** and uses a highly reliable API. + +### Discovery Client Interface + +```go +// internal/discovery/discovery.go + +type Agent struct { + TokenID string + ChainID uint64 + Owner string + Name string + Endpoint string + Skills []string + Domains []string + Metadata map[string]interface{} + X402Support bool +} + +type DiscoveryClient interface { + ListAgents(ctx context.Context, opts ListOptions) ([]Agent, error) + SearchAgents(ctx context.Context, query string, limit int) ([]Agent, error) + GetAgent(ctx context.Context, chainID uint64, tokenID string) (*Agent, error) + Health(ctx context.Context) error +} + +type ListOptions struct { + Skill string // OASF skill filter, e.g. "devops_mlops/model_versioning" + Domain string // OASF domain filter + ChainID uint64 // filter by chain + Limit int + SortBy string // "registered_at", "name" +} +``` + +Three implementations: `RethIndexerClient`, `BaseScanClient`, `EightKScanClient`. +A `FallbackClient` wraps all three and tries in priority order. + +### Cluster Topology + +``` +┌─────────────────────────────────────────────────────────────┐ +│ obol-stack cluster (k3d / k3s) │ +│ │ +│ ┌─────────────────────────┐ ┌──────────────────────┐ │ +│ │ reth-erc8004-indexer │ │ autoresearch chart │ │ +│ │ (optional Helm chart) │ │ (depends on discovery)│ │ +│ │ │ │ │ │ +│ │ ┌─────────┐ ┌────────┐ │ │ coordinator │ │ +│ │ │ Reth │ │ SQLite │ │ │ │ │ │ +│ │ │ ExEx │→│ WAL │ │ │ ▼ │ │ +│ │ │ (Base) │ │ store │ │ │ FallbackClient │ │ +│ │ └─────────┘ └────────┘ │ │ ├→ RethIndexer? │ │ +│ │ │ │ │ ├→ BaseScan? │ │ +│ │ ┌────▼──────┐ │ │ └→ 8004scan? │ │ +│ │ │ REST API │◄────────│───│─── GET /agents?skill= │ │ +│ │ │ :3400 │ │ │ │ │ +│ │ └───────────┘ │ └──────────────────────┘ │ +│ └─────────────────────────┘ │ +│ │ +│ OR (lightweight mode) │ +│ │ +│ ┌──────────────────────┐ │ +│ │ autoresearch chart │ │ +│ │ │ ┌──────────────────────┐ │ +│ │ FallbackClient ─────│────→│ api.basescan.org │ │ +│ │ (no indexer needed) │ │ (ERC-8004 metadata │ │ +│ │ │ │ natively supported) │ │ +│ └──────────────────────┘ └──────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Helm Chart Structure + +``` +charts/reth-erc8004-indexer/ +├── Chart.yaml +├── values.yaml +├── templates/ +│ ├── _helpers.tpl +│ ├── statefulset.yaml # Reth + ExEx + API in one pod +│ ├── service.yaml # ClusterIP on port 3400 (API) + 30303 (P2P) +│ ├── pvc.yaml # Persistent volume for chain data + SQLite +│ ├── configmap.yaml # Reth config (Base chain, ExEx params) +│ ├── servicemonitor.yaml # Prometheus metrics (optional) +│ └── tests/ +│ └── test-api.yaml # Helm test: curl /health + /api/v1/public/stats +└── README.md +``` + +### values.yaml (key fields) + +```yaml +replicaCount: 1 + +image: + repository: ghcr.io/obolnetwork/reth-erc8004-indexer + tag: latest + +reth: + chain: base + dataDir: /data/reth + syncMode: full # full | archive + httpPort: 8545 + p2pPort: 30303 + +indexer: + apiPort: 3400 + dbPath: /data/indexer.db + identityRegistry: "0x8004A169FB4a3325136EB29fA0ceB6D2e539a432" + reputationRegistry: "0x8004BAa17C55a88189AE136b182e5fdA19dE9b63" + +persistence: + enabled: true + size: 500Gi # Base L2 chain data + storageClass: "" # use cluster default + +resources: + requests: + cpu: 2 + memory: 8Gi + limits: + cpu: 4 + memory: 16Gi +``` + +## Test Plan + +### Unit tests (Rust) + +- [ ] `storage.rs`: insert/query/update/delete agents in SQLite +- [ ] `storage.rs`: pagination, sorting, search with LIKE/FTS +- [ ] `indexer.rs`: parse `Registered`, `URIUpdated`, `MetadataSet` event logs +- [ ] `indexer.rs`: handle reorgs (rollback indexed data on chain reorg) +- [ ] `api.rs`: response shape matches 8004scan API contract + +### Integration tests (against running instance) + +- [ ] `/health` returns 200 with sync status +- [ ] `/api/v1/public/agents` returns paginated list +- [ ] `/api/v1/public/agents?protocol=OASF&search=model_versioning` filters correctly +- [ ] `/api/v1/public/agents/{chain_id}/{token_id}` returns single agent with full metadata +- [ ] `/api/v1/public/stats` returns registry statistics +- [ ] Response shapes are wire-compatible with 8004scan (coordinator works against both) + +### Discovery fallback tests + +- [ ] FallbackClient uses Reth indexer when available +- [ ] FallbackClient falls back to BaseScan when indexer is down +- [ ] FallbackClient falls back to 8004scan when BaseScan has no API key +- [ ] FallbackClient returns error when all three are unavailable +- [ ] BaseScan adapter correctly reads ERC-8004 NFT metadata via token API +- [ ] Cache TTL is respected (no redundant API calls within window) + +### Autoresearch-specific tests + +- [ ] Coordinator discovers workers with `devops_mlops/model_versioning` skill via each tier +- [ ] Coordinator reads `best_val_bpb` from worker metadata via each tier +- [ ] Coordinator probes discovered workers via x402 (402 response = alive) +- [ ] End-to-end: register worker → indexer picks up → coordinator discovers → probe succeeds + +## Migration from PR #288 + +Files to move into this PR: + +``` +reth-erc8004-indexer/ → charts/reth-erc8004-indexer/src/ (or keep at root with chart alongside) +Dockerfile.reth-erc8004-indexer → charts/reth-erc8004-indexer/Dockerfile +ralph-m3.md → reference for test plan, then remove +``` + +New files: + +``` +charts/reth-erc8004-indexer/ → Helm chart (as above) +internal/discovery/ → Go discovery client with fallback +internal/discovery/discovery.go → interface + FallbackClient +internal/discovery/reth.go → RethIndexerClient +internal/discovery/basescan.go → BaseScanClient +internal/discovery/eightkcan.go → EightKScanClient (8004scan) +internal/discovery/*_test.go → tests for each +``` + +## Acceptance Criteria + +1. `helm install indexer charts/reth-erc8004-indexer` deploys and syncs on a k3s cluster with Base chain +2. Coordinator discovers workers via the indexer with zero code changes to coordinate.py (SCAN_API_URL points to indexer) +3. When indexer is not installed, coordinator automatically falls back to BaseScan or 8004scan +4. All tests in the test plan pass in CI +5. Docker image builds in CI and publishes to ghcr.io/obolnetwork/reth-erc8004-indexer + +## Labels + +`component:indexer` `component:discovery` `priority:high` `size:L` From 9ca27542c59fd7a139bb678345a798534876755d Mon Sep 17 00:00:00 2001 From: bussyjd Date: Fri, 27 Mar 2026 09:59:18 +0000 Subject: [PATCH 2/5] =?UTF-8?q?docs:=20add=20Option=20A=20settlement=20flo?= =?UTF-8?q?w=20diagram=20=E2=80=94=20zero=20custom=20contracts?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Platform wallet is payer, operator, AND receiver. Escrow is used for verifiable commitment (workers see locked pool on-chain before working), not for routing. After capture() returns USDC to platform wallet, standard ERC20 transfers distribute to workers/innovators. Zero custom Solidity. Zero new deployments. Zero new audits. Only AuthCaptureEscrow (5x audited) + USDC.transfer(). --- .../diagrams/option-a-settlement-flow.md | 290 ++++++++++++++++++ docs/issues/issue-autoresearch-helm-chart.md | 27 ++ 2 files changed, 317 insertions(+) create mode 100644 docs/issues/diagrams/option-a-settlement-flow.md diff --git a/docs/issues/diagrams/option-a-settlement-flow.md b/docs/issues/diagrams/option-a-settlement-flow.md new file mode 100644 index 00000000..ad3060e7 --- /dev/null +++ b/docs/issues/diagrams/option-a-settlement-flow.md @@ -0,0 +1,290 @@ +# Option A: Platform Wallet Settlement — No Custom Contracts + +## The Core Idea + +The platform wallet is both the **payer** (locks funds in escrow) and the +**receiver** (gets them back via capture). Then it distributes to workers +and innovators with standard USDC transfers. The "distributor" is a script, +not a smart contract. + +``` +ZERO custom Solidity. +ZERO new deployments. +Only existing audited Commerce Payments contracts + standard ERC20 transfers. +``` + +## Full Flow Diagram + +``` + BASE L2 (on-chain) +┌──────────────────────────────────────────────────────────────────────┐ +│ │ +│ ┌─────────────────┐ ┌────────────────────────────┐ │ +│ │ USDC Contract │ │ AuthCaptureEscrow │ │ +│ │ (Base Mainnet) │ │ 0xBdEA...0cff │ │ +│ │ │ │ (5x audited, deployed) │ │ +│ │ │ │ │ │ +│ │ balanceOf(plat) │ │ ┌────────────────────────┐ │ │ +│ │ balanceOf(w1) │ │ │ TokenStore │ │ │ +│ │ balanceOf(w2) │ │ │ (holds escrowed USDC) │ │ │ +│ │ balanceOf(inn) │ │ └────────────────────────┘ │ │ +│ └────────┬────────┘ └─────────────┬──────────────┘ │ +│ │ │ │ +└───────────│─────────────────────────────────────│────────────────────┘ + │ │ + │ OBOL-STACK CLUSTER │ +┌───────────│─────────────────────────────────────│────────────────────┐ +│ │ │ │ +│ ┌────────▼─────────────────────────────────────▼──────────────┐ │ +│ │ ESCROW ROUND MANAGER │ │ +│ │ (Python script in pod) │ │ +│ │ │ │ +│ │ Holds the platform wallet private key (or Secure Enclave) │ │ +│ │ This is the OPERATOR and the PAYER and the RECEIVER │ │ +│ └─────────────────────────┬───────────────────────────────────┘ │ +│ │ │ +│ ┌────────────────┼────────────────┐ │ +│ │ │ │ │ +│ ┌────────▼──────┐ ┌──────▼──────┐ ┌───────▼─────┐ │ +│ │ Reward Engine │ │ Verifier │ │ Discovery │ │ +│ │ (OPOW calc) │ │ (proofs) │ │ (ERC-8004) │ │ +│ └───────────────┘ └─────────────┘ └─────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +## Step-by-Step: One Round + +``` +STEP 1: AUTHORIZE +════════════════════════════════════════════════════════════════ + + Platform wallet signs ERC-3009 receiveWithAuthorization: + from: platform_wallet (payer) + to: AuthCaptureEscrow + value: 100 USDC (this round's pool) + + PaymentInfo struct: + operator: platform_wallet ← same entity + payer: platform_wallet ← same entity + receiver: platform_wallet ← SAME ENTITY (this is the trick) + token: USDC + maxAmount: 100 USDC + authorizationExpiry: round_end + 1 hour + refundExpiry: round_end + 24 hours + + Call: AuthCaptureEscrow.authorize(paymentInfo, 100, collector, sig) + + Result: + ┌──────────────────┐ ┌──────────────────┐ + │ Platform Wallet │ ──$100──▶ TokenStore │ + │ balance: -100 │ │ (escrowed) │ + │ │ │ capturableAmt=100│ + └──────────────────┘ └──────────────────┘ + + WHY: The 100 USDC is now LOCKED. Platform can't spend it on + anything else. Workers can verify on-chain that the commitment + is real before starting work. + + +STEP 2: WORKERS DO WORK (during the round) +════════════════════════════════════════════════════════════════ + + Workers submit experiments, proofs are verified. + No money moves. This is the same flow as today. + + ┌──────────┐ precommit ┌──────────┐ + │ Worker 1 │ ───────────▶ │ Verifier │ + │ (spark1) │ benchmark │ │ + │ │ ───────────▶ │ records │ + │ │ proof │ quals │ + │ │ ───────────▶ │ │ + └──────────┘ └──────────┘ + + ┌──────────┐ precommit ┌──────────┐ + │ Worker 2 │ ───────────▶ │ Verifier │ + │ (spark2) │ benchmark │ │ + │ │ ───────────▶ │ │ + │ │ proof │ │ + │ │ ───────────▶ │ │ + └──────────┘ └──────────┘ + + +STEP 3: REWARD ENGINE COMPUTES SHARES +════════════════════════════════════════════════════════════════ + + Pool: 100 USDC + Split: 70% workers, 20% innovators, 10% operator + + Worker influence (from OPOW parity formula): + Worker 1: influence = 0.6 → reward = 70 * 0.6 = 42 USDC + Worker 2: influence = 0.4 → reward = 70 * 0.4 = 28 USDC + + Innovator adoption: + "muon-v3": adoption 75% → reward = 20 * 0.75 = 15 USDC + "adamw": adoption 25% → reward = 20 * 0.25 = 5 USDC + + Operator: 10 USDC + + Total to distribute: 42 + 28 + 15 + 5 + 10 = 100 USDC + + +STEP 4: CAPTURE (single call) +════════════════════════════════════════════════════════════════ + + The escrow round manager calls capture() for the FULL distributable amount. + Since receiver = platform_wallet, the USDC comes right back to us. + + Call: AuthCaptureEscrow.capture(paymentInfo, 100, feeBps=0, feeReceiver) + + Result: + ┌──────────────────┐ ┌──────────────────┐ + │ TokenStore │ ──$100──▶ Platform Wallet │ + │ capturableAmt=0 │ │ balance: +100 │ + │ │ │ (back to us) │ + └──────────────────┘ └──────────────────┘ + + WHY capture to ourselves instead of just voiding? + Because capture creates an ON-CHAIN RECORD of settlement. + Anyone can verify the round was settled by reading the events. + void() would look like the round was cancelled. + + +STEP 5: DISTRIBUTE (standard ERC20 transfers) +════════════════════════════════════════════════════════════════ + + Now the platform wallet holds the USDC and distributes directly. + These are plain USDC.transfer() calls. No custom contract. + + ┌──────────────────┐ + │ Platform Wallet │ + │ balance: 100 │ + │ │ + │ transfer(W1, 42)│──── 42 USDC ────▶ Worker 1 wallet + │ transfer(W2, 28)│──── 28 USDC ────▶ Worker 2 wallet + │ transfer(I1, 15)│──── 15 USDC ────▶ Innovator 1 wallet + │ transfer(I2, 5)│──── 5 USDC ────▶ Innovator 2 wallet + │ (keep 10) │ Operator keeps 10 + │ │ + │ balance: 10 │ + └──────────────────┘ + + Each transfer is a separate on-chain tx. + With 10 workers + 5 innovators = 15 transfers ≈ 15 * $0.001 gas ≈ $0.015 + (Base L2 gas is extremely cheap) + + +STEP 6: VOID (cleanup — usually a no-op) +════════════════════════════════════════════════════════════════ + + If we captured the full amount, void() has nothing to return. + If we captured less (e.g., some workers didn't qualify), void() + returns the remainder to the platform wallet. + + Call: AuthCaptureEscrow.void(paymentInfo) + + Result: capturableAmount (if any) returns to payer. + + +STEP 7: NEXT ROUND +════════════════════════════════════════════════════════════════ + + Repeat from Step 1 with the new round's pool. + The operator's 10 USDC stays in the platform wallet. + New x402 payments from buyers add to the next pool. +``` + +## Why This Works + +``` +QUESTION ANSWER +────────────────────────────────────────────────────────────────── +"Isn't receiver=payer circular?" Yes, intentionally. We use + the escrow for COMMITMENT + (locked, verifiable on-chain) + not for routing. + +"Why not just transfer directly Because authorize() creates + to workers without escrow?" a verifiable on-chain commitment. + Workers see the locked pool + BEFORE doing work. Without it, + workers have to trust the + platform will pay. + +"What if the manager crashes?" After authorizationExpiry, + platform wallet calls reclaim(). + Money comes back. Workers + don't get paid for that round, + but no funds are lost. + +"What if a transfer to a worker The other transfers already + fails?" succeeded (each is independent). + Retry the failed one. USDC + transfer failures are almost + always gas-related, not + permanent. + +"Can workers verify they'll Yes. On-chain: + get paid?" 1. Read capturableAmount = pool + 2. Read authorizationExpiry > now + 3. PaymentInfo is deterministic + from round params + 4. If locked, the math determines + their share (public formula) + +"What about front-running the capture() is called by the + distribution?" operator (platform wallet) only. + Nobody else can capture. + Workers receive standard ERC20 + transfers after capture. + +"Is this as secure as a custom MORE secure. We use 5x-audited + distributor contract?" Commerce Payments + standard + ERC20 transfers. A custom + contract is a new attack surface. +``` + +## What Each Party Sees On-Chain + +``` +WORKER'S PERSPECTIVE: + Before work: + → AuthCaptureEscrow.getPaymentState(hash) shows capturableAmount = 100 USDC + → "Pool is committed, I'll get paid if I do good work" + + After round: + → USDC.Transfer(platform_wallet → my_wallet, 42 USDC) + → "I got paid" + + Audit trail: + → Authorized event (round start, 100 USDC locked) + → Captured event (round end, 100 USDC settled) + → Transfer events (42 USDC to me, 28 to other worker, etc) + +INNOVATOR'S PERSPECTIVE: + → Same as worker but smaller amount (adoption-weighted) + +PLATFORM OPERATOR'S PERSPECTIVE: + → authorize(): 100 USDC leaves wallet to escrow + → capture(): 100 USDC returns to wallet from escrow + → transfer(): 90 USDC leaves wallet to participants + → Net: kept 10 USDC (operator share) + → All events auditable on BaseScan +``` + +## Contract Interaction Summary + +``` +CALL WHO SIGNS CONTRACT WHAT HAPPENS +───────────────────────────────────────────────────────────────────────────────────────────── +authorize(paymentInfo, amt) platform wallet AuthCaptureEscrow USDC locked +capture(paymentInfo, amt) platform wallet AuthCaptureEscrow USDC returned to platform +void(paymentInfo) platform wallet AuthCaptureEscrow remainder returned +reclaim(paymentInfo) platform wallet AuthCaptureEscrow safety recovery +USDC.transfer(worker, amt) platform wallet USDC ERC20 worker gets paid +USDC.transfer(innovator, amt) platform wallet USDC ERC20 innovator gets paid + +Total contracts called: 2 (AuthCaptureEscrow + USDC) +Custom contracts deployed: 0 +New audits needed: 0 +``` diff --git a/docs/issues/issue-autoresearch-helm-chart.md b/docs/issues/issue-autoresearch-helm-chart.md index 87064d7d..11b3f792 100644 --- a/docs/issues/issue-autoresearch-helm-chart.md +++ b/docs/issues/issue-autoresearch-helm-chart.md @@ -187,6 +187,33 @@ function refund( Expiry ordering enforced by contract: `preApprovalExpiry <= authorizationExpiry <= refundExpiry` +### Verified: receiver == payer Is Valid + +Option A sets `receiver = payer = platform_wallet` in PaymentInfo. This is +intentional and verified against the contract source: + +1. **AuthCaptureEscrow** (`_validatePayment()` at line 480) checks: + amount <= maxAmount, expiry ordering, fee bounds. It **never** checks + `payer != receiver`. No constraint exists. + +2. **ERC-3009** (`receiveWithAuthorization`) requires `to == msg.sender` + ("caller must be the payee" — Circle's FiatTokenV2/EIP3009.sol). But + `to` is set to the `ERC3009PaymentCollector` contract address, not to + `paymentInfo.receiver`. The collector is an intermediary: + + ``` + authorize: payer → collector (to==msg.sender ✓) → TokenStore (locked) + capture: TokenStore → receiver (== payer, standard ERC20 transfer) + distribute: payer wallet → worker1, worker2, ... (standard ERC20 transfers) + ``` + +3. **Capture to self** is just a TokenStore → platform_wallet ERC20 transfer. + No special-case logic, no revert condition. + +Source: `base/commerce-payments/src/AuthCaptureEscrow.sol` lines 236-295, +`src/collectors/ERC3009PaymentCollector.sol` lines 42-49, +`circlefin/stablecoin-evm/contracts/v2/EIP3009.sol` (receiveWithAuthorization). + ### Inverted Trust Model — Why It's Better The Commerce Payments escrow protects the **payer** (the reward pool), not the service provider (the worker). This inversion is actually the superior design for our use case: From ab57c974bbb69886cdfdb623d8711dcd2bdab6d6 Mon Sep 17 00:00:00 2001 From: bussyjd Date: Sun, 29 Mar 2026 00:23:07 +0100 Subject: [PATCH 3/5] docs: add obol-stack technical specification bundle Install backend-service-spec-bundler skill and generate comprehensive spec bundle covering all 9 core subsystems: stack lifecycle, LLM routing, network/RPC gateway, sell-side monetization, buy-side payments, OpenClaw & skills, tunnel management, ERC-8004 identity, standalone inference. - SPEC.md: 1452-line technical specification with Mermaid diagrams - ARCHITECTURE.md: C4 diagrams, module decomposition, data flows - BEHAVIORS_AND_EXPECTATIONS.md: 28 desired, 6 undesired, 7 edge cases - CONTRIBUTING.md: 9 non-negotiable developer rules - 7 BDD feature files (126 Gherkin scenarios) - 6 ADRs (k3d, LiteLLM, x402, pre-signed buyer, Gateway API, ERC-8004) --- .../backend-service-spec-bundler/SKILL.md | 138 ++ .../backend-service-spec-bundler/reference.md | 218 +++ .../templates/ADR.md | 18 + .../templates/ARCHITECTURE.md | 200 +++ .../templates/BEHAVIORS_AND_EXPECTATIONS.md | 117 ++ .../templates/CONTRIBUTING.md | 33 + .../templates/SPEC.md | 220 +++ .../templates/feature.feature | 35 + .claude/skills/backend-service-spec-bundler | 1 + docs/specs/ARCHITECTURE.md | 966 +++++++++++ docs/specs/BEHAVIORS_AND_EXPECTATIONS.md | 667 ++++++++ docs/specs/CONTRIBUTING.md | 199 +++ docs/specs/SPEC.md | 1452 +++++++++++++++++ docs/specs/adr/0001-local-first-k3d.md | 62 + docs/specs/adr/0002-litellm-gateway.md | 62 + docs/specs/adr/0003-x402-payment-gating.md | 65 + .../adr/0004-pre-signed-erc3009-buyer.md | 61 + docs/specs/adr/0005-traefik-gateway-api.md | 62 + docs/specs/adr/0006-erc8004-identity.md | 67 + docs/specs/features/buy_payments.feature | 152 ++ docs/specs/features/erc8004_identity.feature | 149 ++ docs/specs/features/llm_routing.feature | 147 ++ docs/specs/features/network_rpc.feature | 149 ++ docs/specs/features/sell_monetization.feature | 203 +++ docs/specs/features/stack_lifecycle.feature | 166 ++ docs/specs/features/tunnel_exposure.feature | 190 +++ 26 files changed, 5799 insertions(+) create mode 100644 .agents/skills/backend-service-spec-bundler/SKILL.md create mode 100644 .agents/skills/backend-service-spec-bundler/reference.md create mode 100644 .agents/skills/backend-service-spec-bundler/templates/ADR.md create mode 100644 .agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md create mode 100644 .agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md create mode 100644 .agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md create mode 100644 .agents/skills/backend-service-spec-bundler/templates/SPEC.md create mode 100644 .agents/skills/backend-service-spec-bundler/templates/feature.feature create mode 120000 .claude/skills/backend-service-spec-bundler create mode 100644 docs/specs/ARCHITECTURE.md create mode 100644 docs/specs/BEHAVIORS_AND_EXPECTATIONS.md create mode 100644 docs/specs/CONTRIBUTING.md create mode 100644 docs/specs/SPEC.md create mode 100644 docs/specs/adr/0001-local-first-k3d.md create mode 100644 docs/specs/adr/0002-litellm-gateway.md create mode 100644 docs/specs/adr/0003-x402-payment-gating.md create mode 100644 docs/specs/adr/0004-pre-signed-erc3009-buyer.md create mode 100644 docs/specs/adr/0005-traefik-gateway-api.md create mode 100644 docs/specs/adr/0006-erc8004-identity.md create mode 100644 docs/specs/features/buy_payments.feature create mode 100644 docs/specs/features/erc8004_identity.feature create mode 100644 docs/specs/features/llm_routing.feature create mode 100644 docs/specs/features/network_rpc.feature create mode 100644 docs/specs/features/sell_monetization.feature create mode 100644 docs/specs/features/stack_lifecycle.feature create mode 100644 docs/specs/features/tunnel_exposure.feature diff --git a/.agents/skills/backend-service-spec-bundler/SKILL.md b/.agents/skills/backend-service-spec-bundler/SKILL.md new file mode 100644 index 00000000..99bb8c40 --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/SKILL.md @@ -0,0 +1,138 @@ +--- +name: backend-service-spec-bundler +description: > + Generate a complete specification bundle for a backend service from an idea. Produces SPEC.md, + ARCHITECTURE.md, BEHAVIORS_AND_EXPECTATIONS.md, BDD feature files, ADRs, and CONTRIBUTING.md + through an interactive discovery process. Spec-only — no implementation code. +argument-hint: "[idea description]" +allowed-tools: Read, Write, Edit, Glob, Grep, Bash(ls *), Bash(mkdir *) +--- + +# Spec Bundle Generator + +You are a specification architect. Your job is to take a user's idea and produce a +**complete, implementation-ready specification bundle** through structured discovery. + +You produce specs. You do NOT write implementation code. Ever. + +**Audience**: This skill is for seasoned developers. Do not dumb things down, do not +default to the "popular" choice, and do not assume comfort zones. The user can handle +type-safe languages, binary protocols, manual memory considerations, and lean infrastructure. +Your job is to find the *right* choice, not the *familiar* one. + +The user's idea: **$ARGUMENTS** + +--- + +## Phase 1: Idea Intake & Discovery + +Before writing anything, you must deeply understand the idea. Run an interactive +discovery session with the user. Ask questions in **batches of 3-5** (not one at a time, +not 20 at once). Cover these areas across 2-4 rounds: + +### Round 1 — Core Understanding +- What problem does this solve? Who is the primary user/actor? +- What are the 3-5 core capabilities (the "must haves")? +- What does this system explicitly NOT do? (Anti-scope) +- Is there a preferred tech stack, language, or platform? (Do NOT suggest defaults — ask openly. If the user doesn't have a preference, explore options based on their performance, safety, and deployment needs. Consider the full spectrum: Rust, Go, Java, Kotlin, C++, Zig, etc. — not just Python/JS/TypeScript.) + +### Round 2 — Boundaries & Constraints +- What external systems does it integrate with? (APIs, databases, services) +- What are the hard constraints? (Security, compliance, performance, infrastructure) +- How will users interact with it? (CLI, API, web UI, mobile, SDK) +- What is the deployment model? (Cloud, self-hosted, embedded, serverless) +- **How complex does the architecture really need to be?** Start from the simplest viable option and justify upward. Consider: plain filesystem, SQLite, embedded stores, in-memory state, single-process — before reaching for Redis, Postgres, Kafka, or distributed anything. Ask: "Could this run as a single binary with an embedded database, or is there a concrete reason it can't?" +- **Wire format**: If the system has an API boundary or inter-service communication, explore the full range: SBE, FlatBuffers, Protocol Buffers, Cap'n Proto, MessagePack, CBOR — not just JSON. JSON is fine for config and human-readable output, but for wire protocols, ask about throughput, payload size, and schema evolution needs before defaulting to it. + +### Round 3 — Behaviors & Edge Cases +- What happens on the "happy path" for the top 3 use cases? +- What are the failure modes? (Network down, bad input, rate limits, partial failures) +- Are there any phased delivery plans? (MVP first, then iterate) +- Are there non-functional requirements? (Latency targets, throughput, uptime) + +### Round 4 (if needed) — Clarification +- Resolve any ambiguities from prior rounds +- Confirm assumptions before proceeding + +**Rules for discovery:** +- Do NOT proceed to Phase 2 until the user confirms you have enough to start. +- Summarize your understanding back to the user before moving on. +- If the user provides a rich initial description, adapt — skip questions they've already answered. +- It's fine to propose reasonable defaults and ask "does this sound right?" rather than asking open-ended questions for everything. +- **Simplicity bias**: Always challenge complexity. If the user says "I need Redis for caching", ask what the cache hit rate and dataset size are — maybe an in-process LRU cache or SQLite is enough. If they say "microservices", ask what the team size and deployment cadence are — maybe a modular monolith is the right call. The goal is the simplest architecture that meets the actual requirements, not the one that looks impressive on a diagram. +- **No tech stack defaults**: Never pre-fill a tech choice. If the user hasn't stated a language, do not suggest Python or JavaScript by default. Ask what matters to them (type safety, performance, ecosystem, team expertise) and let the answer drive the recommendation. Treat languages like Rust, Go, Java, Kotlin, C#, and C++ as equally valid starting points. +- **No serialization defaults**: Do not assume JSON for wire formats. Ask about payload characteristics (size, frequency, schema stability, latency sensitivity) and recommend accordingly. Binary formats (SBE, FlatBuffers, Protobuf) are first-class options, not exotic choices. + +--- + +## Phase 2: Spec Bundle Generation + +Once discovery is complete, generate the following files. Read the templates in +`${CLAUDE_SKILL_DIR}/templates/` for the exact structure of each document. + +### Output Structure + +``` +/ + SPEC.md # Exhaustive technical specification + ARCHITECTURE.md # C4 diagrams and structural overview + BEHAVIORS_AND_EXPECTATIONS.md # Behavioral contract (trigger/expected/rationale) + CONTRIBUTING.md # Developer rules (non-negotiable) + features/ # BDD Gherkin feature files + .feature + .feature + ... + docs/adr/ # Architecture Decision Records + 0001-.md + ... +``` + +### Generation Order + +Generate in this order (each document builds on the previous): + +1. **SPEC.md** — The authoritative technical blueprint. Everything else derives from this. +2. **ARCHITECTURE.md** — Visual/structural companion to the spec. C4 diagrams in Mermaid. +3. **BEHAVIORS_AND_EXPECTATIONS.md** — Behavioral contract. Every behavior maps to a testable scenario. +4. **BDD Feature Files** — One `.feature` file per major feature area. Every scenario traces back to B&E + SPEC sections. +5. **ADRs** — One per significant architectural decision made during discovery. +6. **CONTRIBUTING.md** — Developer rules derived from constraints and architectural decisions. + +### Cross-Reference System + +Maintain bidirectional cross-references between all documents: +- Each feature file header must reference: `# References: SPEC Section X, B&E Section Y` +- Each behavior in B&E must note which SPEC section defines the underlying system +- ARCHITECTURE.md must link to SPEC.md for full details +- ADRs must reference which SPEC sections they impact + +### Quality Requirements + +- **No vagueness**: Every section must be specific enough for an engineer (or an AI agent) to implement from without asking questions. +- **No implementation code**: Specs describe WHAT and WHY, not HOW at the code level. Pseudocode for algorithms is acceptable. Actual implementation code is not. +- **Testable behaviors**: Every behavior in B&E must be expressible as a Gherkin scenario. +- **Mermaid diagrams**: All architecture diagrams use Mermaid syntax (C4, sequence, flowchart). +- **Phased delivery**: If the user mentioned phases, tag features and scenarios with `@phase1`, `@phase2`, etc. +- **Terminology table**: SPEC.md must include a glossary of domain-specific terms. +- **Constraints table**: SPEC.md must include a system constraints table. + +--- + +## Phase 3: Review & Iteration + +After generating all files: + +1. Present a summary of what was generated (file list with brief descriptions). +2. Ask if the user wants to review any specific document. +3. Iterate on feedback — update specs, not code. +4. Confirm the bundle is complete before finishing. + +--- + +## Important Reminders + +- You are a spec writer, not an implementer. If the user asks you to write code, remind them this skill produces specifications only. +- Adapt the templates to the user's domain — don't blindly copy crypto/blockchain terminology from the templates into an unrelated project. +- The templates show STRUCTURE, not content to copy. The content must come from the discovery session. +- If a section from the template doesn't apply (e.g., no multi-chain support for a web app), omit it. Don't include empty sections. +- If the user's project needs sections not in the templates, add them. The templates are a starting point, not a ceiling. diff --git a/.agents/skills/backend-service-spec-bundler/reference.md b/.agents/skills/backend-service-spec-bundler/reference.md new file mode 100644 index 00000000..0c013a3d --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/reference.md @@ -0,0 +1,218 @@ +# Spec Bundle Methodology Reference + +This document describes the specification-first, behavior-driven methodology used by the +specbundle skill. Claude should internalize these principles when generating spec bundles. + +--- + +## Core Philosophy + +**Specs are the product, not the code.** A well-written spec bundle should allow any +competent engineer — or an AI coding agent — to implement the system without asking +clarifying questions. If the implementer needs to guess, the spec failed. + +**Simplicity is the default.** Every piece of infrastructure in the spec must justify its +existence. The question is never "why not use Postgres?" — it's "why not SQLite?" or "why +not the filesystem?" Start from the leanest possible architecture (single process, embedded +storage, in-memory state) and only add complexity when the requirements demand it. A system +that runs as a single binary with zero external dependencies is not "toy" — it's the ideal +starting point. Complexity is added, never assumed. + +**No comfort-zone defaults.** This methodology targets experienced developers who value +type safety, performance, and correctness. Do not default to JavaScript, Python, or JSON +because they're popular. Treat the full spectrum of languages (Rust, Go, Java, Kotlin, C++, +Zig, C#) and serialization formats (SBE, FlatBuffers, Protobuf, Cap'n Proto, MessagePack, +CBOR) as first-class options. The right choice comes from the project's constraints, not +from what's most common on Stack Overflow. + +## The Three-Tier Model + +Every component in the system gets three documents that form a layered specification: + +### Tier 1: Technical Specification (SPEC.md) + +The authoritative blueprint. Contains: +- System scope and anti-scope +- Terminology glossary (every domain term defined) +- System constraints table (hard limits that shape all decisions) +- Module decomposition with dependencies +- Complete API/protocol definitions +- Data model and storage architecture +- Security model +- Error handling taxonomy +- Performance targets +- Phased rollout plan +- Testing strategy + +**Key quality test**: Can someone implement from this document alone? If no, it's incomplete. + +### Tier 2: Architecture Document (ARCHITECTURE.md) + +The visual/structural companion. Contains: +- Design philosophy (3-5 guiding principles) +- C4 diagrams (Context, Container, Component) in Mermaid +- Module decomposition table with SPEC cross-references +- Sequence diagrams for major data flows +- Storage architecture overview +- Deployment model diagram +- Network topology +- Security architecture (trust boundaries, auth flows) + +**Key quality test**: Can someone understand the system structure in 10 minutes from diagrams alone? + +### Tier 3: Behavioral Contract (BEHAVIORS_AND_EXPECTATIONS.md) + +The testable behavioral specification. Contains: +- Desired behaviors: Trigger → Expected → Rationale +- Undesired behaviors: Trigger → Expected → Risk +- Edge cases: Scenario → Expected Handling → Rationale +- Performance expectations with targets and degradation handling +- Guardrail definitions (non-negotiable constraints) + +**Key quality test**: Can every entry be expressed as a Gherkin scenario? + +## BDD Feature Files + +One `.feature` file per major feature area. Each file: +- Uses `@bdd` tag at the top +- Includes a user story (As a / I want / So that) +- Has a header comment referencing SPEC and B&E sections +- Uses a Background block for common preconditions +- Tags scenarios with phase (`@phase1`, `@phase2`) and speed (`@fast`) +- Each scenario traces back to a specific B&E entry + +**Naming convention**: `{feature-area}.feature` (e.g., `trading.feature`, `authentication.feature`) + +## Architecture Decision Records (ADRs) + +One ADR per significant architectural decision. Each ADR: +- Is numbered sequentially: `0001-`, `0002-`, etc. +- Has three sections: Context, Decision, Consequences +- Is immutable once accepted (supersede, don't edit) +- Lives in `docs/adr/` + +**When to write an ADR**: Technology choices, protocol decisions, deployment model, database +selection, major pattern decisions, constraint trade-offs. + +## Cross-Reference System + +All documents maintain bidirectional references: + +``` +SPEC.md Section 4 ←→ ARCHITECTURE.md Section 3 (Module Decomposition) + ←→ B&E Section 2.1 (Desired Behaviors for this subsystem) + ←→ features/pipeline.feature (BDD scenarios) + ←→ docs/adr/0003-*.md (Decision that shaped this subsystem) +``` + +## The Discovery Process + +The spec writer's job during discovery is to extract: + +1. **The problem** — What pain exists today? +2. **The actors** — Who uses this and what are their goals? +3. **The scope** — What's in and what's explicitly out? +4. **The constraints** — What's non-negotiable? (Security, compliance, performance, infra) +5. **The interfaces** — How do users and external systems interact with it? +6. **The data** — What's stored, where, and for how long? +7. **The happy paths** — Walk through the top 3-5 use cases end to end +8. **The failure modes** — What breaks and how does the system recover? +9. **The phases** — What's MVP vs. future? +10. **The decisions** — What choices were already made and why? + +Good discovery questions are specific, not generic. "What's your tech stack?" is fine. +"Tell me about your requirements" is too vague. + +### Challenging Complexity During Discovery + +The spec writer must actively push back on unnecessary complexity. Common patterns to challenge: + +| User Says | Challenge With | +|-----------|---------------| +| "We need Redis for caching" | What's the dataset size? Could an in-process LRU or SQLite handle it? | +| "Postgres for the database" | What's the data volume and query complexity? Would SQLite or embedded RocksDB suffice? | +| "Kafka for event streaming" | What's the throughput? Could an in-process queue or simple file-based log work? | +| "Microservices architecture" | What's the team size? How often do components deploy independently? Would a modular monolith be simpler? | +| "JSON REST API" | What are the payload sizes and call frequency? Would a binary format (SBE, FlatBuffers, Protobuf) be more appropriate? | +| "Docker + Kubernetes" | Could this ship as a single binary / fat JAR? What's the actual scaling requirement? | +| "Python/Node for the backend" | What are the latency and throughput requirements? Is GC pressure a concern? Would a compiled language be a better fit? | + +The goal is not to reject these technologies — they're all valid in the right context. The goal +is to ensure they earn their place in the spec through concrete requirements, not habit. + +## Adapting to Different Domains + +The templates are derived from a crypto/DeFi project but the methodology is domain-agnostic. +When applying to other domains: + +- Replace domain-specific sections with relevant ones (e.g., "Multi-Chain Support" becomes + "Multi-Region Support" for a distributed web app) +- Adjust the constraint table for the domain (e.g., HIPAA for healthcare, PCI for payments) +- Scale the number of feature files to the project's complexity +- Adjust phasing to the project's delivery model +- Add domain-specific guardrails (e.g., "never expose PII" for a user-facing app) + +## The Complexity Ladder + +When specifying infrastructure, start at the bottom and move up only when a concrete +requirement forces you to: + +### Storage +``` +Filesystem (flat files, append-only logs) + ↓ need indexed queries? +SQLite / DuckDB (embedded, zero-config, single-file) + ↓ need concurrent write-heavy workloads from multiple processes? +Postgres / MySQL (server-based RDBMS) + ↓ need horizontal write scaling or document flexibility? +Distributed stores (CockroachDB, Cassandra, MongoDB) +``` + +### Caching +``` +In-process cache (LRU map, Caffeine, moka) + ↓ need shared cache across multiple processes? +Redis / Memcached + ↓ need persistence + cache semantics? +Redis with AOF / embedded KeyDB +``` + +### Message Passing +``` +In-process channels / queues (channels, ring buffers, Disruptor) + ↓ need cross-process or cross-machine messaging with low latency? +ZeroMQ / Aeron (brokerless, low-latency — work over IPC, TCP, and UDP/multicast) + ↓ need built-in durability, replay, or managed routing? +NATS / RabbitMQ + ↓ need massive throughput + log semantics + consumer groups? +Kafka / Redpanda +``` + +### Serialization +``` +Binary schema-driven (SBE, FlatBuffers, Cap'n Proto) — zero-copy, type-safe, fast + ↓ need schema evolution with broad ecosystem support? +Protobuf / MessagePack / CBOR — compact, good tooling + ↓ need human readability for debugging/config? +JSON / YAML / TOML — readable, verbose, no type safety at the wire level +``` + +### Deployment +``` +Single binary / fat JAR (zero dependencies) + ↓ need process isolation or heterogeneous runtimes? +Containers (Docker) + ↓ need orchestration across machines? +Kubernetes / Nomad +``` + +Each step up the ladder adds operational cost, failure modes, and cognitive load. The spec +must justify every step taken. + +## Multiple Subsystems + +Large projects may need separate spec bundles for distinct subsystems. In this case: +- Each subsystem gets its own SPEC, ARCHITECTURE, and B&E file (prefixed with the subsystem name) +- Feature files are organized per subsystem +- A top-level README ties them together +- Cross-references work across subsystem documents diff --git a/.agents/skills/backend-service-spec-bundler/templates/ADR.md b/.agents/skills/backend-service-spec-bundler/templates/ADR.md new file mode 100644 index 00000000..e3b2e584 --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/templates/ADR.md @@ -0,0 +1,18 @@ +# ADR-{NNNN}: {Decision Title} + +**Date**: {date} +**Status**: Accepted + +## Context + + + +## Decision + + + +## Consequences + +- **Positive**: {benefits of this decision} +- **Negative**: {costs, trade-offs, or risks accepted} +- **Neutral**: {side effects that are neither good nor bad} diff --git a/.agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md b/.agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md new file mode 100644 index 00000000..aa9403a8 --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md @@ -0,0 +1,200 @@ +# {Project Name} Architecture + +**Version**: 1.0.0-draft +**Status**: Draft +**Last Updated**: {date} + +This document provides a visual and structural overview of the {project name} system. For the full technical specification, see [SPEC.md](SPEC.md). + +--- + +## Table of Contents + +1. [System Overview](#1-system-overview) +2. [Component Diagrams](#2-component-diagrams) +3. [Module Decomposition](#3-module-decomposition) +4. [Data Flow Diagrams](#4-data-flow-diagrams) +5. [Storage Architecture](#5-storage-architecture) +6. [Deployment Model](#6-deployment-model) +7. [Network Topology](#7-network-topology) +8. [Security Architecture](#8-security-architecture) + +--- + +## 1. System Overview + +### Design Philosophy + + + +{Project name} is built around these principles: + +1. **{Principle 1}**: {explanation} +2. **{Principle 2}**: {explanation} +3. **{Principle 3}**: {explanation} + +### System Constraints + +| Constraint | Impact on Architecture | +|-----------|----------------------| +| {constraint} | {how it shapes the design} | + +--- + +## 2. Component Diagrams + +### 2.1 C4 Context Diagram + +Shows {project name} in relation to all external systems. + +```mermaid +C4Context + title {Project Name} — System Context + + Person(user, "{User Role}", "{User description}") + + System(sys, "{Project Name}", "{One-line description}") + + System_Ext(ext1, "{External System 1}", "{Description}") + System_Ext(ext2, "{External System 2}", "{Description}") + + Rel(user, sys, "{interaction}") + Rel(sys, ext1, "{protocol}", "{what flows}") + Rel(sys, ext2, "{protocol}", "{what flows}") +``` + +### 2.2 C4 Container Diagram + +Zooms into {project name} to show its internal containers. + +```mermaid +C4Container + title {Project Name} — Container Diagram + + Container_Boundary(boundary, "{Project Name}") { + Container(comp1, "{Component 1}", "{Tech}", "{Purpose}") + Container(comp2, "{Component 2}", "{Tech}", "{Purpose}") + ContainerDb(db, "{Database}", "{Type}", "{What it stores}") + } + + System_Ext(ext1, "{External System}", "{Description}") + + Rel(comp1, comp2, "{interaction}") + Rel(comp2, db, "{protocol}") + Rel(comp1, ext1, "{protocol}") +``` + +### 2.3 C4 Component Diagram (optional — for complex subsystems) + +```mermaid +C4Component + title {Subsystem Name} — Component Diagram + + Component(c1, "{Component}", "{Tech}", "{Purpose}") + Component(c2, "{Component}", "{Tech}", "{Purpose}") + + Rel(c1, c2, "{interaction}") +``` + +--- + +## 3. Module Decomposition + +| Module | Purpose | Key Dependencies | SPEC Reference | +|--------|---------|-----------------|----------------| +| {module} | {purpose} | {deps} | Section {N} | + +--- + +## 4. Data Flow Diagrams + +### 4.1 {Primary Flow} (e.g., "User Request Lifecycle") + +```mermaid +sequenceDiagram + participant U as User + participant A as {Component A} + participant B as {Component B} + participant E as {External System} + + U->>A: {action} + A->>B: {internal call} + B->>E: {external call} + E-->>B: {response} + B-->>A: {result} + A-->>U: {response} +``` + +### 4.2 {Secondary Flow} + + + +--- + +## 5. Storage Architecture + +### 5.1 Overview + + + +### 5.2 Schema Summary + + + +| Store | Entity | Key Fields | Purpose | +|-------|--------|-----------|---------| +| {store} | {entity} | {fields} | {purpose} | + +--- + +## 6. Deployment Model + +### 6.1 Deployment Diagram + +```mermaid +graph TD + subgraph "{Environment}" + A["{Component}"] --> B["{Component}"] + B --> C["{Store}"] + end +``` + +### 6.2 Infrastructure Requirements + +| Resource | Requirement | Notes | +|----------|-------------|-------| +| {resource} | {spec} | {notes} | + +--- + +## 7. Network Topology + + + +--- + +## 8. Security Architecture + +### 8.1 Trust Boundaries + + + +### 8.2 Authentication Flow + +```mermaid +sequenceDiagram + participant C as Client + participant S as Server + participant A as Auth Provider + + C->>S: Request + credentials + S->>A: Validate + A-->>S: Token/result + S-->>C: Authenticated response +``` + +### 8.3 Data Encryption + +| Data | At Rest | In Transit | +|------|---------|-----------| +| {data type} | {method} | {method} | diff --git a/.agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md b/.agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md new file mode 100644 index 00000000..268a12e1 --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md @@ -0,0 +1,117 @@ +# {Project Name} — Behaviors and Expectations + +**Version**: 1.0.0-draft +**Status**: Draft +**Last Updated**: {date} + +This document defines the behavioral contract for {project name}. Every behavior described here maps to one or more testable scenarios in the [BDD feature files](features/). + +--- + +## Table of Contents + +1. [Introduction](#1-introduction) +2. [Desired Behaviors](#2-desired-behaviors) +3. [Undesired Behaviors](#3-undesired-behaviors) +4. [Edge Cases](#4-edge-cases) +5. [Performance Expectations](#5-performance-expectations) +6. [Guardrail Definitions](#6-guardrail-definitions) + +--- + +## 1. Introduction + +### 1.1 Purpose + +This document is the behavioral specification for {project name}. It defines what the system should do, what it must not do, how it handles edge cases, and what performance it must achieve. + +It serves as: +- A contract between the product and engineering teams +- The source of truth for BDD feature file scenarios +- A test oracle for integration and adversarial testing + +### 1.2 How to Read This Document + +**Desired behaviors** (Section 2) follow this format: +- **Trigger**: What user action or system state initiates the behavior +- **Expected**: What the system should do +- **Rationale**: Why this behavior matters + +**Undesired behaviors** (Section 3) add: +- **Risk**: What goes wrong if this behavior occurs + +**Edge cases** (Section 4) describe unusual or boundary scenarios with expected handling. + +**Cross-references**: Section numbers reference [SPEC.md](SPEC.md) where the underlying system is defined. + +--- + +## 2. Desired Behaviors + + + +### 2.1 {Feature Area 1} + +#### B-2.1.1: {Behavior Name} + +**Trigger**: {What initiates this behavior} +**Expected**: {What the system does in response} +**Rationale**: {Why this matters} + +#### B-2.1.2: {Behavior Name} + +**Trigger**: ... +**Expected**: ... +**Rationale**: ... + +### 2.2 {Feature Area 2} + + + +--- + +## 3. Undesired Behaviors + + + +### 3.1 {Category} + +#### U-3.1.1: {Undesired Behavior Name} + +**Trigger**: {What could cause this} +**Expected**: {What should happen instead} +**Risk**: {What goes wrong if this occurs} + +--- + +## 4. Edge Cases + + + +### 4.1 {Category} + +#### E-4.1.1: {Edge Case Name} + +**Scenario**: {Description of the unusual situation} +**Expected Handling**: {What the system does} +**Rationale**: {Why this handling was chosen} + +--- + +## 5. Performance Expectations + +| Behavior | Target | Measurement | Degradation Handling | +|----------|--------|-------------|---------------------| +| {behavior} | {target} | {how measured} | {what happens if missed} | + +--- + +## 6. Guardrail Definitions + + + +### 6.1 {Guardrail Category} + +| Guardrail | Rule | Enforcement | Violation Response | +|-----------|------|-------------|-------------------| +| {name} | {rule} | {how enforced} | {what happens} | diff --git a/.agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md b/.agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md new file mode 100644 index 00000000..99744680 --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md @@ -0,0 +1,33 @@ +# {Project Name} — Developer Rules + +These rules are non-negotiable. Every contributor must follow them. + + diff --git a/.agents/skills/backend-service-spec-bundler/templates/SPEC.md b/.agents/skills/backend-service-spec-bundler/templates/SPEC.md new file mode 100644 index 00000000..9a472976 --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/templates/SPEC.md @@ -0,0 +1,220 @@ +# {Project Name} Technical Specification + +**Version**: 1.0.0-draft +**Status**: Draft +**Last Updated**: {date} + +--- + +## Table of Contents + + + +1. [Introduction](#1-introduction) +2. [System Architecture](#2-system-architecture) +3. [Core Subsystems](#3-core-subsystems) +4. [API / Protocol Definition](#4-api--protocol-definition) +5. [Data Model](#5-data-model) +6. [Integration Points](#6-integration-points) +7. [Security Model](#7-security-model) +8. [Error Handling](#8-error-handling) +9. [Performance](#9-performance) +10. [Phased Rollout](#10-phased-rollout) +11. [Testing Strategy](#11-testing-strategy) + +--- + +## 1. Introduction + +### 1.1 Purpose + + + +This document is the authoritative technical specification for {project name}. It defines every subsystem, protocol, interface, and behavioral contract that the implementation must satisfy. + +### 1.2 Scope + + + +The system: +- {capability 1} +- {capability 2} +- ... + +The system does **not**: +- {anti-scope 1} +- {anti-scope 2} +- ... + +### 1.3 Terminology & Glossary + + + +| Term | Definition | +|------|-----------| +| **{Term}** | {Definition} | + +### 1.4 System Constraints + + + +| Constraint | Detail | +|-----------|--------| +| **{Constraint}** | {Detail and impact} | + +--- + +## 2. System Architecture + +### 2.1 High-Level Component Diagram + + + +### 2.2 Module Decomposition + +| Module | Purpose | Key Dependencies | +|--------|---------|-----------------| +| {module} | {purpose} | {deps} | + +### 2.3 Request/Response Lifecycle + + + +--- + +## 3. Core Subsystems + + + +### 3.1 {Subsystem A} + +#### 3.1.1 Purpose +#### 3.1.2 Inputs & Outputs +#### 3.1.3 Logic +#### 3.1.4 Configuration +#### 3.1.5 Error States + +### 3.2 {Subsystem B} + + + +--- + +## 4. API / Protocol Definition + + + +### 4.1 {Interface Type} (e.g., REST API, gRPC, CLI) + +| Endpoint / Command | Method | Description | Request | Response | +|-------------------|--------|-------------|---------|----------| +| {endpoint} | {method} | {desc} | {schema} | {schema} | + +### 4.2 Authentication & Authorization + + + +### 4.3 Rate Limiting & Quotas + +--- + +## 5. Data Model + +### 5.1 Storage Architecture + + + +### 5.2 Schema Definitions + + + +### 5.3 Data Lifecycle + + + +--- + +## 6. Integration Points + + + +| System | Protocol | Purpose | Failure Mode | +|--------|----------|---------|-------------| +| {system} | {protocol} | {purpose} | {what happens when it's down} | + +--- + +## 7. Security Model + +### 7.1 Threat Model + + + +### 7.2 Authentication + +### 7.3 Input Validation & Sanitization + +### 7.4 Data Protection + +--- + +## 8. Error Handling + +### 8.1 Error Categories + +| Category | Example | Handling | +|----------|---------|---------| +| {category} | {example} | {how it's handled} | + +### 8.2 Error Response Format + +### 8.3 Retry & Recovery + +--- + +## 9. Performance + +### 9.1 Targets + +| Metric | Target | Measurement | +|--------|--------|-------------| +| {metric} | {target} | {how measured} | + +### 9.2 Bottlenecks & Mitigations + +--- + +## 10. Phased Rollout + + + +### Phase 1: {name} +- Scope: {what's included} +- Success criteria: {how to know it's done} + +### Phase 2: {name} +- Scope: ... +- Depends on: Phase 1 completion + +--- + +## 11. Testing Strategy + +### 11.1 Test Levels + +| Level | Tool | What It Covers | +|-------|------|---------------| +| Unit | {framework} | Individual functions/methods | +| Integration | {framework} | Subsystem interactions | +| BDD | Cucumber/Gherkin | Behavioral scenarios from B&E doc | +| Performance | {tool} | Latency, throughput targets | + +### 11.2 Test Data Strategy + +### 11.3 CI/CD Integration diff --git a/.agents/skills/backend-service-spec-bundler/templates/feature.feature b/.agents/skills/backend-service-spec-bundler/templates/feature.feature new file mode 100644 index 00000000..d020f5a0 --- /dev/null +++ b/.agents/skills/backend-service-spec-bundler/templates/feature.feature @@ -0,0 +1,35 @@ +@bdd +Feature: {Feature Name} + As a {actor/role} + I want {capability} + So that {benefit} + + # References: SPEC Section {N} ({Section Name}), B&E Section {M} ({Section Name}) + + Background: + Given {common precondition for all scenarios} + + @phase1 @fast + Scenario: {Happy path scenario name} + Given {context/setup} + When {action the user takes} + Then {expected outcome} + And {additional assertion} + + @phase1 @fast + Scenario: {Validation/error scenario name} + Given {context/setup} + When {action with invalid input or edge case} + Then {expected error handling} + And {system remains in valid state} + + @phase1 + Scenario Outline: {Parameterized scenario name} + Given {context with } + When {action with } + Then {expected } + + Examples: + | parameter | input | outcome | + | {val1} | {val1} | {val1} | + | {val2} | {val2} | {val2} | diff --git a/.claude/skills/backend-service-spec-bundler b/.claude/skills/backend-service-spec-bundler new file mode 120000 index 00000000..cb158e36 --- /dev/null +++ b/.claude/skills/backend-service-spec-bundler @@ -0,0 +1 @@ +../../.agents/skills/backend-service-spec-bundler \ No newline at end of file diff --git a/docs/specs/ARCHITECTURE.md b/docs/specs/ARCHITECTURE.md new file mode 100644 index 00000000..a6405da0 --- /dev/null +++ b/docs/specs/ARCHITECTURE.md @@ -0,0 +1,966 @@ +# Obol Stack -- Architecture Document + +> **Version:** 1.0.0 +> **Date:** 2026-03-27 +> **Companion to:** [SPEC.md](./SPEC.md) +> **Audience:** Seasoned developers, agentic workflows, and system integrators. + +--- + +## Table of Contents + +1. [Design Philosophy](#1-design-philosophy) +2. [C4 Diagrams](#2-c4-diagrams) +3. [Module Decomposition](#3-module-decomposition) +4. [Data Flow Diagrams](#4-data-flow-diagrams) +5. [Storage Architecture](#5-storage-architecture) +6. [Deployment Model](#6-deployment-model) +7. [Network Topology](#7-network-topology) +8. [Security Architecture](#8-security-architecture) + +--- + +## 1. Design Philosophy + +Five guiding principles govern every architectural decision in Obol Stack. They are listed in order of precedence -- when principles conflict, the higher-numbered principle yields to the lower. + +### 1.1 Local-First Sovereignty + +The operator's machine is the source of truth. All infrastructure runs inside a local k3d/k3s cluster, all state lives on the local filesystem under XDG-compliant paths, and no cloud account is required to start. Public exposure (Cloudflare tunnels) is opt-in and layered on top, never a prerequisite. This ensures the operator retains full custody of keys, models, and data at all times. + +*SPEC cross-ref: Section 1.3 (System Constraints), Section 2.3 (Configuration Hierarchy).* + +### 1.2 Configuration-Driven Infrastructure + +Infrastructure is declared, not scripted. Two-stage templating (CLI flags to Go templates to Helmfile to Kubernetes manifests) ensures that every deployed resource traces back to a versioned configuration file. Embedded assets (`internal/embed/`) ship default configurations; operators override via flags or values files. Helmfile is the single deployment orchestrator -- there are no imperative `kubectl apply` calls in the steady-state path. + +*SPEC cross-ref: Section 3.3.3 (Two-Stage Templating), Section 2.1 (High-Level Overview).* + +### 1.3 Payment-Gated by Default + +Every publicly exposed service is protected by x402 micropayments unless explicitly exempted. The ForwardAuth pattern means Traefik itself enforces payment before traffic ever reaches the upstream. This is not an afterthought bolt-on -- the payment gate is a first-class infrastructure primitive deployed alongside the service via the ServiceOffer reconciliation loop. + +*SPEC cross-ref: Section 3.4 (Monetize -- Sell Side), Section 4.1 (x402 Payment Protocol).* + +### 1.4 Bounded Trust, Bounded Spending + +The system minimizes trust surfaces at every layer. The buy-side sidecar has zero signer access; it can only spend pre-signed vouchers, bounding maximum loss to N * price. The sell-side verifier delegates to an external facilitator for settlement, never holding funds. Wallet private keys live in encrypted keystores or hardware enclaves, accessed only through a remote-signer REST API. RBAC scopes the agent to exactly the Kubernetes verbs it needs. + +*SPEC cross-ref: Section 7.2 (Payment Security), Section 7.3 (Wallet Security), Section 7.5 (RBAC).* + +### 1.5 Progressive Disclosure + +A single `obol stack up` gives operators a working cluster with auto-configured LLM routing, an AI agent, and local blockchain access. Advanced features -- selling inference, buying remote models, on-chain registration, Secure Enclave keys -- are activated incrementally through explicit CLI commands. Failures in optional subsystems (Ollama down, no cloud API key, tunnel unavailable) degrade gracefully with warnings, never blocking the core startup path. + +*SPEC cross-ref: Section 3.1.3 (Startup Sequence), Section 8.2 (Graceful Degradation).* + +--- + +## 2. C4 Diagrams + +### 2.1 Context Diagram (Level 1) + +The system boundary is the operator's machine. External systems interact via well-defined protocols. + +```mermaid +C4Context + title Obol Stack -- System Context + + Person(operator, "Operator", "Manages cluster via obol CLI") + Person(buyer, "Remote Buyer", "Purchases inference via x402") + + System(obol, "Obol Stack", "Local k3d/k3s cluster with AI agent,
payment-gated inference, blockchain networks") + + System_Ext(cloudflare, "Cloudflare", "Tunnel service for public exposure") + System_Ext(facilitator, "x402 Facilitator", "Payment verification and settlement
(facilitator.x402.rs)") + System_Ext(base, "Base L2", "ERC-8004 identity registry,
USDC settlement (Base Sepolia / Mainnet)") + System_Ext(ollama, "Ollama", "Local LLM inference engine
(host process)") + System_Ext(chainlist, "ChainList API", "Public RPC endpoint discovery") + System_Ext(cloud_llm, "Cloud LLM Providers", "Anthropic, OpenAI APIs") + + Rel(operator, obol, "Manages via CLI") + Rel(buyer, obol, "x402 payments over HTTPS") + Rel(obol, cloudflare, "HTTPS/QUIC tunnel") + Rel(obol, facilitator, "HTTPS POST /verify") + Rel(obol, base, "JSON-RPC (contract calls)") + Rel(obol, ollama, "HTTP /api/tags, /v1/*") + Rel(obol, chainlist, "HTTPS GET") + Rel(obol, cloud_llm, "HTTPS /v1/*") +``` + +### 2.2 Container Diagram (Level 2) + +Inside the k3d cluster, containers are organized by namespace. Each namespace represents a deployment unit with distinct responsibilities. + +```mermaid +C4Container + title Obol Stack -- Container Diagram (k3d Cluster) + + Person(operator, "Operator") + Person(buyer, "Remote Buyer") + + System_Boundary(cluster, "k3d / k3s Cluster") { + + Container(traefik, "Traefik Gateway", "Gateway API", "Ingress controller.
Routes local and public traffic.
ForwardAuth to x402-verifier.") + Container(cloudflared, "cloudflared", "Cloudflare Tunnel", "Exposes public routes
to the internet.") + Container(storefront, "Storefront", "busybox httpd", "Static landing page
at tunnel hostname root.") + + Container(litellm, "LiteLLM", "Python, port 4000", "OpenAI-compatible proxy.
Routes to Ollama, cloud,
or paid/* via sidecar.") + Container(x402_buyer, "x402-buyer", "Go sidecar, port 8402", "Buy-side payment attachment.
Pre-signed ERC-3009 auths.
Runs in LiteLLM Pod.") + + Container(x402_verifier, "x402-verifier", "Go, port 8080", "ForwardAuth middleware.
Route matching, 402 responses,
facilitator delegation.") + + Container(agent, "OpenClaw Agent", "Python", "AI agent singleton.
Skills via PVC injection.
monetize.py reconciler.") + Container(remote_signer, "Remote Signer", "REST API, port 9000", "Keystore-backed signing.
In-namespace access only.") + + Container(erpc, "eRPC", "Go, port 4000", "Blockchain RPC gateway.
Multiplexes upstreams,
caches eth_call.") + Container(frontend, "Frontend", "React, nginx", "Dashboard UI.
Local-only (obol.stack).") + Container(prometheus, "Prometheus", "Monitoring", "Metrics collection.
ServiceMonitor + PodMonitor.") + } + + System_Ext(ollama, "Ollama (Host)") + System_Ext(facilitator, "x402 Facilitator") + System_Ext(internet, "Public Internet") + + Rel(operator, traefik, "HTTP :80 / :443") + Rel(internet, cloudflared, "HTTPS/QUIC") + Rel(cloudflared, traefik, "HTTP") + Rel(buyer, cloudflared, "HTTPS") + + Rel(traefik, x402_verifier, "ForwardAuth POST /verify") + Rel(traefik, litellm, "/services/* (after 200)") + Rel(traefik, frontend, "/ (obol.stack)") + Rel(traefik, erpc, "/rpc (obol.stack)") + Rel(traefik, storefront, "/ (tunnel hostname)") + + Rel(litellm, ollama, "ollama/* models") + Rel(litellm, x402_buyer, "paid/* models :8402") + Rel(x402_buyer, internet, "x402 payment + request") + Rel(x402_verifier, facilitator, "POST /verify") + + Rel(agent, litellm, "Inference requests") + Rel(agent, remote_signer, "Sign transactions :9000") +``` + +### 2.3 Component Diagram -- Monetize Subsystem (Level 3) + +The monetize subsystem is the most architecturally complex part of Obol Stack. It spans the CLI, a Kubernetes CRD, a Python reconciler, Traefik middleware, and the x402-verifier. + +```mermaid +C4Component + title Monetize Subsystem -- Component Diagram + + Container_Boundary(cli_boundary, "obol CLI") { + Component(sell_cmd, "sell.go", "urfave/cli", "Parses flags, validates input,
creates ServiceOffer CR,
triggers tunnel activation.") + Component(schemas_pkg, "schemas/", "Go", "ServiceOffer struct definitions,
payment validation,
price approximation.") + } + + Container_Boundary(agent_boundary, "OpenClaw Agent Pod") { + Component(reconciler, "monetize.py", "Python", "6-stage reconciliation loop.
Watches ServiceOffer CRs.
Creates child resources with
ownerReferences for GC.") + } + + Container_Boundary(k8s_boundary, "Kubernetes API") { + Component(serviceoffer_crd, "ServiceOffer CRD", "obol.org/v1alpha1", "Declarative sell-side API.
Spec: type, model, upstream,
payment, path, registration.") + Component(middleware, "Traefik Middleware", "traefik.io", "ForwardAuth middleware
pointing to x402-verifier.") + Component(httproute, "HTTPRoute", "gateway.networking.k8s.io", "Public route:
/services//*
No hostname restriction.") + Component(pricing_cm, "x402-pricing ConfigMap", "x402 ns", "Route rules: pattern,
price, wallet, chain.") + Component(registration, "Registration Resources", "traefik ns", "ConfigMap + httpd + HTTPRoute
for /.well-known/ and /skill.md") + } + + Container_Boundary(verifier_boundary, "x402-verifier") { + Component(verifier_core, "Verifier", "Go", "ForwardAuth handler.
Route matching, 402 generation,
facilitator delegation.") + Component(watcher, "WatchConfig", "Go", "Polls pricing YAML every 5s.
Atomic config swap on change.") + Component(matcher, "Matcher", "Go", "First-match route resolution.
Exact, prefix, glob patterns.") + } + + Rel(sell_cmd, serviceoffer_crd, "kubectl apply") + Rel(sell_cmd, schemas_pkg, "Validate + build CR") + Rel(reconciler, serviceoffer_crd, "Watch (10s loop)") + Rel(reconciler, middleware, "Stage 3: Create") + Rel(reconciler, pricing_cm, "Stage 3: Patch routes[]") + Rel(reconciler, httproute, "Stage 4: Create") + Rel(reconciler, registration, "Stage 5: Create") + Rel(watcher, pricing_cm, "Poll mtime (5s)") + Rel(watcher, verifier_core, "Atomic Reload()") + Rel(verifier_core, matcher, "Match URI to route") +``` + +--- + +## 3. Module Decomposition + +Every Go package, its purpose, key dependencies, and SPEC cross-references. + +### 3.1 CLI Layer + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `cmd/obol` | CLI entry point and command definitions | `main.go`, `sell.go`, `network.go`, `openclaw.go`, `model.go`, `bootstrap.go`, `update.go` | `urfave/cli/v3`, all `internal/` packages | 4.2 | + +### 3.2 Core Infrastructure + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `internal/config` | XDG-compliant configuration resolution | `config.go` | (stdlib only) | 2.3 | +| `internal/stack` | Cluster lifecycle (init, up, down, purge) | `stack.go`, `backend.go`, `backend_k3d.go`, `backend_k3s.go` | `config`, `embed`, `model`, `openclaw`, `tunnel`, `agent`, `dns` | 3.1 | +| `internal/embed` | Embedded assets (infrastructure, networks, skills) | `embed.go` | `embed` (stdlib) | 2.1, 3.6.2 | +| `internal/kubectl` | Kubernetes API wrapper with auto-KUBECONFIG | `kubectl.go` | `config` | 2.3 | +| `internal/ui` | Terminal UI (spinners, prompts, branded output) | `ui.go`, `spinner.go`, `prompt.go`, `brand.go`, `errors.go`, `exec.go`, `output.go`, `suggest.go` | (stdlib, terminal libs) | 8.1 | +| `internal/version` | Build version information | `version.go` | (stdlib only) | -- | +| `internal/update` | Self-update and dependency management | `update.go`, `github.go`, `charts.go`, `hint.go` | `version` | -- | +| `internal/dns` | Local DNS resolver for `obol.stack` hostname | `resolver.go` | (stdlib only) | 3.7.5 | + +### 3.3 LLM and Inference + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `internal/model` | LiteLLM gateway configuration and provider management | `model.go` | `config`, `kubectl` | 3.2 | +| `internal/inference` | Standalone x402 inference gateway (bare metal / VM) | `gateway.go`, `container.go`, `store.go`, `client.go`, `enclave_middleware.go` | `enclave`, `tee`, `x402` | 3.9 | +| `internal/enclave` | Apple Secure Enclave key management (P-256, ECIES) | `enclave.go`, `enclave_darwin.go`, `enclave_stub.go`, `ecies.go`, `sysctl_darwin.go` | (CGo / Security.framework on macOS) | 3.9.6 | +| `internal/tee` | TEE attestation (TDX, SNP, Nitro) and key management | `tee.go`, `key.go`, `coco.go`, `verify.go`, `attest_*.go` | (platform-specific) | 3.9.6, 7.4 | + +### 3.4 Monetize (Sell Side) + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `internal/x402` | x402-verifier: ForwardAuth, route matching, config hot-reload | `verifier.go`, `config.go`, `matcher.go`, `watcher.go`, `setup.go`, `validate.go`, `metrics.go` | `x402-go` (library) | 3.4.4, 3.4.5 | +| `internal/schemas` | ServiceOffer CRD types, payment validation, pricing math | `serviceoffer.go`, `payment.go`, `registration.go` | (stdlib only) | 3.4.3, 5.3 | +| `internal/embed/skills/monetize/` | Python reconciler (`monetize.py`) for 6-stage ServiceOffer reconciliation | `monetize.py`, `SKILL.md` | Kubernetes Python client | 3.4.2 | + +### 3.5 Monetize (Buy Side) + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `internal/x402/buyer` | x402-buyer sidecar: reverse proxy, pre-signed auth pool, state tracking | `proxy.go`, `signer.go`, `config.go`, `state.go`, `metrics.go` | `x402-go` | 3.5 | +| `cmd/x402-buyer` | Sidecar binary entry point | `main.go` | `internal/x402/buyer` | 3.5 | +| `internal/embed/skills/buy-inference/` | Agent skill for discovery and purchasing remote inference | `SKILL.md`, `scripts/buy.py` | Python, Kubernetes client | 3.5.2 | + +### 3.6 Identity and Blockchain + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `internal/erc8004` | ERC-8004 Identity Registry client (register, metadata, URI) | `client.go`, `types.go`, `abi.go` | `go-ethereum` | 3.8 | +| `internal/network` | Blockchain RPC gateway management (eRPC, ChainList, local nodes) | `network.go`, `erpc.go`, `rpc.go`, `chainlist.go`, `resolve.go`, `parser.go` | `config`, `kubectl`, `embed` | 3.3 | + +### 3.7 Agent and Tunnel + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `internal/openclaw` | OpenClaw agent deployment, wallet generation, version management | `openclaw.go`, `wallet.go`, `resolve.go` | `config`, `embed`, `kubectl` | 3.6 | +| `internal/agent` | Agent RBAC patching and singleton management | `agent.go` | `kubectl` | 7.5 | +| `internal/tunnel` | Cloudflare tunnel lifecycle (quick/dns modes, storefront, URL propagation) | `tunnel.go`, `state.go`, `provision.go`, `cloudflare.go`, `login.go`, `agent.go`, `stackid.go` | `config`, `kubectl` | 3.7 | + +### 3.8 Applications + +| Module | Purpose | Key Files | Dependencies | SPEC Section | +|--------|---------|-----------|-------------|-------------| +| `internal/app` | Helm chart application management (install, sync, list, delete) | `app.go`, `chart.go`, `artifacthub.go`, `metadata.go`, `resolve.go` | `config`, `kubectl`, `embed` | 4.2 | + +--- + +## 4. Data Flow Diagrams + +### 4.1 Stack Initialization and Startup + +This diagram traces the full lifecycle from `obol stack init` through `obol stack up` to a running cluster with all services operational. + +```mermaid +sequenceDiagram + participant Op as Operator + participant CLI as obol CLI + participant Cfg as internal/config + participant Emb as internal/embed + participant Back as Backend (k3d/k3s) + participant HF as Helmfile + participant LLM as autoConfigureLLM + participant OC as OpenClaw Setup + participant Ag as agent.Init + participant Tun as Tunnel + + Note over Op,Tun: Phase 1: Initialization (obol stack init) + + Op->>CLI: obol stack init [--backend k3d] + CLI->>Cfg: Resolve paths (XDG / env / dev mode) + CLI->>CLI: Generate petname cluster ID + CLI->>CLI: Persist .stack-id, .stack-backend + CLI->>Back: Init(cfg, stackID) + Note over Back: Generate k3d.yaml / k3s config
Resolve Ollama host for backend + CLI->>Emb: Copy infrastructure defaults to $CONFIG_DIR/defaults/ + Note over Emb: Template substitution:
{{OLLAMA_HOST}}, {{OLLAMA_HOST_IP}}, {{CLUSTER_ID}} + + Note over Op,Tun: Phase 2: Cluster Startup (obol stack up) + + Op->>CLI: obol stack up + CLI->>Back: Up(cfg, stackID) + Back-->>CLI: kubeconfig bytes + CLI->>CLI: Write $CONFIG_DIR/kubeconfig.yaml + + Note over Op,Tun: Phase 3: Infrastructure Deployment + + CLI->>HF: syncDefaults() -- helmfile sync + Note over HF: Deploys in order:
1. Traefik (GatewayClass + Gateway)
2. eRPC
3. LiteLLM + x402-buyer sidecar
4. x402-verifier
5. Monitoring (Prometheus)
6. Frontend
7. cloudflared
8. ServiceOffer CRD + RBAC + + Note over Op,Tun: Phase 4: Auto-Configuration + + CLI->>LLM: autoConfigureLLM() + LLM->>LLM: Query Ollama /api/tags (host) + LLM->>LLM: Detect cloud API keys (env vars) + LLM->>LLM: Read ~/.openclaw/openclaw.json (agent model) + LLM->>LLM: Patch litellm-config ConfigMap + LLM->>LLM: Patch litellm-secrets Secret + LLM->>LLM: Single LiteLLM restart + + CLI->>OC: SetupDefault() + Note over OC: Deploy singleton agent
Inject skills PVC
($DATA_DIR/openclaw-/openclaw-data/) + + CLI->>Ag: agent.Init() -- patchMonetizeBinding() + Note over Ag: Ensure ClusterRoleBinding subjects
include openclaw SA + + Note over Op,Tun: Phase 5: Tunnel Activation + + CLI->>Tun: Check tunnel state ($CONFIG_DIR/tunnel/cloudflared.json) + alt DNS tunnel provisioned (persistent hostname) + CLI->>Tun: EnsureRunning() + Tun->>Tun: Propagate URL to agent, frontend, storefront + else Quick tunnel (default) + Note over Tun: Dormant -- activates on first obol sell + end + + CLI-->>Op: Stack ready +``` + +*SPEC cross-ref: Section 3.1.2 (Operations), Section 3.1.3 (Startup Sequence), Section 3.2.4 (Logic).* + +### 4.2 Sell-Side: ServiceOffer Creation to Public Route + +This traces the complete path from an operator running `obol sell http` through the 6-stage reconciliation loop to a publicly accessible, payment-gated route. + +```mermaid +sequenceDiagram + participant Op as Operator + participant CLI as obol sell http + participant Val as schemas/ (validation) + participant K8s as Kubernetes API + participant Tun as Tunnel + participant Rec as monetize.py (Reconciler) + participant Ver as x402-verifier + participant TF as Traefik + participant Reg as ERC-8004 Registry + + Op->>CLI: obol sell http myapi --wallet 0x... --chain base-sepolia --price 0.001 --upstream svc --port 8080 --namespace ns + + CLI->>Val: Validate chain, price, wallet, upstream + Val-->>CLI: ServiceOffer struct + + CLI->>K8s: Create ServiceOffer CR (openclaw-obol-agent ns) + CLI->>Tun: EnsureTunnelForSell() + Note over Tun: Start quick tunnel if dormant
or verify DNS tunnel running + + Note over Rec: Reconciliation loop runs every 10 seconds + + rect rgb(240, 248, 255) + Note over Rec: Stage 1: ModelReady + Rec->>K8s: Read ServiceOffer spec.model + Rec->>Rec: Validate model exists (inference type)
or skip (HTTP type) + Rec->>K8s: Update condition ModelReady=True + end + + rect rgb(240, 255, 240) + Note over Rec: Stage 2: UpstreamHealthy + Rec->>K8s: GET upstream.service:port/healthPath + Rec->>K8s: Update condition UpstreamHealthy=True + end + + rect rgb(255, 248, 240) + Note over Rec: Stage 3: PaymentGateReady + Rec->>K8s: Create Traefik Middleware (ForwardAuth -> x402-verifier) + Rec->>K8s: Patch x402-pricing ConfigMap (add route rule) + Note over Ver: WatchConfig detects mtime change (5s poll) + Ver->>Ver: Atomic Reload() with new routes + Rec->>K8s: Update condition PaymentGateReady=True + end + + rect rgb(248, 240, 255) + Note over Rec: Stage 4: RoutePublished + Rec->>K8s: Create HTTPRoute /services/myapi/* (no hostname restriction) + Note over TF: Route live: /services/myapi/* -> ForwardAuth -> upstream + Rec->>K8s: Update condition RoutePublished=True + end + + rect rgb(255, 255, 240) + Note over Rec: Stage 5: Registered + Rec->>K8s: Create registration ConfigMap (agent-registration.json) + Rec->>K8s: Create httpd Deployment + Service + Rec->>K8s: Create HTTPRoute for /.well-known/ and /skill.md + Rec->>Reg: Mint ERC-8004 NFT (via remote-signer) + Rec->>K8s: Update condition Registered=True, status.agentId + end + + rect rgb(240, 255, 255) + Note over Rec: Stage 6: Ready + Rec->>K8s: Set status.endpoint = tunnel_url/services/myapi + Rec->>K8s: Update condition Ready=True + end + + Op->>Op: obol sell status myapi -> Ready +``` + +*SPEC cross-ref: Section 3.4.2 (Sell-Side Flow), Section 3.4.3 (ServiceOffer CRD), Section 3.4.4 (x402-verifier).* + +### 4.3 Buy-Side: Discovery to Paid Inference + +This traces the agent's journey from discovering a remote seller to making paid inference requests through the local LiteLLM gateway. + +```mermaid +sequenceDiagram + participant Agent as OpenClaw Agent + participant Buy as buy.py + participant Seller as Remote Seller + participant K8s as Kubernetes API + participant LiteLLM as LiteLLM :4000 + participant Sidecar as x402-buyer :8402 + + Note over Agent,Sidecar: Phase 1: Discovery + + Agent->>Buy: buy.py probe + Buy->>Seller: GET /services//v1/models + Seller-->>Buy: 402 PaymentRequired + PaymentRequirements JSON + Note over Buy: Extract: price, wallet, chain, asset,
available models + + Buy-->>Agent: Probe results (price, models) + + Note over Agent,Sidecar: Phase 2: Purchase (Pre-sign Authorizations) + + Agent->>Buy: buy.py buy --seller --model --count N + Buy->>Buy: Generate N random nonces (32 bytes each) + Buy->>Buy: Sign N ERC-3009 TransferWithAuthorization
(via remote-signer at :9000) + Note over Buy: Each auth: {from, to, value, validAfter,
validBefore, nonce, signature} + + Buy->>K8s: Create/Patch x402-buyer-config ConfigMap
(upstream URL, model, chain, price) + Buy->>K8s: Create/Patch x402-buyer-auths ConfigMap
(array of pre-signed auths) + + Note over Sidecar: Config watcher detects change
Mutex-guarded Reload() rebuilds handlers + + Note over Agent,Sidecar: Phase 3: Paid Inference + + Agent->>LiteLLM: POST /v1/chat/completions
model: "paid/" + LiteLLM->>LiteLLM: Route paid/* -> openai/* -> :8402/v1 + LiteLLM->>Sidecar: POST /v1/chat/completions
model: "" + + Sidecar->>Sidecar: Resolve model -> upstream handler + Sidecar->>Seller: POST /services//v1/chat/completions + Seller-->>Sidecar: 402 PaymentRequired + + Sidecar->>Sidecar: Pop pre-signed auth from pool (mutex) + Sidecar->>Sidecar: Build X-PAYMENT header (base64 PaymentPayload) + Sidecar->>Seller: Retry with X-PAYMENT header + Seller-->>Sidecar: 200 OK + inference response + + Sidecar->>Sidecar: Mark nonce consumed (onConsume callback) + Sidecar-->>LiteLLM: 200 OK + LiteLLM-->>Agent: Chat completion response +``` + +*SPEC cross-ref: Section 3.5.2 (Buy-Side Flow), Section 3.5.3 (Architecture), Section 3.5.5 (Model Resolution).* + +### 4.4 Payment Flow: x402 Request Lifecycle + +This is the canonical request-level flow for a client paying for access to a service. It shows the interplay between Traefik, the verifier, the facilitator, and the upstream. + +```mermaid +sequenceDiagram + participant Client as Client / Buyer + participant TF as Traefik Gateway + participant Ver as x402-verifier + participant Match as Route Matcher + participant Fac as x402 Facilitator + participant Chain as Base L2 (USDC) + participant Up as Upstream Service + + Client->>TF: GET /services/myapi/data + + TF->>Ver: POST /verify
X-Forwarded-Uri: /services/myapi/data
X-Forwarded-Method: GET + + Ver->>Match: Match("/services/myapi/data") + Match-->>Ver: RouteRule{price: "1000", wallet: "0x...", chain: "base-sepolia"} + + alt No X-PAYMENT header + Ver-->>TF: 402 Payment Required + Note over Ver: Response body:
{x402Version: 1, accepts: [{
scheme: "exact",
network: "eip155:84532",
maxAmountRequired: "1000",
payTo: "0x...",
asset: "0x036C..." (USDC)
}]} + TF-->>Client: 402 + PaymentRequirements + + Note over Client: Sign ERC-3009
TransferWithAuthorization
(EIP-712 typed data) + + Client->>TF: GET /services/myapi/data
X-PAYMENT: base64(PaymentPayload) + TF->>Ver: POST /verify (with X-PAYMENT) + end + + Ver->>Ver: Decode X-PAYMENT (base64 -> JSON) + Ver->>Fac: POST /verify
{payload, paymentRequirements} + + alt Facilitator: verify only mode + Fac->>Fac: Validate signature (EIP-712) + Fac->>Fac: Check authorization fields + Fac-->>Ver: {valid: true, settled: false} + else Facilitator: verify + settle + Fac->>Chain: Submit TransferWithAuthorization tx + Chain-->>Fac: Tx confirmed + Fac-->>Ver: {valid: true, settled: true, txHash: "0x..."} + end + + Ver-->>TF: 200 OK + Note over Ver: Sets Authorization header
if route has upstreamAuth + + TF->>Up: GET /data (+ Authorization header) + Up-->>TF: 200 OK + response body + TF-->>Client: 200 OK + response +``` + +*SPEC cross-ref: Section 4.1.1 (Request Flow), Section 4.1.2 (PaymentRequired Response), Section 4.1.3 (PaymentPayload).* + +--- + +## 5. Storage Architecture + +### 5.1 Filesystem State + +All persistent state lives under three XDG-compliant directory trees. In development mode (`OBOL_DEVELOPMENT=true`), these collapse into `.workspace/`. + +``` +$OBOL_CONFIG_DIR/ # ~/.config/obol or .workspace/config +├── .stack-id # Cluster petname (e.g., "fluffy-penguin") +├── .stack-backend # "k3d" or "k3s" +├── kubeconfig.yaml # Kubernetes API access +├── tunnel/ +│ └── cloudflared.json # Tunnel state (mode, hostname, IDs) +├── defaults/ # Embedded infrastructure (templated) +│ ├── helmfile.yaml +│ ├── base/templates/*.yaml +│ ├── cloudflared/ +│ └── values/ +└── networks/// # Per-network deployment configs + ├── helmfile.yaml + └── values.yaml + +$OBOL_DATA_DIR/ # ~/.local/share/obol or .workspace/data +├── openclaw-/ +│ ├── openclaw-data/ +│ │ └── .openclaw/skills/ # 23 embedded skills (host-path PVC) +│ └── keystore/ # Web3 V3 encrypted keystores +└── local-path-provisioner/ # k3s PVC backing store (root-owned) + +$OBOL_BIN_DIR/ # ~/.local/bin or .workspace/bin +└── obol # CLI binary +``` + +*SPEC cross-ref: Section 2.3 (Configuration Hierarchy), Section 5.1 (Configuration Files).* + +### 5.2 Kubernetes ConfigMaps + +ConfigMaps are the primary in-cluster configuration mechanism. They serve as the control plane for runtime behavior changes without Pod restarts (where hot-reload is supported). + +| ConfigMap | Namespace | Key(s) | Purpose | Hot-Reload | +|-----------|-----------|--------|---------|-----------| +| `litellm-config` | `llm` | `config.yaml` | LiteLLM model_list, routing rules | No (restart required) | +| `x402-pricing` | `x402` | `pricing.yaml` | Verifier route rules, wallet, chain, facilitator URL | Yes (5s poll) | +| `x402-buyer-config` | `llm` | `config.json` | Buyer upstream definitions (URL, model, chain, price) | Yes (mutex reload) | +| `x402-buyer-auths` | `llm` | `auths.json` | Pre-signed ERC-3009 authorization pools | Yes (mutex reload) | +| `erpc-config` | `erpc` | `erpc.yaml` | RPC projects, networks, upstreams | No (restart required) | +| `obol-stack-config` | `obol-frontend` | `config.json` | Frontend dashboard configuration (tunnel URL) | Yes (volume mount) | +| `tunnel-storefront` | `traefik` | `index.html`, `mime.types` | Static HTML landing page content | Yes (volume mount) | + +### 5.3 Kubernetes Secrets + +| Secret | Namespace | Key(s) | Purpose | +|--------|-----------|--------|---------| +| `litellm-secrets` | `llm` | `LITELLM_MASTER_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY` | LiteLLM authentication credentials | +| `x402-secrets` | `x402` | (verifier credentials) | Verifier operational secrets | +| `openclaw-wallet` | `openclaw-obol-agent` | Keystore JSON | Agent wallet encrypted private key | + +### 5.4 Persistent Volume Claims + +| PVC | Namespace | Backing | Purpose | Ownership | +|-----|-----------|---------|---------|----------| +| Skills PVC | `openclaw-obol-agent` | Host-path (`$DATA_DIR/openclaw-/openclaw-data/`) | Skill injection into agent container | Root-owned (k3s provisioner) | +| Local-path PVCs | Various | Host-path (`$DATA_DIR/local-path-provisioner/`) | Blockchain node data, application state | Root-owned (`purge -f` to remove) | + +### 5.5 Wallet Keystores + +Wallet state spans both the filesystem and Kubernetes: + +``` +Filesystem: + $DATA_DIR/openclaw-/keystore/UTC----

.json + +Kubernetes: + Secret/openclaw-wallet (openclaw-obol-agent ns) + └── keystore.json (same content, accessible to remote-signer Pod) + +Remote Signer: + Deployment/remote-signer (openclaw-obol-agent ns, port 9000) + └── Loads keystore from mounted Secret + └── REST API: POST /sign, GET /address +``` + +*SPEC cross-ref: Section 5.4 (Wallet), Section 7.3 (Wallet Security), Section 3.6.3 (Wallet Generation).* + +--- + +## 6. Deployment Model + +### 6.1 k3d Cluster Topology + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Host Machine │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────────────────────┐ │ +│ │ Ollama │ │ obol CLI │ │ Docker Desktop / Engine │ │ +│ │ :11434 │ │ │ │ │ │ +│ └──────────┘ └──────────┘ │ ┌──────────────────────┐ │ │ +│ │ │ k3d Cluster │ │ │ +│ │ │ (k3s v1.35.1-k3s1) │ │ │ +│ │ │ │ │ │ +│ │ │ 1 server node │ │ │ +│ │ │ Port mappings: │ │ │ +│ │ │ 80:80 (HTTP) │ │ │ +│ │ │ 8080:80 (HTTP alt) │ │ │ +│ │ │ 443:443 (HTTPS) │ │ │ +│ │ │ 8443:443 (HTTPS alt)│ │ │ +│ │ │ │ │ │ +│ │ └──────────────────────┘ │ │ +│ └──────────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Backend variants:** + +| Property | k3d (default) | k3s (bare metal) | +|----------|--------------|------------------| +| Runtime | Docker container | Direct k3s binary | +| Ollama access | `host.docker.internal` (macOS) / `host.k3d.internal` (Linux) | `127.0.0.1` (loopback) | +| Port binding | Docker port mapping | Direct host binding | +| Data isolation | Docker volumes + host-path mounts | Direct filesystem | +| Backend switch | Destroys old cluster automatically | Destroys old cluster automatically | + +*SPEC cross-ref: Section 2.4 (Backend Abstraction), Section 3.1.4 (Ollama Host Resolution).* + +### 6.2 Namespace Layout + +Each namespace represents a failure domain and RBAC boundary. Resources within a namespace share a security context. + +```mermaid +graph TB + subgraph "traefik" + GC[GatewayClass: traefik] + GW[Gateway: traefik-gateway
:80, :443] + CFD[Deployment: cloudflared] + SF[Deployment: tunnel-storefront] + SF_SVC[Service: tunnel-storefront :8080] + SF_CM[ConfigMap: tunnel-storefront] + SF_HR[HTTPRoute: tunnel-storefront] + end + + subgraph "llm" + LLMD[Deployment: litellm
Containers: litellm :4000, x402-buyer :8402] + LLC[ConfigMap: litellm-config] + LLS[Secret: litellm-secrets] + BC[ConfigMap: x402-buyer-config] + BA[ConfigMap: x402-buyer-auths] + end + + subgraph "x402" + VD[Deployment: x402-verifier :8080] + VP[ConfigMap: x402-pricing] + VS[Secret: x402-secrets] + VSM[ServiceMonitor: x402-verifier] + end + + subgraph "openclaw-obol-agent" + OC[Deployment: openclaw] + RS[Deployment: remote-signer :9000] + WS[Secret: openclaw-wallet] + end + + subgraph "erpc" + ED[Deployment: erpc :4000] + EC[ConfigMap: erpc-config] + end + + subgraph "obol-frontend" + FD[Deployment: frontend] + FC[ConfigMap: obol-stack-config] + end + + subgraph "monitoring" + PD[Prometheus Stack] + end + + subgraph "network-petname (dynamic)" + EX[Execution Layer :8545] + CL[Consensus Layer] + end + + subgraph "cluster-scoped" + CRD[CRD: ServiceOffer obol.org] + CR[ClusterRole: openclaw-monetize] + CRB[ClusterRoleBinding: openclaw-monetize-binding] + end +``` + +*SPEC cross-ref: Section 5.2 (Kubernetes Resources by Namespace).* + +--- + +## 7. Network Topology + +### 7.1 Traefik Gateway API Routing + +Traefik operates as the single ingress point using the Kubernetes Gateway API (not legacy Ingress). All traffic classification happens at the Gateway level based on hostname and path. + +```mermaid +flowchart TB + subgraph "External Traffic" + Internet((Public Internet)) + Local((Local Machine
obol.stack)) + end + + Internet --> CF[cloudflared
Tunnel Pod] + CF --> GW + + Local --> GW[Traefik Gateway
:80 / :443] + + GW --> ClassifyHostname{Hostname?} + + ClassifyHostname -->|"obol.stack"| LocalRoutes + ClassifyHostname -->|"* (any / tunnel)"| PublicRoutes + + subgraph LocalRoutes["Local-Only Routes (hostnames: obol.stack)"] + direction TB + LR1["/ -> Frontend"] + LR2["/rpc -> eRPC"] + end + + subgraph PublicRoutes["Public Routes (no hostname restriction)"] + direction TB + PR1["/services/* -> ForwardAuth -> Upstream"] + PR2["/.well-known/* -> ERC-8004 httpd"] + PR3["/skill.md -> Service Catalog"] + PR4["/ (tunnel host) -> Storefront"] + end + + PR1 --> FA{x402 ForwardAuth} + FA -->|"No X-PAYMENT"| R402[402 Payment Required] + FA -->|"Valid X-PAYMENT"| R200[200 -> Upstream Service] + FA -->|"No route match"| PASS[200 -> Pass Through] +``` + +### 7.2 Route Classification Rules + +| Route | Hostname Restriction | Protection | Target | HTTPRoute Location | +|-------|---------------------|-----------|--------|-------------------| +| `/` | `obol.stack` | None (local network) | Frontend | `obol-frontend` ns | +| `/rpc` | `obol.stack` | None (local network) | eRPC | `erpc` ns | +| `/services//*` | None (public) | x402 ForwardAuth | Upstream service | `openclaw-obol-agent` ns | +| `/.well-known/agent-registration.json` | None (public) | None (read-only) | ERC-8004 httpd | `traefik` ns | +| `/skill.md` | None (public) | None (read-only) | Service catalog httpd | `traefik` ns | +| `/` (tunnel hostname) | None (public) | None (static HTML) | Storefront httpd | `traefik` ns | + +**Security invariant:** Internal services (frontend, eRPC, LiteLLM admin, Prometheus) MUST have `hostnames: ["obol.stack"]` to prevent tunnel exposure. See Section 8.2 for trust boundary details. + +### 7.3 Internal Service Communication + +All internal traffic uses Kubernetes ClusterIP services with DNS resolution (`..svc.cluster.local`). + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Cluster-Internal Traffic (ClusterIP, no external exposure) │ +│ │ +│ LiteLLM :4000 │ +│ ├── ollama/* ──> Ollama Service :11434 ──> host Ollama │ +│ ├── paid/* ──> x402-buyer :8402 (localhost, same Pod) │ +│ ├── anthropic/* ──> api.anthropic.com (egress) │ +│ └── openai/* ──> api.openai.com (egress) │ +│ │ +│ x402-buyer :8402 │ +│ └── upstream ──> Remote seller (egress, x402 payment attached) │ +│ │ +│ x402-verifier :8080 │ +│ └── facilitator ──> facilitator.x402.rs (egress, HTTPS) │ +│ │ +│ OpenClaw Agent │ +│ ├── LiteLLM ──> litellm.llm.svc:4000 │ +│ └── Remote Signer ──> remote-signer:9000 (same namespace) │ +│ │ +│ eRPC :4000 │ +│ ├── Local nodes ──> -execution..svc:8545 │ +│ └── Remote RPCs ──> ChainList endpoints (egress) │ +│ │ +│ Prometheus │ +│ ├── x402-verifier ──> ServiceMonitor scrape /metrics │ +│ └── x402-buyer ──> PodMonitor scrape /metrics │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +*SPEC cross-ref: Section 2.2 (Routing Architecture), Section 6.2 (Internal Service Communication), Section 7.1 (Tunnel Exposure).* + +--- + +## 8. Security Architecture + +### 8.1 Trust Boundaries + +The system has four trust boundaries, each with distinct threat models and protection mechanisms. + +```mermaid +graph TB + subgraph TB1["Trust Boundary 1: Host Machine"] + CLI[obol CLI] + Ollama[Ollama] + Docker[Docker] + Keystore["Wallet Keystores
(encrypted, filesystem)"] + SE["Secure Enclave
(hardware, macOS)"] + end + + subgraph TB2["Trust Boundary 2: k3d Cluster"] + subgraph TB2a["TB 2a: Local-Only Zone"] + FE[Frontend] + ERPC[eRPC] + Prom[Prometheus] + LLMAdmin[LiteLLM Admin] + end + + subgraph TB2b["TB 2b: Payment-Gated Zone"] + Verifier[x402-verifier] + Services["/services/* upstreams"] + end + + subgraph TB2c["TB 2c: Agent Zone (RBAC-scoped)"] + Agent[OpenClaw Agent] + Signer[Remote Signer] + Wallet[Wallet Secret] + end + end + + subgraph TB3["Trust Boundary 3: Tunnel (Public Internet)"] + CF[cloudflared] + Buyers[Remote Buyers] + end + + subgraph TB4["Trust Boundary 4: External Services"] + Fac[x402 Facilitator] + Chain[Base L2] + CloudLLM[Cloud LLM APIs] + end + + TB3 -->|"Only: /services/*, /.well-known/,
/skill.md, / (storefront)"| TB2b + TB2b -->|"ForwardAuth"| Verifier + TB2c -->|"RBAC: openclaw-monetize ClusterRole"| TB2b + TB2c -->|"Port 9000, in-namespace only"| Signer + Verifier -->|"HTTPS POST"| Fac + Agent -->|"JSON-RPC"| Chain +``` + +### 8.2 Authentication and Authorization Flows + +| Flow | Mechanism | Protection | +|------|-----------|-----------| +| **Public -> Service** | x402 payment (EIP-712 signed ERC-3009) | Facilitator verifies signature + settles on-chain. No payment = 402 rejection. | +| **Local -> Frontend/eRPC** | Hostname restriction (`obol.stack`) | Only reachable from local machine via hosts file or DNS resolver. Tunnel traffic cannot match. | +| **Agent -> Kubernetes API** | ServiceAccount `openclaw` + RBAC | `openclaw-monetize` ClusterRole: CRUD on ServiceOffers, Middlewares, HTTPRoutes, ConfigMaps, Services, Deployments. Read-only on Pods, Endpoints, logs. | +| **Agent -> Signing** | Remote-signer REST API (port 9000) | In-namespace only (no Service exposed outside namespace). Keystore decryption at signer startup. | +| **Buyer -> Remote Seller** | Pre-signed ERC-3009 auths via X-PAYMENT header | Zero signer access. Finite auth pool. Max loss = N * price per auth. | +| **CLI -> Cluster** | kubeconfig (auto-generated, file-permission protected) | `0600` permissions on kubeconfig. Port drift handled by regeneration. | + +### 8.3 Wallet Isolation + +Three distinct wallet isolation models serve different security requirements: + +``` +1. Software Wallet (Default) + ┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ + │ Keystore File │────>│ Remote Signer │────>│ Agent Pod │ + │ (scrypt + AES) │ │ :9000 │ │ (REST only) │ + │ $DATA_DIR/... │ │ In-namespace │ │ │ + └─────────────────┘ └──────────────────┘ └─────────────┘ + Key at rest: encrypted. Key in use: signer memory only. + +2. Secure Enclave (macOS, standalone gateway) + ┌─────────────────┐ ┌──────────────────┐ + │ Apple SEP │────>│ Inference Gateway│ + │ (P-256, never │ │ (ECIES decrypt, │ + │ exported) │ │ ECDSA sign) │ + └─────────────────┘ └──────────────────┘ + Key never leaves hardware. SIP enforced. + +3. TEE (Linux, confidential computing) + ┌─────────────────┐ ┌──────────────────┐ + │ TEE Enclave │────>│ Inference Gateway│ + │ (TDX/SNP/Nitro) │ │ (attestation + │ + │ Key in-enclave │ │ ECIES decrypt) │ + └─────────────────┘ └──────────────────┘ + Key bound to attestation. Hardware-signed quote. +``` + +*SPEC cross-ref: Section 7.3 (Wallet Security), Section 7.4 (Enclave / TEE Security).* + +### 8.4 Pre-Signed Authorization Pool (Buy Side) + +The buy-side security model eliminates private key exposure from the sidecar entirely: + +``` +┌──────────────────────────────────────────────────────┐ +│ buy.py (Agent context, has signer access) │ +│ │ +│ 1. Generate N random 32-byte nonces │ +│ 2. For each nonce, sign ERC-3009 via remote-signer │ +│ 3. Write signed auths to ConfigMap │ +│ │ +│ Output: N pre-signed TransferWithAuthorization │ +│ Each authorizes exactly $price USDC transfer │ +└───────────────────────┬──────────────────────────────┘ + │ ConfigMap (x402-buyer-auths) + v +┌──────────────────────────────────────────────────────┐ +│ x402-buyer sidecar (NO signer access) │ +│ │ +│ - Pops one auth per 402 response (mutex-guarded) │ +│ - Marks nonce consumed (StateStore, crash-safe) │ +│ - Cannot generate new auths │ +│ - Cannot modify auth values │ +│ - Max spend = N * price (bounded by pool size) │ +│ - Pool exhausted -> 404 (agent must pre-sign more) │ +└──────────────────────────────────────────────────────┘ +``` + +*SPEC cross-ref: Section 3.5.3 (Architecture), Section 7.2 (Payment Security).* + +### 8.5 RBAC Model + +The `openclaw-monetize` ClusterRole is the sole RBAC grant for the agent. It follows the principle of least privilege across API groups: + +| API Group | Resources | Verbs | Rationale | +|-----------|-----------|-------|-----------| +| `obol.org` | `serviceoffers`, `serviceoffers/status` | get, list, watch, create, update, patch, delete | Full lifecycle of sell-side CRDs | +| `traefik.io` | `middlewares` | get, list, create, update, patch, delete | ForwardAuth middleware for x402 gating | +| `gateway.networking.k8s.io` | `httproutes` | get, list, create, update, patch, delete | Public route publication | +| (core) | `configmaps`, `services`, `deployments` | get, list, create, update, patch, delete | Pricing ConfigMap, registration httpd, storefront | +| (core) | `pods`, `endpoints`, `pods/log` | get, list | Health checks, debugging (read-only) | + +**Binding:** ClusterRoleBinding `openclaw-monetize-binding` binds to ServiceAccount `openclaw` in namespace `openclaw-obol-agent`. The `patchMonetizeBinding()` function in `internal/agent/agent.go` ensures the subjects array is populated, guarding against race conditions during initial cluster setup. + +*SPEC cross-ref: Section 7.5 (RBAC), Section 5.2 (Kubernetes Resources).* + +### 8.6 Threat Model Summary + +| Threat | Mitigation | Residual Risk | +|--------|-----------|---------------| +| Tunnel exposes internal services | `hostnames: ["obol.stack"]` restriction on all local-only HTTPRoutes | Misconfiguration (test: never create public routes for internal services) | +| Replay attack on x402 payments | Random 32-byte nonces, `validBefore`/`validAfter` windows, facilitator deduplication | Facilitator availability | +| Buyer overspending | Pre-signed auth pool with finite size, nonce consumption tracking | Pool size set at purchase time | +| Wallet key extraction | Encrypted keystore (scrypt), remote-signer pattern, Secure Enclave (non-exportable) | Software wallet in memory during signing | +| Reconciler privilege escalation | ClusterRole scoped to specific API groups and verbs | Agent code compromise could create arbitrary routes | +| Supply chain (container images) | Pinned image tags (LiteLLM, k3s, OpenClaw), version consistency tests | Upstream image compromise before pin update | +| ConfigMap propagation delay | 60-120s k3d file watcher; 5s verifier poll | Brief window where stale config serves requests | + +*SPEC cross-ref: Section 7.1 (Tunnel Exposure), Section 7.2 (Payment Security), Section 9.4 (Known Latencies).* diff --git a/docs/specs/BEHAVIORS_AND_EXPECTATIONS.md b/docs/specs/BEHAVIORS_AND_EXPECTATIONS.md new file mode 100644 index 00000000..37796692 --- /dev/null +++ b/docs/specs/BEHAVIORS_AND_EXPECTATIONS.md @@ -0,0 +1,667 @@ +# Obol Stack -- Behaviors and Expectations + +**Version**: 1.0.0 +**Status**: Living document +**Last Updated**: 2026-03-27 + +This document defines the behavioral contract for Obol Stack. Every behavior described here maps to one or more testable scenarios expressible as Gherkin. Cross-references point to [SPEC.md](SPEC.md) where the underlying system is defined. Existing BDD feature files live in [features/](features/) and `internal/x402/features/`. + +--- + +## Table of Contents + +1. [Introduction](#1-introduction) +2. [Desired Behaviors](#2-desired-behaviors) +3. [Undesired Behaviors](#3-undesired-behaviors) +4. [Edge Cases](#4-edge-cases) +5. [Performance Expectations](#5-performance-expectations) +6. [Guardrail Definitions](#6-guardrail-definitions) + +--- + +## 1. Introduction + +### 1.1 Purpose + +This document is the behavioral specification for Obol Stack. It defines what the system should do, what it must not do, how it handles edge cases, and what performance it must achieve. + +It serves as: +- A contract between the product and engineering teams +- The source of truth for BDD feature file scenarios +- A test oracle for integration and adversarial testing +- A guardrail reference that CI and code review can enforce + +### 1.2 How to Read This Document + +**Desired behaviors** (Section 2) follow this format: +- **Trigger**: What user action or system state initiates the behavior +- **Expected**: What the system should do +- **Rationale**: Why this behavior matters + +**Undesired behaviors** (Section 3) add: +- **Risk**: What goes wrong if this behavior occurs + +**Edge cases** (Section 4) describe unusual or boundary scenarios with expected handling. + +**Cross-references** use the notation `SPEC SS X.Y` to reference sections in [SPEC.md](SPEC.md). For example, `SPEC SS 3.1` refers to Section 3.1 (Stack Lifecycle). + +Every behavior in this document MUST be expressible as a Gherkin `Given / When / Then` scenario. Inline Gherkin examples are included for critical behaviors. + +### 1.3 Relationship to SPEC.md + +| This Document | SPEC.md | +|---------------|---------| +| Describes *what should happen* | Describes *how things are built* | +| Trigger / Expected / Rationale | Architecture, data model, APIs | +| Test-oriented (Gherkin-compatible) | Implementation-oriented | +| Guardrails are non-negotiable | Constraints are structural | + +--- + +## 2. Desired Behaviors + +### 2.1 Stack Lifecycle + +> SPEC SS 3.1 -- Stack Lifecycle + +#### B-2.1.1: Stack initialization generates a unique cluster identity + +**Trigger**: Operator runs `obol stack init`. +**Expected**: The CLI generates a petname-based stack ID, resolves absolute paths for ConfigDir/DataDir/BinDir, writes the backend config file (`.stack-backend`), and copies embedded infrastructure defaults with template substitution (`{{OLLAMA_HOST}}`, `{{OLLAMA_HOST_IP}}`, `{{CLUSTER_ID}}`). The stack ID is persisted at `$OBOL_CONFIG_DIR/.stack-id`. +**Rationale**: Unique naming prevents namespace collisions between clusters. Absolute paths are required because Docker volume mounts reject relative paths. + +```gherkin +Scenario: Stack init creates unique identity + Given no stack has been initialized + When the operator runs "obol stack init" + Then a ".stack-id" file exists in the config directory + And the stack ID is a two-word petname + And a ".stack-backend" file exists with value "k3d" + And the defaults directory contains rendered infrastructure templates +``` + +#### B-2.1.2: Stack init with --force preserves existing stack ID + +**Trigger**: Operator runs `obol stack init --force` on an already-initialized stack. +**Expected**: The cluster config is regenerated, but the existing stack ID is preserved. The previous backend's cluster is destroyed before initializing the new one. +**Rationale**: Preserving the stack ID maintains continuity for data directories and PVC paths. Destroying the old backend prevents orphaned Docker containers. + +#### B-2.1.3: Stack up deploys full infrastructure and auto-configures LLM + +**Trigger**: Operator runs `obol stack up` after `obol stack init`. +**Expected**: The CLI creates the k3d/k3s cluster, exports the kubeconfig, runs `syncDefaults()` (helmfile sync for all infrastructure), auto-configures LiteLLM with detected Ollama models and cloud provider API keys (single restart), deploys the OpenClaw agent singleton, patches agent RBAC, and starts the DNS tunnel if provisioned. The stack is fully operational after completion. +**Rationale**: One command must bring the entire stack from zero to operational. Auto-configuration eliminates manual `obol model setup` for the common case. + +```gherkin +Scenario: Stack up brings cluster to operational state + Given the stack has been initialized + When the operator runs "obol stack up" + Then the k3d cluster is running + And a kubeconfig file exists in the config directory + And the Traefik gateway is accepting connections on port 80 + And LiteLLM is running in the "llm" namespace + And the x402-verifier is running in the "x402" namespace + And the OpenClaw agent is running in the "openclaw-obol-agent" namespace +``` + +#### B-2.1.4: Stack down preserves config and data + +**Trigger**: Operator runs `obol stack down`. +**Expected**: The cluster is stopped. Config directory, data directory, and all PVCs are preserved. The DNS resolver is stopped. +**Rationale**: Operators expect to stop and restart without losing state. PVC data (wallets, skills, blockchain data) is valuable. + +#### B-2.1.5: Stack purge removes cluster and config + +**Trigger**: Operator runs `obol stack purge`. +**Expected**: The cluster is destroyed and the config directory is removed. Root-owned PVC data in the data directory is NOT removed unless `--force` is passed (which invokes sudo). +**Rationale**: Root-owned local-path-provisioner directories cannot be removed by regular users. The `--force` flag makes the destructive scope explicit. + +#### B-2.1.6: Helmfile sync failure triggers automatic cleanup + +**Trigger**: `helmfile sync` fails during `obol stack up`. +**Expected**: The cluster is automatically stopped via `Down()`. The operator receives a clear error message and can fix the issue before retrying. +**Rationale**: A partially deployed cluster is worse than no cluster. Auto-cleanup prevents orphaned resources. + +--- + +### 2.2 LLM Routing + +> SPEC SS 3.2 -- LLM Routing + +#### B-2.2.1: Auto-configuration detects Ollama models + +**Trigger**: `obol stack up` runs `autoConfigureLLM()` and Ollama is running on the host. +**Expected**: The CLI queries `http://localhost:11434/api/tags`, discovers available models, adds them to the `litellm-config` ConfigMap as `ollama/` entries pointing at `http://ollama.llm.svc:11434`, and restarts LiteLLM exactly once. +**Rationale**: Agent chat must work immediately after stack up without manual model configuration. + +```gherkin +Scenario: LLM auto-configuration detects Ollama models + Given Ollama is running with models "qwen3.5:9b" and "llama3.2:3b" + When the operator runs "obol stack up" + Then the litellm-config ConfigMap contains an entry for "qwen3.5:9b" + And the litellm-config ConfigMap contains an entry for "llama3.2:3b" + And LiteLLM was restarted exactly once +``` + +#### B-2.2.2: Auto-configuration detects cloud provider API keys + +**Trigger**: `obol stack up` runs `autoConfigureLLM()` and `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` is set in the environment. +**Expected**: The detected provider is added to `litellm-config` as a wildcard entry (e.g., `anthropic/*`) and the API key is stored in `litellm-secrets`. LiteLLM restart is batched with Ollama model configuration (single restart). +**Rationale**: Cloud providers should be available to the agent without manual setup when keys are present. + +#### B-2.2.3: Paid inference routes through x402-buyer sidecar + +**Trigger**: A request for model `paid/` arrives at LiteLLM. +**Expected**: LiteLLM matches the `paid/*` catch-all entry and proxies to `http://127.0.0.1:8402/v1`. The x402-buyer sidecar handles payment attachment and upstream routing. +**Rationale**: The static `paid/*` route means no LiteLLM fork or dynamic config is needed for buy-side payments. The sidecar pattern keeps payment logic isolated. + +#### B-2.2.4: Manual model setup validates before adding + +**Trigger**: Operator runs `obol model setup custom --name foo --endpoint http://example.com --model bar`. +**Expected**: The CLI validates the endpoint is reachable and the model exists before adding it to LiteLLM config. +**Rationale**: Prevents broken routes in LiteLLM that would cause silent inference failures. + +--- + +### 2.3 Network Management + +> SPEC SS 3.3 -- Network / RPC Gateway + +#### B-2.3.1: Adding a chain by ID fetches public RPCs + +**Trigger**: Operator runs `obol network add `. +**Expected**: The CLI queries the ChainList API for public RPC endpoints for the given chain ID, adds them as upstreams in the `erpc-config` ConfigMap, registers the network, and restarts eRPC. Write methods (`eth_sendRawTransaction`) are blocked by default. +**Rationale**: ChainList provides curated public endpoints. Blocking writes by default prevents accidental mainnet transactions. + +```gherkin +Scenario: Adding a chain blocks write methods by default + Given the stack is running + When the operator runs "obol network add 1" + Then the erpc-config contains upstreams for chain ID 1 + And write methods are blocked on all upstreams for chain ID 1 +``` + +#### B-2.3.2: Adding a chain with --allow-writes enables write methods + +**Trigger**: Operator runs `obol network add --allow-writes`. +**Expected**: Write methods are allowed on the configured upstreams for that chain. +**Rationale**: Some use cases (transaction submission, contract deployment) require write access. The flag makes this an explicit opt-in. + +#### B-2.3.3: Local Ethereum nodes register as priority upstreams with writes blocked + +**Trigger**: `obol network install ethereum` deploys a local execution layer node. +**Expected**: The local node is registered in eRPC as a priority upstream via `RegisterERPCUpstream()`, but write methods are blocked on the local upstream. Write requests are routed to remote upstreams instead. +**Rationale**: Local nodes provide low-latency reads. Writes to a local-only node would not propagate to the real network. + +#### B-2.3.4: Removing a chain cleans up all upstreams + +**Trigger**: Operator runs `obol network remove `. +**Expected**: All upstreams for that chain ID are removed from `erpc-config`. The network entry is also removed. eRPC is restarted. +**Rationale**: Clean removal prevents stale routing entries. + +--- + +### 2.4 Sell-Side Monetization + +> SPEC SS 3.4 -- Monetize: Sell Side + +#### B-2.4.1: Selling an HTTP service creates a ServiceOffer CR + +**Trigger**: Operator runs `obol sell http --wallet 0x... --chain base-sepolia --price 0.001 --upstream --port --namespace `. +**Expected**: A `ServiceOffer` CR is created in the `openclaw-obol-agent` namespace with the specified payment terms, upstream reference, and path (`/services/`). The CLI also calls `EnsureTunnelForSell()` to activate the tunnel if dormant. +**Rationale**: The ServiceOffer CRD is the declarative API. The tunnel must be active for public access. + +```gherkin +Scenario: Operator sells HTTP service via CLI + Given the stack is running + When the operator runs "obol sell http myapi --wallet 0xABC --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" + Then a ServiceOffer "myapi" exists in namespace "openclaw-obol-agent" + And the ServiceOffer has payment.payTo "0xABC" + And the ServiceOffer has payment.network "base-sepolia" + And the ServiceOffer has path "/services/myapi" +``` + +#### B-2.4.2: Agent reconciles ServiceOffer through 6 stages + +**Trigger**: A `ServiceOffer` CR exists and the `monetize.py` reconciler is running. +**Expected**: The reconciler watches for ServiceOffer CRs and progresses them through 6 stages (every 10 seconds): + +1. **ModelReady** -- Model availability verified (inference type) or skipped (HTTP type). +2. **UpstreamHealthy** -- Health check passes against `upstream.healthPath`. +3. **PaymentGateReady** -- Traefik `Middleware` (ForwardAuth) and pricing route in `x402-pricing` ConfigMap are created. +4. **RoutePublished** -- `HTTPRoute` created routing `/services//*` through the ForwardAuth middleware to the upstream. +5. **Registered** -- ERC-8004 on-chain registration and `/.well-known/agent-registration.json` published. +6. **Ready** -- All conditions met, endpoint URL set in status. + +All created resources have `ownerReferences` pointing to the ServiceOffer for automatic garbage collection. +**Rationale**: The 6-stage reconciliation provides observability into the sell-side pipeline. OwnerReferences ensure clean deletion. + +```gherkin +Scenario: Agent reconciles ServiceOffer to Ready + Given a ServiceOffer "myapi" exists + When the agent reconciles the ServiceOffer + Then the ServiceOffer status condition "ModelReady" is "True" + And the ServiceOffer status condition "UpstreamHealthy" is "True" + And the ServiceOffer status condition "PaymentGateReady" is "True" + And the ServiceOffer status condition "RoutePublished" is "True" + And the ServiceOffer status condition "Registered" is "True" + And the ServiceOffer status condition "Ready" is "True" + And a Middleware "x402-myapi" exists in the offer namespace + And an HTTPRoute "so-myapi" exists in the offer namespace +``` + +#### B-2.4.3: x402-verifier returns 402 for unpaid requests to priced routes + +**Trigger**: An HTTP request arrives at a path matching a pricing route in `x402-pricing` ConfigMap, without an `X-PAYMENT` header. +**Expected**: Traefik forwards the request to the x402-verifier via ForwardAuth. The verifier matches the `X-Forwarded-Uri` against configured routes (first match wins), finds a price, and returns HTTP 402 with a `PaymentRequirements` JSON body containing `x402Version`, `accepts` array (scheme, network, maxAmountRequired, resource, asset, payTo, maxTimeoutSeconds). +**Rationale**: The 402 response tells clients exactly how to pay. This is the core x402 protocol handshake. + +```gherkin +Scenario: Unpaid request returns 402 with pricing + Given a priced route exists at "/services/myapi/*" with price "1000" + When a client sends a POST to "/services/myapi/v1/chat/completions" without X-PAYMENT + Then the response status is 402 + And the response body contains "x402Version" with value 1 + And the response body contains an "accepts" array with the route price +``` + +#### B-2.4.4: x402-verifier passes paid requests after facilitator verification + +**Trigger**: An HTTP request with a valid `X-PAYMENT` header arrives at a priced route. +**Expected**: The verifier extracts the payment payload, delegates to the x402 facilitator for verification, and upon success returns 200 OK to Traefik (which then forwards to the upstream). The verifier optionally sets an `Authorization` header for upstream auth. +**Rationale**: Payment verification is delegated to the facilitator to avoid on-chain logic in the hot path. + +#### B-2.4.5: x402-verifier passes unpriced routes without payment + +**Trigger**: An HTTP request arrives at a path that does NOT match any pricing route. +**Expected**: The verifier returns 200 OK immediately (free route). +**Rationale**: Not all routes behind the ForwardAuth middleware require payment. Discovery endpoints and health checks should be freely accessible. + +#### B-2.4.6: Pricing config hot-reloads without restart + +**Trigger**: The `x402-pricing` ConfigMap is updated (e.g., new route added by reconciler). +**Expected**: `WatchConfig()` detects the file modification within 5 seconds (polling interval), parses the new config, and atomically swaps it via `Verifier.Reload()`. In-flight requests are not affected. +**Rationale**: Adding or removing services should not require verifier downtime. Atomic pointer swap ensures lock-free reads on the hot path. + +#### B-2.4.7: Per-million-token pricing approximated as per-request in Phase 1 + +**Trigger**: Operator sets `--per-mtok 1.00` on `obol sell http`. +**Expected**: The effective per-request price is calculated as `perMTok / 1000` (using `ApproxTokensPerRequest = 1000`). Both `perMTok` and the computed `perRequest` (as `price`) are stored in the pricing route. +**Rationale**: Phase 1 does not have exact token metering. A fixed approximation of 1000 tokens per request provides a reasonable baseline. + +#### B-2.4.8: Deleting a ServiceOffer cleans up all owned resources + +**Trigger**: Operator runs `obol sell delete `. +**Expected**: The ServiceOffer CR is deleted. All resources with ownerReferences (Middleware, HTTPRoute, pricing route ConfigMap entry, registration resources) are garbage-collected by Kubernetes. +**Rationale**: Clean deletion prevents orphaned routes and stale pricing entries. + +--- + +### 2.5 Buy-Side Payments + +> SPEC SS 3.5 -- Monetize: Buy Side + +#### B-2.5.1: Buyer probe discovers pricing from 402 response + +**Trigger**: Agent runs `buy.py probe` against a remote seller endpoint. +**Expected**: The probe sends an unpaid request, receives a 402 response, and extracts pricing information (payTo, price, network, asset) from the `PaymentRequirements` body. +**Rationale**: Discovery-driven purchasing lets agents find and pay for services without hardcoded pricing. + +```gherkin +Scenario: Buyer discovers pricing via probe + Given a remote seller is serving at "https://seller.example.com/services/qwen" + When the agent runs "buy.py probe" against the seller endpoint + Then the probe returns 402 with pricing info + And the pricing contains payTo, price, and network +``` + +#### B-2.5.2: Buyer pre-signs ERC-3009 authorizations into ConfigMaps + +**Trigger**: Agent runs `buy.py buy` with a seller endpoint and count. +**Expected**: The agent pre-signs N `TransferWithAuthorization` (ERC-3009) vouchers using the wallet private key, stores them in the `x402-buyer-auths` ConfigMap, and configures the upstream in `x402-buyer-config` ConfigMap. The sidecar hot-reloads the new config. +**Rationale**: Pre-signing moves the expensive signing operation out of the hot path. Bounded pool size limits maximum financial exposure. + +#### B-2.5.3: Paid inference through sidecar attaches payment on 402 + +**Trigger**: LiteLLM proxies a `paid/` request to the x402-buyer sidecar, which forwards to the remote seller and receives a 402. +**Expected**: The sidecar intercepts the 402, pops one pre-signed authorization from the pool, constructs an `X-PAYMENT` header, and retries the request. The seller verifies payment and returns the inference result. The sidecar returns the result to LiteLLM, which returns it to the agent. +**Rationale**: Transparent payment attachment means the agent sees standard OpenAI API responses. The sidecar handles the full x402 handshake internally. + +```gherkin +Scenario: Paid inference through sidecar + Given the buyer has 5 pre-signed authorizations for "seller-qwen" + When the agent requests model "paid/qwen3.5:9b" + Then LiteLLM proxies to the x402-buyer sidecar + And the sidecar sends an unpaid request to the seller + And the seller returns 402 + And the sidecar pops one authorization and retries with X-PAYMENT + And the seller returns 200 with inference content + And the agent receives the inference result + And the buyer has 4 remaining authorizations +``` + +#### B-2.5.4: Model resolution strips prefixes correctly + +**Trigger**: A request for `paid/openai/qwen3.5:9b` arrives at the x402-buyer sidecar. +**Expected**: The sidecar strips `paid/` and `openai/` prefixes to resolve `qwen3.5:9b`, looks up the model in `modelRoutes`, and dispatches to the correct upstream handler. +**Rationale**: LiteLLM adds `openai/` when routing through the `paid/*` catch-all. The sidecar must handle both prefixed and bare model names. + +#### B-2.5.5: Sidecar exposes status, health, and metrics endpoints + +**Trigger**: Monitoring or liveness probes query the sidecar. +**Expected**: `/status` returns JSON with remaining/spent auth counts per upstream. `/healthz` returns 200 for liveness. `/metrics` returns Prometheus-format metrics scraped via `PodMonitor`. +**Rationale**: Observability into auth pool state is critical for operational awareness. Prometheus integration enables alerting on low pool levels. + +--- + +### 2.6 Tunnel Management + +> SPEC SS 3.7 -- Tunnel Management + +#### B-2.6.1: Quick tunnel activates on first sell + +**Trigger**: Operator runs `obol sell http` and no DNS tunnel is provisioned. +**Expected**: The quick tunnel mode activates, starting the cloudflared pod. A random `*.trycloudflare.com` URL is assigned. The URL is propagated to the OpenClaw agent (`AGENT_BASE_URL`), the frontend ConfigMap, and the storefront. The tunnel URL is ephemeral and changes on restart. +**Rationale**: Quick mode provides zero-configuration public access. Activation on first sell means the tunnel is dormant until needed, reducing attack surface. + +```gherkin +Scenario: Quick tunnel activates on first sell + Given the stack is running with no DNS tunnel provisioned + And the tunnel is dormant + When the operator runs "obol sell http myapi ..." + Then the cloudflared pod is running in the "traefik" namespace + And a quick tunnel URL is assigned + And AGENT_BASE_URL is set on the OpenClaw Deployment +``` + +#### B-2.6.2: DNS tunnel persists across restarts + +**Trigger**: Operator runs `obol tunnel login --hostname stack.example.com`, provisions the tunnel, and later restarts the stack. +**Expected**: After `obol stack up`, the DNS tunnel automatically starts with the same stable hostname. Tunnel state (mode, hostname, accountID, zoneID, tunnelID) is persisted at `$OBOL_CONFIG_DIR/tunnel/cloudflared.json`. +**Rationale**: Stable hostnames are required for on-chain ERC-8004 registration and consistent discovery URLs. + +#### B-2.6.3: Tunnel URL propagation updates all consumers + +**Trigger**: A tunnel becomes active (either quick or DNS). +**Expected**: The tunnel URL is propagated to 4 consumers: +1. OpenClaw agent `AGENT_BASE_URL` environment variable. +2. Frontend `obol-stack-config` ConfigMap in `obol-frontend` namespace. +3. Agent overlay Helmfile state values. +4. Storefront landing page content. + +**Rationale**: Multiple components need the tunnel URL. Centralized propagation prevents URL drift. + +#### B-2.6.4: Storefront deploys at tunnel hostname root + +**Trigger**: Tunnel becomes active. +**Expected**: `CreateStorefront()` deploys 4 resources in the `traefik` namespace: a ConfigMap with HTML content, a busybox httpd Deployment (5m CPU, 8Mi RAM), a ClusterIP Service on port 8080, and an HTTPRoute for the tunnel hostname root (`/`). +**Rationale**: The storefront provides a human-readable landing page for visitors who navigate to the tunnel URL directly. + +--- + +### 2.7 ERC-8004 Identity + +> SPEC SS 3.8 -- ERC-8004 Identity + +#### B-2.7.1: On-chain registration mints agent NFT + +**Trigger**: The reconciler reaches stage 5 (Registered) for a ServiceOffer with `registration.enabled: true`, or the operator runs `obol sell register`. +**Expected**: The ERC-8004 client calls `Register(ctx, key, agentURI)` on the Identity Registry contract (Base Sepolia: `0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D`), minting an NFT. The returned `agentId` (token ID) and `registrationTxHash` are stored in the ServiceOffer status. +**Rationale**: On-chain identity enables decentralized agent discovery and reputation. + +```gherkin +Scenario: Agent registers on-chain via ERC-8004 + Given the ServiceOffer "myapi" is at stage "RoutePublished" + And the wallet has sufficient Base Sepolia ETH for gas + When the reconciler processes stage 5 (Registered) + Then an agent NFT is minted on the Identity Registry + And the ServiceOffer status has a non-empty "agentId" + And the ServiceOffer status has a non-empty "registrationTxHash" +``` + +#### B-2.7.2: Registration document served at well-known endpoint + +**Trigger**: A client fetches `/.well-known/agent-registration.json` via the tunnel. +**Expected**: The endpoint returns an `AgentRegistration` JSON document containing the agent name, description, services, x402Support (true), registrations (agentId + registry address), and supportedTrust array. +**Rationale**: The well-known endpoint is the standard ERC-8004 discovery mechanism. + +#### B-2.7.3: Metadata update via SetAgentURI + +**Trigger**: Agent metadata changes (e.g., new service added, description updated). +**Expected**: The ERC-8004 client calls `SetAgentURI(ctx, key, agentId, newURI)` to update the on-chain metadata pointer. +**Rationale**: On-chain metadata must stay current with the agent's actual capabilities. + +--- + +### 2.8 Security + +> SPEC SS 7 -- Security Model + +#### B-2.8.1: Local-only routes restricted by hostname + +**Trigger**: Any HTTPRoute for an internal service (frontend, eRPC, monitoring). +**Expected**: The HTTPRoute has `hostnames: ["obol.stack"]`, ensuring the route only matches requests with `Host: obol.stack`. Requests arriving via the tunnel (with the tunnel hostname) do not match. +**Rationale**: Internal services contain sensitive data (blockchain RPCs, inference admin, Prometheus metrics) and must not be reachable from the public internet. + +```gherkin +Scenario: Frontend is not accessible via tunnel + Given the tunnel is active with hostname "stack.example.com" + When a client sends GET "/" with Host "stack.example.com" + Then the response is the storefront landing page + And the response is NOT the frontend dashboard + +Scenario: Frontend is accessible locally + When a client sends GET "/" with Host "obol.stack" + Then the response is the frontend dashboard +``` + +#### B-2.8.2: RBAC binding patched by agent init + +**Trigger**: `obol agent init` runs during `obol stack up`. +**Expected**: `patchMonetizeBinding()` ensures the `openclaw-monetize-binding` ClusterRoleBinding has the `openclaw` ServiceAccount in `openclaw-obol-agent` namespace as a subject. This grants the agent CRUD access to ServiceOffers, Middlewares, HTTPRoutes, ConfigMaps, Services, and Deployments. +**Rationale**: The agent needs these permissions for the 6-stage reconciliation. Patching at init time handles the race condition where the binding may exist with empty subjects. + +--- + +## 3. Undesired Behaviors + +### 3.1 Security Violations + +#### U-3.1.1: Internal services exposed via tunnel (CRITICAL) + +**Trigger**: An HTTPRoute for the frontend, eRPC, LiteLLM admin, or monitoring is created without `hostnames: ["obol.stack"]` restriction. +**Expected**: This MUST NOT happen. All internal service HTTPRoutes MUST include the hostname restriction. Code review and CI must reject any change that removes hostname restrictions from internal routes. +**Risk**: Exposing the frontend exposes cluster management. Exposing eRPC exposes blockchain RPCs (potentially including write-enabled chains). Exposing LiteLLM admin exposes inference configuration and API keys. Exposing Prometheus exposes internal metrics and potentially secrets. This is the highest-severity security violation in the system. + +```gherkin +Scenario: Internal HTTPRoutes must have hostname restrictions + Given the stack is running + When I inspect the HTTPRoute for the frontend + Then it has hostnames containing "obol.stack" + When I inspect the HTTPRoute for eRPC + Then it has hostnames containing "obol.stack" +``` + +### 3.2 LLM Configuration + +#### U-3.2.1: Model without tool support assigned to agent + +**Trigger**: A model that does not support function/tool calling is configured as the agent's primary model. +**Expected**: The system should warn the operator that the model lacks tool support, as the OpenClaw agent relies on tool calling for skill execution and infrastructure management. +**Risk**: The agent silently fails to use skills, producing degraded responses with no indication of the root cause. + +#### U-3.2.2: drop_params silently strips tool definitions + +**Trigger**: LiteLLM's `drop_params` setting is enabled (default in some configs) and the model does not natively support tool parameters. +**Expected**: The system should NOT silently strip tool definitions. If a model does not support tools, the error should surface rather than being hidden. +**Risk**: Tool calls appear to succeed but the model never sees the tool definitions, leading to non-functional agent behavior that is extremely difficult to diagnose. + +### 3.3 Infrastructure Drift + +#### U-3.3.1: Kubeconfig port drift after restart + +**Trigger**: The k3d cluster is restarted and the API server is assigned a different port. +**Expected**: The kubeconfig should be refreshed during `obol stack up`. If the port drifts and the kubeconfig is stale, all kubectl operations fail. +**Risk**: All CLI commands that interact with the cluster fail with connection errors. The fix is `k3d kubeconfig write -o $CONFIG_DIR/kubeconfig.yaml --overwrite`, but the operator may not know this. + +#### U-3.3.2: RBAC binding empty subjects race condition + +**Trigger**: `obol agent init` runs before k3s has fully applied the `openclaw-monetize-binding` ClusterRoleBinding manifest. +**Expected**: The `patchMonetizeBinding()` function should handle this by creating or patching the binding. If the race occurs and is not handled, the agent has no permissions. +**Risk**: The 6-stage reconciliation silently fails on all stages that require Kubernetes API access (stages 3-6). The ServiceOffer status shows unhelpful error messages. + +### 3.4 Caching and Staleness + +#### U-3.4.1: eRPC cache staleness for balance queries + +**Trigger**: A paid request settles on-chain, and `buy.py balance` is called within 10 seconds. +**Expected**: The balance query may return a stale value because eRPC caches `eth_call` results for 10 seconds (unfinalized block TTL). +**Risk**: The agent or operator sees an incorrect USDC balance. This is cosmetic (the on-chain state is correct) but confusing. Operators should be aware of the ~10-second lag. + +--- + +## 4. Edge Cases + +### 4.1 Infrastructure Dependencies + +#### E-4.1.1: No Ollama running during stack up + +**Scenario**: The operator runs `obol stack up` but Ollama is not installed or not running on the host. +**Expected Handling**: `autoConfigureLLM()` fails to reach `http://localhost:11434/api/tags`. The failure is non-fatal: a warning is printed, and LiteLLM starts without local model entries. The operator can install/start Ollama later and run `obol model setup` manually. +**Rationale**: Ollama is not a hard dependency. Cloud-only configurations are valid. The stack must be operational even without local inference. + +```gherkin +Scenario: Stack up without Ollama + Given Ollama is not running + When the operator runs "obol stack up" + Then a warning is printed about Ollama not being available + And LiteLLM is running with no Ollama model entries + And the stack is otherwise fully operational +``` + +#### E-4.1.2: No cloud provider API keys available + +**Scenario**: Neither `ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, nor `OPENAI_API_KEY` is set in the environment. +**Expected Handling**: `autoConfigureLLM()` prints a warning for each missing provider. LiteLLM starts with only Ollama models (if available) or an empty model list. The operator can add keys later via `obol model setup`. +**Rationale**: Cloud API keys are not required. Local-only inference with Ollama is a valid configuration. + +### 4.2 Blockchain Operations + +#### E-4.2.1: Wallet lacks Base Sepolia ETH for registration + +**Scenario**: The reconciler reaches stage 5 (Registered) but the agent wallet has insufficient ETH to pay gas for the ERC-8004 registration transaction. +**Expected Handling**: The `Register()` call fails with a transaction error. The reconciler logs the error, sets the `Registered` condition to `False` with a message indicating insufficient gas, and retries on the next loop (10 seconds). The ServiceOffer remains at stage 4 (RoutePublished) -- the service is functional but not registered on-chain. +**Rationale**: Gas availability is an external dependency. The service should still work (stages 1-4 are complete) even if on-chain registration is pending. + +#### E-4.2.2: All discovery backends unavailable + +**Scenario**: The x402 facilitator, ChainList API, and blockchain RPC endpoints are all unreachable. +**Expected Handling**: Each subsystem degrades independently: +- Facilitator unavailable: x402-verifier cannot verify payments, returns 500. Existing free routes still work. +- ChainList unavailable: `obol network add ` fails with an error. Custom endpoints (`--endpoint`) still work. +- RPC unavailable: eRPC returns errors for blockchain queries. Local nodes (if deployed) still serve reads. + +**Rationale**: External service failures should not cascade. Each failure is isolated to its subsystem. + +### 4.3 Timing and Propagation + +#### E-4.3.1: ConfigMap propagation delay (60-120 seconds) + +**Scenario**: The reconciler updates the `x402-pricing` ConfigMap with a new route, but the x402-verifier does not see the change for up to 120 seconds. +**Expected Handling**: The k3d file watcher takes 60-120 seconds to propagate ConfigMap changes to mounted volumes. The verifier's `WatchConfig()` polls every 5 seconds for file modification time changes. Net worst-case delay: ~125 seconds from ConfigMap update to verifier reload. During this window, requests to the new route will pass through unpriced (free). +**Rationale**: This is a known k3d limitation. The window is short and the failure mode is permissive (free access, not blocked access). For immediate effect, the operator can force a pod restart. + +```gherkin +Scenario: Pricing route available after propagation delay + Given the x402-verifier is running + When a new pricing route is added to the x402-pricing ConfigMap + Then within 125 seconds the verifier serves the new pricing route + And requests to the route return 402 until payment is provided +``` + +#### E-4.3.2: ExternalName services with Traefik Gateway API + +**Scenario**: An operator creates an ExternalName Service expecting it to work as a Traefik upstream via Gateway API HTTPRoutes. +**Expected Handling**: ExternalName services do NOT work with Traefik Gateway API. The operator must use a ClusterIP Service with manually managed Endpoints instead. +**Rationale**: This is a known Traefik limitation. The `obol sell http` command creates ClusterIP Services to avoid this issue. + +### 4.4 Payment Edge Cases + +#### E-4.4.1: Pre-signed auth pool exhausted + +**Scenario**: All pre-signed ERC-3009 authorizations for a given upstream have been consumed. +**Expected Handling**: The `PreSignedSigner.Sign()` call returns an error. The x402-buyer sidecar returns a 404 for the model, indicating no purchased upstream is available. The agent must run `buy.py` to pre-sign additional authorizations. +**Rationale**: Bounded pool size is a security feature (maximum loss = N * price). Exhaustion is an expected operational event, not an error. + +```gherkin +Scenario: Auth pool exhaustion returns 404 + Given the buyer has 1 remaining authorization for "seller-qwen" + When the agent makes a paid request that consumes the last authorization + Then the request succeeds + When the agent makes another paid request + Then the sidecar returns 404 for the model + And the /status endpoint shows 0 remaining for "seller-qwen" +``` + +#### E-4.4.2: Quick tunnel URL changes after restart + +**Scenario**: The cluster is restarted (`obol stack down` then `obol stack up`) while in quick tunnel mode. +**Expected Handling**: The quick tunnel gets a new random `*.trycloudflare.com` URL. All URL consumers are re-propagated with the new URL. Any previous ERC-8004 registration with the old URL becomes stale. +**Rationale**: Quick mode is explicitly ephemeral. Operators needing stable URLs should use DNS mode (`obol tunnel login --hostname`). + +--- + +## 5. Performance Expectations + +| Behavior | Target | Measurement | Degradation Handling | +|----------|--------|-------------|---------------------| +| x402 ForwardAuth verify (no payment) | < 5ms | Time from ForwardAuth request to 402 response (local) | Lock-free `atomic.Pointer` config reads; pre-resolved chain map | +| x402 ForwardAuth verify (with payment) | < 600ms | Includes facilitator round-trip (100-500ms network) | Facilitator timeout; returns 500 on timeout | +| x402-buyer auth pop | < 1ms | Single mutex lock + O(1) pool pop per `Sign()` call | Mutex contention only under extreme concurrency | +| Route matching (verifier) | < 1ms | First-match short-circuit; no regex per request | Degenerate case: many routes, all glob patterns | +| Buyer model routing | < 1ms | `sync.RWMutex` concurrent reads; rebuild only on `Reload()` | Write lock held briefly during config reload | +| Pricing config hot-reload | < 10s | Poll interval (5s) + parse + atomic swap | Worst case: 5s poll + parse time; old config serves during swap | +| ConfigMap propagation (k3d) | 60-120s | k3d file watcher interval | Force pod restart for immediate effect | +| Quick tunnel URL availability | 10-20s | Time from pod start to URL assignment | Cloudflare registration latency; retry on failure | +| Helmfile sync (initial) | 2-5 min | Full infrastructure deployment | Progress reported via Helmfile output | +| LiteLLM restart | 10-30s | Pod termination + startup | In-flight requests may fail during restart window | +| `obol stack up` (cold start) | 3-7 min | Cluster creation + helmfile sync + auto-config | Depends on Docker image cache state | + +--- + +## 6. Guardrail Definitions + +### 6.1 Network Security + +| Guardrail | Rule | Enforcement | Violation Response | +|-----------|------|-------------|-------------------| +| Hostname restrictions on internal HTTPRoutes | Frontend, eRPC, monitoring, and LiteLLM admin HTTPRoutes MUST have `hostnames: ["obol.stack"]` | Code review; embedded template validation; `embed_crd_test.go` | Block PR merge; revert if deployed | +| Public routes limited to safe endpoints | Only `/services/*`, `/.well-known/*`, `/skill.md`, and `/` (storefront) may lack hostname restrictions | Template review; BDD integration tests | Block PR merge | +| Facilitator URL must use HTTPS | `ValidateFacilitatorURL()` rejects non-HTTPS facilitator URLs (loopback exempted for testing) | Runtime validation in CLI | CLI returns error; operation aborted | + +### 6.2 Payment Security + +| Guardrail | Rule | Enforcement | Violation Response | +|-----------|------|-------------|-------------------| +| x402 payment verification before resource access | Every request to `/services/*` MUST pass through x402 ForwardAuth | Traefik Middleware with ForwardAuth; reconciler creates Middleware in stage 3 | Middleware missing = route not published (stage 4 blocks) | +| Bounded spending on buyer sidecar | Maximum financial exposure = N * price, where N = pre-signed auth count | Finite pool in `PreSignedSigner`; no signer access in sidecar | Pool exhaustion returns 404; no additional spending possible | +| Replay protection | Every ERC-3009 authorization uses a unique random 32-byte nonce | `StateStore` tracks consumed nonces; `crypto/rand` for generation | Double-spend attempt rejected by contract | +| Zero signer access in buyer sidecar | The x402-buyer sidecar MUST NOT have access to any private key | Architecture: sidecar receives only pre-signed vouchers via ConfigMap | No private key mounted, injected, or accessible | + +### 6.3 Configuration Integrity + +| Guardrail | Rule | Enforcement | Violation Response | +|-----------|------|-------------|-------------------| +| KUBECONFIG auto-set for all K8s tools | `obol kubectl`, `obol helm`, `obol helmfile`, `obol k9s` MUST set `KUBECONFIG=$OBOL_CONFIG_DIR/kubeconfig.yaml` | CLI passthrough implementation sets env before exec | Without this, tools use default kubeconfig and target wrong cluster | +| OpenClaw version pinning consistency | Version in `OPENCLAW_VERSION` file, `openclawImageTag` Go const, and `obolup.sh` MUST agree | `TestOpenClawVersionConsistency` unit test | Test failure blocks CI | +| Two-stage templating separation | Stage 1 (CLI flags to Go templates) and Stage 2 (Helmfile to K8s) MUST NOT be mixed | Code review; template structure in `internal/embed/networks/` | Mixing stages causes unpredictable template rendering | +| Absolute paths in Docker volume mounts | All paths passed to Docker/k3d MUST be absolute | Resolved at `obol stack init` time; `config.Config` stores absolute paths | Relative paths cause Docker mount failures | + +### 6.4 Data Safety + +| Guardrail | Rule | Enforcement | Violation Response | +|-----------|------|-------------|-------------------| +| Wallet backup before purge | `PromptBackupBeforePurge()` MUST run before `obol stack purge` when wallets exist | CLI implementation checks for keystore files | Operator prompted to backup; can force with flag | +| Config hot-reload preserves previous on error | If parsing a new config file fails, the verifier/buyer MUST keep the previous valid config | Error handling in `WatchConfig()` and `Reload()` | Log error; continue serving with old config | +| OwnerReferences on reconciler-created resources | All Kubernetes resources created by `monetize.py` MUST have ownerReferences pointing to the ServiceOffer | Reconciler implementation sets ownerReferences on every create | Missing ownerReferences cause resource leaks on ServiceOffer deletion | +| Backend switching destroys old cluster | Changing from k3d to k3s (or vice versa) via `obol stack init --backend` MUST destroy the old backend first | `Init()` checks `.stack-backend` and calls `Destroy()` on mismatch | Orphaned Docker containers or k3s processes | diff --git a/docs/specs/CONTRIBUTING.md b/docs/specs/CONTRIBUTING.md new file mode 100644 index 00000000..20750187 --- /dev/null +++ b/docs/specs/CONTRIBUTING.md @@ -0,0 +1,199 @@ +# Developer Rules — Non-negotiable + +> References: SPEC Sections 1–9, ARCHITECTURE Section 1 (Design Philosophy) + +These rules derive from architectural decisions and hard-won operational experience. +Violating them creates silent failures, security holes, or infrastructure drift. + +--- + +## 1. Never Expose Internal Services via Tunnel + +Every HTTPRoute for frontend, eRPC, LiteLLM, or monitoring **must** carry `hostnames: ["obol.stack"]`. +Removing this restriction exposes admin UIs and RPC endpoints to the public internet through the Cloudflare tunnel. + +**Do this:** +```yaml +hostnames: + - "obol.stack" +``` + +**Not this:** +```yaml +# hostnames: [] ← CRITICAL: makes the route reachable via tunnel +``` + +*Why:* The tunnel exposes all routes without hostname restrictions. Internal services have no authentication layer beyond network isolation. (SPEC §7.1, ADR-0005) + +--- + +## 2. Two-Stage Templating Is Sacred + +Stage 1 (CLI flags → Go templates → `values.yaml`) and Stage 2 (Helmfile → K8s manifests) must stay separate. Never leak Helmfile template syntax into Stage 1 or vice versa. + +**Do this:** +```go +// Stage 1: Go template produces values.yaml +tmpl.Execute(out, map[string]string{"ChainID": "8453"}) +// Stage 2: helmfile sync --state-values-file values.yaml +``` + +**Not this:** +```go +// Mixing stages: Go template emitting {{ .Values.x }} Helm syntax +tmpl.Execute(out, "{{ .Values.chainID }}") // breaks Stage 2 +``` + +*Why:* Mixed stages produce undebuggable template errors. The separation enables `values.yaml` to be inspected as plain YAML between stages. (SPEC §3.3) + +--- + +## 3. Absolute Paths for Docker Volume Mounts + +All paths passed to k3d/Docker must be absolute. Relative paths resolve differently inside containers vs. host, causing silent mount failures. + +**Do this:** +```go +absPath, _ := filepath.Abs(cfg.DataDir) +// Use absPath in k3d volume mount +``` + +**Not this:** +```go +// Relative path: works on host, empty inside container +mount := ".workspace/data:/data" +``` + +*Why:* Resolved at `obol stack init` and stored in config. k3d volume mounts require host-absolute paths. (SPEC §1.3) + +--- + +## 4. Bound Spending on Buy-Side — Never Hot-Wallet the Sidecar + +The x402-buyer sidecar reads pre-signed ERC-3009 vouchers from a ConfigMap. It never holds signing keys. Maximum loss = N × price where N is the voucher pool size. + +**Do this:** +```go +// Sidecar pops one pre-signed auth per request +auth := pool.Pop(upstream) +``` + +**Not this:** +```go +// Signing in the sidecar: unbounded spending if compromised +sig, _ := wallet.Sign(transferAuth) +``` + +*Why:* A compromised sidecar with signing keys could drain the wallet. Pre-signed vouchers bound the blast radius by design. (SPEC §3.5, ADR-0004) + +--- + +## 5. KUBECONFIG Must Auto-Set for All K8s Tools + +Every command that touches Kubernetes (`kubectl`, `helm`, `helmfile`, `k9s`, and internal functions) must set `KUBECONFIG=$OBOL_CONFIG_DIR/kubeconfig.yaml`. Never rely on the user's default kubeconfig. + +**Do this:** +```go +cmd.Env = append(os.Environ(), "KUBECONFIG="+cfg.KubeconfigPath()) +``` + +**Not this:** +```go +// Omitting KUBECONFIG: hits user's default cluster, not obol's +cmd := exec.Command("kubectl", "apply", "-f", manifest) +``` + +*Why:* Users may have multiple clusters. Omitting KUBECONFIG operates on the wrong cluster, potentially destroying production workloads. (SPEC §1.3, §3.1) + +--- + +## 6. Version Pins Must Agree Across Three Locations + +OpenClaw version is pinned in `internal/openclaw/OPENCLAW_VERSION` (source of truth), `openclawImageTag` constant in `openclaw.go`, and `OPENCLAW_VERSION` in `obolup.sh`. All three must match. `TestOpenClawVersionConsistency` enforces this. + +**Do this:** +``` +# Update all three when bumping: +internal/openclaw/OPENCLAW_VERSION ← Renovate watches this +internal/openclaw/openclaw.go ← openclawImageTag const +obolup.sh ← OPENCLAW_VERSION variable +``` + +**Not this:** +``` +# Updating only one: CI passes, runtime pulls wrong image +echo "0.1.8" > internal/openclaw/OPENCLAW_VERSION +# Forgot openclaw.go and obolup.sh → version drift +``` + +*Why:* Mismatched versions cause the binary to deploy a different image than obolup.sh installs, producing silent behavioral differences. (SPEC §3.6) + +--- + +## 7. ServiceOffer Cleanup via OwnerReferences + +When the reconciler creates Kubernetes resources (Middleware, HTTPRoute, ConfigMap, Service, Deployment) for a ServiceOffer, every resource must carry an `ownerReference` back to the ServiceOffer CR. This enables automatic garbage collection on delete. + +**Do this:** +```python +owner_ref = { + "apiVersion": "obol.org/v1alpha1", + "kind": "ServiceOffer", + "name": offer["metadata"]["name"], + "uid": offer["metadata"]["uid"], +} +``` + +**Not this:** +```python +# Orphaned resources: deleting the ServiceOffer leaves routing artifacts +kubectl.create(middleware) # no ownerReference +``` + +*Why:* Without owner references, `obol sell delete` leaves orphaned Middleware and HTTPRoutes that continue routing traffic to dead upstreams. (SPEC §3.4) + +--- + +## 8. Conventional Commits, Scoped PRs + +Use conventional commit prefixes (`feat:`, `fix:`, `test:`, `docs:`, `chore:`). Keep PRs scoped — separate formatting changes from logic changes. Never mix refactoring with feature work in the same PR. + +**Do this:** +``` +feat: add per-mtok pricing to sell http command +fix: restore tunnel open/close dropped during cherry-pick +``` + +**Not this:** +``` +update sell command and fix formatting and add tests +``` + +*Why:* Scoped commits enable clean reverts, meaningful changelogs, and reviewable diffs. Mixed PRs are unreviewable and un-revertable. + +--- + +## 9. Integration Tests Skip Gracefully + +Integration tests use `//go:build integration` and must skip (not fail) when prerequisites are missing (no cluster, no Ollama, no API keys). Unit tests must never require a running cluster. + +**Do this:** +```go +//go:build integration + +func TestIntegration_SellFlow(t *testing.T) { + if os.Getenv("OBOL_DEVELOPMENT") == "" { + t.Skip("requires OBOL_DEVELOPMENT=true and running cluster") + } +} +``` + +**Not this:** +```go +// No build tag, fails in CI without cluster +func TestSellFlow(t *testing.T) { + // Calls kubectl internally → fails everywhere +} +``` + +*Why:* CI runs `go test ./...` without a cluster. Failing tests block unrelated PRs. (SPEC §10) diff --git a/docs/specs/SPEC.md b/docs/specs/SPEC.md new file mode 100644 index 00000000..1d9ba147 --- /dev/null +++ b/docs/specs/SPEC.md @@ -0,0 +1,1452 @@ +# Obol Stack -- Technical Specification + +> **Version:** 1.0.0 +> **Date:** 2026-03-27 +> **Status:** Living document reflecting the current codebase on the `main` branch. + +--- + +## Table of Contents + +1. [Introduction](#1-introduction) +2. [System Architecture](#2-system-architecture) +3. [Core Subsystems](#3-core-subsystems) +4. [API and Protocol Definition](#4-api-and-protocol-definition) +5. [Data Model](#5-data-model) +6. [Integration Points](#6-integration-points) +7. [Security Model](#7-security-model) +8. [Error Handling](#8-error-handling) +9. [Performance](#9-performance) +10. [Testing Strategy](#10-testing-strategy) + +--- + +## 1. Introduction + +### 1.1 Purpose + +Obol Stack is a framework for AI agents to run decentralized infrastructure locally. It deploys a k3d/k3s Kubernetes cluster containing an OpenClaw AI agent, blockchain networks, payment-gated inference via the x402 protocol, and Cloudflare tunnels for public exposure. All management is done through the `obol` CLI binary, built with Go and `github.com/urfave/cli/v3`. + +### 1.2 Terminology + +| Term | Definition | +|------|-----------| +| **x402** | HTTP 402 Payment Required protocol for micropayments. Clients attach EIP-712 signed `PaymentPayload` headers; servers verify via a facilitator service. | +| **ERC-8004** | Ethereum standard for on-chain agent identity. Defines an `IdentityRegistryUpgradeable` (ERC-721) with metadata storage. | +| **ServiceOffer** | Custom Kubernetes resource (`obol.org`) declaring a sell-side service with pricing, upstream, and registration metadata. | +| **ForwardAuth** | Traefik middleware pattern where every request is first forwarded to an auth service (`x402-verifier`) before reaching the upstream. | +| **ERC-3009** | `TransferWithAuthorization` -- gasless USDC transfers via pre-signed EIP-712 authorizations. | +| **Facilitator** | Third-party x402 service that verifies payment signatures and optionally settles on-chain. Default: `https://facilitator.x402.rs`. | +| **LiteLLM** | OpenAI-compatible proxy that routes inference requests to Ollama, Anthropic, OpenAI, or paid remote sellers. | +| **eRPC** | Blockchain RPC gateway that multiplexes and caches requests across multiple upstream RPC providers. | +| **OpenClaw** | AI agent runtime deployed as a singleton Kubernetes Deployment with skills injected via host-path PVC. | +| **Petname** | Two-word deterministic identifier (e.g., `fluffy-penguin`) generated via `dustinkirkland/golang-petname` for unique cluster/deployment naming. | +| **CAIP-2** | Chain Agnostic Improvement Proposal for network identifiers (e.g., `eip155:84532` for Base Sepolia). | +| **Storefront** | Static HTML landing page served at the tunnel hostname root via busybox httpd. | +| **Sidecar** | The `x402-buyer` container running alongside LiteLLM in the same Pod, handling buy-side payment attachment. | + +### 1.3 System Constraints + +| Constraint | Detail | +|-----------|--------| +| **Absolute paths** | Docker volume mounts require absolute paths, resolved at `obol stack init` time. | +| **Two-stage templating** | Stage 1: CLI flags populate Go templates in `values.yaml.gotmpl`. Stage 2: Helmfile renders final Kubernetes manifests. Stages must not be mixed. | +| **Unique namespaces** | Every deployment (network, app) gets a unique namespace: `-`. | +| **OBOL_DEVELOPMENT=true** | Required for `obol stack up` to auto-build and import local Docker images (x402-verifier, x402-buyer). | +| **Root-owned PVCs** | k3s local-path-provisioner creates root-owned directories. `obol stack purge -f` (sudo) required to remove. | +| **Single cluster** | One k3d/k3s cluster per config directory. Multiple stacks require separate `OBOL_CONFIG_DIR` values. | +| **OpenClaw version pinning** | Version must agree in 3 places: `OPENCLAW_VERSION` file, `openclawImageTag` Go const, `obolup.sh` shell const. `TestOpenClawVersionConsistency` enforces this. | +| **ConfigMap propagation delay** | k3d file watcher takes 60-120 seconds to pick up manifest changes. | + +### 1.4 Dependencies + +| Dependency | Minimum Version | Purpose | +|-----------|----------------|---------| +| Docker | 20.10.0 | Container runtime for k3d backend | +| Go | 1.25 | Build toolchain | +| kubectl | 1.35.0 | Kubernetes API client | +| Helm | 3.19.4 | Chart management | +| k3d | 5.8.3 | k3d cluster management (default backend) | +| Helmfile | 1.2.3 | Declarative Helm chart orchestration | +| k9s | 0.50.18 | Cluster TUI (optional) | +| k3s | v1.35.1-k3s1 | Kubernetes distribution (via `rancher/k3s` image or binary) | + +--- + +## 2. System Architecture + +### 2.1 High-Level Overview + +The system is composed of two parts: `obolup.sh` (bootstrap installer with pinned dependency versions) and the `obol` CLI (Go binary managing all lifecycle operations). + +```mermaid +graph TD + subgraph "Host Machine" + CLI["obol CLI (Go binary)"] + Ollama["Ollama (host)"] + Docker["Docker / Podman"] + end + + subgraph "k3d / k3s Cluster" + subgraph "traefik ns" + GW["Traefik Gateway
(Gateway API)"] + CF["cloudflared"] + SF["Storefront httpd"] + end + + subgraph "llm ns" + LiteLLM["LiteLLM :4000"] + Buyer["x402-buyer :8402
(sidecar)"] + end + + subgraph "x402 ns" + Verifier["x402-verifier
(ForwardAuth)"] + end + + subgraph "openclaw-obol-agent ns" + Agent["OpenClaw Agent"] + RS["Remote Signer :9000"] + end + + subgraph "erpc ns" + ERPC["eRPC Gateway"] + end + + subgraph "obol-frontend ns" + FE["Frontend"] + end + + subgraph "monitoring ns" + Prom["Prometheus"] + end + + subgraph "network-petname ns" + EL["Execution Layer"] + CL["Consensus Layer"] + end + end + + Internet["Public Internet"] + + CLI --> Docker + CLI --> GW + Ollama -.->|host.docker.internal| LiteLLM + Internet --> CF --> GW + GW -->|/services/*| Verifier -->|200 OK| GW --> LiteLLM + GW -->|obol.stack| FE + GW -->|obol.stack/rpc| ERPC + LiteLLM --> Buyer -->|x402 payment| Internet + Agent --> RS + ERPC --> EL +``` + +### 2.2 Routing Architecture + +Traefik serves as the cluster ingress using the Kubernetes Gateway API. A single `GatewayClass` (`traefik`) and `Gateway` (`traefik-gateway`) in the `traefik` namespace handle all HTTP/HTTPS traffic. + +```mermaid +flowchart LR + subgraph "Request Classification" + direction TB + R1["Local-only
hostnames: obol.stack"] + R2["Public
no hostname restriction"] + end + + R1 -->|"/"| FE["Frontend"] + R1 -->|"/rpc"| ERPC["eRPC"] + + R2 -->|"/services/name/*"| FA["x402 ForwardAuth"] --> US["Upstream Service"] + R2 -->|"/.well-known/*"| WK["ERC-8004 httpd"] + R2 -->|"/skill.md"| SK["Service Catalog"] + R2 -->|"/ (tunnel host)"| SF["Storefront"] +``` + +**Routing rules:** + +- **Local-only routes** are restricted to `hostnames: ["obol.stack"]`. This ensures the frontend, eRPC, LiteLLM admin, and monitoring are never reachable via the Cloudflare tunnel. +- **Public routes** have no hostname restriction and are intentionally exposed via the tunnel. The `/services/*` path is protected by x402 ForwardAuth. Discovery endpoints (`/.well-known/`, `/skill.md`) and the storefront landing page are unauthenticated. + +### 2.3 Configuration Hierarchy + +``` +Config{ConfigDir, BinDir, DataDir, StateDir} + +Precedence (each directory type): + 1. Explicit env var (OBOL_CONFIG_DIR, OBOL_BIN_DIR, OBOL_DATA_DIR, OBOL_STATE_DIR) + 2. XDG standard (XDG_CONFIG_HOME/obol, ~/.local/bin, XDG_DATA_HOME/obol, XDG_STATE_HOME/obol) + 3. OBOL_DEVELOPMENT=true -> .workspace/{config,bin,data,state} +``` + +**Source:** `internal/config/config.go` + +### 2.4 Backend Abstraction + +The `Backend` interface (`internal/stack/backend.go`) abstracts the Kubernetes runtime: + +| Method | Description | +|--------|-----------| +| `Init(cfg, ui, stackID)` | Generate backend-specific cluster configuration | +| `Up(cfg, ui, stackID)` | Create/start cluster, return kubeconfig bytes | +| `Down(cfg, ui, stackID)` | Stop cluster without destroying config/data | +| `Destroy(cfg, ui, stackID)` | Remove cluster entirely | +| `DataDir(cfg)` | Return storage path for local-path-provisioner | +| `Prerequisites(cfg)` | Check required software/permissions | +| `IsRunning(cfg, stackID)` | Check if cluster is currently running | + +**Implementations:** + +- `K3dBackend` (default): Docker-based via k3d. Ports: 80:80, 8080:80, 443:443, 8443:443. +- `K3sBackend`: Bare-metal k3s binary. Ollama host is `127.0.0.1` (no Docker networking). + +Backend choice is persisted in `.stack-backend` file. Switching backends triggers automatic destruction of the old cluster to prevent orphaned resources. + +--- + +## 3. Core Subsystems + +### 3.1 Stack Lifecycle + +**Source:** `internal/stack/stack.go`, `internal/stack/backend.go`, `internal/stack/backend_k3d.go`, `internal/stack/backend_k3s.go` + +#### 3.1.1 Purpose + +Manage the full lifecycle of the local Kubernetes cluster: initialization, startup (with infrastructure deployment), shutdown, and purge. + +#### 3.1.2 Operations + +| Command | Function | Behavior | +|---------|----------|---------| +| `obol stack init` | `Init()` | Generate cluster ID (petname), resolve absolute paths, write backend config, copy embedded infrastructure defaults, resolve Ollama host for backend | +| `obol stack up` | `Up()` | Create cluster, export kubeconfig, `syncDefaults()` (helmfile sync), auto-configure LiteLLM, deploy OpenClaw, apply agent RBAC, start DNS tunnel if persistent | +| `obol stack down` | `Down()` | Stop cluster (preserves config + data), stop DNS resolver | +| `obol stack purge` | `Purge()` | Destroy cluster, remove config dir; `--force` also removes root-owned data dir via sudo | + +#### 3.1.3 Startup Sequence + +```mermaid +sequenceDiagram + participant User + participant CLI as obol CLI + participant Backend as K3d/K3s Backend + participant Helmfile + participant LiteLLM + participant OpenClaw + participant Tunnel + + User->>CLI: obol stack up + CLI->>Backend: Up(cfg, stackID) + Backend-->>CLI: kubeconfig bytes + CLI->>CLI: Write kubeconfig + CLI->>Helmfile: syncDefaults (helmfile sync) + Note over Helmfile: Deploy infrastructure
(Traefik, eRPC, x402, LiteLLM, etc.) + Helmfile-->>CLI: Infrastructure deployed + CLI->>LiteLLM: autoConfigureLLM() + Note over LiteLLM: Detect Ollama models
Detect cloud provider API keys
Patch ConfigMap + Secret
Single restart + CLI->>OpenClaw: SetupDefault() + Note over OpenClaw: Deploy singleton agent
Inject skills via PVC + CLI->>CLI: agent.Init() (RBAC patching) + CLI->>Tunnel: Check tunnel state + alt DNS tunnel provisioned + CLI->>Tunnel: EnsureRunning() + else Quick tunnel + Note over Tunnel: Dormant until first sell + end + CLI-->>User: Stack started +``` + +#### 3.1.4 Ollama Host Resolution + +The Ollama host varies by backend and OS: + +| Backend | OS | Ollama Host | IP Resolution | +|---------|----|-------------|---------------| +| k3d | macOS | `host.docker.internal` | Docker Desktop gateway `192.168.65.254` | +| k3d | Linux | `host.k3d.internal` | `docker0` bridge IP | +| k3s | any | `127.0.0.1` | Loopback (k3s runs on host) | + +#### 3.1.5 Configuration + +- **Stack ID:** Persisted in `$OBOL_CONFIG_DIR/.stack-id`. Preserved across `--force` reinit. +- **Backend choice:** Persisted in `$OBOL_CONFIG_DIR/.stack-backend`. +- **Embedded defaults:** Copied to `$OBOL_CONFIG_DIR/defaults/` with template substitution (`{{OLLAMA_HOST}}`, `{{OLLAMA_HOST_IP}}`, `{{CLUSTER_ID}}`). + +#### 3.1.6 Error States + +| Error | Cause | Recovery | +|-------|-------|---------| +| `stack ID not found` | `Init()` not called | Run `obol stack init` | +| `port(s) already in use` | Conflicting service on 80/443/8080/8443 | Stop conflicting service | +| `helmfile sync failed` | Infrastructure deployment error | Cluster auto-stopped via `Down()`, fix and retry | +| `prerequisites check failed` | Missing Docker/k3s binary | Install prerequisites | + +--- + +### 3.2 LLM Routing + +**Source:** `internal/model/model.go` + +#### 3.2.1 Purpose + +Configure and manage the LiteLLM gateway (port 4000) as the central OpenAI-compatible inference proxy, routing requests to local Ollama, cloud providers, or paid remote sellers. + +#### 3.2.2 Inputs / Outputs + +| Input | Source | Description | +|-------|--------|-------------| +| Ollama models | Host Ollama API (`/api/tags`) | Auto-detected during `obol stack up` | +| Cloud API keys | Environment variables | `ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `OPENAI_API_KEY` | +| OpenClaw config | `~/.openclaw/openclaw.json` | Agent model preference for cloud provider detection | + +| Output | Target | Description | +|--------|--------|-------------| +| `litellm-config` ConfigMap | `llm` namespace | YAML `config.yaml` with `model_list` entries | +| `litellm-secrets` Secret | `llm` namespace | Master key + provider API keys | +| LiteLLM Deployment restart | `llm` namespace | Triggered after config patches | + +#### 3.2.3 Provider Configuration + +Known providers are defined statically: + +| Provider | EnvVar | Alt EnvVars | Notes | +|----------|--------|-------------|-------| +| `anthropic` | `ANTHROPIC_API_KEY` | `CLAUDE_CODE_OAUTH_TOKEN` | Claude models | +| `openai` | `OPENAI_API_KEY` | -- | GPT models | +| `ollama` | -- | -- | Local, no API key | + +#### 3.2.4 Logic + +1. **Auto-configuration** (`autoConfigureLLM`): Detects Ollama models and cloud provider API keys. Patches all providers first, then performs a single LiteLLM restart. +2. **Manual configuration** (`ConfigureLiteLLM`): `obol model setup --provider `. Patches Secret + ConfigMap + restarts. +3. **Paid inference routing**: Static `paid/*` model alias routes through the `x402-buyer` sidecar at `http://127.0.0.1:8402`. The LiteLLM config contains a permanent catch-all entry; the sidecar handles payment attachment. + +#### 3.2.5 LiteLLM Config Structure + +```yaml +model_list: + - model_name: "qwen3.5:9b" # Ollama model + litellm_params: + model: "ollama/qwen3.5:9b" + api_base: "http://ollama.llm.svc:11434" + - model_name: "anthropic/*" # Cloud wildcard + litellm_params: + model: "anthropic/*" + api_key: "os.environ/ANTHROPIC_API_KEY" + - model_name: "paid/*" # Buy-side sidecar + litellm_params: + model: "openai/*" + api_base: "http://127.0.0.1:8402/v1" +``` + +#### 3.2.6 Error States + +| Error | Cause | Recovery | +|-------|-------|---------| +| `cluster not running` | Kubeconfig missing | Run `obol stack up` | +| `no models to configure` | Empty model list for provider | Ensure Ollama has models or provide API key | +| Auto-configure failures | Non-fatal | User can run `obol model setup` manually | + +--- + +### 3.3 Network / RPC Gateway + +**Source:** `internal/network/erpc.go`, `internal/network/rpc.go`, `internal/network/network.go`, `internal/network/resolve.go` + +#### 3.3.1 Purpose + +Manage blockchain RPC routing through the eRPC gateway. Add/remove chains with public or custom RPC endpoints, deploy local Ethereum nodes, and register them as priority upstreams. + +#### 3.3.2 eRPC ConfigMap Structure + +The eRPC configuration is stored in the `erpc-config` ConfigMap in the `erpc` namespace under the key `erpc.yaml`. It defines projects with networks and upstreams: + +```yaml +projects: + - id: main + networks: + - architecture: evm + evm: + chainId: 1 + upstreams: + - id: local-ethereum-fluffy-penguin + endpoint: http://ethereum-execution.ethereum-fluffy-penguin.svc.cluster.local:8545 + evm: + chainId: 1 + - id: chainlist-ethereum-1 + endpoint: https://eth.llamarpc.com + evm: + chainId: 1 +``` + +#### 3.3.3 Two-Stage Templating + +Network deployments use two-stage templating: + +1. **Stage 1 (CLI flags -> Go templates):** `values.yaml.gotmpl` files in `internal/embed/networks/` use `@enum`, `@default`, `@description` annotations. CLI flags populate these templates to produce `values.yaml`. +2. **Stage 2 (Helmfile -> K8s):** `helmfile sync --state-values-file values.yaml --state-values-set id=` renders final Kubernetes manifests. + +#### 3.3.4 Write Method Blocking + +By default, eRPC blocks write methods (`eth_sendRawTransaction`) on all upstreams. The `--allow-writes` flag on `obol network add` enables write methods for a specific chain. Local Ethereum nodes registered via `RegisterERPCUpstream()` always have writes blocked -- write requests are routed to remote upstreams instead. + +#### 3.3.5 Operations + +| Command | Function | Description | +|---------|----------|-------------| +| `obol network add` | `AddPublicRPCs()` / `AddCustomRPC()` | Add chain by ID (ChainList) or custom endpoint | +| `obol network remove` | `RemoveRPC()` | Remove chain from eRPC | +| `obol network list` | `ListRPCNetworks()` | Show configured chains and upstreams | +| `obol network install` | `Install()` | Deploy local Ethereum node (two-stage template) | +| `obol network sync` | `Sync()` | Re-sync helmfile for a deployed network | +| `obol network status` | `Status()` | Show deployment status | + +--- + +### 3.4 Monetize -- Sell Side + +**Source:** `cmd/obol/sell.go`, `internal/x402/`, `internal/schemas/`, `internal/embed/skills/monetize/` + +#### 3.4.1 Purpose + +Enable operators to sell access to cluster services (inference, HTTP endpoints) via x402 micropayments. The sell side creates ServiceOffer CRDs, reconciles them through 6 stages, and publishes payment-gated routes via Traefik. + +#### 3.4.2 Sell-Side Flow + +```mermaid +sequenceDiagram + participant Operator + participant CLI as obol sell http + participant K8s as Kubernetes API + participant Reconciler as monetize.py + participant Verifier as x402-verifier + participant Traefik + + Operator->>CLI: obol sell http myapi --wallet 0x... --price 0.001 + CLI->>K8s: Create ServiceOffer CR + CLI->>CLI: EnsureTunnelForSell() + + loop Reconciliation (every 10s) + Reconciler->>K8s: Watch ServiceOffer CRs + Reconciler->>Reconciler: Stage 1: ModelReady + Reconciler->>Reconciler: Stage 2: UpstreamHealthy + Reconciler->>Verifier: Stage 3: PaymentGateReady
(create Middleware + pricing route) + Reconciler->>Traefik: Stage 4: RoutePublished
(create HTTPRoute) + Reconciler->>K8s: Stage 5: Registered
(ERC-8004 on-chain) + Reconciler->>K8s: Stage 6: Ready + end + + Note over Traefik: /services/myapi/* -> ForwardAuth -> upstream +``` + +#### 3.4.3 ServiceOffer CRD + +The `ServiceOffer` CRD (`obol.org`) is the declarative API for sell-side services: + +**Spec fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `type` | `WorkloadType` | `inference` or `fine-tuning` | +| `model` | `ModelSpec` | `{name, runtime}` -- LLM model metadata | +| `upstream` | `UpstreamSpec` | `{service, namespace, port, healthPath}` -- target K8s Service | +| `payment` | `PaymentTerms` | `{scheme, network, payTo, maxTimeoutSeconds, price}` | +| `path` | `string` | URL path prefix (default: `/services/`) | +| `registration` | `RegistrationSpec` | ERC-8004 metadata `{enabled, name, description, image, services, supportedTrust}` | + +**Status fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `conditions[]` | `Condition` | 6 condition types tracking reconciliation progress | +| `endpoint` | `string` | Published URL | +| `agentId` | `string` | ERC-8004 token ID | +| `registrationTxHash` | `string` | On-chain registration transaction hash | + +#### 3.4.4 x402-verifier (ForwardAuth) + +**Source:** `internal/x402/verifier.go`, `internal/x402/config.go`, `internal/x402/matcher.go`, `internal/x402/watcher.go` + +The x402-verifier runs in the `x402` namespace as a Deployment. Traefik sends every request matching a ForwardAuth Middleware to `POST /verify`. The verifier: + +1. Reads `X-Forwarded-Uri` from the request headers. +2. Matches against `PricingConfig.Routes[]` (first match wins). +3. No match -> `200 OK` (free route). +4. Match + no `X-PAYMENT` header -> `402 Payment Required` with `PaymentRequirements` body. +5. Match + `X-PAYMENT` header -> delegates to `x402-go` middleware for verification/settlement. +6. Verified -> `200 OK` (optionally sets `Authorization` header for upstream auth). + +**Configuration hot-reload:** `WatchConfig()` polls the pricing YAML file every 5 seconds for modification time changes, then atomically swaps the `PricingConfig` via `Verifier.Reload()`. This handles ConfigMap volume mount updates (kubelet symlink swaps) without fsnotify. + +**Route matching** (`internal/x402/matcher.go`): + +| Pattern Type | Example | Behavior | +|-------------|---------|---------| +| Exact | `/health` | Matches only `/health` | +| Prefix | `/rpc/*` | Matches `/rpc/`, `/rpc/a/b/c` | +| Glob | `/inference-*/v1/*` | Segment-level wildcards via `path.Match` | + +#### 3.4.5 Pricing + +```go +// PricingConfig (YAML: x402-pricing ConfigMap) +type PricingConfig struct { + Wallet string // USDC recipient address + Chain string // e.g., "base-sepolia" + FacilitatorURL string // default: "https://facilitator.x402.rs" + VerifyOnly bool // skip settlement (testing) + Routes []RouteRule // first-match pricing rules +} + +type RouteRule struct { + Pattern string // URL path pattern + Price string // USDC per request + PayTo string // per-route wallet override + Network string // per-route chain override + UpstreamAuth string // Authorization header for upstream + PriceModel string // metadata: "perRequest", "perMTok" + PerMTok string // original per-million-token price + ApproxTokensPerRequest int // fixed estimate (default: 1000) + OfferNamespace string // originating ServiceOffer + OfferName string // originating ServiceOffer +} +``` + +**Phase 1 pricing approximation:** When `perMTok` is set, the effective per-request price is `perMTok / 1000` (using `ApproxTokensPerRequest = 1000`). Exact token metering is planned for phase 2. + +#### 3.4.6 Supported Chains + +| Chain | Name | CAIP-2 | +|-------|------|--------| +| Base Mainnet | `base` | `eip155:8453` | +| Base Sepolia | `base-sepolia` | `eip155:84532` | +| Polygon Mainnet | `polygon` | `eip155:137` | +| Polygon Amoy | `polygon-amoy` | `eip155:80002` | +| Avalanche Mainnet | `avalanche` | `eip155:43114` | +| Avalanche Fuji | `avalanche-fuji` | `eip155:43113` | + +#### 3.4.7 CLI Commands + +| Command | Description | +|---------|-------------| +| `obol sell http ` | Sell access to an HTTP service with x402 gating | +| `obol sell inference ` | Sell inference via standalone gateway (bare metal) | +| `obol sell list` | List active ServiceOffers | +| `obol sell status ` | Show reconciliation status for a ServiceOffer | +| `obol sell stop ` | Scale down a sold service | +| `obol sell delete ` | Delete ServiceOffer and all owned resources | +| `obol sell pricing` | Configure global wallet and chain | +| `obol sell register` | Trigger ERC-8004 on-chain registration | + +#### 3.4.8 Error States + +| Error | Cause | Recovery | +|-------|-------|---------| +| `unsupported chain` | Invalid chain name in `--chain` | Use one of: base, base-sepolia, polygon, polygon-amoy, avalanche, avalanche-fuji | +| `facilitator URL must use HTTPS` | Non-HTTPS facilitator (not localhost) | Use HTTPS URL or loopback for testing | +| Reconciler stuck at stage | Upstream unhealthy, wallet missing, tunnel down | Check `obol sell status ` for condition messages | + +--- + +### 3.5 Monetize -- Buy Side + +**Source:** `internal/x402/buyer/config.go`, `internal/x402/buyer/signer.go`, `internal/x402/buyer/proxy.go`, `internal/x402/buyer/state.go` + +#### 3.5.1 Purpose + +Enable agents to purchase inference from remote x402-gated sellers using pre-signed ERC-3009 `TransferWithAuthorization` vouchers. The `x402-buyer` sidecar runs as a second container in the `litellm` Deployment. + +#### 3.5.2 Buy-Side Flow + +```mermaid +sequenceDiagram + participant Agent as OpenClaw Agent + participant LiteLLM + participant Buyer as x402-buyer sidecar + participant Seller as Remote Seller + + Agent->>LiteLLM: POST /v1/chat/completions
model: "paid/qwen3.5:9b" + LiteLLM->>Buyer: Proxy to :8402/v1/chat/completions
model: "qwen3.5:9b" + Buyer->>Seller: POST /services/qwen/v1/chat/completions + Seller-->>Buyer: 402 PaymentRequired + Note over Buyer: Pop pre-signed auth
from pool + Buyer->>Seller: Retry with X-PAYMENT header + Seller-->>Buyer: 200 OK + inference response + Buyer-->>LiteLLM: 200 OK + LiteLLM-->>Agent: Chat completion +``` + +#### 3.5.3 Architecture + +The sidecar has zero signer access. Spending is bounded by design: maximum loss = N * price, where N is the number of pre-signed authorizations in the pool. + +**Components:** + +| Component | Role | +|-----------|------| +| `Proxy` | OpenAI-compatible reverse proxy with model-based routing | +| `PreSignedSigner` | Implements `x402.Signer` by popping from a finite auth pool | +| `StateStore` | Tracks consumed nonces to prevent double-spend across restarts | +| `X402Transport` | HTTP transport that intercepts 402 responses and attaches payments | + +#### 3.5.4 Configuration + +```json +// x402-buyer-config ConfigMap +{ + "upstreams": { + "seller-qwen": { + "url": "https://seller.example.com/services/qwen", + "remoteModel": "qwen3.5:9b", + "network": "base-sepolia", + "payTo": "0x...", + "asset": "0x...", + "price": "1000" + } + } +} +``` + +```json +// x402-buyer-auths ConfigMap (pre-signed ERC-3009 authorizations) +{ + "seller-qwen": [ + { + "signature": "0x...", + "from": "0x...", + "to": "0x...", + "value": "1000", + "validAfter": "0", + "validBefore": "115792089237316195423570985008687907853269984665640564039457584007913129639935", + "nonce": "0x..." + } + ] +} +``` + +#### 3.5.5 Model Resolution + +The proxy strips `paid/` and `openai/` prefixes from the requested model name to resolve the upstream: + +``` +"paid/openai/qwen3.5:9b" -> "qwen3.5:9b" -> lookup in modelRoutes -> upstream handler +``` + +#### 3.5.6 Endpoints + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/v1/chat/completions` | POST | OpenAI chat completions (model-routed) | +| `/chat/completions` | POST | OpenAI chat completions (no `/v1` prefix) | +| `/v1/responses` | POST | OpenAI responses API | +| `/responses` | POST | OpenAI responses API (no `/v1` prefix) | +| `/upstream//...` | ANY | Direct upstream access (compatibility) | +| `/status` | GET | JSON with remaining/spent auths per upstream | +| `/healthz` | GET | Liveness probe | +| `/metrics` | GET | Prometheus metrics | + +#### 3.5.7 Error States + +| Error | Cause | Recovery | +|-------|-------|---------| +| `pre-signed auth pool exhausted` | All vouchers consumed | Agent runs `buy.py` to pre-sign more | +| `no purchased upstream mapped` | Model not in buyer config | Agent runs `buy.py probe` + `buy.py buy` | +| Payment failure from seller | Invalid/expired auth, insufficient balance | Check auth validity, top up USDC | + +--- + +### 3.6 OpenClaw and Skills + +**Source:** `internal/openclaw/openclaw.go`, `internal/openclaw/wallet.go`, `internal/openclaw/resolve.go`, `internal/embed/` + +#### 3.6.1 Purpose + +Deploy and manage the OpenClaw AI agent as a singleton Kubernetes Deployment, inject skills via host-path PVC, and manage agent wallets for on-chain operations. + +#### 3.6.2 Agent Deployment + +The agent is deployed as a singleton Deployment named `openclaw` in the `openclaw-obol-agent` namespace. Skills are delivered via host-path PVC injection to `$DATA_DIR/openclaw-/openclaw-data/.openclaw/skills/`. + +**23 embedded skills** in `internal/embed/skills/`: + +| Category | Skills | +|----------|--------| +| Infrastructure | ethereum-networks, ethereum-local-wallet, obol-stack, distributed-validators, monetize, discovery, buy-inference, maintain-inference | +| Ethereum Dev | addresses, building-blocks, concepts, gas, indexing, l2s, orchestration, security, standards, ship, testing, tools, wallets | +| Frontend | frontend-playbook, frontend-ux, qa, why | + +#### 3.6.3 Wallet Generation + +`GenerateWallet()` in `internal/openclaw/wallet.go`: + +1. Generate secp256k1 private key. +2. Derive Ethereum address (Keccak-256 of uncompressed public key, last 20 bytes). +3. Encrypt private key using Web3 V3 keystore format (scrypt KDF, AES-128-CTR cipher). +4. Write keystore JSON to `$DATA_DIR/openclaw-/keystore/`. +5. Deploy remote-signer REST API at port 9000 in the same namespace. + +#### 3.6.4 Cloud Provider Detection + +During `obol stack up`, `autoDetectCloudProvider()`: + +1. Reads `~/.openclaw/openclaw.json` for agent model preference. +2. Extracts provider from model name (e.g., `anthropic/claude-sonnet-4-6` -> `anthropic`). +3. Resolves API key: primary env var -> alt env vars -> `.env` file (dev mode). +4. Patches LiteLLM with the provider + key. + +#### 3.6.5 Version Pinning + +Three locations must agree: + +| Location | File | Format | +|----------|------|--------| +| Source of truth | `internal/openclaw/OPENCLAW_VERSION` | Plain text | +| Go constant | `internal/openclaw/openclaw.go` | `openclawImageTag` const | +| Shell constant | `obolup.sh` | `OPENCLAW_VERSION` variable | + +`TestOpenClawVersionConsistency` in `internal/openclaw/version_test.go` enforces consistency. + +--- + +### 3.7 Tunnel Management + +**Source:** `internal/tunnel/tunnel.go`, `internal/tunnel/state.go`, `internal/tunnel/provision.go`, `internal/tunnel/cloudflare.go`, `internal/tunnel/agent.go` + +#### 3.7.1 Purpose + +Manage Cloudflare tunnels that expose the cluster to the public internet, enabling remote access to x402-gated services and agent discovery endpoints. + +#### 3.7.2 Tunnel Modes + +| Mode | Activation | URL | Persistence | +|------|-----------|-----|-------------| +| `quick` | Dormant by default; activates on first `obol sell` | Random `*.trycloudflare.com` | Ephemeral (changes on restart) | +| `dns` | `obol tunnel login --hostname stack.example.com` | Stable user-controlled hostname | Persistent across restarts | + +#### 3.7.3 State + +Tunnel state is persisted at `$OBOL_CONFIG_DIR/tunnel/cloudflared.json`: + +```go +type tunnelState struct { + Mode string // "quick" or "dns" + Hostname string // e.g., "stack.example.com" + AccountID string // Cloudflare account ID + ZoneID string // Cloudflare zone ID + TunnelID string // Cloudflare tunnel ID + TunnelName string // Tunnel name + UpdatedAt time.Time // Last state update +} +``` + +#### 3.7.4 Lifecycle + +```mermaid +stateDiagram-v2 + [*] --> Dormant: obol stack up (quick mode) + Dormant --> Active: obol sell http / obol tunnel restart + Active --> Dormant: obol tunnel stop (scale to 0) + [*] --> Active: obol stack up (dns mode, auto-start) + Active --> [*]: obol stack down / purge + + state Active { + [*] --> Running + Running --> Restarting: obol tunnel restart + Restarting --> Running + } +``` + +#### 3.7.5 URL Propagation + +When a tunnel becomes active, the URL is propagated to multiple consumers: + +1. **obol-agent env:** `AGENT_BASE_URL` on the OpenClaw Deployment (for `monetize.py` registration JSON). +2. **Frontend ConfigMap:** `obol-stack-config` in `obol-frontend` namespace (dashboard URL). +3. **Agent overlay:** Helmfile state values for consistency across syncs. +4. **Storefront:** Busybox httpd landing page at the tunnel hostname root. + +#### 3.7.6 Storefront Resources + +`CreateStorefront()` deploys 4 Kubernetes resources in the `traefik` namespace: + +- `ConfigMap/tunnel-storefront`: HTML content + mime types +- `Deployment/tunnel-storefront`: busybox httpd serving the ConfigMap (5m CPU, 8Mi RAM) +- `Service/tunnel-storefront`: ClusterIP on port 8080 +- `HTTPRoute/tunnel-storefront`: Routes tunnel hostname root to the storefront + +--- + +### 3.8 ERC-8004 Identity + +**Source:** `internal/erc8004/client.go`, `internal/erc8004/types.go`, `internal/erc8004/abi.go` + +#### 3.8.1 Purpose + +Register AI agents on-chain using the ERC-8004 Identity Registry, enabling decentralized agent discovery and identity verification. + +#### 3.8.2 Contract + +| Property | Value | +|----------|-------| +| Standard | ERC-721 (IdentityRegistryUpgradeable) | +| Base Sepolia | `0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D` | +| Base Mainnet | `0x8004A169...` (abbreviated) | + +#### 3.8.3 Client Operations + +| Method | Description | +|--------|-------------| +| `Register(ctx, key, agentURI)` | Mint new agent NFT, returns `agentId` (token ID) | +| `SetAgentURI(ctx, key, agentId, uri)` | Update the agent's metadata URI | +| `SetMetadata(ctx, key, agentId, entries)` | Set on-chain metadata key-value pairs | +| `GetMetadata(ctx, agentId, key)` | Read on-chain metadata | +| `TokenURI(ctx, agentId)` | Read the agent's metadata URI | + +#### 3.8.4 Agent Registration Document + +Served at `/.well-known/agent-registration.json`: + +```go +type AgentRegistration struct { + Type string // "https://eips.ethereum.org/EIPS/eip-8004#registration-v1" + Name string // Agent name + Description string // Human-readable description + Image string // Agent icon URL + Services []ServiceDef // Endpoints (web, A2A, MCP, OASF) + X402Support bool // Always true for Obol Stack agents + Active bool // Service availability + Registrations []OnChainReg // On-chain records [{agentId, agentRegistry}] + SupportedTrust []string // ["reputation", "crypto-economic", "tee-attestation"] +} +``` + +#### 3.8.5 Error States + +| Error | Cause | Recovery | +|-------|-------|---------| +| `erc8004: dial` | RPC endpoint unreachable | Check network connectivity, verify RPC URL | +| `erc8004: register tx` | Transaction submission failed | Check wallet balance (gas), verify contract address | +| `erc8004: wait mined` | Transaction not mined | Retry, check network congestion | + +--- + +### 3.9 Standalone Inference Gateway + +**Source:** `internal/inference/gateway.go`, `internal/inference/container.go`, `internal/inference/store.go`, `internal/inference/client.go` + +#### 3.9.1 Purpose + +Provide a standalone, bare-metal OpenAI-compatible HTTP gateway with x402 payment gating and optional hardware-backed encryption (Secure Enclave or TEE). + +#### 3.9.2 Configuration + +```go +type GatewayConfig struct { + ListenAddr string // default ":8402" + UpstreamURL string // e.g., "http://localhost:11434" + WalletAddress string // USDC recipient + PricePerRequest string // default "0.001" + Chain x402.ChainConfig // default BaseSepolia + FacilitatorURL string // default "https://facilitator.x402.rs" + VerifyOnly bool // skip settlement + EnclaveTag string // macOS Secure Enclave key tag + VMMode bool // Apple Containerization VM + VMImage string // default "ollama/ollama:latest" + VMCPUs int // default 4 + VMMemoryMB int // default 8192 + VMHostPort int // default 11435 + VMBinary string // default "container" + TEEType string // "tdx", "snp", "nitro", "stub" + ModelHash string // SHA-256 of served model + NoPaymentGate bool // disable x402 (cluster mode) +} +``` + +#### 3.9.3 Middleware Stack + +The gateway composes middleware layers from innermost to outermost: + +``` +Client -> x402 Payment Gate -> Enclave/TEE Decrypt -> Reverse Proxy -> Upstream (Ollama) +``` + +| Layer | Condition | Behavior | +|-------|-----------|---------| +| x402 Payment Gate | `!NoPaymentGate` | Returns 402 for unpaid requests | +| Enclave Middleware | `EnclaveTag != ""` or `TEEType != ""` | Decrypts `application/x-obol-encrypted` bodies via SE/TEE key | +| Reverse Proxy | Always | Forwards to upstream inference service | + +#### 3.9.4 Endpoints + +| Endpoint | Auth | Description | +|----------|------|-------------| +| `GET /health` | None | Liveness probe | +| `GET /v1/enclave/pubkey` | None | SE/TEE public key (enclave mode only) | +| `GET /v1/attestation` | None | TEE attestation report (TEE mode only) | +| `POST /v1/chat/completions` | x402 | Chat completions (payment-gated) | +| `POST /v1/completions` | x402 | Text completions (payment-gated) | +| `POST /v1/embeddings` | x402 | Embeddings (payment-gated) | +| `GET /v1/models` | x402 | Model list (payment-gated) | +| `* /` | None | Passthrough to upstream | + +#### 3.9.5 VM Mode + +When `--vm` is set, the gateway: + +1. Starts an OCI container via Apple Containerization (`container` CLI). +2. Maps the container's Ollama port 11434 to `VMHostPort` on the host. +3. Overrides `UpstreamURL` with `http://localhost:`. +4. On `Stop()`, gracefully shuts down the container with a 30-second timeout. + +#### 3.9.6 Encryption Scheme (Enclave / TEE) + +**Source:** `internal/enclave/enclave.go` + +The `Key` interface provides hardware-backed P-256 key management: + +| Method | Description | +|--------|-------------| +| `PublicKeyBytes()` | Uncompressed 65-byte SEC1 public key | +| `Sign(digest)` | ECDSA signature via SE/TEE private key | +| `ECDH(peerPubKey)` | Diffie-Hellman shared secret | +| `Decrypt(ciphertext)` | Full ECIES decryption | +| `Persistent()` | Whether key survives process restart | + +**Wire format:** + +``` +[1 byte] version (0x01) +[65 bytes] uncompressed ephemeral public key +[12 bytes] AES-GCM nonce +[n bytes] ciphertext +[16 bytes] AES-GCM authentication tag +``` + +**Implementations:** + +| Platform | Backend | Source | +|----------|---------|-------| +| macOS (CGO) | Apple Secure Enclave (Security.framework) | `enclave_darwin.go` | +| Linux TEE | TDX, SNP, Nitro, or stub | `internal/tee/` | +| Other | `ErrNotSupported` | `enclave_stub.go` | + +--- + +## 4. API and Protocol Definition + +### 4.1 x402 Payment Protocol + +#### 4.1.1 Request Flow + +```mermaid +sequenceDiagram + participant Client + participant Traefik + participant Verifier as x402-verifier + participant Facilitator + participant Upstream + + Client->>Traefik: GET /services/myapi/data + Traefik->>Verifier: ForwardAuth (X-Forwarded-Uri: /services/myapi/data) + Verifier->>Verifier: Match route -> price $0.001 + Verifier-->>Traefik: 402 PaymentRequired + Traefik-->>Client: 402 + PaymentRequirements JSON + + Note over Client: Sign ERC-3009 TransferWithAuthorization + + Client->>Traefik: GET /services/myapi/data
X-PAYMENT: base64(PaymentPayload) + Traefik->>Verifier: ForwardAuth + X-PAYMENT + Verifier->>Facilitator: POST /verify (PaymentPayload) + Facilitator-->>Verifier: {valid: true} + Verifier-->>Traefik: 200 OK + Authorization header + Traefik->>Upstream: GET /data (Authorization: Bearer sk-...) + Upstream-->>Client: 200 OK + response +``` + +#### 4.1.2 PaymentRequired Response (402) + +```json +{ + "x402Version": 1, + "accepts": [ + { + "scheme": "exact", + "network": "eip155:84532", + "maxAmountRequired": "1000", + "resource": "https://seller.example.com/services/myapi/data", + "asset": "0x036CbD53842c5426634e7929541eC2318f3dCF7e", + "payTo": "0x...", + "maxTimeoutSeconds": 300 + } + ] +} +``` + +#### 4.1.3 PaymentPayload (X-PAYMENT header) + +```json +{ + "x402Version": 1, + "scheme": "exact", + "network": "eip155:84532", + "payload": { + "signature": "0x...", + "authorization": { + "from": "0x...", + "to": "0x...", + "value": "1000", + "validAfter": "0", + "validBefore": "115792089237316195423570985008687907853269984665640564039457584007913129639935", + "nonce": "0x..." + } + } +} +``` + +### 4.2 CLI Command Tree + +``` +obol +├── stack +│ ├── init [--force] [--backend k3d|k3s] +│ ├── up +│ ├── down +│ └── purge [--force] +├── agent +│ └── init +├── network +│ ├── list +│ ├── install [--id ] [flags] +│ ├── add [--allow-writes] [--endpoint ] +│ ├── remove +│ ├── status +│ ├── sync +│ └── delete +├── sell +│ ├── inference --model [--price|--per-mtok] [--vm] +│ ├── http --wallet --chain [--price|--per-request|--per-mtok] +│ │ --upstream --port --namespace [--health-path ] +│ ├── list +│ ├── status +│ ├── stop +│ ├── delete +│ ├── pricing --wallet --chain +│ └── register --name --private-key-file +├── openclaw +│ ├── onboard +│ ├── setup +│ ├── sync +│ ├── list +│ ├── delete +│ ├── dashboard +│ ├── cli +│ ├── token +│ └── skills +├── model +│ ├── setup [--provider ] [custom --name --endpoint --model] +│ └── status +├── app +│ ├── install [--id ] +│ ├── sync +│ ├── list +│ └── delete +├── tunnel +│ ├── status +│ ├── login [--hostname ] +│ ├── provision +│ ├── restart +│ └── logs [--follow] +├── kubectl (passthrough, auto KUBECONFIG) +├── helm (passthrough, auto KUBECONFIG) +├── helmfile (passthrough, auto KUBECONFIG) +├── k9s (passthrough, auto KUBECONFIG) +├── update +├── upgrade +└── version +``` + +--- + +## 5. Data Model + +### 5.1 Configuration Files + +| File | Location | Format | Purpose | +|------|----------|--------|---------| +| `.stack-id` | `$CONFIG_DIR/` | Plain text | Cluster petname identifier | +| `.stack-backend` | `$CONFIG_DIR/` | Plain text | `k3d` or `k3s` | +| `kubeconfig.yaml` | `$CONFIG_DIR/` | YAML | Kubernetes API access | +| `cloudflared.json` | `$CONFIG_DIR/tunnel/` | JSON | Tunnel state (mode, hostname, IDs) | +| `defaults/` | `$CONFIG_DIR/` | Helmfile + YAML | Infrastructure deployment manifests | +| `networks///` | `$CONFIG_DIR/` | Helmfile + YAML | Per-network deployment configs | + +### 5.2 Kubernetes Resources (by namespace) + +| Namespace | Resources | +|-----------|-----------| +| `traefik` | GatewayClass, Gateway, cloudflared Deployment, tunnel-storefront (Deployment, Service, ConfigMap, HTTPRoute) | +| `llm` | LiteLLM Deployment (+ x402-buyer sidecar), `litellm-config` ConfigMap, `litellm-secrets` Secret | +| `x402` | x402-verifier Deployment, `x402-pricing` ConfigMap, `x402-secrets` Secret, ServiceMonitor | +| `openclaw-obol-agent` | OpenClaw Deployment, remote-signer Deployment, wallet Secret, RBAC (ClusterRole, ClusterRoleBinding) | +| `erpc` | eRPC Deployment, `erpc-config` ConfigMap | +| `obol-frontend` | Frontend Deployment, `obol-stack-config` ConfigMap | +| `monitoring` | Prometheus stack | +| `-` | Execution layer, consensus layer, per-network resources | +| (cluster-scoped) | ServiceOffer CRD (`obol.org`), `openclaw-monetize` ClusterRole | + +### 5.3 ServiceOffer CRD Schema + +```yaml +apiVersion: obol.org/v1alpha1 +kind: ServiceOffer +metadata: + name: my-inference + namespace: openclaw-obol-agent +spec: + type: inference # WorkloadType: inference | fine-tuning + model: + name: qwen3.5:9b + runtime: ollama + upstream: + service: litellm + namespace: llm + port: 4000 + healthPath: /health/readiness + payment: + scheme: exact # x402 payment scheme + network: base-sepolia # Human-friendly chain name + payTo: "0x..." # USDC recipient + maxTimeoutSeconds: 300 + price: + perRequest: "0.001" # USDC per request + perMTok: "1.00" # USDC per million tokens (phase 1: /1000) + perHour: "5.00" # USDC per compute-hour (fine-tuning) + path: /services/my-inference # URL path prefix + registration: + enabled: true + name: "My Inference Agent" + description: "Sells qwen3.5:9b inference" + image: "https://example.com/icon.png" + services: + - name: web + endpoint: "" # Auto-filled from tunnel URL + supportedTrust: + - reputation +status: + conditions: + - type: ModelReady + status: "True" + - type: UpstreamHealthy + status: "True" + - type: PaymentGateReady + status: "True" + - type: RoutePublished + status: "True" + - type: Registered + status: "True" + - type: Ready + status: "True" + endpoint: "https://stack.example.com/services/my-inference" + agentId: "42" + registrationTxHash: "0x..." +``` + +### 5.4 Wallet (Web3 V3 Keystore) + +```json +{ + "address": "aabbccdd...", + "crypto": { + "cipher": "aes-128-ctr", + "ciphertext": "...", + "cipherparams": { "iv": "..." }, + "kdf": "scrypt", + "kdfparams": { "dklen": 32, "n": 262144, "r": 8, "p": 1, "salt": "..." }, + "mac": "..." + }, + "id": "uuid", + "version": 3 +} +``` + +--- + +## 6. Integration Points + +### 6.1 External Services + +| Service | Protocol | Purpose | Configuration | +|---------|----------|---------|---------------| +| Cloudflare Tunnel | HTTPS/QUIC | Public internet exposure | `obol tunnel login` / auto-provisioned | +| x402 Facilitator | HTTPS POST | Payment verification + settlement | `facilitatorURL` (default: `https://facilitator.x402.rs`) | +| ChainList API | HTTPS GET | Public RPC endpoint discovery | Used by `obol network add ` | +| Ollama API | HTTP | Local LLM inference | `http://localhost:11434` (host) | +| Anthropic API | HTTPS | Cloud LLM inference | `ANTHROPIC_API_KEY` env var | +| OpenAI API | HTTPS | Cloud LLM inference | `OPENAI_API_KEY` env var | +| Base Sepolia RPC | HTTPS | ERC-8004 registration + ERC-3009 settlement | Via eRPC or direct endpoint | + +### 6.2 Internal Service Communication + +```mermaid +graph LR + subgraph "Ingress" + T[Traefik :80/:443] + end + + subgraph "Auth" + V[x402-verifier :8080] + end + + subgraph "Compute" + L[LiteLLM :4000] + B[x402-buyer :8402] + O[Ollama :11434] + end + + subgraph "Data" + E[eRPC :4000] + EL[Execution Layer :8545] + end + + subgraph "Agent" + A[OpenClaw] + RS[Remote Signer :9000] + end + + T -->|ForwardAuth| V + T -->|upstream| L + T -->|local| E + L -->|ollama/*| O + L -->|paid/*| B + B -->|x402| Internet((Internet)) + E --> EL + A --> RS + A --> L +``` + +--- + +## 7. Security Model + +### 7.1 Tunnel Exposure + +The Cloudflare tunnel is the primary attack surface. The security model ensures only intentionally public endpoints are reachable via the tunnel. + +| Route | Exposure | Protection | +|-------|----------|-----------| +| `/services/*` | Public via tunnel | x402 payment gate (ForwardAuth) | +| `/.well-known/agent-registration.json` | Public via tunnel | Read-only, no sensitive data | +| `/skill.md` | Public via tunnel | Read-only service catalog | +| `/` (tunnel hostname) | Public via tunnel | Static HTML storefront | +| `/` (obol.stack) | Local only | `hostnames: ["obol.stack"]` restriction | +| `/rpc` | Local only | `hostnames: ["obol.stack"]` restriction | +| LiteLLM admin | Local only | Not exposed via any HTTPRoute | +| Prometheus | Local only | `hostnames: ["obol.stack"]` restriction | + +**Invariants (NEVER violate):** + +- Frontend and eRPC HTTPRoutes MUST have `hostnames: ["obol.stack"]`. +- Internal services MUST NOT have HTTPRoutes without hostname restrictions. +- The frontend, RPC gateway, monitoring, and LiteLLM admin MUST NOT be reachable via the tunnel. + +### 7.2 Payment Security + +| Property | Mechanism | +|----------|-----------| +| Payment integrity | EIP-712 typed signatures verified by facilitator | +| Replay protection | Random 32-byte nonces in ERC-3009 authorizations | +| Bounded spending (buyer) | Finite pool of pre-signed auths; max loss = N * price | +| Zero signer access (buyer) | Sidecar has no private key; only pre-signed vouchers | +| Facilitator HTTPS | `ValidateFacilitatorURL()` enforces HTTPS (loopback exempted) | +| Settlement verification | Facilitator verifies on-chain before confirming | + +### 7.3 Wallet Security + +| Property | Mechanism | +|----------|-----------| +| Key generation | secp256k1 via `crypto/rand` | +| Key storage | Web3 V3 keystore (scrypt KDF + AES-128-CTR) | +| Key access | Remote-signer REST API (port 9000, in-namespace only) | +| Enclave keys | Apple Secure Enclave (P-256, private key never leaves hardware) | +| TEE keys | Generated inside TEE (TDX/SNP/Nitro), bound to attestation | +| Wallet backup | `PromptBackupBeforePurge()` before destructive operations | + +### 7.4 Enclave / TEE Security + +| Property | macOS Secure Enclave | Linux TEE | +|----------|---------------------|-----------| +| Key generation | In-hardware (SEP) | In-enclave | +| Private key access | Never exported | Never exported | +| SIP requirement | `CheckSIP()` enforced | N/A | +| Attestation | N/A | Hardware-signed quote binding pubkey + model hash | +| Persistence | Keychain (persistent or ephemeral) | Per-enclave instance | + +### 7.5 RBAC + +The `openclaw-monetize` ClusterRole grants the OpenClaw agent CRUD access to: + +- ServiceOffers (`obol.org`) +- Middlewares (`traefik.io`) +- HTTPRoutes (`gateway.networking.k8s.io`) +- ConfigMaps, Services, Deployments (core) +- Read-only: Pods, Endpoints, logs + +Bound to ServiceAccount `openclaw` in `openclaw-obol-agent` namespace via ClusterRoleBinding. Patched by `obol agent init` via `patchMonetizeBinding()`. + +--- + +## 8. Error Handling + +### 8.1 Error Handling Strategy + +The codebase uses a layered error handling approach: + +| Layer | Strategy | +|-------|---------| +| CLI commands | Return `error` to `urfave/cli` which prints and exits non-zero | +| Non-fatal operations | Log warning via `u.Warnf()`, continue execution | +| Infrastructure deployment | Fatal: auto-cleanup via `Down()` on helmfile sync failure | +| Config hot-reload | Log error, keep previous config (verifier, buyer) | +| Network operations | `kubectl.EnsureCluster()` guard at entry points | + +### 8.2 Graceful Degradation + +| Component | Failure Mode | Behavior | +|-----------|-------------|---------| +| Ollama not running | Auto-configure skipped | LiteLLM starts without local models; user can add later | +| Cloud API key missing | Warning printed | Provider not configured; manual `obol model setup` possible | +| OpenClaw setup fails | Warning printed | User can run `obol openclaw onboard` manually | +| Tunnel not available | Warning printed | Services work locally; sell commands will start tunnel on demand | +| DNS resolver fails | Warning printed | `obol.stack` hostname resolution may not work; IP access still works | +| Pre-signed auths exhausted | 404 for model | Agent must pre-sign more via `buy.py` | + +### 8.3 Atomic Operations + +| Operation | Atomicity Mechanism | +|-----------|-------------------| +| Config reload (verifier) | `atomic.Pointer` swap | +| Config reload (buyer) | Mutex-guarded `Reload()` rebuilds all handlers | +| Auth consumption | Mutex-guarded pop from pool with `onConsume` callback | +| Tunnel state | File write with `0600` permissions | +| Backend switching | Destroy old backend before initializing new one | + +--- + +## 9. Performance + +### 9.1 Resource Allocation + +| Component | CPU Request | Memory Request | CPU Limit | Memory Limit | +|-----------|-----------|---------------|----------|-------------| +| Storefront httpd | 5m | 8Mi | 20m | 16Mi | +| x402-verifier | (cluster default) | (cluster default) | -- | -- | +| LiteLLM | (cluster default) | (cluster default) | -- | -- | +| x402-buyer sidecar | (cluster default) | (cluster default) | -- | -- | +| OpenClaw agent | (cluster default) | (cluster default) | -- | -- | + +### 9.2 Caching + +| Cache | TTL | Purpose | +|-------|-----|---------| +| eRPC `eth_call` | 10s (unfinalized) | Avoid redundant RPC calls | +| x402-verifier chain resolution | Permanent (per-load) | Pre-resolve all chain configs during `load()` | +| LiteLLM model routing | Permanent (until restart) | Static model_list in ConfigMap | + +### 9.3 Hot Paths + +| Path | Optimization | +|------|-------------| +| x402 ForwardAuth verify | `atomic.Pointer` for lock-free config reads; pre-resolved chain map | +| x402-buyer auth pop | Single mutex lock per Sign() call; O(1) pool pop | +| Route matching | First-match short-circuit; no regex compilation per request | +| Buyer model routing | `sync.RWMutex` for concurrent reads; rebuild only on Reload() | + +### 9.4 Known Latencies + +| Operation | Typical Latency | Notes | +|-----------|----------------|-------| +| ConfigMap propagation | 60-120s | k3d file watcher interval | +| Quick tunnel URL | 10-20s | Cloudflare registration after pod start | +| x402 facilitator verify | 100-500ms | Network round-trip to facilitator | +| Helmfile sync (initial) | 2-5min | Full infrastructure deployment | +| LiteLLM restart | 10-30s | Pod termination + startup | + +--- + +## 10. Testing Strategy + +### 10.1 Test Organization + +| Category | Build Tag | Location | Prerequisites | +|----------|----------|----------|---------------| +| Unit tests | (none) | `*_test.go` alongside source | `go test ./...` | +| Integration tests | `integration` | `internal/openclaw/integration_test.go` | Running cluster + Ollama + `OBOL_DEVELOPMENT=true` | +| BDD tests | `integration` | `internal/x402/bdd_integration_test.go` | Running cluster | + +### 10.2 Unit Test Coverage + +| Package | Key Test Files | Coverage Focus | +|---------|---------------|----------------| +| `internal/x402` | `config_test.go`, `verifier_test.go`, `matcher_test.go`, `validate_test.go`, `watcher_test.go` | Pricing config parsing, route matching, ForwardAuth responses, HTTPS validation | +| `internal/x402/buyer` | `signer_test.go`, `proxy_test.go` | Auth pool exhaustion, model resolution, payment attachment | +| `internal/erc8004` | `abi_test.go`, `client_test.go`, `types_test.go` | ABI encoding, registration document schema | +| `internal/schemas` | `serviceoffer_test.go`, `payment_test.go` | CRD field validation, price approximation | +| `internal/network` | `erpc_test.go`, `chainlist_test.go`, `resolve_test.go` | ConfigMap patching, chain resolution | +| `internal/model` | `model_test.go` | Provider detection, model entry building | +| `internal/stack` | `stack_test.go`, `backend_test.go`, `backend_k3s_test.go` | Backend abstraction, port checking | +| `internal/openclaw` | `wallet_test.go`, `wallet_backup_test.go`, `overlay_test.go`, `version_test.go`, `resolve_test.go` | Keystore generation, version consistency, instance resolution | +| `internal/inference` | `gateway_test.go`, `store_test.go`, `client_test.go`, `enclave_middleware_test.go` | Gateway handler, deployment persistence, encryption middleware | +| `internal/tunnel` | `tunnel_test.go` | URL parsing, state management | +| `internal/embed` | `embed_crd_test.go` | CRD + RBAC validation of embedded manifests | +| `cmd/obol` | `sell_test.go` | CLI flag parsing and validation | + +### 10.3 Integration Tests + +Integration tests use `//go:build integration` and require: + +```bash +export OBOL_DEVELOPMENT=true +export OBOL_CONFIG_DIR=$(pwd)/.workspace/config +export OBOL_BIN_DIR=$(pwd)/.workspace/bin +export OBOL_DATA_DIR=$(pwd)/.workspace/data +go build -o .workspace/bin/obol ./cmd/obol +go test -tags integration -v -timeout 15m ./internal/openclaw/ +``` + +**Key integration test:** `TestIntegration_Tunnel_SellDiscoverBuySidecar_QuotaAndBalance` validates the full paid-inference commerce loop (requires `qwen3.5:9b` model): + +1. Sell inference via `obol sell http` +2. Discover service via tunnel +3. Buy inference using pre-signed auths +4. Verify quota consumption and balance + +### 10.4 BDD Tests + +Gherkin-style BDD tests in `internal/x402/features/` exercise the x402 payment flow end-to-end using `godog`: + +- Payment verification happy path +- Payment rejection (insufficient funds, wrong chain) +- Route matching edge cases +- Config hot-reload during operation + +### 10.5 Version Consistency Tests + +`TestOpenClawVersionConsistency` in `internal/openclaw/version_test.go` reads all 3 version-pinning locations and fails if they disagree. This prevents version drift between the Go binary and the shell installer. + +### 10.6 Running Tests + +```bash +# All unit tests +go test ./... + +# Single test +go test -v -run 'TestMatchRoute' ./internal/x402/ + +# Integration tests (requires running cluster) +go test -tags integration -v -timeout 15m ./internal/openclaw/ + +# Full commerce loop (requires qwen3.5:9b) +go test -tags integration -v -run TestIntegration_Tunnel_SellDiscoverBuySidecar_QuotaAndBalance \ + -timeout 30m ./internal/openclaw/ + +# Check compilation only +go build ./... +``` diff --git a/docs/specs/adr/0001-local-first-k3d.md b/docs/specs/adr/0001-local-first-k3d.md new file mode 100644 index 00000000..99c10872 --- /dev/null +++ b/docs/specs/adr/0001-local-first-k3d.md @@ -0,0 +1,62 @@ +# ADR-0001: Local-First Kubernetes via k3d + +**Status:** Accepted +**Date:** 2026-03-27 + +## Context + +Obol Stack needs a reproducible local Kubernetes cluster that supports: + +- Port forwarding from the host (80, 443, 8080, 8443) for Traefik ingress. +- Docker image import for locally built images (x402-verifier, x402-buyer) during development. +- Fast startup times (under 60 seconds) for developer iteration. +- Consistent behavior across macOS and Linux. +- Access to host services (Ollama) from within the cluster. + +The main alternatives considered were: + +| Option | Pros | Cons | +|--------|------|------| +| **k3d** | Docker-based, fast startup, native image import, multi-platform, port mapping via k3d config | Requires Docker, k3s-only | +| **minikube** | Multi-driver (Docker, HyperKit, VirtualBox), wide adoption | Slower startup, heavier resource usage, image import via registry or `minikube image load` | +| **kind** | Docker-based, widely used for CI | No native port mapping (requires manual extraPortMappings), no built-in Ollama host routing | +| **bare-metal k3s** | No Docker dependency, direct host access | Requires root or systemd, harder to isolate, no image import | + +## Decision + +Use **k3d** as the default Kubernetes backend for local development and operation. + +Additionally, implement a `Backend` interface (`internal/stack/backend.go`) to abstract the runtime, allowing a secondary `K3sBackend` for bare-metal deployments where Docker is unavailable (e.g., production edge nodes). + +## Rationale + +1. **Port forwarding**: k3d natively maps host ports to the k3s server in its YAML config, avoiding manual iptables or NodePort workarounds. +2. **Image import**: `k3d image import` loads locally built Docker images directly into the cluster, critical for `OBOL_DEVELOPMENT=true` builds of x402-verifier and x402-buyer. +3. **Fast startup**: k3d cluster creation completes in 10-30 seconds, compared to 60-120 seconds for minikube. +4. **Host access**: k3d provides `host.docker.internal` (macOS) and `host.k3d.internal` (Linux) for Ollama connectivity. +5. **k3s compatibility**: k3d wraps k3s, so manifests placed in `/var/lib/rancher/k3s/server/manifests/` auto-apply on startup -- used for infrastructure deployment. + +## Consequences + +### Positive + +- Reproducible single-cluster setup with a declarative k3d YAML config. +- `obol stack up` reliably creates, configures, and tears down clusters. +- Development workflow is fast: build image, import, restart pod. +- The `Backend` interface means k3s bare-metal is also supported without code duplication. + +### Negative + +- **Docker dependency**: Operators must have Docker or Podman running. This excludes minimal environments without containerization. +- **Single cluster**: One k3d cluster per config directory. Multiple stacks require separate `OBOL_CONFIG_DIR` values. +- **Port conflicts**: k3d binds host ports 80/443/8080/8443 directly; other services using these ports cause startup failure. +- **Kubeconfig port drift**: The k3d API server port can change between cluster restarts, requiring `k3d kubeconfig write` to refresh. +- **ConfigMap propagation delay**: k3d's file watcher introduces 60-120 second delays for manifest changes placed in the k3s manifests directory. +- **Ollama host resolution varies**: `host.docker.internal` on macOS, `host.k3d.internal` on Linux, `127.0.0.1` for k3s -- resolved at `obol stack init` time. + +## SPEC References + +- Section 2.4 -- Backend Abstraction +- Section 3.1 -- Stack Lifecycle +- Section 1.3 -- System Constraints (absolute paths, single cluster) +- Section 3.1.4 -- Ollama Host Resolution diff --git a/docs/specs/adr/0002-litellm-gateway.md b/docs/specs/adr/0002-litellm-gateway.md new file mode 100644 index 00000000..bc6475b2 --- /dev/null +++ b/docs/specs/adr/0002-litellm-gateway.md @@ -0,0 +1,62 @@ +# ADR-0002: LiteLLM as Unified LLM Gateway + +**Status:** Accepted +**Date:** 2026-03-27 + +## Context + +The OpenClaw agent and cluster services need to access LLM inference from multiple providers: + +- **Ollama** (local, no API key) for on-device models like qwen3.5:9b. +- **Anthropic** (cloud, API key) for Claude models. +- **OpenAI** (cloud, API key) for GPT models. +- **Paid remote sellers** (x402-gated) for purchased inference from other agents. + +The application layer (OpenClaw, LiteLLM overlays, downstream apps) should not need to know which provider serves a given model. A single OpenAI-compatible endpoint simplifies routing, auth, and configuration. + +Alternatives considered: + +| Option | Pros | Cons | +|--------|------|------| +| **LiteLLM** | OpenAI-compatible proxy, multi-provider, ConfigMap-driven, wildcard routing | `drop_params` behavior can silently discard unsupported fields, restart required for config changes | +| **Direct provider SDKs** | No proxy overhead, full parameter control | Each consumer must handle auth + routing per provider, no unified API | +| **vLLM / llm-d** | High-performance serving, GPU scheduling | Different abstraction layer (model serving, not routing); evaluated and rejected for this role | +| **Custom proxy** | Full control | Maintenance burden, reimplements LiteLLM's model routing | + +## Decision + +Use **LiteLLM** (deployed as a Kubernetes Deployment in the `llm` namespace on port 4000) as the unified LLM gateway for all inference routing. + +## Rationale + +1. **Single API surface**: All consumers (OpenClaw agent, apps, tests) use `http://litellm.llm.svc:4000/v1` with standard OpenAI client libraries. +2. **Multi-provider routing**: LiteLLM's `model_list` supports exact names (Ollama models), wildcards (`anthropic/*`, `openai/*`), and catch-alls (`paid/*`). +3. **ConfigMap-driven**: The `litellm-config` ConfigMap and `litellm-secrets` Secret are patched by Go code (`internal/model/model.go`) without forking LiteLLM. +4. **Auto-configuration**: During `obol stack up`, `autoConfigureLLM()` detects Ollama models and cloud API keys, patches config + secret, and performs a single restart. +5. **Paid inference integration**: The static `paid/*` route forwards to the `x402-buyer` sidecar at `http://127.0.0.1:8402/v1`, keeping the LiteLLM image unmodified. +6. **Per-instance overlay**: `buildLiteLLMRoutedOverlay()` reuses the "ollama" provider slot pointing at `litellm.llm.svc:4000/v1`, enabling app-level model aliasing without additional infrastructure. + +## Consequences + +### Positive + +- Unified endpoint for all LLM access -- no provider-specific client code needed. +- Adding a new provider is a ConfigMap patch + Secret update + restart. +- Paid inference works through vanilla LiteLLM with a static route to the buyer sidecar. +- `dangerouslyDisableDeviceAuth` is enabled for Traefik-proxied access, avoiding auth double-gate. + +### Negative + +- **`drop_params` risk**: LiteLLM silently drops parameters not supported by the target provider. This can cause subtle behavior differences between providers for the same model name. +- **Restart required**: Config changes require a Deployment restart (10-30 second latency). There is no live-reload mechanism. +- **Single point of failure**: All inference routes through one LiteLLM pod. Pod failure means no inference until restart. +- **ConfigMap complexity**: The `litellm-config` ConfigMap grows with every provider and model. Patching logic in `internal/model/model.go` must handle merges carefully. +- **Version coupling**: Pinned LiteLLM image (v1.82.3 as of writing, pinned for supply chain security) must be updated when new provider features are needed. + +## SPEC References + +- Section 3.2 -- LLM Routing +- Section 3.2.4 -- Logic (autoConfigureLLM, paid inference routing) +- Section 3.2.5 -- LiteLLM Config Structure +- Section 3.5 -- Monetize Buy Side (paid/* route) +- Section 3.6.4 -- Cloud Provider Detection diff --git a/docs/specs/adr/0003-x402-payment-gating.md b/docs/specs/adr/0003-x402-payment-gating.md new file mode 100644 index 00000000..f67e414c --- /dev/null +++ b/docs/specs/adr/0003-x402-payment-gating.md @@ -0,0 +1,65 @@ +# ADR-0003: x402 Payment Gating for Services + +**Status:** Accepted +**Date:** 2026-03-27 + +## Context + +Obol Stack needs a mechanism for operators to monetize cluster services (inference, HTTP endpoints). The payment system must be: + +- **Permissionless**: No API key registration, no account creation, no subscription management. +- **Per-request**: Each request is independently priced and paid for. +- **Gasless for buyers**: Buyers should not need to pay blockchain gas for every inference request. +- **Machine-to-machine**: AI agents must be able to pay autonomously without human interaction. +- **Composable**: Payment gating should work with any HTTP service behind Traefik, not just inference. + +Alternatives considered: + +| Option | Pros | Cons | +|--------|------|------| +| **x402 (HTTP 402)** | Permissionless, per-request, gasless via ERC-3009, standard HTTP, agent-native | Facilitator dependency, USDC-only, limited chain support | +| **API keys** | Simple, widely understood | Requires user registration, key management, not agent-native | +| **Stripe/subscriptions** | Established, fiat currency | Requires merchant account, not permissionless, not agent-to-agent | +| **Lightning Network** | Per-request micropayments, mature | Bitcoin-only, requires channel management, different user base | +| **State channels** | Low latency, off-chain | Complex setup, requires both parties online, custom protocol | + +## Decision + +Use the **x402 protocol** (HTTP 402 Payment Required) with **ERC-3009** (TransferWithAuthorization) for gasless USDC micropayments, implemented via **Traefik ForwardAuth** middleware. + +## Rationale + +1. **HTTP-native**: x402 uses standard HTTP 402 status codes. Any HTTP client can discover pricing by making an unauthenticated request. Payment is attached as an `X-PAYMENT` header. +2. **Gasless for buyers**: ERC-3009 `TransferWithAuthorization` allows pre-signed USDC transfers. The buyer signs once; the facilitator settles on-chain. No gas from the buyer. +3. **Traefik ForwardAuth**: The x402-verifier runs as a ForwardAuth middleware. Every request matching a Middleware is sent to `POST /verify`. This cleanly separates payment from business logic -- the upstream service never sees payment details. +4. **Facilitator delegation**: Payment verification and settlement are delegated to a trusted facilitator (`https://facilitator.x402.rs`). This simplifies the verifier to a stateless proxy. +5. **Multi-chain support**: The system supports Base, Polygon, and Avalanche (mainnet + testnet). Chain configuration is per-route. +6. **Agent-native**: AI agents can programmatically discover pricing (402 response), sign payments (ERC-3009), and consume services without human intervention. + +## Consequences + +### Positive + +- Any HTTP service can be monetized by adding a ServiceOffer CR -- no code changes to the upstream. +- Agents discover pricing automatically via the 402 response. +- USDC stablecoin avoids cryptocurrency price volatility. +- The ForwardAuth pattern means payment logic is fully decoupled from service logic. +- Route-level pricing: different paths can have different prices, wallets, and chains. + +### Negative + +- **Facilitator dependency**: Payment verification requires the facilitator to be reachable. If `facilitator.x402.rs` is down, all paid requests fail. No offline fallback exists. +- **USDC-only**: Only USDC is supported as the payment asset. Other stablecoins or tokens require facilitator support. +- **Limited chain support**: Only 6 chains (3 mainnets + 3 testnets) are supported. Adding new chains requires code changes to the chain resolution logic. +- **Phase 1 pricing approximation**: `perMTok` pricing is approximated as `perMTok / 1000` (fixed 1000 tokens per request). Exact token metering is deferred to phase 2. +- **HTTPS requirement**: The facilitator URL must use HTTPS (loopback exempted for testing). This prevents local-only facilitator setups without TLS. +- **Settlement latency**: Facilitator verification adds 100-500ms per request. This is acceptable for inference but may be too slow for high-frequency API calls. + +## SPEC References + +- Section 3.4 -- Monetize Sell Side +- Section 4.1 -- x402 Payment Protocol +- Section 3.4.4 -- x402-verifier (ForwardAuth) +- Section 3.4.5 -- Pricing +- Section 3.4.6 -- Supported Chains +- Section 7.2 -- Payment Security diff --git a/docs/specs/adr/0004-pre-signed-erc3009-buyer.md b/docs/specs/adr/0004-pre-signed-erc3009-buyer.md new file mode 100644 index 00000000..e173f450 --- /dev/null +++ b/docs/specs/adr/0004-pre-signed-erc3009-buyer.md @@ -0,0 +1,61 @@ +# ADR-0004: Pre-Signed ERC-3009 Voucher Pool for Buy-Side Payments + +**Status:** Accepted +**Date:** 2026-03-27 + +## Context + +The OpenClaw agent needs to purchase inference from remote x402-gated sellers. The buy-side payment mechanism must satisfy: + +- **No hot wallet in the sidecar**: The x402-buyer sidecar must never have access to a private key. A compromised sidecar should not drain the wallet. +- **Bounded spending**: The maximum possible loss must be known and capped at deployment time. +- **Low latency**: Payment attachment must not add significant overhead to each inference request. +- **Restart resilience**: Consumed vouchers must not be reused after a sidecar restart. + +Alternatives considered: + +| Option | Pros | Cons | +|--------|------|------| +| **Pre-signed ERC-3009 vouchers** | Zero signer in sidecar, bounded loss (N * price), O(1) per request | Finite pool requires replenishment, storage in ConfigMap | +| **Hot wallet in sidecar** | Sign on demand, no pool management | Compromised sidecar = drained wallet, unbounded loss | +| **Allowance (ERC-20 approve)** | Standard pattern, no pre-signing | Unbounded spending once approved, requires revocation | +| **Permit (ERC-2612)** | Gasless approval | Still requires a signer for each permit, not supported by all tokens | +| **Payment channel** | Amortized gas, high throughput | Complex setup, requires both parties online, custom protocol | + +## Decision + +Pre-sign a **bounded batch of ERC-3009 `TransferWithAuthorization`** vouchers using the agent wallet (via `buy.py`), store them in Kubernetes ConfigMaps, and have the `x402-buyer` sidecar pop one voucher per paid request. + +## Rationale + +1. **Zero signer access**: The sidecar only reads from ConfigMaps. It has no private key, no signing capability, no wallet access. The `PreSignedSigner` implements the `x402.Signer` interface by popping from a finite pool. +2. **Bounded loss**: If the sidecar is compromised or misbehaves, the maximum loss is exactly `N * price` where N is the number of pre-signed vouchers. This is decided at `buy.py buy --count N` time. +3. **O(1) per request**: Popping a voucher is a mutex-guarded array pop. No cryptographic operations at request time. No network calls for signing. +4. **Restart resilience**: The `StateStore` persists consumed nonces. On restart, the sidecar reloads the state and skips already-consumed vouchers. +5. **ConfigMap-native**: Vouchers and upstream config are standard Kubernetes ConfigMaps, managed by `buy.py` (the agent's buy skill). No custom storage backend. +6. **Separation of concerns**: The agent (`buy.py`) handles discovery, negotiation, and pre-signing. The sidecar handles only payment attachment and forwarding. LiteLLM routes via the static `paid/*` entry. + +## Consequences + +### Positive + +- Security posture is strong: compromised sidecar has no signing capability and bounded financial exposure. +- The sidecar is stateless except for the consumed-nonce tracker. Scaling or replacing it is trivial. +- `buy.py` can pre-sign vouchers for multiple sellers, each with different prices and chains. +- The LiteLLM configuration is static (`paid/* -> :8402`); no dynamic reconfiguration needed per seller. + +### Negative + +- **Pool exhaustion**: When all vouchers are consumed, the sidecar returns `pre-signed auth pool exhausted`. The agent must run `buy.py` again to replenish. There is no automatic replenishment. +- **ConfigMap size limits**: Kubernetes ConfigMaps have a ~1MB limit. Each voucher is ~500 bytes of JSON, so the practical limit is ~2000 vouchers per ConfigMap. Large pools may need sharding. +- **No partial spending**: Each voucher is for a fixed amount. If the seller's price changes, existing vouchers may become invalid (underpayment) or wasteful (overpayment). +- **Nonce tracking persistence**: The `StateStore` must survive restarts. If the state file is lost, there is a risk of attempting to reuse consumed nonces (which will fail on-chain, but wastes a request). +- **Double-spend prevention is on-chain**: The ERC-3009 contract itself prevents double-spend. If two sidecars share the same pool, only the first submission of each nonce succeeds. + +## SPEC References + +- Section 3.5 -- Monetize Buy Side +- Section 3.5.3 -- Architecture (zero signer access, bounded spending) +- Section 3.5.4 -- Configuration (ConfigMap structure) +- Section 3.5.7 -- Error States (pool exhaustion) +- Section 7.2 -- Payment Security (bounded spending, replay protection) diff --git a/docs/specs/adr/0005-traefik-gateway-api.md b/docs/specs/adr/0005-traefik-gateway-api.md new file mode 100644 index 00000000..1e6f2b69 --- /dev/null +++ b/docs/specs/adr/0005-traefik-gateway-api.md @@ -0,0 +1,62 @@ +# ADR-0005: Traefik with Kubernetes Gateway API + +**Status:** Accepted +**Date:** 2026-03-27 + +## Context + +Obol Stack requires an ingress layer that supports: + +- **Per-route middleware**: x402 ForwardAuth must apply only to `/services/*` routes, not to all traffic. +- **Hostname-based access control**: Internal services (frontend, eRPC, monitoring) must be restricted to `obol.stack` hostname, while public routes (x402-gated services, discovery endpoints) must be accessible via the Cloudflare tunnel hostname. +- **Dynamic route creation**: The monetize reconciler creates HTTPRoutes programmatically when ServiceOffers reach the RoutePublished stage. +- **Standard CRDs**: Routes should be managed as Kubernetes resources with ownerReferences for automatic garbage collection. + +Alternatives considered: + +| Option | Pros | Cons | +|--------|------|------| +| **Traefik + Gateway API** | Per-route middleware via Middleware CRD, hostname filtering on HTTPRoute, standard K8s Gateway API CRDs, built into k3s | Traefik-specific Middleware CRD (`traefik.io`), newer API surface | +| **Traefik + Ingress** | Simple, widely supported | No per-route middleware (annotations are per-Ingress), hostname restrictions are less granular | +| **Nginx Ingress** | Mature, widely deployed | No native ForwardAuth per route (requires custom annotations), no Gateway API support in standard controller | +| **Istio service mesh** | Full mTLS, advanced routing | Heavy resource footprint, complex for a local-first stack, overkill for HTTP routing | +| **Envoy Gateway** | Gateway API native | Less mature, no built-in ForwardAuth equivalent, additional deployment | + +## Decision + +Use **Traefik** as the cluster ingress controller with the **Kubernetes Gateway API** (GatewayClass, Gateway, HTTPRoute) for routing, combined with Traefik-specific **Middleware** CRDs (`traefik.io`) for ForwardAuth. + +## Rationale + +1. **Built into k3s**: Traefik is the default ingress controller in k3s/k3d. No additional installation or configuration needed. +2. **Gateway API HTTPRoute**: The `HTTPRoute` CRD supports `hostnames` filtering natively. Setting `hostnames: ["obol.stack"]` on internal routes ensures they are never matched by tunnel traffic (which arrives with the tunnel hostname). +3. **ForwardAuth Middleware**: Traefik's `Middleware` CRD (`traefik.io/v1alpha1`) supports `forwardAuth` configuration. The x402-verifier is referenced as a ForwardAuth target on per-route HTTPRoutes, so only `/services/*` traffic is payment-gated. +4. **OwnerReferences**: HTTPRoutes and Middlewares created by the monetize reconciler set ownerReferences to the ServiceOffer CR. Deleting a ServiceOffer cascades deletion to all routing resources. +5. **Single Gateway**: One `Gateway` resource (`traefik-gateway` in `traefik` namespace) handles all HTTP/HTTPS traffic. Routes reference it via `parentRefs`. +6. **Security by default**: The hostname restriction pattern makes it structurally impossible to accidentally expose internal services via the tunnel. Adding a new internal service requires explicitly setting `hostnames: ["obol.stack"]`. + +## Consequences + +### Positive + +- Clean separation between local-only routes (hostname-restricted) and public routes (no hostname restriction). +- The reconciler creates standard Kubernetes resources (HTTPRoute, Middleware) that are visible via `kubectl` and benefit from RBAC. +- ForwardAuth is applied per-route, not globally. Free routes (health, discovery) bypass the verifier entirely. +- Automatic garbage collection via ownerReferences prevents orphaned routes when ServiceOffers are deleted. +- The routing architecture is auditable: `kubectl get httproutes -A` shows all routes with their hostname restrictions. + +### Negative + +- **Traefik-specific Middleware**: The `Middleware` CRD is not part of the standard Gateway API. This couples the stack to Traefik. Migrating to another Gateway API controller would require replacing ForwardAuth with a different mechanism. +- **ExternalName incompatibility**: Traefik's Gateway API implementation does not support `ExternalName` Services. All upstreams must use `ClusterIP` + `Endpoints`, which required workarounds for cross-namespace routing. +- **GatewayClass singleton**: Only one `GatewayClass` (`traefik`) exists. Multi-tenant scenarios with different ingress controllers are not supported. +- **No mTLS**: Traefik in this configuration does not provide mutual TLS between services. Inter-service communication within the cluster is unencrypted (acceptable for a local-first stack). +- **Hostname discipline required**: Developers must remember to add `hostnames: ["obol.stack"]` to every internal HTTPRoute. The SPEC and CLAUDE.md document this as a security invariant, and code review must enforce it. + +## SPEC References + +- Section 2.2 -- Routing Architecture +- Section 3.4.4 -- x402-verifier (ForwardAuth) +- Section 7.1 -- Tunnel Exposure (security model, hostname restrictions) +- Section 5.2 -- Kubernetes Resources (traefik namespace) +- Section 7.5 -- RBAC (Middleware CRD access) diff --git a/docs/specs/adr/0006-erc8004-identity.md b/docs/specs/adr/0006-erc8004-identity.md new file mode 100644 index 00000000..cdee0afa --- /dev/null +++ b/docs/specs/adr/0006-erc8004-identity.md @@ -0,0 +1,67 @@ +# ADR-0006: ERC-8004 NFT-Based Identity Registry + +**Status:** Accepted +**Date:** 2026-03-27 + +## Context + +AI agents deployed via Obol Stack need a decentralized identity mechanism that supports: + +- **On-chain discoverability**: Other agents and users should be able to find and verify an agent's identity using public blockchain data. +- **Metadata storage**: The identity should carry structured metadata (name, description, services, trust mechanisms) that is machine-readable. +- **Ownership and control**: The agent operator must control the identity, with the ability to update metadata and transfer ownership. +- **Integration with x402**: The identity should declare x402 payment support so buyers know the agent accepts micropayments. +- **Graceful degradation**: Registration should work even without on-chain funds, falling back to off-chain-only discovery. + +Alternatives considered: + +| Option | Pros | Cons | +|--------|------|------| +| **ERC-8004 Identity Registry** | NFT-based (ERC-721), on-chain metadata, purpose-built for agents, `.well-known` discovery | Base Sepolia deployment, NFT mint cost (gas), newer standard | +| **ENS (Ethereum Name Service)** | Established, human-readable names | Ethereum mainnet gas costs, annual renewal, no structured agent metadata | +| **DID (Decentralized Identifiers)** | W3C standard, multi-chain | No single registry, resolution complexity, no native NFT ownership | +| **Custom registry contract** | Full control over schema | Maintenance burden, no ecosystem adoption, reinvents the wheel | +| **DNS TXT records** | Simple, widely supported | Centralized, no ownership proof, no structured metadata | + +## Decision + +Use **ERC-8004** (`IdentityRegistryUpgradeable`, an ERC-721 contract) on **Base Sepolia** (testnet) and **Base Mainnet** for on-chain agent identity registration, combined with a `.well-known/agent-registration.json` endpoint for HTTP-based discovery. + +## Rationale + +1. **Purpose-built for agents**: ERC-8004 defines a standard schema for agent identity with metadata, services, trust mechanisms, and x402 support declaration. It is designed for the agent economy, not adapted from another use case. +2. **NFT ownership**: Each agent gets an ERC-721 token. The token holder controls the identity. This integrates naturally with wallet-based operations (the same wallet that receives x402 payments owns the identity). +3. **On-chain + off-chain**: The on-chain registration stores the `agentURI` pointing to `/.well-known/agent-registration.json`. The JSON document contains the full metadata. This hybrid approach keeps gas costs low while providing rich metadata. +4. **Base L2**: Deploying on Base (an Ethereum L2) keeps gas costs low compared to Ethereum mainnet. Base Sepolia is used for testnet development. +5. **Graceful degradation**: If the wallet lacks ETH for gas, the system falls back to `OffChainOnly` mode. The `.well-known` endpoint is still served and the agent is discoverable via HTTP, but no on-chain record exists. When funded, the agent can upgrade to full on-chain registration. +6. **`.well-known` convention**: The `/.well-known/agent-registration.json` endpoint follows established web conventions (RFC 8615). Any HTTP client can discover the agent's capabilities without blockchain access. + +## Consequences + +### Positive + +- Agents are discoverable both on-chain (via ERC-8004 registry queries) and off-chain (via HTTP `.well-known`). +- The identity is controlled by the operator's wallet -- no centralized authority can revoke it. +- The `AgentRegistration` JSON schema includes `x402Support: true`, enabling automated buyer discovery. +- The `services` array supports multiple endpoint types (web, A2A, MCP, OASF), making the identity extensible. +- The `supportedTrust` array declares trust mechanisms (reputation, crypto-economic, tee-attestation), enabling trust-aware agent interactions. +- OffChainOnly degradation means the monetize flow is never blocked by lack of gas funds. + +### Negative + +- **NFT mint cost**: Registering an agent requires ETH for gas on Base Sepolia/Mainnet. While cheap on L2, it is not free. +- **Base chain dependency**: The identity is tied to the Base network. Agents on other chains would need bridge or multi-chain registration (not currently supported). +- **Contract upgrade risk**: The registry uses `IdentityRegistryUpgradeable`. A malicious or buggy upgrade could affect all registered agents. +- **Newer standard**: ERC-8004 has less ecosystem adoption than ENS or DIDs. Tooling and indexer support is limited. +- **Registration latency**: Minting an NFT requires waiting for transaction confirmation (10-30 seconds on Base). The reconciler handles this asynchronously. +- **Metadata not on-chain by default**: The bulk of the metadata lives at the `.well-known` HTTP endpoint, not on-chain. If the agent goes offline, only the `agentURI` remains on-chain, and the full metadata becomes unavailable. + +## SPEC References + +- Section 3.8 -- ERC-8004 Identity +- Section 3.8.2 -- Contract (addresses) +- Section 3.8.3 -- Client Operations (Register, SetMetadata, GetMetadata) +- Section 3.8.4 -- Agent Registration Document (JSON schema) +- Section 3.8.5 -- Error States +- Section 3.4.2 -- Sell-Side Flow (Stage 5: Registered) +- Section 7.1 -- Tunnel Exposure (/.well-known public route) diff --git a/docs/specs/features/buy_payments.feature b/docs/specs/features/buy_payments.feature new file mode 100644 index 00000000..29b552be --- /dev/null +++ b/docs/specs/features/buy_payments.feature @@ -0,0 +1,152 @@ +# References: +# SPEC.md Section 3.5 — Monetize Buy Side +# SPEC.md Section 4.1 — x402 Payment Protocol +# SPEC.md Section 7.2 — Payment Security + +Feature: Buy-Side Payments + As an AI agent + I want to purchase inference from remote x402-gated sellers using pre-signed vouchers + So that I can access paid models without exposing a hot wallet + + Background: + Given the cluster is running + And the x402-buyer sidecar is running in the "litellm" Deployment + And a remote seller is available at "https://seller.example.com/services/qwen" + And the seller prices inference at "0.001" USDC per request on "base-sepolia" + + # ------------------------------------------------------------------- + # Probe and discovery + # ------------------------------------------------------------------- + + Scenario: Probe discovers seller pricing via 402 response + When the agent runs "buy.py probe https://seller.example.com/services/qwen" + Then the probe sends a request to the seller endpoint + And the seller responds with HTTP 402 and PaymentRequirements + And the probe extracts: + | field | value | + | scheme | exact | + | network | eip155:84532 | + | maxAmountRequired | 1000 | + | payTo | 0xSELLER | + | asset | 0x036CbD... | + And the agent receives the pricing information + + Scenario: Probe handles non-402 seller response + Given the seller endpoint responds with HTTP 200 (no payment required) + When the agent runs "buy.py probe https://seller.example.com/services/free" + Then the probe reports the endpoint does not require payment + + # ------------------------------------------------------------------- + # Pre-signed ERC-3009 vouchers + # ------------------------------------------------------------------- + + Scenario: Pre-signed ERC-3009 vouchers stored in ConfigMap + When the agent runs "buy.py buy --count 10 --seller https://seller.example.com/services/qwen" + Then 10 ERC-3009 TransferWithAuthorization vouchers are pre-signed + And the vouchers are stored in the "x402-buyer-auths" ConfigMap in "llm" namespace + And each voucher contains: + | field | description | + | signature | EIP-712 typed signature | + | from | buyer wallet address | + | to | seller payTo address | + | value | price per request in base units | + | validAfter | 0 (immediately valid) | + | validBefore | max uint256 (no expiry) | + | nonce | unique random 32-byte nonce | + And the "x402-buyer-config" ConfigMap contains the upstream mapping for the seller + + Scenario: Buyer config maps model to upstream + Given vouchers have been pre-signed for seller "seller-qwen" + When the "x402-buyer-config" ConfigMap is inspected + Then it contains an upstream entry: + | field | value | + | url | https://seller.example.com/services/qwen | + | remoteModel | qwen3.5:9b | + | network | base-sepolia | + | payTo | 0xSELLER | + + # ------------------------------------------------------------------- + # Paid request flow + # ------------------------------------------------------------------- + + Scenario: Paid request consumes one voucher and forwards to seller + Given the buyer has 5 pre-signed vouchers for upstream "seller-qwen" + When the agent sends a chat completion request for model "paid/qwen3.5:9b" + Then LiteLLM routes the request to the x402-buyer sidecar at ":8402" + And the sidecar strips the "paid/" prefix to resolve model "qwen3.5:9b" + And the sidecar forwards the request to the seller + And the seller responds with HTTP 402 + And the sidecar pops one voucher from the pool + And the sidecar retries the request with the X-PAYMENT header + And the seller responds with HTTP 200 and the inference result + And the remaining voucher count is 4 + + Scenario: Paid request with openai prefix is resolved correctly + Given the buyer has vouchers for upstream "seller-qwen" + When a request arrives for model "paid/openai/qwen3.5:9b" + Then the sidecar strips both "paid/" and "openai/" prefixes + And resolves to model "qwen3.5:9b" + And routes to the correct upstream + + Scenario: Voucher consumption is atomic + Given the buyer has 1 pre-signed voucher for upstream "seller-qwen" + When two concurrent requests arrive for model "paid/qwen3.5:9b" + Then exactly one request consumes the voucher + And the other request receives an error indicating pool exhaustion + + # ------------------------------------------------------------------- + # Voucher pool exhaustion + # ------------------------------------------------------------------- + + Scenario: Voucher pool exhaustion returns error + Given the buyer has 0 pre-signed vouchers for upstream "seller-qwen" + When a request arrives for model "paid/qwen3.5:9b" + Then the sidecar returns an error: "pre-signed auth pool exhausted" + And no request is forwarded to the seller + + Scenario: No purchased upstream mapped returns error + Given no buyer config exists for model "paid/unknown-model" + When a request arrives for model "paid/unknown-model" + Then the sidecar returns an error: "no purchased upstream mapped" + + # ------------------------------------------------------------------- + # Sidecar status and observability + # ------------------------------------------------------------------- + + Scenario: Sidecar /status endpoint reports remaining vouchers + Given the buyer started with 10 vouchers for "seller-qwen" + And 3 vouchers have been consumed + When I send a GET request to the sidecar at "/status" + Then the response is JSON with: + | upstream | remaining | spent | + | seller-qwen | 7 | 3 | + + Scenario: Sidecar /healthz returns liveness status + When I send a GET request to the sidecar at "/healthz" + Then the response is HTTP 200 + + Scenario: Sidecar /metrics exposes Prometheus metrics + When I send a GET request to the sidecar at "/metrics" + Then the response contains Prometheus-format metrics + And a PodMonitor in the "llm" namespace scrapes the sidecar + + # ------------------------------------------------------------------- + # State persistence across restarts + # ------------------------------------------------------------------- + + Scenario: Consumed nonces survive sidecar restart + Given the buyer has consumed 3 vouchers with specific nonces + When the x402-buyer sidecar is restarted + Then the StateStore reloads consumed nonces + And the previously consumed vouchers are not reused + And the remaining voucher count reflects prior consumption + + # ------------------------------------------------------------------- + # Security properties + # ------------------------------------------------------------------- + + Scenario: Sidecar has zero signer access + Given the x402-buyer sidecar is running + Then the sidecar container has no private key mounted + And the sidecar can only use pre-signed authorizations from ConfigMaps + And maximum loss is bounded to N * price where N is the voucher count diff --git a/docs/specs/features/erc8004_identity.feature b/docs/specs/features/erc8004_identity.feature new file mode 100644 index 00000000..65015f78 --- /dev/null +++ b/docs/specs/features/erc8004_identity.feature @@ -0,0 +1,149 @@ +# References: +# SPEC.md Section 3.8 — ERC-8004 Identity +# SPEC.md Section 3.4.2 — Sell-Side Flow (Stage 5: Registered) +# SPEC.md Section 7.1 — Tunnel Exposure (/.well-known) +# SPEC.md Section 5.3 — ServiceOffer CRD Schema (registration spec) + +Feature: ERC-8004 Identity + As an AI agent operator + I want to register my agent on-chain using the ERC-8004 Identity Registry + So that other agents and users can discover and verify my agent's identity + + Background: + Given the cluster is running + And a wallet is available with a private key + And the Base Sepolia RPC endpoint is reachable + + # ------------------------------------------------------------------- + # Agent registration on Base Sepolia + # ------------------------------------------------------------------- + + Scenario: Agent registers on Base Sepolia Identity Registry + Given the wallet has sufficient ETH for gas on Base Sepolia + When I run "obol sell register --name my-agent --private-key-file /path/to/keyfile" + Then a Register transaction is submitted to the Identity Registry at "0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D" + And the transaction mints an ERC-721 NFT for the agent + And the returned agentId is the minted token ID + And the agent URI is set to the tunnel URL "/.well-known/agent-registration.json" + + Scenario: Registration during sell-side reconciliation (Stage 5) + Given a ServiceOffer CR "myapi" has reached stage 4 (RoutePublished) + And registration is enabled with name "My Inference Agent" + And the tunnel URL is "https://stack.example.com" + When the reconciler evaluates stage 5 + Then the agent is registered on Base Sepolia + And the ServiceOffer status is updated with: + | field | value | + | agentId | | + | registrationTxHash | | + And the condition "Registered" is set to "True" + + Scenario: Registration submits correct agent metadata + When the agent is registered with: + | field | value | + | name | My Inference Agent | + | description | Sells qwen3.5:9b inference | + | image | https://example.com/icon.png | + Then the registration transaction includes the metadata + And the agent URI points to the /.well-known endpoint + + # ------------------------------------------------------------------- + # Registration JSON at /.well-known + # ------------------------------------------------------------------- + + Scenario: Registration JSON served at /.well-known endpoint + Given an agent has been registered with agentId "42" + And the tunnel is active at "https://stack.example.com" + When a GET request is made to "https://stack.example.com/.well-known/agent-registration.json" + Then the response is HTTP 200 with Content-Type "application/json" + And the JSON body conforms to the AgentRegistration schema: + | field | value | + | type | https://eips.ethereum.org/EIPS/eip-8004#registration-v1 | + | name | My Inference Agent | + | x402Support | true | + | active | true | + And the "registrations" array contains: + | agentId | agentRegistry | + | 42 | 0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D | + And the "services" array contains at least one service endpoint + + Scenario: Registration JSON includes supported trust mechanisms + Given the ServiceOffer has supportedTrust ["reputation"] + When the registration JSON is served + Then the "supportedTrust" array contains "reputation" + + Scenario: Registration JSON httpd Deployment is minimal + Given the registration JSON has been published + When I inspect the httpd Deployment in "traefik" namespace + Then it uses a busybox image serving the ConfigMap content + And an HTTPRoute routes "/.well-known/agent-registration.json" to the httpd Service + + # ------------------------------------------------------------------- + # Metadata update + # ------------------------------------------------------------------- + + Scenario: Metadata update via SetMetadata + Given an agent is registered with agentId "42" + When SetMetadata is called with: + | key | value | + | description | Updated inference service | + | version | 2.0 | + Then a SetMetadata transaction is submitted to the registry + And the on-chain metadata is updated for agentId "42" + + Scenario: Agent URI update via SetAgentURI + Given an agent is registered with agentId "42" + And the tunnel hostname changes to "new-stack.example.com" + When SetAgentURI is called with the new URI + Then the on-chain agent URI is updated to "https://new-stack.example.com/.well-known/agent-registration.json" + + Scenario: Read metadata from registry + Given an agent is registered with agentId "42" and metadata key "description" = "Inference service" + When GetMetadata is called for agentId "42" and key "description" + Then the returned value is "Inference service" + + Scenario: Read token URI from registry + Given an agent is registered with agentId "42" + When TokenURI is called for agentId "42" + Then the returned URI is the agent's metadata endpoint + + # ------------------------------------------------------------------- + # Degraded mode without ETH + # ------------------------------------------------------------------- + + Scenario: Registration degrades to OffChainOnly without ETH + Given the wallet has zero ETH on Base Sepolia + When the reconciler evaluates registration (stage 5) + Then no on-chain Register transaction is submitted + And the /.well-known/agent-registration.json is still created and served + But the "registrations" array in the JSON is empty + And the condition "Registered" is set to "True" with reason "OffChainOnly" + + Scenario: OffChainOnly agent upgrades to on-chain after funding + Given the agent was registered in OffChainOnly mode + And the wallet has been funded with ETH + When the reconciler re-evaluates registration + Then the on-chain Register transaction is submitted + And the "registrations" array is populated with the agentId + And the condition "Registered" reason is updated to "OnChain" + + # ------------------------------------------------------------------- + # Error states + # ------------------------------------------------------------------- + + Scenario: Registration fails when RPC is unreachable + Given the Base Sepolia RPC endpoint is unreachable + When the reconciler evaluates registration + Then the error "erc8004: dial" is recorded + And the condition "Registered" is set to "False" + And the reconciler retries on the next loop + + Scenario: Registration fails when transaction is not mined + Given the RPC endpoint is reachable but the network is congested + When the Register transaction is submitted + Then the error "erc8004: wait mined" may occur + And the reconciler retries on the next loop + + Scenario: Registration uses correct contract address per chain + When registering on Base Sepolia + Then the contract address "0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D" is used diff --git a/docs/specs/features/llm_routing.feature b/docs/specs/features/llm_routing.feature new file mode 100644 index 00000000..b71c9218 --- /dev/null +++ b/docs/specs/features/llm_routing.feature @@ -0,0 +1,147 @@ +# References: +# SPEC.md Section 3.2 — LLM Routing +# SPEC.md Section 3.6.4 — Cloud Provider Detection +# SPEC.md Section 3.5 — Monetize Buy Side (paid inference routing) + +Feature: LLM Routing + As an operator + I want the LiteLLM gateway to auto-discover and route to all available LLM providers + So that the OpenClaw agent can use local, cloud, and paid remote models through a single endpoint + + Background: + Given the cluster is running + And the LiteLLM Deployment exists in the "llm" namespace + + # ------------------------------------------------------------------- + # Auto-detection of Ollama models + # ------------------------------------------------------------------- + + Scenario: Auto-detect Ollama models during stack up + Given Ollama is running on the host with models: + | model | + | qwen3.5:9b | + | llama3.2:3b | + When "obol stack up" runs autoConfigureLLM + Then the "litellm-config" ConfigMap contains entries for "qwen3.5:9b" and "llama3.2:3b" + And each Ollama model entry has provider "ollama" and api_base pointing to the Ollama service + And the LiteLLM Deployment is restarted exactly once + + Scenario: Auto-configure skips Ollama when not running + Given Ollama is not running on the host + When "obol stack up" runs autoConfigureLLM + Then no Ollama model entries are added to "litellm-config" + And a warning is logged: Ollama not available + And the stack up continues without failure + + Scenario: Auto-configure updates models on subsequent stack up + Given the cluster was previously started with Ollama model "qwen3.5:9b" + And Ollama now has models "qwen3.5:9b" and "deepseek-r1:7b" + When "obol stack up" runs autoConfigureLLM + Then the "litellm-config" ConfigMap contains entries for both models + And the LiteLLM Deployment is restarted + + # ------------------------------------------------------------------- + # Cloud provider detection from environment variables + # ------------------------------------------------------------------- + + Scenario: Detect Anthropic provider from ANTHROPIC_API_KEY + Given the environment variable "ANTHROPIC_API_KEY" is set + When "obol stack up" runs autoConfigureLLM + Then the "litellm-config" ConfigMap contains a wildcard entry "anthropic/*" + And the "litellm-secrets" Secret contains the Anthropic API key + + Scenario: Detect Anthropic provider from CLAUDE_CODE_OAUTH_TOKEN + Given the environment variable "CLAUDE_CODE_OAUTH_TOKEN" is set + And "ANTHROPIC_API_KEY" is not set + When "obol stack up" runs autoConfigureLLM + Then the "litellm-config" ConfigMap contains a wildcard entry "anthropic/*" + And the "litellm-secrets" Secret contains the OAuth token as the Anthropic key + + Scenario: Detect OpenAI provider from OPENAI_API_KEY + Given the environment variable "OPENAI_API_KEY" is set + When "obol stack up" runs autoConfigureLLM + Then the "litellm-config" ConfigMap contains a wildcard entry "openai/*" + And the "litellm-secrets" Secret contains the OpenAI API key + + Scenario: Detect cloud provider from OpenClaw agent model preference + Given the file "~/.openclaw/openclaw.json" specifies agent model "anthropic/claude-sonnet-4-6" + And the environment variable "ANTHROPIC_API_KEY" is set + When "obol stack up" runs autoConfigureLLM + Then the Anthropic provider is auto-configured + + Scenario: No cloud provider configured when API keys are absent + Given no cloud provider API keys are set in the environment + When "obol stack up" runs autoConfigureLLM + Then no cloud provider entries are added to "litellm-config" + And a warning is logged for each missing provider + And the stack up continues without failure + + # ------------------------------------------------------------------- + # Manual model setup + # ------------------------------------------------------------------- + + Scenario: Manual provider setup via obol model setup + Given the cluster is running + When I run "obol model setup --provider anthropic" + And I provide the API key + Then the "litellm-config" ConfigMap is patched with the Anthropic wildcard entry + And the "litellm-secrets" Secret is updated + And the LiteLLM Deployment is restarted + + # ------------------------------------------------------------------- + # Custom endpoint validation + # ------------------------------------------------------------------- + + Scenario: Custom endpoint passes reachability test + Given a custom inference endpoint is running at "http://myhost:8080/v1" + When I run "obol model setup custom --name my-model --endpoint http://myhost:8080/v1 --model gpt-4" + Then the endpoint reachability test passes + And the "litellm-config" ConfigMap contains the custom model entry + And the LiteLLM Deployment is restarted + + Scenario: Custom endpoint fails reachability test + Given no service is running at "http://unreachable:8080/v1" + When I run "obol model setup custom --name my-model --endpoint http://unreachable:8080/v1 --model gpt-4" + Then the command fails with a reachability error + And the "litellm-config" ConfigMap is not modified + + # ------------------------------------------------------------------- + # Model ranking + # ------------------------------------------------------------------- + + Scenario: Cloud providers are preferred over local Ollama + Given Ollama is running with model "qwen3.5:9b" + And the environment variable "ANTHROPIC_API_KEY" is set + When "obol stack up" runs autoConfigureLLM + Then the "litellm-config" ConfigMap contains entries for both providers + And cloud provider entries appear before Ollama entries in the model list + + # ------------------------------------------------------------------- + # Paid inference routing through buyer sidecar + # ------------------------------------------------------------------- + + Scenario: Paid inference routes through buyer sidecar + Given the "litellm-config" ConfigMap contains the permanent "paid/*" entry + When a request arrives for model "paid/qwen3.5:9b" + Then LiteLLM routes the request to "http://127.0.0.1:8402/v1" + And the x402-buyer sidecar handles payment attachment + + Scenario: LiteLLM config contains permanent paid catch-all + When the "litellm-config" ConfigMap is loaded + Then it contains a model entry with name "paid/*" + And the entry has provider "openai" and api_base "http://127.0.0.1:8402/v1" + + # ------------------------------------------------------------------- + # Error states + # ------------------------------------------------------------------- + + Scenario: Model setup fails when cluster is not running + Given the cluster is not running + When I run "obol model setup --provider anthropic" + Then the command fails with "cluster not running" + + Scenario: Model setup fails with empty model list + Given the cluster is running + And Ollama has no models loaded + When I run "obol model setup --provider ollama" + Then the command fails with "no models to configure" diff --git a/docs/specs/features/network_rpc.feature b/docs/specs/features/network_rpc.feature new file mode 100644 index 00000000..77d9047d --- /dev/null +++ b/docs/specs/features/network_rpc.feature @@ -0,0 +1,149 @@ +# References: +# SPEC.md Section 3.3 — Network / RPC Gateway +# SPEC.md Section 3.3.3 — Two-Stage Templating +# SPEC.md Section 3.3.4 — Write Method Blocking +# SPEC.md Section 6.1 — External Services (ChainList API) + +Feature: Network RPC Gateway + As an operator + I want to manage blockchain RPC routing through the eRPC gateway + So that my cluster can interact with multiple blockchain networks reliably + + Background: + Given the cluster is running + And the eRPC Deployment exists in the "erpc" namespace + And the "erpc-config" ConfigMap exists in the "erpc" namespace + + # ------------------------------------------------------------------- + # Add public RPCs from ChainList + # ------------------------------------------------------------------- + + Scenario: Add public RPCs from ChainList by chain ID + When I run "obol network add 1" + Then the eRPC config is patched with chain ID 1 (Ethereum Mainnet) + And public RPC endpoints from ChainList are added as upstreams + And each upstream has an ID prefixed with "chainlist-" + And a network entry with "evm.chainId: 1" is added to the project + + Scenario: Add multiple chains + When I run "obol network add 1" + And I run "obol network add 137" + Then the eRPC config contains upstreams for both chain ID 1 and chain ID 137 + And network entries exist for both chains + + Scenario: Adding the same chain twice is idempotent + Given chain ID 1 is already configured in eRPC + When I run "obol network add 1" + Then the eRPC config is not duplicated for chain ID 1 + And existing upstreams are preserved + + # ------------------------------------------------------------------- + # Add custom RPC endpoint + # ------------------------------------------------------------------- + + Scenario: Add custom RPC endpoint for a chain + When I run "obol network add 1 --endpoint https://my-node.example.com/rpc" + Then the eRPC config contains a custom upstream with the provided endpoint + And the upstream is associated with chain ID 1 + + Scenario: Custom endpoint is validated before adding + When I run "obol network add 1 --endpoint https://unreachable.example.com/rpc" + Then the endpoint reachability is checked + And if the endpoint is unreachable, a warning is displayed + + # ------------------------------------------------------------------- + # Write method blocking + # ------------------------------------------------------------------- + + Scenario: Write methods are blocked by default + Given chain ID 1 is configured in eRPC without --allow-writes + When an eth_sendRawTransaction request arrives at eRPC for chain 1 + Then eRPC blocks the request + And returns an error indicating write methods are not allowed + + Scenario: Write methods are allowed with --allow-writes flag + When I run "obol network add 1 --allow-writes" + Then the eRPC config for chain ID 1 allows eth_sendRawTransaction + And write requests are forwarded to the upstream + + Scenario: Local Ethereum nodes always have writes blocked + Given a local Ethereum node is deployed as "ethereum-fluffy-penguin" + And the node is registered as a priority upstream in eRPC + When an eth_sendRawTransaction request arrives for the local node's chain + Then the write method is blocked on the local upstream + And the write request is routed to remote upstreams instead + + # ------------------------------------------------------------------- + # Remove chain RPCs + # ------------------------------------------------------------------- + + Scenario: Remove chain RPCs from eRPC + Given chain ID 1 is configured in eRPC with multiple upstreams + When I run "obol network remove 1" + Then all upstreams for chain ID 1 are removed from the eRPC config + And the network entry for chain ID 1 is removed + + Scenario: Remove non-existent chain is a no-op + Given chain ID 999 is not configured in eRPC + When I run "obol network remove 999" + Then the command completes without error + And the eRPC config is unchanged + + # ------------------------------------------------------------------- + # eRPC status and listing + # ------------------------------------------------------------------- + + Scenario: eRPC status shows upstream counts + Given the eRPC config has: + | chain | upstream_count | + | 1 | 3 | + | 137 | 2 | + When I run "obol network list" + Then the output lists configured chains with their upstream counts: + | chain_id | name | upstreams | + | 1 | Ethereum Mainnet | 3 | + | 137 | Polygon Mainnet | 2 | + + # ------------------------------------------------------------------- + # Local Ethereum node deployment + # ------------------------------------------------------------------- + + Scenario: Install local Ethereum node with two-stage templating + When I run "obol network install ethereum --id fluffy-penguin" + Then Stage 1 renders values.yaml from values.yaml.gotmpl with CLI flags + And Stage 2 runs "helmfile sync" with the rendered values and id "fluffy-penguin" + And the node is deployed in namespace "ethereum-fluffy-penguin" + And the node is registered as a priority upstream in eRPC + + Scenario: Install local Ethereum node with auto-generated petname + When I run "obol network install ethereum" + Then a petname ID is auto-generated + And the node is deployed in namespace "ethereum-" + + Scenario: Local node registered as priority upstream in eRPC + Given a local Ethereum node "ethereum-fluffy-penguin" is deployed + Then the eRPC config contains an upstream with: + | field | value | + | id | local-ethereum-fluffy-penguin | + | endpoint | http://ethereum-execution.ethereum-fluffy-penguin.svc.cluster.local:8545 | + + # ------------------------------------------------------------------- + # Network sync and status + # ------------------------------------------------------------------- + + Scenario: Network sync re-runs helmfile for deployed network + Given a local Ethereum node is deployed with id "fluffy-penguin" + When I run "obol network sync ethereum fluffy-penguin" + Then helmfile sync is re-run for the "ethereum-fluffy-penguin" deployment + + Scenario: Network status shows deployment health + Given a local Ethereum node is deployed with id "fluffy-penguin" + When I run "obol network status ethereum fluffy-penguin" + Then the output shows the deployment status of the Ethereum node + And includes pod readiness and sync status + + Scenario: Network delete removes deployment and eRPC upstream + Given a local Ethereum node is deployed with id "fluffy-penguin" + When I run "obol network delete ethereum fluffy-penguin" + Then the namespace "ethereum-fluffy-penguin" is deleted + And the local upstream for "fluffy-penguin" is removed from the eRPC config diff --git a/docs/specs/features/sell_monetization.feature b/docs/specs/features/sell_monetization.feature new file mode 100644 index 00000000..adb01647 --- /dev/null +++ b/docs/specs/features/sell_monetization.feature @@ -0,0 +1,203 @@ +# References: +# SPEC.md Section 3.4 — Monetize Sell Side +# SPEC.md Section 4.1 — x402 Payment Protocol +# SPEC.md Section 5.3 — ServiceOffer CRD Schema +# SPEC.md Section 7.1 — Tunnel Exposure +# SPEC.md Section 3.4.4 — x402-verifier (ForwardAuth) + +Feature: Sell-Side Monetization + As an operator + I want to sell access to cluster services via x402 micropayments + So that I can earn USDC for every inference request served + + Background: + Given the cluster is running + And a wallet is configured with address "0xSELLER" + And the chain is set to "base-sepolia" + + # ------------------------------------------------------------------- + # ServiceOffer creation + # ------------------------------------------------------------------- + + Scenario: obol sell http creates a ServiceOffer CR + When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" + Then a ServiceOffer CR named "myapi" is created in "openclaw-obol-agent" namespace + And the ServiceOffer spec contains: + | field | value | + | payment.scheme | exact | + | payment.network | base-sepolia | + | payment.payTo | 0xSELLER | + | payment.price.perRequest | 0.001 | + | upstream.service | litellm | + | upstream.port | 4000 | + | upstream.namespace | llm | + | path | /services/myapi | + + Scenario: obol sell http with per-mtok pricing + When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --per-mtok 1.00 --upstream litellm --port 4000 --namespace llm" + Then a ServiceOffer CR named "myapi" is created + And the ServiceOffer spec contains: + | field | value | + | payment.price.perMTok | 1.00 | + + Scenario: obol sell http with health path + When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm --health-path /health/readiness" + Then the ServiceOffer spec has upstream.healthPath "/health/readiness" + + Scenario: obol sell http activates tunnel on first sell + Given no tunnel is currently active + When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" + Then EnsureTunnelForSell() is called + And the quick-mode tunnel is activated + + Scenario: obol sell http rejects unsupported chain + When I run "obol sell http myapi --wallet 0xSELLER --chain ethereum-mainnet --price 0.001 --upstream litellm --port 4000 --namespace llm" + Then the command fails with "unsupported chain" + + Scenario: obol sell http rejects non-HTTPS facilitator + When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm --facilitator http://example.com" + Then the command fails with "facilitator URL must use HTTPS" + + # ------------------------------------------------------------------- + # 6-stage reconciliation + # ------------------------------------------------------------------- + + Scenario: Stage 1 — ModelReady + Given a ServiceOffer CR "myapi" exists with type "inference" and model "qwen3.5:9b" + When the reconciler evaluates stage 1 + Then the condition "ModelReady" is set to "True" if the model is available in LiteLLM + And the condition "ModelReady" is set to "False" with a message if the model is not available + + Scenario: Stage 2 — UpstreamHealthy + Given the ServiceOffer CR "myapi" has condition "ModelReady" = "True" + And the upstream service "litellm" in namespace "llm" is healthy at "/health/readiness" + When the reconciler evaluates stage 2 + Then the condition "UpstreamHealthy" is set to "True" + + Scenario: Stage 2 — UpstreamHealthy fails on unhealthy upstream + Given the ServiceOffer CR "myapi" has condition "ModelReady" = "True" + And the upstream service "litellm" in namespace "llm" returns 503 at "/health/readiness" + When the reconciler evaluates stage 2 + Then the condition "UpstreamHealthy" is set to "False" with a message indicating the health check failed + + Scenario: Stage 3 — PaymentGateReady + Given the ServiceOffer CR "myapi" has condition "UpstreamHealthy" = "True" + When the reconciler evaluates stage 3 + Then a Traefik Middleware resource of type ForwardAuth is created + And the "x402-pricing" ConfigMap is updated with a route entry for "/services/myapi/*" + And the route entry contains the price, wallet, and chain from the ServiceOffer + And the condition "PaymentGateReady" is set to "True" + + Scenario: Stage 4 — RoutePublished + Given the ServiceOffer CR "myapi" has condition "PaymentGateReady" = "True" + When the reconciler evaluates stage 4 + Then an HTTPRoute resource is created for path "/services/myapi" + And the HTTPRoute references the ForwardAuth Middleware + And traffic matching "/services/myapi/*" is routed to the upstream service + And the condition "RoutePublished" is set to "True" + + Scenario: Stage 5 — Registered (ERC-8004 on-chain) + Given the ServiceOffer CR "myapi" has condition "RoutePublished" = "True" + And registration is enabled in the ServiceOffer spec + And the tunnel URL is available + When the reconciler evaluates stage 5 + Then an ERC-8004 registration is submitted on Base Sepolia + And the status field "agentId" is set to the minted token ID + And the status field "registrationTxHash" is set + And a ConfigMap with agent-registration.json is created + And an httpd Deployment serves /.well-known/agent-registration.json + And the condition "Registered" is set to "True" + + Scenario: Stage 5 — Registration degrades without ETH + Given the ServiceOffer CR "myapi" has condition "RoutePublished" = "True" + And the wallet has zero ETH for gas + When the reconciler evaluates stage 5 + Then the registration degrades to OffChainOnly mode + And the /.well-known/agent-registration.json is still served + But no on-chain transaction is submitted + + Scenario: Stage 6 — Ready + Given all 5 prior conditions are "True" + When the reconciler evaluates stage 6 + Then the condition "Ready" is set to "True" + And the status field "endpoint" is set to the full public URL + + Scenario: Reconciled resources have ownerReferences for auto-GC + Given a ServiceOffer CR "myapi" has reached "Ready" state + When I inspect the Middleware, HTTPRoute, ConfigMap, and httpd Deployment + Then each resource has an ownerReference pointing to the ServiceOffer CR + And deleting the ServiceOffer cascades deletion to all owned resources + + # ------------------------------------------------------------------- + # x402-verifier behavior + # ------------------------------------------------------------------- + + Scenario: x402-verifier responds 402 with pricing for unauthenticated requests + Given the route "/services/myapi/*" is configured with price "0.001" USDC + When a request arrives at "/services/myapi/data" without an X-PAYMENT header + Then the x402-verifier responds with HTTP 402 + And the response body contains PaymentRequirements JSON with: + | field | value | + | x402Version | 1 | + | accepts[0].scheme | exact | + | accepts[0].network | eip155:84532 | + | accepts[0].maxAmountRequired | 1000 | + + Scenario: x402-verifier passes through requests with valid payment + Given the route "/services/myapi/*" is configured with price "0.001" USDC + When a request arrives at "/services/myapi/data" with a valid X-PAYMENT header + Then the x402-verifier delegates verification to the facilitator + And the facilitator confirms the payment is valid + And the x402-verifier responds with HTTP 200 + And the upstream receives the request with an Authorization header + + Scenario: x402-verifier passes through unmatched routes for free + Given the route "/services/myapi/*" is configured in pricing + When a request arrives at "/health" which matches no pricing route + Then the x402-verifier responds with HTTP 200 + And the request proceeds to the upstream without payment + + Scenario: x402-verifier hot-reloads pricing config + Given the x402-verifier is running with route "/services/old/*" + When the "x402-pricing" ConfigMap is updated to add route "/services/new/*" + And 5 seconds elapse for the config watcher poll + Then the verifier accepts the new route "/services/new/*" + And the old route "/services/old/*" is no longer active + + # ------------------------------------------------------------------- + # Pricing models + # ------------------------------------------------------------------- + + Scenario: perRequest pricing is used directly + Given a ServiceOffer with price.perRequest = "0.001" + When the reconciler creates the pricing route + Then the route price is "1000" (0.001 USDC in base units) + And the route priceModel is "perRequest" + + Scenario: perMTok pricing is converted at 1000 tokens per request + Given a ServiceOffer with price.perMTok = "1.00" + When the reconciler creates the pricing route + Then the effective perRequest price is perMTok / 1000 + And the route contains both perMTok and the approximated perRequest + And approxTokensPerRequest is set to 1000 + + # ------------------------------------------------------------------- + # CLI management commands + # ------------------------------------------------------------------- + + Scenario: obol sell list shows active ServiceOffers + Given ServiceOffers "myapi" and "myinference" exist + When I run "obol sell list" + Then the output lists both ServiceOffers with their status + + Scenario: obol sell status shows reconciliation progress + Given a ServiceOffer "myapi" is stuck at stage 2 (UpstreamHealthy = False) + When I run "obol sell status myapi" + Then the output shows each condition with its status + And the "UpstreamHealthy" condition shows the failure message + + Scenario: obol sell delete removes ServiceOffer and owned resources + Given a ServiceOffer "myapi" exists at "Ready" state + When I run "obol sell delete myapi" + Then the ServiceOffer CR is deleted + And all owned resources (Middleware, HTTPRoute, ConfigMaps) are garbage collected diff --git a/docs/specs/features/stack_lifecycle.feature b/docs/specs/features/stack_lifecycle.feature new file mode 100644 index 00000000..275ab2b7 --- /dev/null +++ b/docs/specs/features/stack_lifecycle.feature @@ -0,0 +1,166 @@ +# References: +# SPEC.md Section 3.1 — Stack Lifecycle +# SPEC.md Section 2.4 — Backend Abstraction +# SPEC.md Section 5.1 — Configuration Files +# SPEC.md Section 1.3 — System Constraints + +Feature: Stack Lifecycle + As an operator + I want to manage the full lifecycle of my local Kubernetes cluster + So that I can run decentralized AI infrastructure reproducibly + + Background: + Given Docker is running + And the obol CLI is installed + + # ------------------------------------------------------------------- + # obol stack init + # ------------------------------------------------------------------- + + Scenario: Stack init generates cluster ID and writes config + When I run "obol stack init" + Then a petname cluster ID is generated + And the file "$OBOL_CONFIG_DIR/.stack-id" contains the cluster ID + And the file "$OBOL_CONFIG_DIR/.stack-backend" contains "k3d" + And embedded infrastructure defaults are copied to "$OBOL_CONFIG_DIR/defaults/" + And template variables "OLLAMA_HOST", "OLLAMA_HOST_IP", "CLUSTER_ID" are substituted in defaults + + Scenario: Stack init resolves absolute paths for Docker volume mounts + When I run "obol stack init" + Then all paths in the generated k3d config are absolute + And no relative paths appear in volume mount declarations + + Scenario: Stack init preserves existing cluster ID on force reinit + Given I have previously run "obol stack init" + And the cluster ID is "fluffy-penguin" + When I run "obol stack init --force" + Then the cluster ID remains "fluffy-penguin" + And the backend config is regenerated + + Scenario: Stack init with k3s backend + When I run "obol stack init --backend k3s" + Then the file "$OBOL_CONFIG_DIR/.stack-backend" contains "k3s" + And the Ollama host is resolved as "127.0.0.1" + + Scenario: Stack init fails without Docker when using k3d backend + Given Docker is not running + When I run "obol stack init --backend k3d" + Then the command fails with "prerequisites check failed" + + # ------------------------------------------------------------------- + # obol stack up + # ------------------------------------------------------------------- + + Scenario: Stack up creates k3d cluster and deploys infrastructure + Given I have run "obol stack init" + When I run "obol stack up" + Then a k3d cluster is created with the persisted stack ID + And kubeconfig is written to "$OBOL_CONFIG_DIR/kubeconfig.yaml" + And helmfile sync deploys infrastructure to the cluster + And the following namespaces exist: + | namespace | + | traefik | + | llm | + | x402 | + | openclaw-obol-agent | + | erpc | + | obol-frontend | + | monitoring | + + Scenario: Stack up auto-configures LiteLLM with Ollama models + Given I have run "obol stack init" + And Ollama is running on the host with model "qwen3.5:9b" + When I run "obol stack up" + Then the "litellm-config" ConfigMap in "llm" namespace contains model "qwen3.5:9b" + And the LiteLLM Deployment is restarted once + + Scenario: Stack up deploys OpenClaw agent with skills + Given I have run "obol stack init" + When I run "obol stack up" + Then the OpenClaw Deployment exists in "openclaw-obol-agent" namespace + And skills are injected via host-path PVC + And the "openclaw-monetize" ClusterRoleBinding is patched with the openclaw ServiceAccount + + Scenario: Stack up auto-starts DNS tunnel when provisioned + Given I have run "obol stack init" + And a DNS tunnel is provisioned with hostname "stack.example.com" + When I run "obol stack up" + Then the Cloudflare tunnel is started + And the tunnel URL is propagated to "AGENT_BASE_URL" on the OpenClaw Deployment + + Scenario: Stack up keeps quick tunnel dormant until first sell + Given I have run "obol stack init" + And no DNS tunnel is provisioned + When I run "obol stack up" + Then the Cloudflare tunnel is not started + And the cloudflared Deployment has zero replicas + + Scenario: Stack up is idempotent + Given I have run "obol stack init" and "obol stack up" + And the cluster is running + When I run "obol stack up" again + Then the cluster remains in a healthy state + And no duplicate resources are created + And all existing services remain accessible + + Scenario: Stack up cleans up on helmfile sync failure + Given I have run "obol stack init" + And helmfile sync will fail due to a malformed template + When I run "obol stack up" + Then the command fails + And the cluster is automatically stopped via Down() + + Scenario: Stack up binds expected ports + Given I have run "obol stack init" + And ports 80, 8080, 443, and 8443 are available + When I run "obol stack up" + Then the k3d cluster binds host ports 80, 8080, 443, and 8443 + + Scenario: Stack up fails when ports are occupied + Given I have run "obol stack init" + And port 80 is already in use by another service + When I run "obol stack up" + Then the command fails with "port(s) already in use" + + # ------------------------------------------------------------------- + # obol stack down + # ------------------------------------------------------------------- + + Scenario: Stack down deletes cluster but preserves config + Given the cluster is running + When I run "obol stack down" + Then the k3d cluster is deleted + And the file "$OBOL_CONFIG_DIR/.stack-id" still exists + And the file "$OBOL_CONFIG_DIR/kubeconfig.yaml" still exists + And the directory "$OBOL_CONFIG_DIR/defaults/" still exists + + Scenario: Stack down stops the DNS resolver + Given the cluster is running + And the DNS resolver for "obol.stack" is active + When I run "obol stack down" + Then the DNS resolver is stopped + + # ------------------------------------------------------------------- + # obol stack purge + # ------------------------------------------------------------------- + + Scenario: Stack purge removes config directory + Given the cluster is running + When I run "obol stack purge" + Then the k3d cluster is destroyed + And the directory "$OBOL_CONFIG_DIR" is removed + But the directory "$OBOL_DATA_DIR" still exists + + Scenario: Stack purge with force removes root-owned PVCs + Given the cluster is running + And root-owned PVC data exists in "$OBOL_DATA_DIR" + When I run "obol stack purge --force" + Then the k3d cluster is destroyed + And the directory "$OBOL_CONFIG_DIR" is removed + And the directory "$OBOL_DATA_DIR" is removed via sudo + + Scenario: Stack purge prompts for wallet backup + Given the cluster is running + And a wallet exists at "$OBOL_DATA_DIR/openclaw-/keystore/" + When I run "obol stack purge" + Then the user is prompted to back up the wallet before proceeding diff --git a/docs/specs/features/tunnel_exposure.feature b/docs/specs/features/tunnel_exposure.feature new file mode 100644 index 00000000..e1ca7356 --- /dev/null +++ b/docs/specs/features/tunnel_exposure.feature @@ -0,0 +1,190 @@ +# References: +# SPEC.md Section 3.7 — Tunnel Management +# SPEC.md Section 7.1 — Tunnel Exposure (Security Model) +# SPEC.md Section 2.2 — Routing Architecture +# SPEC.md Section 3.7.6 — Storefront Resources + +Feature: Tunnel Exposure + As an operator + I want the Cloudflare tunnel to expose only payment-gated and discovery endpoints + So that internal services remain protected while public services are accessible + + Background: + Given the cluster is running + And the Traefik Gateway is deployed in the "traefik" namespace + + # ------------------------------------------------------------------- + # Quick mode tunnel activation + # ------------------------------------------------------------------- + + Scenario: Quick mode tunnel activates on first sell command + Given no tunnel is currently active + And the tunnel mode is "quick" + When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" + Then the quick tunnel is activated + And the tunnel URL is a random "*.trycloudflare.com" hostname + And the cloudflared Deployment is scaled to 1 replica + + Scenario: Quick mode tunnel stays dormant during stack up + Given the tunnel mode is "quick" + When I run "obol stack up" + Then the cloudflared Deployment has zero replicas + And no tunnel URL is assigned + + Scenario: Quick mode tunnel URL changes on restart + Given the quick tunnel is active with URL "https://abc123.trycloudflare.com" + When I run "obol tunnel restart" + Then the tunnel URL changes to a new "*.trycloudflare.com" hostname + And the new URL is propagated to all consumers + + # ------------------------------------------------------------------- + # DNS mode tunnel + # ------------------------------------------------------------------- + + Scenario: DNS mode tunnel with stable hostname + Given I have run "obol tunnel login --hostname stack.example.com" + When I run "obol stack up" + Then the tunnel is automatically started + And the tunnel URL is "https://stack.example.com" + And the URL persists across restarts + + Scenario: DNS tunnel state is persisted + Given a DNS tunnel is provisioned with hostname "stack.example.com" + When I inspect "$OBOL_CONFIG_DIR/tunnel/cloudflared.json" + Then the state contains: + | field | value | + | mode | dns | + | hostname | stack.example.com | + + # ------------------------------------------------------------------- + # URL propagation + # ------------------------------------------------------------------- + + Scenario: Tunnel URL propagated to agent AGENT_BASE_URL + Given the tunnel is active with URL "https://stack.example.com" + When the tunnel URL is propagated + Then the OpenClaw Deployment in "openclaw-obol-agent" namespace has env var "AGENT_BASE_URL" set to "https://stack.example.com" + + Scenario: Tunnel URL propagated to frontend ConfigMap + Given the tunnel is active with URL "https://stack.example.com" + When the tunnel URL is propagated + Then the "obol-stack-config" ConfigMap in "obol-frontend" namespace contains the tunnel URL + + Scenario: Tunnel URL propagated to storefront + Given the tunnel is active with URL "https://stack.example.com" + When the tunnel URL is propagated + Then the storefront resources are created in the "traefik" namespace + + # ------------------------------------------------------------------- + # Internal services NOT accessible via tunnel + # ------------------------------------------------------------------- + + Scenario: Frontend is not accessible via tunnel hostname + Given the tunnel is active + When a request arrives via the tunnel hostname for path "/" + Then the request is routed to the storefront landing page + And the frontend application is NOT served + Because the frontend HTTPRoute has hostnames restricted to "obol.stack" + + Scenario: eRPC is not accessible via tunnel hostname + Given the tunnel is active + When a request arrives via the tunnel hostname for path "/rpc" + Then the request does NOT reach the eRPC gateway + Because the eRPC HTTPRoute has hostnames restricted to "obol.stack" + + Scenario: LiteLLM admin is not exposed via any route + Given the tunnel is active + When a request arrives via the tunnel hostname for any path + Then LiteLLM admin endpoints are never reachable + Because no HTTPRoute exists for LiteLLM without hostname restrictions + + Scenario: Prometheus monitoring is not accessible via tunnel + Given the tunnel is active + When a request arrives via the tunnel hostname for monitoring paths + Then the monitoring endpoints are NOT reachable + Because monitoring HTTPRoutes have hostnames restricted to "obol.stack" + + Scenario: Internal services remain accessible locally via obol.stack + Given the tunnel is active + When a request arrives with Host header "obol.stack" for path "/" + Then the frontend application is served + And when a request arrives with Host header "obol.stack" for path "/rpc" + Then the eRPC gateway handles the request + + # ------------------------------------------------------------------- + # /services/* accessible and x402-gated via tunnel + # ------------------------------------------------------------------- + + Scenario: Public service route is accessible via tunnel with payment + Given the tunnel is active + And a ServiceOffer "myapi" is in "Ready" state + When a request arrives via the tunnel hostname for path "/services/myapi/data" with valid payment + Then the x402-verifier validates the payment + And the request is forwarded to the upstream service + And the upstream responds successfully + + Scenario: Public service route returns 402 without payment via tunnel + Given the tunnel is active + And a ServiceOffer "myapi" is in "Ready" state + When a request arrives via the tunnel hostname for path "/services/myapi/data" without payment + Then the x402-verifier returns HTTP 402 with PaymentRequirements + + # ------------------------------------------------------------------- + # Discovery endpoints via tunnel + # ------------------------------------------------------------------- + + Scenario: Agent registration JSON accessible via tunnel + Given the tunnel is active + And an ERC-8004 registration has been published + When a request arrives via the tunnel hostname for "/.well-known/agent-registration.json" + Then the response contains the AgentRegistration JSON + And the JSON includes: + | field | type | + | type | string | + | name | string | + | x402Support | true | + | active | true | + | services | array | + | registrations| array | + + Scenario: Skill catalog accessible via tunnel + Given the tunnel is active + And a /skill.md route is published + When a request arrives via the tunnel hostname for "/skill.md" + Then the response contains the machine-readable service catalog + + # ------------------------------------------------------------------- + # Storefront landing page + # ------------------------------------------------------------------- + + Scenario: Storefront landing page served at tunnel root + Given the tunnel is active with URL "https://stack.example.com" + When a request arrives at "https://stack.example.com/" + Then the storefront static HTML page is served + And the storefront is served by the busybox httpd in the "traefik" namespace + + Scenario: Storefront resources are created correctly + Given the tunnel is active + When the storefront is deployed + Then the following resources exist in the "traefik" namespace: + | kind | name | + | ConfigMap | tunnel-storefront | + | Deployment | tunnel-storefront | + | Service | tunnel-storefront | + | HTTPRoute | tunnel-storefront | + And the Deployment uses busybox httpd with 5m CPU and 8Mi RAM requests + + # ------------------------------------------------------------------- + # Tunnel management + # ------------------------------------------------------------------- + + Scenario: obol tunnel status shows tunnel state + Given the quick tunnel is active with URL "https://abc123.trycloudflare.com" + When I run "obol tunnel status" + Then the output shows the tunnel mode as "quick" + And the output shows the current tunnel URL + + Scenario: obol tunnel logs shows cloudflared output + Given the tunnel is active + When I run "obol tunnel logs" + Then the output streams logs from the cloudflared pod From 2d9176f374cc04af1dedafcb522cdd1baaf9c417 Mon Sep 17 00:00:00 2001 From: bussyjd Date: Sun, 29 Mar 2026 15:47:03 +0200 Subject: [PATCH 4/5] Rewrite spec bundle around PR288 baseline --- .codex/hooks.json | 37 + .codex/hooks/stop_spec_sync.py | 112 +++ .codex/hooks/workspace_context.py | 53 ++ ARCHITECTURE.md | 346 +++++++ BEHAVIORS_AND_EXPECTATIONS.md | 308 ++++++ CONTRIBUTING.md | 207 ++-- SPEC.md | 709 ++++++++++++++ docs/adr/0001-local-first-stack-runtime.md | 20 + docs/adr/0002-central-litellm-gateway.md | 20 + docs/adr/0003-distinct-network-domains.md | 20 + .../0004-openclaw-elevated-agent-runtime.md | 20 + .../0005-serviceoffer-skill-reconcile-loop.md | 20 + ...006-static-paid-namespace-buyer-sidecar.md | 20 + ...surfaces-with-optional-public-discovery.md | 20 + ...onical-root-spec-bundle-and-codex-hooks.md | 20 + ...2-exact-metering-after-pre-request-gate.md | 20 + .../0010-phase-2-agent-ready-cli-surfaces.md | 20 + docs/getting-started.md | 2 + docs/guides/monetize-inference.md | 8 +- docs/guides/monetize_sell_side_testing_log.md | 399 -------- docs/guides/monetize_test_coverage_report.md | 666 ------------- docs/monetisation-architecture-proposal.md | 480 ---------- docs/plans/buy-side-testing.md | 214 ----- docs/plans/cli-agent-readiness.md | 307 ------ docs/plans/multi-network-sell.md | 387 -------- docs/plans/per-token-metering.md | 164 ---- docs/x402-test-plan.md | 330 ------- features/application_management.feature | 36 + features/buy_side_payments.feature | 35 + features/frontend_and_monitoring.feature | 36 + features/llm_routing.feature | 37 + features/network_management.feature | 35 + features/openclaw_runtime.feature | 36 + features/sell_side_monetization.feature | 44 + features/stack_lifecycle.feature | 36 + features/tunnel_and_discovery.feature | 36 + plans/agent-services.md | 567 ----------- plans/litellmrouting.md | 123 --- plans/monetise.md | 480 ---------- plans/skills-host-path-injection-v3.md | 120 --- plans/skills-system-redesign-v2.md | 253 ----- plans/skills-system-redesign.md | 895 ------------------ plans/terminal-ux-improvement.md | 135 --- 43 files changed, 2247 insertions(+), 5586 deletions(-) create mode 100644 .codex/hooks.json create mode 100644 .codex/hooks/stop_spec_sync.py create mode 100644 .codex/hooks/workspace_context.py create mode 100644 ARCHITECTURE.md create mode 100644 BEHAVIORS_AND_EXPECTATIONS.md create mode 100644 SPEC.md create mode 100644 docs/adr/0001-local-first-stack-runtime.md create mode 100644 docs/adr/0002-central-litellm-gateway.md create mode 100644 docs/adr/0003-distinct-network-domains.md create mode 100644 docs/adr/0004-openclaw-elevated-agent-runtime.md create mode 100644 docs/adr/0005-serviceoffer-skill-reconcile-loop.md create mode 100644 docs/adr/0006-static-paid-namespace-buyer-sidecar.md create mode 100644 docs/adr/0007-local-only-operator-surfaces-with-optional-public-discovery.md create mode 100644 docs/adr/0008-canonical-root-spec-bundle-and-codex-hooks.md create mode 100644 docs/adr/0009-phase-2-exact-metering-after-pre-request-gate.md create mode 100644 docs/adr/0010-phase-2-agent-ready-cli-surfaces.md delete mode 100644 docs/guides/monetize_sell_side_testing_log.md delete mode 100644 docs/guides/monetize_test_coverage_report.md delete mode 100644 docs/monetisation-architecture-proposal.md delete mode 100644 docs/plans/buy-side-testing.md delete mode 100644 docs/plans/cli-agent-readiness.md delete mode 100644 docs/plans/multi-network-sell.md delete mode 100644 docs/plans/per-token-metering.md delete mode 100644 docs/x402-test-plan.md create mode 100644 features/application_management.feature create mode 100644 features/buy_side_payments.feature create mode 100644 features/frontend_and_monitoring.feature create mode 100644 features/llm_routing.feature create mode 100644 features/network_management.feature create mode 100644 features/openclaw_runtime.feature create mode 100644 features/sell_side_monetization.feature create mode 100644 features/stack_lifecycle.feature create mode 100644 features/tunnel_and_discovery.feature delete mode 100644 plans/agent-services.md delete mode 100644 plans/litellmrouting.md delete mode 100644 plans/monetise.md delete mode 100644 plans/skills-host-path-injection-v3.md delete mode 100644 plans/skills-system-redesign-v2.md delete mode 100644 plans/skills-system-redesign.md delete mode 100644 plans/terminal-ux-improvement.md diff --git a/.codex/hooks.json b/.codex/hooks.json new file mode 100644 index 00000000..73c79a35 --- /dev/null +++ b/.codex/hooks.json @@ -0,0 +1,37 @@ +{ + "hooks": { + "SessionStart": [ + { + "matcher": "startup|resume", + "hooks": [ + { + "type": "command", + "command": "/usr/bin/python3 \"$(git rev-parse --show-toplevel)/.codex/hooks/workspace_context.py\"", + "statusMessage": "Loading Obol Stack bundle context" + } + ] + } + ], + "UserPromptSubmit": [ + { + "hooks": [ + { + "type": "command", + "command": "/usr/bin/python3 \"$(git rev-parse --show-toplevel)/.codex/hooks/workspace_context.py\"" + } + ] + } + ], + "Stop": [ + { + "hooks": [ + { + "type": "command", + "command": "/usr/bin/python3 \"$(git rev-parse --show-toplevel)/.codex/hooks/stop_spec_sync.py\"", + "timeout": 30 + } + ] + } + ] + } +} diff --git a/.codex/hooks/stop_spec_sync.py b/.codex/hooks/stop_spec_sync.py new file mode 100644 index 00000000..a3f0ea59 --- /dev/null +++ b/.codex/hooks/stop_spec_sync.py @@ -0,0 +1,112 @@ +#!/usr/bin/env python3 + +import json +import os +import subprocess +import sys +from typing import Iterable + + +CANONICAL_PREFIXES = ( + "SPEC.md", + "ARCHITECTURE.md", + "BEHAVIORS_AND_EXPECTATIONS.md", + "CONTRIBUTING.md", + "features/", + "docs/adr/", +) + +SPEC_IMPACT_PREFIXES = ( + "cmd/obol/", + "internal/stack/", + "internal/model/", + "internal/network/", + "internal/openclaw/", + "internal/agent/", + "internal/x402/", + "internal/tunnel/", + "internal/erc8004/", + "internal/inference/", + "internal/embed/infrastructure/", + "internal/embed/skills/", + "internal/app/", + "internal/schemas/", +) + + +def git_root(cwd: str) -> str: + result = subprocess.run( + ["git", "rev-parse", "--show-toplevel"], + cwd=cwd, + check=True, + capture_output=True, + text=True, + ) + return result.stdout.strip() + + +def git_lines(root: str, args: list[str]) -> list[str]: + result = subprocess.run( + ["git", *args], + cwd=root, + check=True, + capture_output=True, + text=True, + ) + return [line.strip() for line in result.stdout.splitlines() if line.strip()] + + +def matches(path: str, prefixes: Iterable[str]) -> bool: + return any(path == prefix or path.startswith(prefix) for prefix in prefixes) + + +def main() -> int: + payload = json.load(sys.stdin) + cwd = payload.get("cwd") or os.getcwd() + + try: + root = git_root(cwd) + except Exception: + json.dump({"continue": True}, sys.stdout) + return 0 + + changed = set() + for args in ( + ["diff", "--name-only"], + ["diff", "--name-only", "--cached"], + ["ls-files", "--others", "--exclude-standard"], + ): + try: + changed.update(git_lines(root, args)) + except subprocess.CalledProcessError: + pass + + impacting = sorted(path for path in changed if matches(path, SPEC_IMPACT_PREFIXES)) + canonical = sorted(path for path in changed if matches(path, CANONICAL_PREFIXES)) + + if not impacting or canonical: + json.dump({"continue": True}, sys.stdout) + return 0 + + preview = ", ".join(impacting[:4]) + if len(impacting) > 4: + preview = f"{preview}, +{len(impacting) - 4} more" + + reason = ( + "Spec-impacting changes were detected in " + f"{preview}. Update the canonical root bundle " + "(SPEC.md, ARCHITECTURE.md, BEHAVIORS_AND_EXPECTATIONS.md, " + "CONTRIBUTING.md, features/, or docs/adr/) before ending the turn, " + "or explicitly explain why no spec change is required." + ) + + if payload.get("stop_hook_active"): + json.dump({"continue": False, "systemMessage": reason}, sys.stdout) + return 0 + + json.dump({"decision": "block", "reason": reason}, sys.stdout) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/.codex/hooks/workspace_context.py b/.codex/hooks/workspace_context.py new file mode 100644 index 00000000..d3ff2a75 --- /dev/null +++ b/.codex/hooks/workspace_context.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python3 + +import json +import os +import subprocess +import sys + + +def git_root(cwd: str) -> str: + try: + result = subprocess.run( + ["git", "rev-parse", "--show-toplevel"], + cwd=cwd, + check=True, + capture_output=True, + text=True, + ) + return result.stdout.strip() + except Exception: + return cwd + + +def main() -> int: + payload = json.load(sys.stdin) + cwd = payload.get("cwd") or os.getcwd() + root = git_root(cwd) + event_name = payload.get("hook_event_name") or "SessionStart" + + context = "\n".join( + [ + f"Repository conventions for {os.path.basename(root)}:", + "- PR288 is the behavioral baseline for the canonical bundle.", + "- The canonical bundle lives at repo root: SPEC.md, ARCHITECTURE.md, BEHAVIORS_AND_EXPECTATIONS.md, CONTRIBUTING.md, features/, docs/adr/.", + "- Actor priority is local operator, then agent developer, then remote buyer.", + "- Spec-impacting code changes must update the root bundle in the same turn.", + "- Future work belongs in explicit phase sections and ADR follow-ups, not ad hoc plan files.", + ] + ) + + json.dump( + { + "hookSpecificOutput": { + "hookEventName": event_name, + "additionalContext": context, + } + }, + sys.stdout, + ) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 00000000..af196b4f --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,346 @@ +# Obol Stack Architecture + +**Version**: 1.0.0-pr288 +**Status**: Living document +**Last Updated**: 2026-03-29 + +This document is the structural companion to [SPEC.md](SPEC.md). It focuses on component boundaries, data flow, deployment topology, and trust boundaries for the PR `#288` baseline. + +--- + +## Table of Contents + +1. [Design Philosophy](#1-design-philosophy) +2. [Component Diagrams](#2-component-diagrams) +3. [Module Decomposition](#3-module-decomposition) +4. [Data Flow Diagrams](#4-data-flow-diagrams) +5. [Storage Architecture](#5-storage-architecture) +6. [Deployment Model](#6-deployment-model) +7. [Network Topology](#7-network-topology) +8. [Security Architecture](#8-security-architecture) + +--- + +## 1. Design Philosophy + +Obol Stack is built around these principles: + +1. **Local-first sovereignty**: the operator machine remains the source of truth for cluster, wallet, and skill state. +2. **Single operator entry point**: the `obol` CLI is the primary control surface for lifecycle, routing, applications, and monetization. +3. **Centralized protocol translation**: LiteLLM centralizes model routing, Traefik centralizes HTTP routing, and eRPC centralizes chain access. +4. **Bounded trust**: payment execution, signing, and public routing are split into separate components with different privileges. +5. **Phased extensibility**: experimental or not-yet-fully-integrated surfaces are explicit phase follow-ups rather than hidden assumptions. + +System constraints are defined in [SPEC.md](SPEC.md#15-system-constraints). + +--- + +## 2. Component Diagrams + +### 2.1 C4 Context Diagram + +```mermaid +C4Context + title Obol Stack - System Context + + Person(operator, "Local Operator", "Starts the stack, manages services, inspects health") + Person(agent_dev, "Agent Developer", "Deploys and tunes OpenClaw instances and skills") + Person(remote_buyer, "Remote Buyer", "Pays for public services or remote models") + + System(obol, "Obol Stack", "Local-first agent and infrastructure platform") + + System_Ext(ollama, "Ollama", "Local host model runtime") + System_Ext(cloud_llm, "Cloud LLM APIs", "Anthropic and OpenAI providers") + System_Ext(chainlist, "ChainList", "Public RPC discovery") + System_Ext(facilitator, "x402 Facilitator", "Payment verification and settlement") + System_Ext(cloudflare, "Cloudflare", "Tunnel control plane and edge") + System_Ext(chains, "EVM Chains", "Payment and registration settlement") + System_Ext(charts, "ArtifactHub / OCI / Helm Repos", "Managed application sources") + + Rel(operator, obol, "CLI + browser") + Rel(agent_dev, obol, "CLI + embedded skills") + Rel(remote_buyer, obol, "HTTPS paid requests") + Rel(obol, ollama, "HTTP") + Rel(obol, cloud_llm, "HTTPS") + Rel(obol, chainlist, "HTTPS") + Rel(obol, facilitator, "HTTPS") + Rel(obol, cloudflare, "Browser auth, API, tunnel traffic") + Rel(obol, chains, "JSON-RPC via eRPC") + Rel(obol, charts, "HTTPS / OCI pull") +``` + +### 2.2 C4 Container Diagram + +```mermaid +C4Container + title Obol Stack - Container Diagram + + Person(operator, "Local Operator") + Person(remote_buyer, "Remote Buyer") + + System_Boundary(host, "Operator Machine") { + Container(cli, "obol CLI", "Go", "Lifecycle, routing, apps, monetization") + Container_Boundary(cluster, "Local k3d/k3s Cluster") { + Container(traefik, "Traefik", "Gateway API", "Ingress and route dispatch") + Container(cloudflared, "cloudflared", "Cloudflare Tunnel", "Public ingress bridge") + Container(litellm, "LiteLLM", "Python", "OpenAI-compatible model gateway") + Container(buyer, "x402-buyer", "Go sidecar", "Attaches pre-signed payments") + Container(erpc, "eRPC", "Go", "Blockchain RPC gateway") + Container(verifier, "x402-verifier", "Go", "ForwardAuth payment checks") + Container(agent, "OpenClaw", "OpenClaw runtime", "Agent instances and skills") + Container(frontend, "Frontend", "React app", "Operator dashboard") + ContainerDb(prom, "Prometheus", "Monitoring stack", "Metrics and scrape targets") + } + } + + System_Ext(ollama, "Ollama") + System_Ext(facilitator, "x402 Facilitator") + System_Ext(chain, "EVM Chain") + + Rel(operator, cli, "Runs commands") + Rel(operator, traefik, "Uses obol.stack") + Rel(remote_buyer, cloudflared, "HTTPS") + Rel(cloudflared, traefik, "HTTP") + Rel(traefik, frontend, "Route /") + Rel(traefik, erpc, "Route /rpc") + Rel(traefik, verifier, "ForwardAuth /verify") + Rel(traefik, litellm, "Route public services after auth") + Rel(litellm, buyer, "paid/* route") + Rel(litellm, ollama, "ollama/* models") + Rel(verifier, facilitator, "Verify payment") + Rel(erpc, chain, "JSON-RPC") + Rel(agent, erpc, "Chain queries") + Rel(agent, verifier, "Pricing config and registration side effects") + Rel(prom, buyer, "Scrapes /metrics") +``` + +### 2.3 Component Diagram: Sell-Side Control Loop + +```mermaid +C4Component + title Sell-Side Control Loop + + Component(cli, "sell commands", "Go", "Creates ServiceOffers and local gateways") + Component(offer, "ServiceOffer CRD", "Kubernetes API", "Declarative sell contract") + Component(mon, "monetize.py", "Python skill", "Skill-driven reconcile loop") + Component(ver, "x402-verifier", "Go", "Payment gate") + Component(route, "HTTPRoute / Middleware", "Gateway API + Traefik", "Traffic publication") + Component(reg, "registration publisher", "Python skill", "Generates well-known document and optional on-chain registration") + + Rel(cli, offer, "Create / update") + Rel(offer, mon, "Read / status patch") + Rel(mon, ver, "Update pricing inputs") + Rel(mon, route, "Create route + middleware") + Rel(mon, reg, "Publish registration") +``` + +--- + +## 3. Module Decomposition + +| Module | Responsibility | SPEC Reference | +|--------|----------------|----------------| +| `internal/stack` | Backend lifecycle and default infrastructure deployment | Section 3.1 | +| `internal/model` | Central LiteLLM routing and provider patching | Section 3.2 | +| `internal/network` | eRPC, local network deployments, public RPC management | Section 3.3 | +| `internal/openclaw` | OpenClaw overlays, tokens, skills, wallets, dashboards | Section 3.4 | +| `internal/agent` | Elevation of the default agent with monetization powers | Section 3.4 | +| `cmd/obol/sell.go` + `internal/x402` | Sell-side operator and verifier paths | Section 3.5 | +| `internal/x402/buyer` | Buy-side sidecar runtime | Section 3.6 | +| `internal/tunnel` | Quick and DNS tunnel lifecycle | Section 3.7 | +| `internal/app` | Managed Helm-chart workloads | Section 3.8 | + +--- + +## 4. Data Flow Diagrams + +### 4.1 Stack Startup + +```mermaid +sequenceDiagram + participant O as Operator + participant CLI as obol CLI + participant B as Backend + participant H as Helmfile + participant L as LiteLLM + participant OC as OpenClaw + participant A as agent.Init + participant T as Tunnel + + O->>CLI: obol stack up + CLI->>B: Up() + B-->>CLI: kubeconfig + CLI->>H: sync defaults + H-->>CLI: baseline infrastructure ready + CLI->>L: autoConfigureLLM() + CLI->>OC: SetupDefault() + CLI->>A: patch RBAC + HEARTBEAT.md + CLI->>T: start only if persistent DNS tunnel exists + CLI-->>O: obol.stack ready +``` + +### 4.2 Sell-Side Publication + +```mermaid +sequenceDiagram + participant O as Operator + participant CLI as sell command + participant K as Kubernetes API + participant M as monetize.py + participant V as x402-verifier + participant G as Traefik / Gateway API + participant R as Registry publisher + + O->>CLI: obol sell http ... + CLI->>K: create ServiceOffer + M->>K: read ServiceOffer + M->>K: patch status ModelReady / UpstreamHealthy + M->>V: publish pricing route + M->>G: create Middleware + HTTPRoute + M->>R: publish agent-registration.json + M->>K: patch Ready +``` + +### 4.3 Buy-Side Request + +```mermaid +sequenceDiagram + participant A as Agent + participant S as Remote Signer + participant C as ConfigMaps + participant L as LiteLLM + participant B as x402-buyer + participant Seller as Remote Seller + participant F as Facilitator + + A->>Seller: probe without payment + Seller-->>A: 402 pricing + A->>S: pre-sign N auths + A->>C: store upstream config + auth pool + A->>L: request model paid/ + L->>B: forward request + B->>Seller: request without payment + Seller-->>B: 402 + B->>Seller: retry with X-PAYMENT + Seller->>F: verify payment + F-->>Seller: verification result + Seller-->>B: 200 response + B-->>L: inference result +``` + +--- + +## 5. Storage Architecture + +### 5.1 Overview + +State is intentionally split between: +- local XDG filesystem state managed by the CLI +- Kubernetes resources in the local cluster +- external chain state and facilitator state that the stack references but does not own + +### 5.2 Schema Summary + +| Store | Entity | Key Fields | Purpose | +|-------|--------|-----------|---------| +| Local config dir | stack metadata | `.stack-id`, `.stack-backend`, `kubeconfig.yaml` | Stack identity and runtime targeting | +| Local config dir | deployment config | `applications//`, `networks//` | Declarative deployment inputs | +| Kubernetes API | `ServiceOffer` | `spec.upstream`, `spec.payment`, `status.conditions` | Sell-side contract and reconcile status | +| Kubernetes ConfigMaps | routing and pricing state | LiteLLM, eRPC, x402, buyer config | Dynamic runtime routing | +| Kubernetes Secrets | provider creds and tunnel token | API keys, tunnel token | Sensitive runtime inputs | + +--- + +## 6. Deployment Model + +### 6.1 Deployment Diagram + +```mermaid +graph TD + subgraph "Operator Host" + CLI["obol CLI"] + XDG["XDG config/data/state"] + OLLAMA["Ollama (optional)"] + end + + subgraph "Local k3d / k3s Cluster" + TRAEFIK["Traefik + Gateway"] + CLOUDFLARED["cloudflared"] + LLM["LiteLLM + x402-buyer"] + ERPC["eRPC"] + X402["x402-verifier"] + OCA["OpenClaw / obol-agent"] + FE["Frontend"] + MON["Monitoring"] + end + + CLI --> XDG + CLI --> TRAEFIK + CLI --> OCA + CLI --> ERPC + LLM --> OLLAMA + CLOUDFLARED --> TRAEFIK + TRAEFIK --> FE + TRAEFIK --> ERPC + TRAEFIK --> X402 + TRAEFIK --> LLM +``` + +### 6.2 Infrastructure Requirements + +| Resource | Requirement | Notes | +|----------|-------------|-------| +| Local runtime | Docker for `k3d` or direct host support for `k3s` | Backend-specific prerequisites | +| Filesystem | Writable XDG config/data/state dirs | Required for persistent stack state | +| Network | Local loopback plus outbound HTTPS | Needed for providers, ChainList, facilitator, Cloudflare | +| Optional Cloudflare account | Required only for persistent DNS tunnel | Quick tunnel path can remain local-first | + +--- + +## 7. Network Topology + +- `obol.stack` is the local operator hostname. +- Frontend and eRPC are intentionally bound behind `hostnames: ["obol.stack"]`. +- Public service routes flow through Cloudflare tunnel to Traefik, then through x402 ForwardAuth before reaching an upstream. +- Buyer-side `paid/*` traffic stays inside the cluster until the sidecar contacts a remote seller. +- Registration JSON is intentionally public and bypasses ForwardAuth. + +--- + +## 8. Security Architecture + +### 8.1 Trust Boundaries + +Trust boundaries exist between: +- operator host and local cluster +- local-only routes and public tunnel routes +- remote signer and buyer sidecar +- x402 verification and upstream service execution +- local filesystem state and external chain/facilitator systems + +### 8.2 Authentication Flow + +```mermaid +sequenceDiagram + participant Buyer as Remote Buyer + participant Traefik as Traefik + participant Verifier as x402-verifier + participant Fac as Facilitator + participant Upstream as Service + + Buyer->>Traefik: HTTPS request + Traefik->>Verifier: ForwardAuth /verify + Verifier->>Fac: validate X-PAYMENT + Fac-->>Verifier: result + Verifier-->>Traefik: 200 or 402 + Traefik->>Upstream: only after 200 +``` + +### 8.3 Data Encryption + +| Data | At Rest | In Transit | +|------|---------|-----------| +| Provider API keys | Kubernetes Secret | HTTPS to provider APIs | +| Wallet and backup material | Local data dir, optional encrypted backup | Local filesystem or remote signer API | +| Tunnel traffic | Cloudflare-managed | HTTPS / QUIC | +| Payment proofs | Not persisted by the sidecar beyond auth pool state | HTTPS to seller / facilitator | diff --git a/BEHAVIORS_AND_EXPECTATIONS.md b/BEHAVIORS_AND_EXPECTATIONS.md new file mode 100644 index 00000000..fa75e04f --- /dev/null +++ b/BEHAVIORS_AND_EXPECTATIONS.md @@ -0,0 +1,308 @@ +# Obol Stack - Behaviors and Expectations + +**Version**: 1.0.0-pr288 +**Status**: Living document +**Last Updated**: 2026-03-29 + +This document defines the behavioral contract for Obol Stack on the PR `#288` baseline. Every behavior here maps to current or planned BDD scenarios in [features/](features/). + +--- + +## Table of Contents + +1. [Introduction](#1-introduction) +2. [Desired Behaviors](#2-desired-behaviors) +3. [Undesired Behaviors](#3-undesired-behaviors) +4. [Edge Cases](#4-edge-cases) +5. [Performance Expectations](#5-performance-expectations) +6. [Guardrail Definitions](#6-guardrail-definitions) + +--- + +## 1. Introduction + +### 1.1 Purpose + +This is the behavioral contract for Obol Stack. It defines what the current branch should do, what it must not do, and how it should degrade when optional dependencies are absent. + +### 1.2 Reading Guide + +Behavior entries use: +- **Trigger**: what starts the behavior +- **Expected**: what the system should do +- **Rationale**: why the behavior matters + +Cross-references use `SPEC SS X.Y`, pointing to [SPEC.md](SPEC.md). + +### 1.3 Behavioral Priorities + +The behavior model is ordered by actor priority: +1. local operator +2. agent developer +3. remote buyer + +When tradeoffs conflict, operator safety and recoverability win. + +--- + +## 2. Desired Behaviors + +### 2.1 Stack Lifecycle + +> SPEC SS 3.1 + +#### B-2.1.1: Stack initialization persists a stable cluster identity + +**Trigger**: The operator runs `obol stack init`. +**Expected**: The CLI writes a stack ID, backend selection, and rendered defaults into the config directory. If `--force` is used against an existing stack, the stack ID is preserved unless the operator explicitly purges the stack. +**Rationale**: Persistent identity keeps local state, directory naming, and LiteLLM master-key derivation stable. + +#### B-2.1.2: Stack startup deploys defaults before optional public exposure + +**Trigger**: The operator runs `obol stack up`. +**Expected**: The cluster starts, baseline infrastructure is deployed through Helmfile, LiteLLM is auto-configured when possible, the default OpenClaw instance is prepared, and the tunnel remains dormant unless a persistent DNS tunnel was previously provisioned. +**Rationale**: Local operation is the primary mode. Public exposure must not be a prerequisite for core startup. + +#### B-2.1.3: Purge preserves data unless the operator explicitly requests destruction + +**Trigger**: The operator runs `obol stack purge`. +**Expected**: Config is removed and the cluster is destroyed, but persistent data survives unless `--force` is used. +**Rationale**: Wallets and agent state are valuable and must not be destroyed by the ordinary cleanup path. + +### 2.2 LLM Routing + +> SPEC SS 3.2 + +#### B-2.2.1: LiteLLM acts as the central operator-facing model gateway + +**Trigger**: An OpenClaw instance or operator-configured route needs model access. +**Expected**: Requests are routed through LiteLLM rather than per-instance ad hoc provider wiring. +**Rationale**: Central routing reduces duplication, keeps provider config consistent, and enables the static buy-side `paid/*` namespace. + +#### B-2.2.2: Model auto-configuration is best-effort, not mandatory + +**Trigger**: `stack up` runs on a host with or without Ollama or cloud credentials. +**Expected**: When models or credentials are discoverable they are applied automatically; otherwise the stack still starts and the operator can configure providers later. +**Rationale**: Startup should remain recoverable even when optional provider dependencies are absent. + +#### B-2.2.3: Custom OpenAI-compatible endpoints are validated before they are added + +**Trigger**: The operator runs `obol model setup custom ...`. +**Expected**: The endpoint is validated before it becomes part of the LiteLLM route set. +**Rationale**: Broken model entries create confusing downstream failures for operators and agents. + +### 2.3 Network Management + +> SPEC SS 3.3 + +#### B-2.3.1: Local installable networks and remote RPC aliases remain distinct + +**Trigger**: The operator uses `obol network install`, `list`, `add`, or `remove`. +**Expected**: Local deployable networks come only from embedded network bundles, while remote RPC aliases are resolved from the ChainList alias map and public RPC discovery flow. +**Rationale**: Treating these as separate prevents invalid support claims and operator confusion. + +#### B-2.3.2: Public RPC writes are blocked by default + +**Trigger**: The operator adds a remote chain without `--allow-writes`. +**Expected**: eRPC write methods remain blocked on that chain. +**Rationale**: Read-only defaults reduce the chance of accidental live transactions. + +#### B-2.3.3: Network status reflects current command semantics, not idealized per-deployment views + +**Trigger**: The operator runs `obol network status`. +**Expected**: The command reports current eRPC gateway health and upstream counts; it does not pretend to be a per-deployment local-node dashboard unless the implementation adds that contract. +**Rationale**: The spec must match the current CLI surface exactly. + +### 2.4 OpenClaw Runtime + +> SPEC SS 3.4 + +#### B-2.4.1: The default OpenClaw instance is the canonical elevated agent runtime + +**Trigger**: `stack up` completes on a branch where the default agent can be configured. +**Expected**: The `obol-agent` instance is created or re-synced and then elevated via RBAC patching and heartbeat injection. +**Rationale**: Monetization and cluster-aware agent behavior rely on a single canonical elevated runtime. + +#### B-2.4.2: Additional OpenClaw instances remain operator-managed deployments + +**Trigger**: The operator uses `obol openclaw onboard`, `sync`, `delete`, `dashboard`, `token`, `skills`, or wallet flows. +**Expected**: Instance selection, deployment directories, dashboard URLs, skills, and tokens are all managed through the CLI and persisted under managed config directories. +**Rationale**: OpenClaw instances are part of the platform control surface, not transient ad hoc workloads. + +### 2.5 Sell-Side Monetization + +> SPEC SS 3.5 + +#### B-2.5.1: Sell-side resources are created in the namespace the operator chose + +**Trigger**: The operator runs `obol sell http --namespace ...`. +**Expected**: The resulting `ServiceOffer` is created in `` and references the chosen upstream namespace explicitly. +**Rationale**: Namespace is an operator intent field and cannot be silently rewritten by the implementation or docs. + +#### B-2.5.2: Reconciliation advances through six explicit stages + +**Trigger**: A `ServiceOffer` is created or updated. +**Expected**: The offer advances through `ModelReady`, `UpstreamHealthy`, `PaymentGateReady`, `RoutePublished`, `Registered`, and `Ready`, with status updates visible to operators. +**Rationale**: Operators need a clear progress model for debugging sell-side failures. + +#### B-2.5.3: Registration failure degrades gracefully when possible + +**Trigger**: Registration is enabled but signer, gas, or RPC prerequisites are missing. +**Expected**: The service can remain payment-gated and publicly described with `Registered=True` and an `OffChainOnly` reason when that degraded path applies. +**Rationale**: Public discovery should not be all-or-nothing when the on-chain mint path is temporarily unavailable. + +#### B-2.5.4: Probe verifies the payment gate without consuming buyer budget + +**Trigger**: The operator runs `obol sell probe -n `. +**Expected**: The command sends an unauthenticated request, expects a `402` pricing response, and confirms that the route is live and payment-gated. +**Rationale**: Operators need a cheap verification path before involving a real buyer flow. + +### 2.6 Buy-Side Payments + +> SPEC SS 3.6 + +#### B-2.6.1: Paid model routing uses a static public namespace + +**Trigger**: An agent requests `paid/` through LiteLLM. +**Expected**: LiteLLM resolves the request to the buyer sidecar without requiring dynamic model-list rewrites for every purchased upstream. +**Rationale**: Static public naming keeps the buy-side integration simple and operationally stable. + +#### B-2.6.2: Buyer runtime spending is bounded by the pre-signed auth pool + +**Trigger**: The buyer sidecar serves paid requests. +**Expected**: It consumes only pre-signed authorizations and cannot mint new spend authority at runtime. +**Rationale**: This is the key safety property of the buy-side design. + +#### B-2.6.3: Unmapped paid models fail explicitly + +**Trigger**: A request arrives for `paid/` that does not map to a purchased upstream. +**Expected**: The request fails with a clear not-found style response rather than silently drifting to another provider. +**Rationale**: Silent fallback would break spending and trust assumptions. + +### 2.7 Tunnel, Discovery, Frontend, and Monitoring + +> SPEC SS 3.7 + +#### B-2.7.1: Quick tunnel activation is demand-driven + +**Trigger**: The stack starts without a pre-provisioned DNS tunnel. +**Expected**: Cloudflared remains dormant until a sell flow requires public exposure or the operator starts it manually. +**Rationale**: The operator should not pay the complexity cost of public exposure before it is needed. + +#### B-2.7.2: Public discovery metadata reflects the current tunnel URL + +**Trigger**: A tunnel URL becomes available or changes. +**Expected**: The stack updates `AGENT_BASE_URL` and syncs frontend-readable configuration so generated registration documents point at the current public origin. +**Rationale**: Discovery documents must describe reachable public endpoints. + +#### B-2.7.3: Frontend remains local-only unless the architecture changes deliberately + +**Trigger**: The operator accesses the dashboard. +**Expected**: The frontend is served under `obol.stack` and is not exposed by the public tunnel path. +**Rationale**: The frontend is an operator control surface, not a public buyer surface. + +### 2.8 Managed Applications and Supporting Operations + +> SPEC SS 3.8 + +#### B-2.8.1: Managed applications behave like named, persistent deployments + +**Trigger**: The operator runs `obol app install`, `sync`, `list`, or `delete`. +**Expected**: Chart references are resolved, persisted under managed config paths, and deployed or removed through explicit CLI flows. +**Rationale**: Application management should match the rest of the stack’s declarative local-state model. + +--- + +## 3. Undesired Behaviors + +### 3.1 Exposure and Safety + +#### U-3.1.1: Local-only operator routes are reachable through the public tunnel + +**Trigger**: Route configuration removes or bypasses `obol.stack` hostname restrictions for frontend, eRPC, or monitoring. +**Expected**: The change is rejected or treated as a critical regression. +**Risk**: Public exposure of operator-only surfaces weakens the main trust boundary of the stack. + +#### U-3.1.2: Remote RPC write capability is enabled by default + +**Trigger**: A newly added public RPC upstream forwards write methods without explicit opt-in. +**Expected**: Write methods remain blocked unless the operator used `--allow-writes`. +**Risk**: Unintended live-chain transactions become possible through a read-mostly operator flow. + +#### U-3.1.3: Buyer runtime receives live signing authority + +**Trigger**: Runtime changes allow the sidecar to contact the remote signer or mint new spend approvals. +**Expected**: The runtime remains restricted to pre-signed authorizations only. +**Risk**: The bounded-spend trust model collapses. + +### 3.2 Contract Drift + +#### U-3.2.1: Documentation claims operator support that the CLI does not ship + +**Trigger**: Specs or guides describe commands, flags, or supported chains that do not exist in the branch. +**Expected**: The canonical bundle is corrected to the current code surface, with future work moved into phased sections. +**Risk**: Operators make invalid assumptions and the spec stops being implementation-ready. + +--- + +## 4. Edge Cases + +### 4.1 Startup and Operator Recovery + +#### E-4.1.1: No local model provider is immediately available + +**Scenario**: The stack starts without Ollama models and without imported cloud credentials. +**Expected Handling**: Core infrastructure still starts; OpenClaw setup may be skipped or remain partially configured until the operator runs explicit provider setup. +**Rationale**: Provider absence should not destroy the local operator path. + +#### E-4.1.2: Helmfile sync fails during startup + +**Scenario**: Default infrastructure deployment fails mid-startup. +**Expected Handling**: The stack automatically runs a cleanup-oriented shutdown path. +**Rationale**: A half-started cluster is more dangerous than a failed startup. + +### 4.2 Payments and Registration + +#### E-4.2.1: Registration wallet has no gas + +**Scenario**: A service is ready for publication but the registration wallet cannot submit an on-chain transaction. +**Expected Handling**: The service degrades to `OffChainOnly` rather than disappearing entirely. +**Rationale**: Discovery metadata is still valuable even when chain settlement is temporarily blocked. + +#### E-4.2.2: Buyer auth pool is exhausted + +**Scenario**: A purchased upstream has no remaining signed authorizations. +**Expected Handling**: Requests fail explicitly until the operator or agent refills the pool. +**Rationale**: Silent fallback would break billing and hide a capacity problem. + +### 4.3 Selection Ambiguity + +#### E-4.3.1: Multiple deployments of the same type exist + +**Scenario**: The operator has multiple OpenClaw instances, app deployments, or network deployments. +**Expected Handling**: Commands auto-select only when there is exactly one unambiguous target; otherwise they require the operator to specify the target. +**Rationale**: Ambiguous automation is more dangerous than an extra required argument. + +--- + +## 5. Performance Expectations + +| Behavior | Target | Measurement | Degradation Handling | +|----------|--------|-------------|---------------------| +| ChainList discovery | bounded by 15s timeout | `internal/network/chainlist.go` timeout | operator retries with custom endpoint | +| Tunnel startup | bounded by 30s rollout wait | `tunnel.EnsureRunning()` rollout status | local path remains available | +| LiteLLM restart | bounded by 90s rollout wait | `model.RestartLiteLLM()` rollout status | operator reruns provider setup or inspects deployment | +| Buyer metrics visibility | 30s scrape interval | PodMonitor config | stale metrics do not block inference | + +--- + +## 6. Guardrail Definitions + +### 6.1 Non-Negotiable Guardrails + +| Guardrail | Rule | Enforcement | Violation Response | +|-----------|------|-------------|-------------------| +| Local-only surfaces | Frontend, eRPC, and monitoring stay behind `obol.stack` hostname restrictions | route templates, review, spec bundle | treat as critical regression | +| Static paid namespace | Buy-side public names remain `paid/` | LiteLLM config model, buyer sidecar routing | reject drifting implementations | +| Namespace fidelity | `sell http --namespace ` creates the `ServiceOffer` in `` | CLI manifest generation | treat mismatched docs or code as bug | +| Phase discipline | future behavior must live in phased sections or ADR follow-ups | canonical root-level bundle and hooks | block or fix before merge | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index fad91f69..4b2458ea 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,90 +1,171 @@ -# Contributing to Blockchain Helm Charts +# Contributing to Obol Stack -Thank you for considering contributing to this project! This document provides guidelines to help you contribute effectively. +This document defines the non-negotiable contribution rules for the consolidated Obol Stack codebase and spec bundle. -## Getting Started +--- -### Prerequisites +## 1. Canonical Documents -- Kubernetes knowledge -- Helm chart development experience -- Understanding of the specific blockchain client you're creating/modifying a chart for +The canonical specification bundle lives at repo root: -### Development Environment +- `SPEC.md` +- `ARCHITECTURE.md` +- `BEHAVIORS_AND_EXPECTATIONS.md` +- `CONTRIBUTING.md` +- `features/` +- `docs/adr/` -1. Install [Helm](https://helm.sh/docs/intro/install/) -2. Install [kubectl](https://kubernetes.io/docs/tasks/tools/) -3. Set up a Kubernetes environment (minikube, kind, or a cloud provider) +Supporting material in `docs/guides/` can remain useful, but it is **not** authoritative once the root-level bundle covers the same topic. +Planning or architecture notes must be folded into `SPEC.md` phase sections or `docs/adr/` instead of living as parallel sources of truth under `docs/plans/` or `plans/`. -## Chart Development Guidelines +If code and docs disagree: +- code is the temporary source of truth +- the root-level bundle must be updated in the same change or immediately after -### Chart Structure +--- -Each chart should follow this structure: -``` -charts// -├── Chart.yaml -├── values.yaml -├── templates/ -│ ├── deployment.yaml -│ ├── service.yaml -│ ├── configmap.yaml -│ ├── secret.yaml (if needed) -│ ├── pvc.yaml (if needed) -│ └── NOTES.txt -├── OWNERS (maintainers list) -└── README.md (chart-specific documentation) -``` +## 2. Actor Priority + +When making product or UX tradeoffs, preserve this order: + +1. Local operator +2. Agent developer +3. Remote buyer + +This affects: +- defaults +- failure handling +- public exposure rules +- CLI ergonomics +- phased rollout decisions + +--- + +## 3. Documentation Update Rules + +Any change touching these areas is spec-impacting and must update the canonical bundle when it changes behavior: -### Requirements +- `cmd/obol/` +- `internal/stack/` +- `internal/model/` +- `internal/network/` +- `internal/openclaw/` +- `internal/agent/` +- `internal/x402/` +- `internal/x402/buyer/` +- `internal/tunnel/` +- `internal/erc8004/` +- `internal/inference/` +- `internal/embed/infrastructure/` +- `internal/embed/skills/` +- `internal/app/` +- `internal/schemas/` -- Charts must be compatible with Helm 3 -- Include comprehensive documentation -- Provide sensible defaults in values.yaml -- Include proper Kubernetes resource requests and limits -- Follow security best practices +Rules: +- describe only behavior that is actually implemented on the branch +- move future work into `Phase 2+` sections and ADR follow-ups +- do not silently broaden support claims +- do not collapse different chain domains into one “supported networks” statement -### Values.yaml +Current chain domains that must stay distinct: +- installable local networks +- eRPC remote RPC aliases +- sell-side payment chains +- ERC-8004 registration networks -- Group related values logically -- Add comments explaining the purpose of values -- Include sensible defaults that work out-of-the-box -- Provide examples for custom configurations +--- -## Pull Request Process +## 4. Feature and ADR Discipline -1. Fork the repository -2. Create a new branch for your changes -3. Make your changes following the chart development guidelines -4. Test your charts thoroughly -5. Submit a pull request -6. Address review comments +Feature files: +- live under `features/` +- start with `@bdd` +- reference both `SPEC.md` and `BEHAVIORS_AND_EXPECTATIONS.md` +- use `@phase1`, `@phase2`, etc. when phases matter -### Pull Request Checklist +ADRs: +- live under `docs/adr/` +- record durable architectural decisions, not transient implementation chatter +- must note the affected `SPEC.md` sections -- [ ] Chart version updated according to semantic versioning -- [ ] Chart README.md updated with any new values or changes -- [ ] Chart has been tested and verified to work -- [ ] `helm lint` passes without warnings -- [ ] `helm template` generates valid Kubernetes resources +--- -## Testing Your Chart +## 5. Development Expectations + +Baseline validation before sending a substantial code change: + +```bash +go build ./... +go test ./... +``` + +When the change touches the monetization path, strongly prefer validating one or more of: ```bash -# Lint the chart -helm lint charts/your-chart +./flows/flow-06-sell-setup.sh +./flows/flow-07-sell-verify.sh +./flows/flow-08-buy.sh +./flows/flow-10-anvil-facilitator.sh +``` -# Render the templates -helm template charts/your-chart +When the change touches embedded skills or sell-side metadata, also consider: -# Install the chart in a test environment -helm install test-release charts/your-chart --dry-run +```bash +python3 tests/skills_smoke_test.py +python3 tests/test_sell_registration_metadata.py +python3 tests/test_autoresearch_worker.py ``` -## Code of Conduct +--- + +## 6. Security and Exposure Guardrails + +Never merge a change that: +- exposes frontend, eRPC, monitoring, or similar operator surfaces to the public tunnel +- gives the buyer sidecar live signer access +- changes sell-side chain support claims without updating both CLI behavior and docs +- enables write-capable public RPC forwarding by default +- removes the `OffChainOnly` degradation path without a replacement operator-safe fallback + +--- + +## 7. Hook-Based Drift Detection + +Repo-local Codex hooks should be treated as guardrails, not as a substitute for human judgment. + +Intended behavior: +- session-start hooks remind Codex that the root-level bundle is canonical +- stop hooks block or warn when spec-impacting code changed but the canonical bundle did not + +To enable Codex hooks locally: + +```toml +# ~/.codex/config.toml +[features] +codex_hooks = true +``` + +The repository hook entrypoint is: + +- `.codex/hooks.json` + +Hook scripts belong under: + +- `.codex/hooks/` + +This repository currently ships: + +- `.codex/hooks/workspace_context.py` +- `.codex/hooks/stop_spec_sync.py` + +If hooks and code ever disagree, fix the hooks or the bundle. Do not paper over the mismatch. -Please respect other contributors and maintain a positive environment for everyone. +--- -## Thank You +## 8. Pull Request Checklist -Your contributions help make this project better for everyone! +- [ ] Behavior changes are reflected in the root-level canonical bundle +- [ ] Future work is isolated into phases or ADR follow-ups +- [ ] Operator-facing claims match the actual CLI and runtime surface +- [ ] Security exposure boundaries were preserved +- [ ] Tests or flow validations were run, or the omission is explicitly stated diff --git a/SPEC.md b/SPEC.md new file mode 100644 index 00000000..fb875d89 --- /dev/null +++ b/SPEC.md @@ -0,0 +1,709 @@ +# Obol Stack Technical Specification + +**Version**: 1.0.0-pr288 +**Status**: Living document +**Last Updated**: 2026-03-29 + +This document is the authoritative technical specification for Obol Stack on the PR `#288` integration baseline. It describes the system that is actually implemented on this branch, with future work isolated into explicit phased rollout items and ADR follow-ups. + +Primary actor priority: +- Local operator +- Agent developer +- Remote buyer + +--- + +## Table of Contents + +1. [Introduction](#1-introduction) +2. [System Architecture](#2-system-architecture) +3. [Core Subsystems](#3-core-subsystems) +4. [API / Protocol Definition](#4-api--protocol-definition) +5. [Data Model](#5-data-model) +6. [Integration Points](#6-integration-points) +7. [Security Model](#7-security-model) +8. [Error Handling](#8-error-handling) +9. [Performance and Operations](#9-performance-and-operations) +10. [Phased Rollout](#10-phased-rollout) +11. [Testing Strategy](#11-testing-strategy) + +--- + +## 1. Introduction + +### 1.1 Purpose + +Obol Stack is a local-first Kubernetes platform for running AI agent infrastructure, blockchain connectivity, payment-gated services, and public discovery from a single operator-controlled machine. This specification defines the expected structure and behavior of the stack as shipped on the PR `#288` branch. + +### 1.2 Scope + +The system: +- Initializes and manages a local `k3d` or `k3s` cluster from an XDG-compliant CLI. +- Deploys default infrastructure: Traefik, eRPC, LiteLLM, Cloudflare tunnel connector, monitoring, frontend, and OpenClaw. +- Lets the operator configure local and cloud model providers through a central LiteLLM gateway. +- Lets the operator install local blockchain nodes and add remote RPC upstreams to eRPC. +- Runs OpenClaw instances with embedded skills, wallet management, and an elevated default `obol-agent`. +- Sells local services through x402 payment gates and optional ERC-8004 registration. +- Buys remote x402-gated inference through a bounded-risk sidecar pattern. +- Exposes local-only and public routes with different trust boundaries. +- Installs arbitrary Helm charts as managed applications. + +The system does **not**: +- Operate as a hosted multi-tenant SaaS control plane. +- Assume public exposure is required for the core local operator path. +- Guarantee exact token metering for every pricing model in the current phase. +- Treat every chain known to eRPC or `internal/x402` as an operator-supported sell-side CLI chain. +- Replace direct Kubernetes administration for users who want bespoke cluster changes outside Obol-managed paths. + +### 1.3 Personas + +| Persona | Goal | Primary Interfaces | +|--------|------|--------------------| +| Local operator | Bring up the stack, manage infra, expose services, inspect health | `obol` CLI, `http://obol.stack`, tunnel URL | +| Agent developer | Deploy and tune OpenClaw instances, skills, wallets, model routes | `obol openclaw`, `obol model`, embedded skills | +| Remote buyer | Discover and pay for a service or remote model | x402-gated HTTP endpoints, `paid/` through LiteLLM | + +### 1.4 Terminology and Glossary + +| Term | Definition | +|------|-----------| +| **Stack ID** | Petname-based identifier persisted in `$OBOL_CONFIG_DIR/.stack-id`; used for cluster identity and LiteLLM master key derivation. | +| **Backend** | Local cluster runtime: `k3d` (Docker-based) or `k3s` (bare-metal). | +| **ServiceOffer** | Namespaced CRD (`obol.org/v1alpha1`) describing a sell-side service, payment terms, route path, provenance, and registration metadata. | +| **eRPC** | In-cluster blockchain RPC gateway that multiplexes local node and public RPC upstreams. | +| **LiteLLM** | Central OpenAI-compatible model gateway in the `llm` namespace. | +| **OpenClaw instance** | A deployed AI agent runtime managed through `obol openclaw ...`. | +| **obol-agent** | The canonical default OpenClaw instance with elevated RBAC and a heartbeat-based reconciliation loop. | +| **x402-verifier** | ForwardAuth service that matches routes, emits `402 Payment Required`, and delegates verification to a facilitator. | +| **x402-buyer** | Sidecar in the LiteLLM pod that attaches pre-signed payment headers to paid upstream requests. | +| **Remote signer** | In-cluster signing service used by OpenClaw and registration flows; separate from the buyer sidecar. | +| **AGENT_BASE_URL** | Environment variable injected into the default agent deployment so generated registration documents use the current tunnel URL. | + +### 1.5 System Constraints + +| Constraint | Detail | +|-----------|--------| +| **Local-first execution** | The operator machine is the source of truth; cluster state, skills, wallet material, and configuration are rooted in local XDG paths. | +| **Actor priority** | The local operator path takes precedence over agent-developer ergonomics, which in turn take precedence over remote-buyer convenience. | +| **Backend exclusivity** | The stack supports exactly one active backend per config directory: `k3d` or `k3s`. Backend switching must tear down the old cluster first. | +| **Public exposure is optional** | The quick tunnel is dormant by default and only activates on sell flows unless a persistent DNS tunnel was provisioned. | +| **Chain domains are distinct** | Local installable networks, eRPC remote RPC aliases, sell-side payment chains, and ERC-8004 registration networks are related but not interchangeable. | +| **Least-public routing** | Frontend, eRPC, monitoring, and other operator surfaces are local-only under `hostnames: ["obol.stack"]`; public tunnel surfaces are intentionally narrower. | +| **Destructive cleanup is explicit** | `stack purge` preserves data by default; deleting root-owned persistent data requires `--force` and `sudo`. | +| **Phase discipline** | Future work must be recorded in explicit phase sections or ADR follow-ups, not blended into current-shipping behavior. | + +--- + +## 2. System Architecture + +### 2.1 High-Level Overview + +Obol Stack is a single-node, operator-managed platform with three concentric planes: + +1. **Control plane**: the `obol` CLI and XDG filesystem state. +2. **Cluster plane**: Traefik, LiteLLM, eRPC, OpenClaw, x402 services, frontend, monitoring, and Cloudflare tunnel connector. +3. **External plane**: Ollama and cloud LLM providers, ChainList, x402 facilitator, Cloudflare, and EVM chains used for payment or registration. + +### 2.2 Module Decomposition + +| Module | Purpose | Key Dependencies | +|--------|---------|-----------------| +| `cmd/obol` | User-facing CLI surface | `internal/*`, `urfave/cli/v3` | +| `internal/stack` | Stack init/up/down/purge, backend management, default infra sync | `internal/embed`, `internal/model`, `internal/openclaw`, `internal/agent`, `internal/tunnel` | +| `internal/model` | LiteLLM provider configuration and model synchronization | Kubernetes ConfigMaps/Secrets, Ollama, cloud APIs | +| `internal/network` | Local node deployment and eRPC remote upstream management | Embedded network charts, ChainList, eRPC ConfigMap | +| `internal/openclaw` | Instance onboarding, overlays, dashboard, token, skills, wallet flows | Helmfile, embedded skills, DNS, LiteLLM | +| `internal/agent` | Elevates the default OpenClaw instance with monetization RBAC and heartbeat behavior | Kubernetes RBAC, local data volume | +| `internal/x402` | Sell-side verifier, pricing config, watcher, setup, metrics | x402 facilitator, Traefik ForwardAuth | +| `internal/x402/buyer` | Buy-side paid upstream proxy with pre-signed auth pools | LiteLLM, remote sellers, ConfigMaps | +| `internal/erc8004` | Registration clients, network registry, types, signer integration | eRPC, registry contracts, remote signer | +| `internal/tunnel` | Quick and DNS Cloudflare tunnel lifecycle | cloudflared, Cloudflare APIs, frontend ConfigMap | +| `internal/app` | Managed application install/sync/list/delete | ArtifactHub, OCI/HTTP charts, Helmfile | + +### 2.3 Critical Lifecycles + +#### 2.3.1 Operator Startup Lifecycle + +1. `obol stack init` creates config directories, chooses a backend, writes `.stack-id` and `.stack-backend`, and materializes default infrastructure templates. +2. `obol stack up` starts the local cluster and writes `kubeconfig.yaml`. +3. `syncDefaults()` deploys baseline infrastructure via Helmfile. +4. `autoConfigureLLM()` patches LiteLLM for detected Ollama models and imported cloud credentials. +5. `openclaw.SetupDefault()` creates or re-syncs the default `obol-agent` instance. +6. `agent.Init()` patches monetization RBAC and injects `HEARTBEAT.md`. +7. DNS is configured for `obol.stack` and, if provisioned, a persistent tunnel is started. + +#### 2.3.2 Sell-Side Lifecycle + +1. The operator creates a sell surface using `obol sell http ...` or `obol sell inference ...`. +2. A `ServiceOffer` CR or host-side gateway deployment is created and persisted. +3. The `monetize.py` reconciler evaluates the offer through `ModelReady`, `UpstreamHealthy`, `PaymentGateReady`, `RoutePublished`, `Registered`, and `Ready`. +4. Traefik routes public traffic through x402 ForwardAuth. +5. If registration is enabled, `/.well-known/agent-registration.json` is published and on-chain registration is attempted. + +#### 2.3.3 Buy-Side Lifecycle + +1. The agent probes a seller to read its 402 pricing response. +2. The agent pre-signs a bounded batch of ERC-3009 authorizations through the remote signer. +3. Buyer config and auth pools are stored in `llm` namespace ConfigMaps. +4. LiteLLM receives a request for `paid/` and forwards to the local sidecar. +5. The sidecar retries the upstream request with `X-PAYMENT`, consumes one auth, and tracks remaining budget. + +--- + +## 3. Core Subsystems + +### 3.1 Stack Lifecycle + +#### 3.1.1 Purpose + +Provide a single CLI-managed entry point for provisioning, starting, stopping, and destroying the full local stack. + +#### 3.1.2 Inputs and Outputs + +Inputs: +- XDG or `OBOL_*` path environment variables. +- Backend selection (`k3d` or `k3s`). +- Local prerequisites such as Docker or local filesystem access. + +Outputs: +- Stack config under `$OBOL_CONFIG_DIR`. +- Persistent data under `$OBOL_DATA_DIR`. +- Runtime state under `$OBOL_STATE_DIR`. +- A working kubeconfig and running cluster. + +#### 3.1.3 Startup Sequence + +`stack up` is intentionally opinionated: +- Start backend. +- Write kubeconfig. +- Run Helmfile over embedded defaults. +- Auto-configure LiteLLM. +- Create or refresh default OpenClaw. +- Apply agent capabilities. +- Configure local DNS. +- Start tunnel only when persistent DNS state already exists. + +A Helmfile failure is treated as fatal and triggers an automatic `stack down` cleanup path. + +#### 3.1.4 Shutdown and Purge + +- `stack down` stops the cluster and DNS helper but preserves config and data. +- `stack purge` destroys cluster state and removes config. +- `stack purge --force` additionally removes persistent data and prompts for wallet backup before destruction. + +### 3.2 LLM Routing and Provider Management + +#### 3.2.1 Purpose + +Centralize model routing through one OpenAI-compatible gateway so OpenClaw instances and paid model paths use a single runtime interface. + +#### 3.2.2 Provider Model + +Supported provider classes on this branch: +- `ollama` +- `anthropic` +- `openai` +- custom OpenAI-compatible endpoints + +Key properties: +- LiteLLM config lives in `litellm-config` ConfigMap in namespace `llm`. +- Provider secrets live in `litellm-secrets`. +- Auto-discovery during `stack up` is best-effort, not required for later manual setup. +- After provider changes, configured models are synchronized back into OpenClaw overlays to avoid route drift. + +#### 3.2.3 Static Paid Namespace + +The buy-side path is intentionally static: +- Public model names are always `paid/`. +- LiteLLM keeps a permanent wildcard route. +- Purchased model changes update buyer ConfigMaps, not LiteLLM topology. + +This keeps the payment path isolated from the rest of model routing. + +### 3.3 Network Management and eRPC + +#### 3.3.1 Chain Domains + +Obol Stack uses four separate chain domains: + +| Domain | Current Source of Truth | Examples | +|-------|--------------------------|----------| +| Local installable networks | `internal/embed/networks/` | `ethereum`, `aztec` | +| eRPC remote RPC aliases | `internal/network/chainlist.go` | `base`, `mainnet`, `polygon`, `avalanche`, `hoodi` | +| Sell-side payment chains | `cmd/obol/sell.go` | `base-sepolia`, `base`, `ethereum` | +| ERC-8004 registration chains | `internal/erc8004/networks.go` | `base-sepolia`, `base`, `ethereum` | + +Documentation and behavior must not collapse these into a single “supported networks” statement. + +#### 3.3.2 Local Networks + +Local installable networks are embedded Helmfile/chart bundles. On this branch: +- `ethereum` +- `aztec` + +`obol network install ` renders `values.yaml` from annotated templates, copies the network bundle into `$OBOL_CONFIG_DIR/networks///`, and waits for explicit `network sync` to deploy it. + +#### 3.3.3 Remote RPC Networks + +`obol network add ` uses ChainList to fetch public HTTPS RPCs, filters and ranks them, and writes them into eRPC configuration. By default: +- only free/public HTTPS endpoints are accepted +- full-tracking endpoints are rejected +- write methods remain blocked + +`network remove` removes ChainList-sourced upstreams for a chain without touching local node upstreams or custom endpoints. + +#### 3.3.4 Route Exposure + +eRPC is exposed locally at `http://obol.stack/rpc` behind Traefik. Traffic is still passed through the x402 middleware path, but the verifier returns `200` for unmatched routes or routes with no active pricing rule. + +### 3.4 OpenClaw Runtime and Agent Capabilities + +#### 3.4.1 Purpose + +Manage AI agent instances as first-class stack workloads with operator-controlled overlays, credentials, skills, and wallets. + +#### 3.4.2 Instance Model + +OpenClaw instances are stored under: +- `$OBOL_CONFIG_DIR/applications/openclaw//` + +Each instance has: +- Helmfile deployment metadata +- Obol overlay values +- optional imported provider/channel settings +- skill injection into persistent volume paths + +The canonical default instance is `obol-agent`. It is re-synced idempotently by `stack up`. + +#### 3.4.3 Agent Elevation + +`agent.Init()` does not create a separate controller binary. Instead it: +- patches monetization ClusterRoleBindings and a pricing RoleBinding +- injects `HEARTBEAT.md` into the default agent workspace so heartbeat cycles run `monetize.py process --all --quick` + +This makes monetization behavior part of the default agent runtime, not a parallel control plane. + +#### 3.4.4 Operator Surfaces + +Key instance operations: +- onboard or scaffold +- sync +- retrieve or regenerate gateway token +- open dashboard +- manage skills +- backup or restore wallet material +- shell out to the embedded OpenClaw CLI + +### 3.5 Sell-Side Monetization + +#### 3.5.1 Purpose + +Expose local services through x402 payment gates and optional ERC-8004 public discovery without requiring a separate Kubernetes operator binary. + +#### 3.5.2 Operator Commands + +Current sell-side CLI surface: +- `sell inference` +- `sell http` +- `sell list` +- `sell status` +- `sell probe` +- `sell stop` +- `sell delete` +- `sell pricing` +- `sell register` + +#### 3.5.3 ServiceOffer CRD + +`ServiceOffer` is the declarative contract for sell-side workloads. Required fields are: +- `spec.upstream` +- `spec.payment` + +Optional but meaningful fields are: +- `spec.type` +- `spec.model` +- `spec.provenance` +- `spec.path` +- `spec.registration` + +Status includes: +- `conditions[]` +- `endpoint` +- `agentId` +- `registrationTxHash` + +#### 3.5.4 Reconciliation Stages + +The current skill-driven reconcile loop uses these stages: +1. `ModelReady` +2. `UpstreamHealthy` +3. `PaymentGateReady` +4. `RoutePublished` +5. `Registered` +6. `Ready` + +Registration is intentionally degradable. If the signer, RPC path, or gas funding is unavailable, the service can remain public and payment-gated with `Registered=True` and reason `OffChainOnly`. + +#### 3.5.5 Pricing Models + +Current pricing models on this branch: +- `perRequest` +- `perMTok` +- `perHour` +- `perEpoch` in schema, but not a first-class operator flow yet + +Current enforcement reality: +- `perRequest` is direct +- `perMTok` is approximated to a request price using `1000` tokens per request +- `perHour` is approximated to a request price using `5` minutes per request in the current monetization skill + +These approximations are current implementation behavior, not future exact-metering guarantees. + +#### 3.5.6 Standalone Inference Gateway + +`sell inference` supports two related paths: +- standalone host-side x402-gated gateway +- cluster-aware mode, where a host-side gateway is wrapped by a `ServiceOffer` and cluster routing + +Optional attestation-related inputs already exist on this branch: +- macOS Secure Enclave +- Linux TEE backends +- provenance metadata for experiment output + +### 3.6 Buy-Side Remote Inference + +#### 3.6.1 Purpose + +Allow agents to pay for remote x402-gated inference without giving the runtime access to live signing keys. + +#### 3.6.2 Design + +The buy-side path uses: +- a pre-signing step through the remote signer +- `x402-buyer-config` ConfigMap +- `x402-buyer-auths` ConfigMap +- an `x402-buyer` sidecar in the LiteLLM pod +- a static public model namespace `paid/` + +Runtime properties: +- zero signer access in the sidecar +- bounded spending equal to remaining auth count times unit price +- OpenAI-compatible reverse proxy interfaces +- `/healthz`, `/status`, and `/metrics` endpoints + +### 3.7 Tunnel, Discovery, Frontend, and Monitoring + +#### 3.7.1 Tunnel Modes + +Current tunnel modes: +- `quick`: dormant until a sell flow requires public exposure +- `dns`: persistent hostname-based tunnel created via browser login or API-token provisioning + +When a tunnel URL becomes available, the stack updates: +- `AGENT_BASE_URL` on the default agent deployment +- the frontend configuration ConfigMap + +#### 3.7.2 Public vs Local Routes + +Local-only operator routes: +- `http://obol.stack/` +- `http://obol.stack/rpc` +- monitoring and internal admin surfaces via hostname restriction + +Public tunnel routes: +- `/services//...` +- `/.well-known/agent-registration.json` +- storefront and machine-readable service catalog surfaces + +#### 3.7.3 Frontend and Monitoring + +The stack ships: +- `obol-frontend` namespace for the dashboard +- `monitoring` namespace with kube-prometheus-stack +- a PodMonitor for the buyer sidecar + +The frontend is allowed to discover namespaces, pods, ConfigMaps, Secrets, and `ServiceOffer` resources through an explicit ClusterRoleBinding. + +### 3.8 Application Management and Supporting Operations + +#### 3.8.1 Managed Applications + +`obol app install/sync/list/delete` lets operators treat arbitrary Helm charts as managed workloads under `$OBOL_CONFIG_DIR/applications///`. + +Supported chart references: +- `repo/chart` +- `repo/chart@version` +- `https://.../*.tgz` +- `oci://...` + +#### 3.8.2 Supporting Operations + +The branch also includes: +- update and upgrade commands +- flow scripts validating sell and buy paths +- optional subprojects such as `reth-erc8004-indexer` +- embedded skills for autoresearch-related workloads + +These supporting operations are part of the repository surface, but not all of them are yet first-class operator workflows. + +--- + +## 4. API / Protocol Definition + +### 4.1 CLI Surface + +| Surface | Current Commands | +|--------|-------------------| +| Stack | `stack init`, `stack up`, `stack down`, `stack purge` | +| Agent | `agent init` | +| Models | `model setup`, `model status`, `model sync`, `model pull`, `model list`, `model remove` | +| Networks | `network list`, `network install`, `network sync`, `network delete`, `network add`, `network remove`, `network status` | +| OpenClaw | `openclaw onboard`, `sync`, `token`, `list`, `delete`, `setup`, `dashboard`, `skills`, `wallet`, `cli` | +| Sell | `sell inference`, `http`, `list`, `status`, `probe`, `stop`, `delete`, `pricing`, `register` | +| Tunnel | `tunnel status`, `login`, `provision`, `restart`, `stop`, `logs` | +| Apps | `app install`, `sync`, `list`, `delete` | +| Operations | `update`, `upgrade`, `version`, Kubernetes passthrough commands | + +### 4.2 Kubernetes API and CRDs + +| Interface | Kind | Purpose | +|----------|------|---------| +| `obol.org/v1alpha1` | `ServiceOffer` | Declares sell-side services, pricing, provenance, and registration metadata | +| `gateway.networking.k8s.io/v1` | `HTTPRoute` | Exposes frontend, eRPC, public services, and registration document routes | +| `traefik.io/v1alpha1` | `Middleware` | ForwardAuth integration for x402 payment checks | +| `monitoring.coreos.com/v1` | `PodMonitor` | Scrapes buyer sidecar metrics | + +### 4.3 HTTP and Routing Surfaces + +| Surface | Location | Audience | Notes | +|--------|----------|----------|------| +| Frontend | `http://obol.stack/` | Local operator | Local-only hostname restriction | +| eRPC | `http://obol.stack/rpc` | Local operator, agent workloads | Route goes through Traefik middleware path | +| Public service routes | `https:///services//...` | Remote buyer | x402-gated | +| Registration document | `https:///.well-known/agent-registration.json` | Discovery clients | Public, no ForwardAuth | +| Buyer sidecar health | `http://127.0.0.1:8402/healthz` | In-cluster | Sidecar-local | +| Buyer sidecar status | `http://127.0.0.1:8402/status` | In-cluster | Sidecar-local | +| Buyer metrics | `/metrics` on buyer sidecar | Monitoring | Scraped by PodMonitor | + +### 4.4 Authentication and Authorization + +- OpenClaw dashboard and API access use a per-instance gateway token retrievable from `obol openclaw token`. +- Public sell-side routes rely on x402 payment verification rather than a user session. +- Kubernetes mutating actions are performed through local operator credentials or specific service accounts with explicit RBAC. +- The buyer sidecar authenticates payments with pre-signed vouchers, not a live signer. + +### 4.5 Rate Limiting and Quotas + +There is no global user quota service in the current branch. Effective limits are: +- finite pre-signed auth pools on the buy side +- route-level pricing configured in x402 verifier +- workload capacity imposed by local cluster resources and upstream services + +--- + +## 5. Data Model + +### 5.1 Filesystem Layout + +| Path | Purpose | +|------|---------| +| `$OBOL_CONFIG_DIR/.stack-id` | Persistent stack identity | +| `$OBOL_CONFIG_DIR/.stack-backend` | Active backend selection | +| `$OBOL_CONFIG_DIR/kubeconfig.yaml` | Cluster access for passthrough tools and CLI operations | +| `$OBOL_CONFIG_DIR/defaults/` | Rendered default infrastructure bundle | +| `$OBOL_CONFIG_DIR/networks///` | Local network deployment config | +| `$OBOL_CONFIG_DIR/applications///` | Managed application or OpenClaw instance config | +| `$OBOL_DATA_DIR/` | Persistent volumes, wallet data, OpenClaw workspaces | +| `$OBOL_STATE_DIR/` | Runtime logs and mutable state | + +### 5.2 Kubernetes Namespaces and Core Resources + +| Namespace | Core Resources | Purpose | +|----------|----------------|---------| +| `traefik` | Traefik, cloudflared, gateway | Ingress and tunnel connector | +| `llm` | LiteLLM, x402-buyer sidecar, buyer PodMonitor | Model gateway and buy-side runtime | +| `erpc` | eRPC, HTTPRoute, metadata ConfigMap | Blockchain RPC gateway | +| `x402` | x402-verifier, pricing config | Sell-side payment verification | +| `openclaw-obol-agent` | Default OpenClaw agent, remote signer | Canonical agent runtime | +| `openclaw-` | Additional OpenClaw instances | User-managed agent runtimes | +| `obol-frontend` | Frontend deployment, HTTPRoute, RBAC | Dashboard | +| `monitoring` | Prometheus stack | Metrics and observability | +| `reloader` | Stakater Reloader | Config/secret-triggered restarts | + +### 5.3 Key ConfigMaps, Secrets, and Documents + +| Object | Purpose | +|-------|---------| +| `litellm-config` | Model routing table for LiteLLM | +| `litellm-secrets` | Cloud provider API keys | +| `erpc-config` | eRPC upstream and network definitions | +| `obol-stack-config` | Frontend-readable stack metadata, including tunnel URL | +| `x402-pricing` | Sell-side route pricing for the verifier | +| `x402-buyer-config` | Buy-side upstream mapping | +| `x402-buyer-auths` | Pre-signed authorization pools | +| `cloudflared-tunnel-token` | DNS tunnel token for persistent Cloudflare tunnel | +| `so--registration` ConfigMap | Generated `agent-registration.json` for a ServiceOffer | + +### 5.4 Data Lifecycle + +- Stack identity and backend selection are created at `stack init` and persist until purge. +- Local network and application configs are created before cluster deployment and reused across syncs. +- Wallet material persists across `stack down` and ordinary deletes; explicit backup and restore flows exist for OpenClaw wallets. +- Buy-side auth pools are consumed monotonically and must be refilled. +- Registration JSON is regenerated when a ServiceOffer changes or registration status changes. + +--- + +## 6. Integration Points + +| System | Protocol | Purpose | Failure Mode | +|--------|----------|---------|-------------| +| Ollama | HTTP | Local model serving and discovery | Auto-config skips or later requests fail until operator configures a provider | +| Anthropic / OpenAI | HTTPS | Cloud model routing through LiteLLM | Provider remains unavailable; other routes continue | +| ChainList | HTTPS | Public RPC discovery for eRPC | `network add` fails or requires custom endpoint | +| x402 facilitator | HTTPS | Payment verification and settlement | Sell-side requests fail verification or operator runs verify-only/local test paths | +| Cloudflare | Browser auth, API, tunnel transport | Public exposure | Stack remains locally usable without tunnel | +| EVM chains via eRPC | JSON-RPC | Payments, registration, discovery queries | Registration degrades to off-chain or buyer/seller requests fail upstream | +| ArtifactHub / chart repos / OCI | HTTPS | Managed app installation | App install fails; core stack remains unaffected | + +--- + +## 7. Security Model + +### 7.1 Threat Model + +Primary threats: +- accidental public exposure of operator-only routes +- live signing key exposure to runtime components +- unintended mainnet write forwarding +- silent documentation drift that misstates operator guarantees +- orphaned or half-started infrastructure after failed deploys + +### 7.2 Wallet and Signing Boundaries + +- The buyer sidecar has no live signer access. +- Registration and other signing flows prefer the remote signer and may fall back to a private key file only when explicitly invoked. +- Secure Enclave and Linux TEE support exist for standalone inference paths, but are optional. + +### 7.3 Public Exposure Guardrails + +- Frontend, eRPC, monitoring, and similar operator surfaces are local-only under `obol.stack`. +- Public tunnel routes are intentionally narrower and centered on payment-gated services and discovery metadata. +- Quick tunnels are not started eagerly on `stack up`. + +### 7.4 Payment Trust Model + +- x402 payment proofs are verified through a facilitator. +- Non-HTTPS facilitator URLs are rejected except for loopback and container-internal development hosts. +- Route matching is explicit; unmatched routes pass through without payment requirements. + +### 7.5 RBAC Model + +- Default agent elevation is explicit and applied by `agent.Init()`. +- Frontend has a narrow but meaningful ClusterRole for discovery and `ServiceOffer` CRUD. +- Sell-side resource cleanup relies on namespaced ownership and cluster-scoped permissions where required. + +--- + +## 8. Error Handling + +### 8.1 Error Categories + +| Category | Example | Handling | +|----------|---------|---------| +| Prerequisite failure | backend missing, cluster not running | CLI exits non-zero with remediation hint | +| Partial deployment failure | Helmfile sync fails | stack auto-runs cleanup path | +| Unsupported chain-domain input | using an eRPC-only alias in a sell-side command | command fails with supported chain list | +| Upstream health failure | `ServiceOffer` upstream is unhealthy | reconcile stops before route publish | +| Registration failure | signer unavailable or wallet unfunded | degrade to `OffChainOnly` where supported | +| Buyer budget exhaustion | no remaining auths for `paid/` | request path fails until refill | +| Tunnel unavailability | quick or DNS tunnel cannot start | local stack remains usable; public path degraded | + +### 8.2 Error Response Contracts + +- CLI failures are non-zero exits with human-readable hints. +- x402 verifier emits HTTP `402 Payment Required` with pricing metadata. +- Buy-side proxy returns HTTP `404` when no purchased upstream matches the requested `paid/`. +- Registration degradation is recorded in `ServiceOffer.status.conditions`. + +### 8.3 Retry and Recovery + +- Model and provider configuration can be re-run safely. +- Network sync and app sync are explicit operator actions. +- Buyer auth pools can be refilled without rebuilding LiteLLM topology. +- Tunnel restarts are explicit and cheap for quick tunnels. + +--- + +## 9. Performance and Operations + +### 9.1 Operational Bounds + +| Metric | Current Bound | Measurement | +|--------|---------------|-------------| +| ChainList fetch timeout | 15 seconds | `internal/network/chainlist.go` timeout | +| Tunnel rollout wait | 30 seconds | `tunnel.EnsureRunning()` rollout status | +| LiteLLM rollout wait | 90 seconds | `model.RestartLiteLLM()` rollout status | +| Buyer metrics scrape interval | 30 seconds | PodMonitor definition | +| `perMTok` approximation | 1000 tokens/request | monetization skill constant | +| `perHour` approximation | 5 minutes/request | monetization skill constant | + +### 9.2 Observability + +- Prometheus stack is part of the default infrastructure. +- Buyer sidecar exports metrics for auth pools and payment attempts. +- Tunnel status, OpenClaw token flows, and sell status all have dedicated CLI surfaces. + +--- + +## 10. Phased Rollout + +### Phase 1: PR288 Baseline + +- Local-first stack lifecycle with `k3d` and `k3s` +- Default infrastructure deployment through Helmfile +- LiteLLM as the central model gateway +- eRPC local and remote RPC management +- OpenClaw instance lifecycle and the elevated `obol-agent` +- Sell-side x402 routes, `ServiceOffer` reconcile loop, `sell probe`, and optional ERC-8004 registration +- Buy-side `paid/*` remote inference path with bounded-risk sidecar +- Quick and DNS tunnel modes +- Local frontend and monitoring stack +- Managed application install/sync/list/delete + +### Phase 2: Explicit Follow-Ups + +- Replace approximation-based pricing for `perMTok` and `perHour` with exact metering where supported. +- Add operator-safe JSON, headless, and introspection surfaces to the CLI before promoting broader agent or MCP control paths. +- Package `reth-erc8004-indexer` as a first-class managed application instead of a repository-adjacent subproject. +- Promote autoresearch worker and coordinator workflows from skill-level building blocks into operator-visible flows with clearer provenance surfaces. +- Tighten reconcile and heartbeat latency rather than relying on the current default cadence. +- Extend and document multi-chain sell-side support only when the CLI, verifier, and registration surfaces agree on the contract. +- Extend monetized publication beyond the current inference-centric path only after explicit isolation, ownership, and routing rules are specified. +- Validate the buy-side path more deeply through LiteLLM-routed hands-off tests and in-pod skill smoke coverage. +- Enforce canonical spec drift checks through Codex hooks and CI. + +--- + +## 11. Testing Strategy + +### 11.1 Test Levels + +| Level | Tooling | What It Covers | +|-------|---------|----------------| +| Unit | `go test ./...` | Core package logic, serializers, matchers, config handling | +| Integration | package-level integration tests | Kubernetes-backed paths, OpenClaw flows, x402 verifier paths | +| Flow / E2E | `flows/flow-06`, `07`, `08`, `10` | Sell setup, verify, buy path, anvil facilitator loop | +| Skill smoke | `tests/skills_smoke_test.py`, focused Python tests | Embedded skill assets and runtime contracts | +| BDD spec | `features/*.feature` plus existing executable features | Behavioral contract for current and future implementation | + +### 11.2 Test Data Strategy + +- Use deterministic local stack IDs and local config dirs in tests where possible. +- Prefer fixture-based ChainList data for RPC selection tests. +- Treat real public tunnel URLs, facilitator endpoints, and on-chain registration as integration concerns, not unit-test assumptions. + +### 11.3 CI/CD Integration + +- Code changes that alter the operator, agent, buyer, seller, tunnel, or routing contract must update this root-level bundle. +- Operator guides in `docs/guides/` may remain for context, but they are not authoritative once this bundle exists. diff --git a/docs/adr/0001-local-first-stack-runtime.md b/docs/adr/0001-local-first-stack-runtime.md new file mode 100644 index 00000000..ffa8f3bc --- /dev/null +++ b/docs/adr/0001-local-first-stack-runtime.md @@ -0,0 +1,20 @@ +# ADR-0001: Local-First Stack Runtime + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 1.3, 3.1, 5.1, 6, 7.3 + +## Context + +Obol Stack serves operators running a full agent platform from their own machine. The system needs reproducible local cluster lifecycle control, predictable filesystem ownership, and a recovery path that does not depend on remote control planes. + +## Decision + +The stack remains local-first. The operator machine owns config, binaries, and persistent data, while `k3d` and `k3s` are the supported backend runtime options exposed through one `obol stack` lifecycle. Public exposure is optional and layered on top of a usable local baseline rather than required for startup. + +## Consequences + +- **Positive**: Startup, recovery, and inspection flows stay operator-centric and easier to reason about. +- **Negative**: Some cloud-native assumptions, such as always-on public endpoints or remote state stores, are intentionally deprioritized. +- **Neutral**: Future hosted or multi-node modes must be expressed as new phases rather than silently widening the local-first contract. diff --git a/docs/adr/0002-central-litellm-gateway.md b/docs/adr/0002-central-litellm-gateway.md new file mode 100644 index 00000000..e4c5fdad --- /dev/null +++ b/docs/adr/0002-central-litellm-gateway.md @@ -0,0 +1,20 @@ +# ADR-0002: Central LiteLLM Gateway + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 2.3, 3.2, 4.3, 7.1 + +## Context + +The stack needs one consistent model routing surface for local Ollama models, cloud APIs, and paid remote models. Per-instance provider wiring leads to duplicated credentials, stale model lists, and inconsistent behavior across agents. + +## Decision + +LiteLLM is the central cluster-wide model gateway. OpenClaw instances and operator flows route through LiteLLM for normal model access, while provider credentials and static paid-route configuration remain centralized in the `llm` namespace. + +## Consequences + +- **Positive**: Model routing becomes uniform across operator, agent, and buyer paths. +- **Negative**: LiteLLM readiness becomes a critical dependency for most inference surfaces. +- **Neutral**: Direct-to-provider experiments remain possible, but they are exceptions to the main platform contract rather than the default architecture. diff --git a/docs/adr/0003-distinct-network-domains.md b/docs/adr/0003-distinct-network-domains.md new file mode 100644 index 00000000..afef3990 --- /dev/null +++ b/docs/adr/0003-distinct-network-domains.md @@ -0,0 +1,20 @@ +# ADR-0003: Distinct Network Domains + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 3.3, 3.5, 3.6, 4.2 + +## Context + +The platform touches several network concepts that look similar but are not interchangeable: installable local networks, remote RPC aliases, sell-side payment chains, and ERC-8004 registration networks. Previous spec work blurred those domains and created false support claims. + +## Decision + +The spec and CLI contract must keep these network domains separate. A chain appearing in one subsystem, such as the low-level x402 resolver, does not automatically expand support claims for other subsystems. Multi-chain sell-side support may only be documented once the CLI, payment verifier, and registration surfaces agree on the same contract. + +## Consequences + +- **Positive**: Support claims stay factual and users can tell which network surface they are configuring. +- **Negative**: Documentation is less compact because one generic “supported networks” list is intentionally avoided. +- **Neutral**: Future multi-chain expansion requires aligned implementation work across several modules before the spec can widen the contract. diff --git a/docs/adr/0004-openclaw-elevated-agent-runtime.md b/docs/adr/0004-openclaw-elevated-agent-runtime.md new file mode 100644 index 00000000..87d40dc1 --- /dev/null +++ b/docs/adr/0004-openclaw-elevated-agent-runtime.md @@ -0,0 +1,20 @@ +# ADR-0004: OpenClaw as the Elevated Agent Runtime + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 3.4, 5.2, 7.5 + +## Context + +Obol Stack needs an automation runtime that can operate inside the cluster, consume embedded skills, and act on behalf of the operator for selected workflows. Building a separate controller family for every automation path would fragment the control model. + +## Decision + +The default elevated automation runtime is an OpenClaw deployment, `obol-agent`, with carefully scoped elevated permissions and embedded skills. Additional OpenClaw instances remain operator-managed deployments and do not inherit the same elevated role automatically. + +## Consequences + +- **Positive**: The platform reuses one agent runtime model for operator workflows and skill execution. +- **Negative**: Elevated RBAC and skill distribution must be reviewed carefully because the default agent has broader authority than ordinary instances. +- **Neutral**: New autonomous behaviors should first be expressed as skills against this runtime before introducing dedicated controllers. diff --git a/docs/adr/0005-serviceoffer-skill-reconcile-loop.md b/docs/adr/0005-serviceoffer-skill-reconcile-loop.md new file mode 100644 index 00000000..9eacb480 --- /dev/null +++ b/docs/adr/0005-serviceoffer-skill-reconcile-loop.md @@ -0,0 +1,20 @@ +# ADR-0005: ServiceOffer-Driven Sell-Side Reconcile Loop + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 3.5, 4.2, 5.3, 8 + +## Context + +Sell-side publication needs declarative state, observable status, and reconciliation across Kubernetes routing, pricing, and optional registration. Prior proposals considered separate controllers or looser imperative flows. + +## Decision + +Sell-side publication is driven by the `ServiceOffer` custom resource and reconciled by the elevated agent's monetize skill. The reconcile loop advances through explicit stages that cover model readiness, upstream health, payment gate setup, route publication, optional registration, and final readiness. + +## Consequences + +- **Positive**: Operators get one declarative resource and one status model for sell-side lifecycle. +- **Negative**: Reconcile latency is bounded by the agent heartbeat cadence rather than a dedicated controller loop. +- **Neutral**: Future generalized agent-authored services should extend this pattern only if they preserve explicit ownership, isolation, and stage visibility. diff --git a/docs/adr/0006-static-paid-namespace-buyer-sidecar.md b/docs/adr/0006-static-paid-namespace-buyer-sidecar.md new file mode 100644 index 00000000..0e05e8df --- /dev/null +++ b/docs/adr/0006-static-paid-namespace-buyer-sidecar.md @@ -0,0 +1,20 @@ +# ADR-0006: Static Paid Namespace with a Bounded-Risk Buyer Sidecar + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 3.2.3, 3.6, 7.2, 7.4 + +## Context + +Remote paid inference needs a stable buyer-facing model contract, but giving the request path direct access to live signing authority would create a large security and spend risk. + +## Decision + +Paid remote models are exposed through a static `paid/*` namespace at LiteLLM and fulfilled by a buyer sidecar that holds only a bounded pool of pre-signed authorizations. The sidecar handles payment retries and forwarding without receiving live signer authority. + +## Consequences + +- **Positive**: The buyer path is easier to integrate and materially safer than a live-signing proxy. +- **Negative**: Capacity is limited by the pre-signed auth pool and requires replenishment workflows. +- **Neutral**: Observability for auth exhaustion and payment retries becomes a first-class operational concern. diff --git a/docs/adr/0007-local-only-operator-surfaces-with-optional-public-discovery.md b/docs/adr/0007-local-only-operator-surfaces-with-optional-public-discovery.md new file mode 100644 index 00000000..79cb9dc2 --- /dev/null +++ b/docs/adr/0007-local-only-operator-surfaces-with-optional-public-discovery.md @@ -0,0 +1,20 @@ +# ADR-0007: Local-Only Operator Surfaces with Optional Public Discovery + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 3.7, 4.3, 7.3 + +## Context + +The stack needs public reachability for paid services and optional discovery, but it also exposes sensitive operator surfaces such as the frontend, eRPC gateway, and monitoring. + +## Decision + +Operator surfaces remain local-only by default. Tunnel exposure is scoped to the routes that explicitly need it, and public discovery metadata follows the current tunnel address rather than widening local control-plane surfaces. + +## Consequences + +- **Positive**: Public monetization and discovery can coexist with conservative operator safety boundaries. +- **Negative**: Public operator dashboards and remote admin UX remain out of scope for the current contract. +- **Neutral**: If public operator surfaces are ever introduced, they require an explicit architectural change rather than an incremental tunnel tweak. diff --git a/docs/adr/0008-canonical-root-spec-bundle-and-codex-hooks.md b/docs/adr/0008-canonical-root-spec-bundle-and-codex-hooks.md new file mode 100644 index 00000000..ace317a6 --- /dev/null +++ b/docs/adr/0008-canonical-root-spec-bundle-and-codex-hooks.md @@ -0,0 +1,20 @@ +# ADR-0008: Canonical Root-Level Spec Bundle with Codex Hook Guardrails + +**Date**: 2026-03-29 +**Status**: Accepted + +**Impacts**: SPEC Sections 10, 11.3 and CONTRIBUTING.md + +## Context + +The repository accumulated parallel plan files, stale design notes, and an incorrect `docs/specs/` bundle that drifted from both the code and the original backend-service-spec-bundler design. The project needed one canonical spec location and a lightweight mechanism to catch future drift during development. + +## Decision + +The repository follows the original backend-service-spec-bundler layout at repo root: `SPEC.md`, `ARCHITECTURE.md`, `BEHAVIORS_AND_EXPECTATIONS.md`, `CONTRIBUTING.md`, `features/`, and `docs/adr/`. Codex hooks are added as guardrails to remind the model of these conventions and to flag spec-impacting code changes when the canonical bundle was not updated in the same turn. + +## Consequences + +- **Positive**: The bundle has one authoritative location and drift becomes easier to detect. +- **Negative**: Contributors must maintain the canonical docs alongside behavior changes instead of relying on scattered planning notes. +- **Neutral**: Hooks assist developer workflow, but CI and human review still remain the final enforcement layer. diff --git a/docs/adr/0009-phase-2-exact-metering-after-pre-request-gate.md b/docs/adr/0009-phase-2-exact-metering-after-pre-request-gate.md new file mode 100644 index 00000000..2699a5cb --- /dev/null +++ b/docs/adr/0009-phase-2-exact-metering-after-pre-request-gate.md @@ -0,0 +1,20 @@ +# ADR-0009: Phase 2 Exact Metering After the Pre-Request Payment Gate + +**Date**: 2026-03-29 +**Status**: Proposed + +**Impacts**: SPEC Sections 3.5.5, 4.5, 10 + +## Context + +The PR288 baseline supports `perMTok` and `perHour` pricing, but current enforcement relies on approximation before execution. The platform needs a clearer future direction for exact post-response accounting without discarding the existing pre-request payment gate. + +## Decision + +Phase 2 exact metering, where implemented, should augment the current pre-request payment gate rather than replace it. Authorization remains the entry check, while measured usage becomes a post-response accounting and observability concern for supported protocols. + +## Consequences + +- **Positive**: The current gatekeeping model remains intact while exact accounting improves fidelity where it is technically feasible. +- **Negative**: The platform must operate two related billing surfaces during transition: pre-request authorization and post-response accounting. +- **Neutral**: Streaming and non-OpenAI-compatible formats may continue to use approximation until a stronger metering contract exists. diff --git a/docs/adr/0010-phase-2-agent-ready-cli-surfaces.md b/docs/adr/0010-phase-2-agent-ready-cli-surfaces.md new file mode 100644 index 00000000..747eaa95 --- /dev/null +++ b/docs/adr/0010-phase-2-agent-ready-cli-surfaces.md @@ -0,0 +1,20 @@ +# ADR-0010: Phase 2 Agent-Ready CLI Surfaces + +**Date**: 2026-03-29 +**Status**: Proposed + +**Impacts**: SPEC Sections 1.3, 4.1, 10 + +## Context + +The platform is increasingly consumed by agents as well as human operators. Human-first CLI ergonomics are still primary, but the repository also contains future-work notes for structured JSON output, headless prompt handling, and richer introspection. + +## Decision + +Phase 2 agent-facing improvements should add structured output, non-interactive input paths, and machine-friendly introspection without replacing the human-first operator contract. The local operator remains the primary actor, so agent-ready surfaces are an extension of the CLI rather than a separate control plane by default. + +## Consequences + +- **Positive**: Agents and future MCP adapters gain a safer path to consume the CLI without scraping human output. +- **Negative**: Every new machine-facing surface must preserve compatibility with existing operator workflows and documentation. +- **Neutral**: A dedicated MCP layer remains optional and should be introduced only if the structured CLI surface proves insufficient. diff --git a/docs/getting-started.md b/docs/getting-started.md index 13366094..3e97546c 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -1,5 +1,7 @@ # Getting Started with the Obol Stack +> This is an operator guide. The canonical system contract lives in [SPEC.md](../SPEC.md), [ARCHITECTURE.md](../ARCHITECTURE.md), and [BEHAVIORS_AND_EXPECTATIONS.md](../BEHAVIORS_AND_EXPECTATIONS.md). + This guide walks you through installing the Obol Stack, starting a local Kubernetes cluster, testing LLM inference through the AI agent, and optionally monetizing your compute. > [!IMPORTANT] diff --git a/docs/guides/monetize-inference.md b/docs/guides/monetize-inference.md index eb14e8fd..5eb44f4f 100644 --- a/docs/guides/monetize-inference.md +++ b/docs/guides/monetize-inference.md @@ -1,5 +1,7 @@ # How to Monetize Your Inference with Obol Stack +> This is an operator workflow guide. The canonical contract for sell-side and buy-side behavior lives in [SPEC.md](../../SPEC.md), [ARCHITECTURE.md](../../ARCHITECTURE.md), and [BEHAVIORS_AND_EXPECTATIONS.md](../../BEHAVIORS_AND_EXPECTATIONS.md). + This guide walks you through exposing a local LLM as a paid API endpoint using the Obol Stack. By the end, you'll have: - A local Ollama model serving inference @@ -10,9 +12,9 @@ This guide walks you through exposing a local LLM as a paid API endpoint using t > [!NOTE] > `--per-mtok` is supported for inference pricing, but phase 1 still charges an > approximate flat request price derived as `perMTok / 1000` using a fixed -> `1000 tok/request` assumption. Exact token metering is deferred to the -> follow-up `x402-meter` design described in -> [`docs/plans/per-token-metering.md`](../plans/per-token-metering.md). +> `1000 tok/request` assumption. Exact token metering is tracked as phase 2 +> follow-up work in [SPEC.md](../../SPEC.md#10-phased-rollout) and +> [ADR-0009](../adr/0009-phase-2-exact-metering-after-pre-request-gate.md). > [!IMPORTANT] > The monetize subsystem is alpha software on the `feat/secure-enclave-inference` branch. diff --git a/docs/guides/monetize_sell_side_testing_log.md b/docs/guides/monetize_sell_side_testing_log.md deleted file mode 100644 index befd67e0..00000000 --- a/docs/guides/monetize_sell_side_testing_log.md +++ /dev/null @@ -1,399 +0,0 @@ -# Monetize Sell-Side Testing Log - -Full lifecycle walkthrough of the hardened monetize subsystem on a fresh dev cluster, using the real x402-rs facilitator against an Anvil fork of base-sepolia. - -**Branch**: `fix/review-hardening` (off `feat/secure-enclave-inference`) -**Date**: 2026-02-27 -**Cluster**: `obol-stack-sweeping-man` (k3d, 1 server node) - ---- - -## Prerequisites - -```bash -# Working directory: the obol-stack repo (or worktree) -cd /path/to/obol-stack - -# Environment — set these in every terminal session -export OBOL_DEVELOPMENT=true -export OBOL_CONFIG_DIR=$(pwd)/.workspace/config -export OBOL_BIN_DIR=$(pwd)/.workspace/bin -export OBOL_DATA_DIR=$(pwd)/.workspace/data - -# Alias for brevity (optional) -alias obol="$OBOL_BIN_DIR/obol" -``` - -**External dependencies** (must be installed separately): - -| Dependency | Install | Purpose | -|-----------|---------|---------| -| Docker | [docker.com](https://docker.com) | k3d runs inside Docker | -| Foundry (`anvil`, `cast`) | `curl -L https://foundry.paradigm.xyz \| bash && foundryup` | Local base-sepolia fork | -| Rust toolchain | [rustup.rs](https://rustup.rs) | Building x402-rs facilitator | -| Python 3 + venv | System package manager | Signing the EIP-712 payment header | -| x402-rs | `git clone https://github.com/x402-rs/x402-rs ~/Development/R&D/x402-rs` | Real x402 facilitator | -| Ollama | [ollama.com](https://ollama.com) | Local LLM inference (must be running on host) | -| `/etc/hosts` entry | `echo "127.0.0.1 obol.stack" \| sudo tee -a /etc/hosts` | `obolup.sh` does this, or add manually | - ---- - -## Phase 1: Build & Cluster - -```bash -# 1. Build the obol binary from the hardened branch -go build -o .workspace/bin/obol ./cmd/obol - -# 2. Wipe any previous cluster -obol stack down 2>/dev/null; obol stack purge -f 2>/dev/null -rm -rf "$OBOL_CONFIG_DIR" "$OBOL_DATA_DIR" - -# 3. Initialize fresh cluster config -obol stack init - -# 4. Bring up the cluster -# (builds x402-verifier Docker image locally, deploys all infrastructure) -obol stack up - -# 5. Verify — all pods should be Running -obol kubectl get pods -A -``` - -Expected: ~18 pods across namespaces (`erpc`, `kube-system`, `llm`, `monitoring`, `obol-frontend`, `openclaw-default`, `reloader`, `traefik`, `x402`). x402-verifier should have **2 replicas**. - ---- - -## Phase 2: Verify Hardening - -```bash -# Split RBAC ClusterRoles exist -obol kubectl get clusterrole openclaw-monetize-read -obol kubectl get clusterrole openclaw-monetize-workload - -# x402 namespace Role exists -obol kubectl get role openclaw-x402-pricing -n x402 - -# x402 HA: 2 replicas -obol kubectl get deploy x402-verifier -n x402 -o jsonpath='{.spec.replicas}' -# → 2 - -# PDB active -obol kubectl get pdb -n x402 -# → x402-verifier minAvailable=1 allowedDisruptions=1 -``` - ---- - -## Phase 3: Deploy Agent - -```bash -# 6. Deploy the obol-agent singleton -# - creates namespace openclaw-obol-agent -# - deploys openclaw + remote-signer pods -# - injects 24 skills (including monetize) -# - patches all 3 RBAC bindings to the agent's ServiceAccount -obol agent init - -# 7. Verify RBAC bindings point to the agent's ServiceAccount -obol kubectl get clusterrolebinding openclaw-monetize-read-binding \ - -o jsonpath='{.subjects}' -obol kubectl get clusterrolebinding openclaw-monetize-workload-binding \ - -o jsonpath='{.subjects}' -obol kubectl get rolebinding openclaw-x402-pricing-binding -n x402 \ - -o jsonpath='{.subjects}' -# All three should show: -# [{"kind":"ServiceAccount","name":"openclaw","namespace":"openclaw-obol-agent"}] -``` - ---- - -## Phase 4: Configure Payment & Create Offer - -```bash -# 8. Configure x402 pricing (seller wallet + chain) -obol sell pricing \ - --wallet 0x70997970C51812dc3A010C7d01b50e0d17dc79C8 \ - --chain base-sepolia - -# 9. Verify Ollama has the model available on the host -curl -s http://localhost:11434/api/tags | python3 -c \ - "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models']]" -# Should include qwen3:0.6b — if not: -# ollama pull qwen3:0.6b - -# 10. Create ServiceOffer CR -obol sell http my-qwen \ - --type inference \ - --model qwen3:0.6b \ - --runtime ollama \ - --per-request 0.001 \ - --network base-sepolia \ - --pay-to 0x70997970C51812dc3A010C7d01b50e0d17dc79C8 \ - --namespace llm \ - --upstream ollama \ - --port 11434 \ - --path /services/my-qwen -# → serviceoffer.obol.org/my-qwen created -``` - ---- - -## Phase 5: Agent Reconciliation - -```bash -# 11. Trigger reconciliation from inside the agent pod -# (The heartbeat cron runs every 30 min by default — -# this is the same script it would execute) -obol kubectl exec -n openclaw-obol-agent deploy/openclaw -c openclaw -- \ - python3 /data/.openclaw/skills/monetize/scripts/monetize.py process --all - -# Expected output: -# Processing 1 pending offer(s)... -# Reconciling llm/my-qwen... -# Checking if model qwen3:0.6b is available... -# Model qwen3:0.6b already available -# Health-checking http://ollama.llm.svc.cluster.local:11434/health... -# Upstream reachable (HTTP 404 — acceptable for health check) -# Creating Middleware x402-my-qwen... -# Added pricing route: /services/my-qwen/* → 0.001 USDC -# Creating HTTPRoute so-my-qwen... -# ServiceOffer llm/my-qwen is Ready - -# 12. Verify all 6 conditions are True -obol sell status my-qwen --namespace llm -# → ModelReady=True -# UpstreamHealthy=True -# PaymentGateReady=True -# RoutePublished=True -# Registered=True (Skipped) -# Ready=True -``` - ---- - -## Phase 6: Test 402 Gate (No Payment) - -```bash -# 13. Request without payment → expect HTTP 402 -curl -s -w "\nHTTP %{http_code}" -X POST \ - "http://obol.stack:8080/services/my-qwen/v1/chat/completions" \ - -H "Content-Type: application/json" \ - -d '{"model":"qwen3:0.6b","messages":[{"role":"user","content":"Hello"}],"stream":false}' - -# Expected: HTTP 402 + JSON body: -# { -# "x402Version": 1, -# "error": "Payment required for this resource", -# "accepts": [{ -# "scheme": "exact", -# "network": "base-sepolia", -# "maxAmountRequired": "1000", -# "asset": "0x036CbD53842c5426634e7929541eC2318f3dCF7e", -# "payTo": "0x70997970C51812dc3A010C7d01b50e0d17dc79C8", -# ... -# }] -# } -``` - ---- - -## Phase 7: Start x402-rs Facilitator + Anvil - -```bash -# 14. Start Anvil forking base-sepolia (background, port 8545) -anvil --fork-url https://sepolia.base.org --port 8545 --host 0.0.0.0 --silent & - -# Verify Anvil is running: -curl -s -X POST http://localhost:8545 \ - -H "Content-Type: application/json" \ - -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' -# → {"jsonrpc":"2.0","id":1,"result":"0x14a34"} (84532 = base-sepolia) - -# 15. Build x402-rs facilitator (first time only, ~2 min) -cd ~/Development/R\&D/x402-rs/facilitator && cargo build --release && cd - - -# 16. Start facilitator with Anvil config (background, port 4040) -# config-anvil.json points RPC at host.docker.internal:8545 -~/Development/R\&D/x402-rs/facilitator/target/release/facilitator \ - --config ~/Development/R\&D/x402-rs/config-anvil.json & - -# Verify facilitator is running: -curl -s http://localhost:4040/supported -# → {"kinds":[{"x402Version":1,"scheme":"exact","network":"base-sepolia"}, ...], -# "signers":{"eip155:84532":["0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266"]}} - -# 17. Verify buyer (Anvil account 0) has USDC on the fork -cast call 0x036CbD53842c5426634e7929541eC2318f3dCF7e \ - "balanceOf(address)(uint256)" \ - 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266 \ - --rpc-url http://localhost:8545 -# → non-zero balance (e.g. 287787514 = ~287 USDC) -``` - ---- - -## Phase 8: Patch Verifier → Local Facilitator - -```bash -# 18. Point x402-verifier at the local x402-rs facilitator -# macOS: host.docker.internal -# Linux: host.k3d.internal -obol kubectl patch configmap x402-pricing -n x402 --type merge -p '{ - "data": { - "pricing.yaml": "wallet: 0x70997970C51812dc3A010C7d01b50e0d17dc79C8\nchain: base-sepolia\nfacilitatorURL: http://host.docker.internal:4040\nverifyOnly: false\nroutes:\n- pattern: \"/services/my-qwen/*\"\n price: \"0.001\"\n description: \"ServiceOffer my-qwen\"\n payTo: \"0x70997970C51812dc3A010C7d01b50e0d17dc79C8\"\n network: \"base-sepolia\"\n" - } -}' - -# 19. Restart verifier to pick up immediately -# (otherwise the file watcher takes 60-120s) -obol kubectl rollout restart deploy/x402-verifier -n x402 -obol kubectl rollout status deploy/x402-verifier -n x402 --timeout=60s -``` - ---- - -## Phase 9: Sign Payment & Test Paid Request - -```bash -# 20. Create venv and install eth-account -python3 -m venv /tmp/x402-venv -/tmp/x402-venv/bin/pip install eth-account --quiet - -# 21. Write the payment signing script -cat > /tmp/x402-pay.py << 'PYEOF' -#!/usr/bin/env python3 -"""Sign an x402 V1 exact payment header using Anvil account 0.""" -import json, base64, os -from eth_account import Account -from eth_account.messages import encode_typed_data - -PRIVATE_KEY = "0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80" -PAYER = "0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266" -PAY_TO = "0x70997970C51812dc3A010C7d01b50e0d17dc79C8" -USDC = "0x036CbD53842c5426634e7929541eC2318f3dCF7e" -CHAIN_ID = 84532 -AMOUNT = "1000" # 0.001 USDC in 6-decimal micro-units -NONCE = "0x" + os.urandom(32).hex() - -signable = encode_typed_data(full_message={ - "types": { - "EIP712Domain": [ - {"name": "name", "type": "string"}, - {"name": "version", "type": "string"}, - {"name": "chainId", "type": "uint256"}, - {"name": "verifyingContract", "type": "address"}, - ], - "TransferWithAuthorization": [ - {"name": "from", "type": "address"}, - {"name": "to", "type": "address"}, - {"name": "value", "type": "uint256"}, - {"name": "validAfter", "type": "uint256"}, - {"name": "validBefore", "type": "uint256"}, - {"name": "nonce", "type": "bytes32"}, - ], - }, - "primaryType": "TransferWithAuthorization", - "domain": { - "name": "USDC", "version": "2", - "chainId": CHAIN_ID, "verifyingContract": USDC, - }, - "message": { - "from": PAYER, "to": PAY_TO, - "value": int(AMOUNT), - "validAfter": 0, "validBefore": 4294967295, - "nonce": bytes.fromhex(NONCE[2:]), - }, -}) - -signed = Account.sign_message(signable, PRIVATE_KEY) - -# IMPORTANT: x402-rs wire format requires validAfter/validBefore as STRINGS -payload = { - "x402Version": 1, - "scheme": "exact", - "network": "base-sepolia", - "payload": { - "signature": "0x" + signed.signature.hex(), - "authorization": { - "from": PAYER, "to": PAY_TO, - "value": AMOUNT, # string (decimal_u256) - "validAfter": "0", # string (UnixTimestamp) - "validBefore": "4294967295", # string (UnixTimestamp) - "nonce": NONCE, # string (B256 hex) - }, - }, - "resource": { - "payTo": PAY_TO, "maxAmountRequired": AMOUNT, - "asset": USDC, "network": "base-sepolia", - }, -} -print(base64.b64encode(json.dumps(payload).encode()).decode()) -PYEOF - -# 22. Generate payment header and send paid request -PAYMENT=$(/tmp/x402-venv/bin/python3 /tmp/x402-pay.py) - -curl -s -w "\nHTTP %{http_code}" -X POST \ - "http://obol.stack:8080/services/my-qwen/v1/chat/completions" \ - -H "Content-Type: application/json" \ - -H "X-PAYMENT: $PAYMENT" \ - -d '{"model":"qwen3:0.6b","messages":[{"role":"user","content":"Say hello in exactly 3 words"}],"stream":false}' - -# Expected: HTTP 200 + full Ollama inference response JSON -``` - ---- - -## Phase 10: Lifecycle Cleanup - -```bash -# 23. Stop offer (removes pricing route from ConfigMap, keeps CR) -obol sell stop my-qwen --namespace llm - -# 24. Restart verifier so removed route takes effect immediately -obol kubectl rollout restart deploy/x402-verifier -n x402 - -# 25. Verify endpoint is now free (no payment required) -curl -s -w "\nHTTP %{http_code}" -X POST \ - "http://obol.stack:8080/services/my-qwen/v1/chat/completions" \ - -H "Content-Type: application/json" \ - -d '{"model":"qwen3:0.6b","messages":[{"role":"user","content":"Hello"}],"stream":false}' -# → HTTP 200 (free endpoint, no 402) - -# 26. Full delete — removes CR + Middleware + HTTPRoute (ownerRef cascade) -obol sell delete my-qwen --namespace llm --force - -# 27. Verify everything is cleaned up -obol kubectl get serviceoffers,middleware,httproutes -n llm -# → No resources found in llm namespace. - -# 28. Stop background processes and clean up temp files -pkill -f "anvil.*fork-url" -pkill -f "facilitator.*config-anvil" -rm -rf /tmp/x402-venv /tmp/x402-pay.py -``` - ---- - -## Reference: Key Addresses - -| Role | Address | Note | -|------|---------|------| -| Seller (payTo) | `0x70997970C51812dc3A010C7d01b50e0d17dc79C8` | Anvil account 1 | -| Buyer (payer) | `0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266` | Anvil account 0 | -| Buyer private key | `0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80` | Anvil default — never use in production | -| USDC (base-sepolia) | `0x036CbD53842c5426634e7929541eC2318f3dCF7e` | Circle USDC on base-sepolia | -| Chain ID | `84532` | base-sepolia | - -## Reference: Key Gotchas - -| Gotcha | Detail | -|--------|--------| -| **macOS vs Linux host bridging** | macOS: `host.docker.internal`. Linux: `host.k3d.internal` (step 18) | -| **x402-rs timestamp format** | `validAfter`/`validBefore` must be **strings** (`"0"`, `"4294967295"`), not integers. x402-rs `UnixTimestamp` deserializes from stringified u64 | -| **ConfigMap propagation delay** | x402-verifier file watcher takes 60-120s. Use `kubectl rollout restart` for immediate effect | -| **Heartbeat interval** | 30 minutes by default. For interactive testing, exec into the pod and run `monetize.py process --all` manually (step 11) | -| **`/etc/hosts`** | Must have `127.0.0.1 obol.stack`. `obolup.sh` sets this during install, or add manually | -| **`OBOL_DEVELOPMENT=true`** | Required for `obol stack up` to build the x402-verifier Docker image locally instead of pulling from registry | -| **Anvil fork freshness** | Each `anvil` restart creates a fresh fork. USDC balances come from the forked base-sepolia state at the time of fork | -| **x402-rs `config-anvil.json`** | Ships with the x402-rs repo. Points `eip155:84532` RPC at `host.docker.internal:8545` (Anvil). Adjust if your Anvil is on a different port | diff --git a/docs/guides/monetize_test_coverage_report.md b/docs/guides/monetize_test_coverage_report.md deleted file mode 100644 index d4c0262b..00000000 --- a/docs/guides/monetize_test_coverage_report.md +++ /dev/null @@ -1,666 +0,0 @@ -# Monetize Subsystem — Test Coverage Report - -**Branch**: `fix/review-hardening` (off `feat/secure-enclave-inference`) -**Date**: 2026-02-27 -**Total integration tests**: 46 across 3 files - ---- - -## Section Overview - -``` -┌──────────────────────────────────────────────────────────────────┐ -│ TEST PYRAMID │ -│ │ -│ ▲ │ -│ ╱ ╲ Phase 8: FULL (1) │ -│ ╱ ╲ ← tunnel+Ollama+x402-rs+EIP-712 │ -│ ╱─────╲ │ -│ ╱ ╲ Phase 5+: Real Facilitator (1) │ -│ ╱ ╲ ← real x402-rs, real EIP-712 │ -│ ╱───────────╲ │ -│ ╱ ╲ Phase 6+7: Tunnel + Fork (5) │ -│ ╱ ╲ ← real Ollama, mock facilitator │ -│ ╱─────────────────╲ │ -│ ╱ ╲ Phase 4+5: Payment + E2E (8) │ -│ ╱ ╲ ← mock facilitator, real gate │ -│ ╱─────────────────╲ │ -│ ╱ ╲ Phase 3: Routing (6) │ -│ ╱ ╲ ← real Traefik, Anvil RPC │ -│ ╱───────────────────────╲ │ -│ ╱ ╲ Phase 2: RBAC + Recon (6) │ -│ ╱ ╲ ← real agent in pod │ -│ ╱─────────────────────────────╲ │ -│ ╱ ╲ Phase 1: CRD (7) │ -│ ╱ ╲ ← schema validation │ -│ ╱───────────────────────────────────╲ │ -│ ╱ ╲ Base: Inference (12)│ -│ ╱_______________________________________╲ ← Ollama + skills │ -│ │ -└──────────────────────────────────────────────────────────────────┘ -``` - ---- - -## Phase 1 — CRD Lifecycle (7 tests) - -**What it covers**: ServiceOffer custom resource schema validation, CRUD operations, printer columns, status subresource isolation. - -**Realism**: Low (data-plane only, no reconciliation or traffic). - -``` -┌─────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ kubectl apply ──▶ ┌──────────────────┐ │ -│ │ ServiceOffer CR │ │ -│ kubectl get ──▶ │ (obol.org CRD) │ │ -│ └──────────────────┘ │ -│ kubectl patch ──▶ │ │ -│ kubectl delete──▶ ▼ │ -│ API Server validates: │ -│ ✓ wallet regex (^0x[0-9a-fA-F]{40}$)│ -│ ✓ status subresource isolation │ -│ ✓ printer columns (TYPE, PRICE) │ -│ │ -│ ┌─────────────────────────────────────────────┐ │ -│ │ NOT TESTED: reconciler, routing, payment │ │ -│ └─────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `CRD_Exists` | CRD installed in cluster | -| `CRD_CreateGet` | Spec fields round-trip correctly | -| `CRD_List` | kubectl list works | -| `CRD_StatusSubresource` | Status patch doesn't mutate spec | -| `CRD_WalletValidation` | Invalid wallet rejected by API server | -| `CRD_PrinterColumns` | `kubectl get` shows TYPE, PRICE, NETWORK | -| `CRD_Delete` | CR deletion works | - -**Gap vs real world**: No agent involvement. A real user runs `obol sell http`, not raw kubectl. - ---- - -## Phase 2 — RBAC + Reconciliation (6 tests) - -**What it covers**: Split RBAC roles exist and are bound, agent can read/write CRs from inside pod, reconciler handles unhealthy upstreams, idempotent re-processing. - -**Realism**: Medium (real agent pod, real RBAC, but no traffic or payment). - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ ┌─────────────┐ RBAC Check ┌─────────────────────────┐ │ -│ │ Test Runner │ ────────────────▶ │ ClusterRole: │ │ -│ │ (kubectl get)│ │ openclaw-monetize-read │ │ -│ └─────────────┘ │ openclaw-monetize-wkld │ │ -│ │ │ Role: │ │ -│ │ │ openclaw-x402-pricing │ │ -│ │ └─────────────────────────┘ │ -│ │ │ -│ │ kubectl exec │ -│ ▼ │ -│ ┌─────────────────────────────────┐ │ -│ │ obol-agent pod │ │ -│ │ monetize.py process │──▶ ServiceOffer CR │ -│ │ monetize.py process --all │ (status conditions) │ -│ │ monetize.py list │ │ -│ └─────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ UpstreamHealthy=False (no real upstream) │ -│ HEARTBEAT_OK (no pending offers) │ -│ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ NOT TESTED: Traefik routing, x402 gate, payment, tunnel │ │ -│ └──────────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `RBAC_ClusterRolesExist` | Split RBAC roles deployed by k3s manifests | -| `RBAC_BindingsPatched` | `obol agent init` patches all 3 bindings | -| `Monetize_ListEmpty` | Agent skill lists zero offers | -| `Monetize_ProcessAllEmpty` | Heartbeat returns OK with no work | -| `Monetize_ProcessUnhealthy` | Sets UpstreamHealthy=False for missing svc | -| `Monetize_Idempotent` | Second reconcile doesn't error | - -**Gap vs real world**: No upstream service exists. Reconciliation never reaches PaymentGateReady or RoutePublished. - ---- - -## Phase 3 — Routing with Anvil Upstream (6 tests) - -**What it covers**: Full 6-condition reconciliation with a real upstream (Anvil fork), Traefik Middleware + HTTPRoute creation, traffic forwarding, owner-reference cascade on delete. - -**Realism**: Medium-High (real cluster networking, real Traefik, real upstream). No payment gate yet. - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ ┌──────────┐ │ -│ │ Anvil │ ◀── Host machine (port N) │ -│ │ (fork of │ forking Base Sepolia │ -│ │ base-sep)│ │ -│ └────┬─────┘ │ -│ │ ClusterIP + EndpointSlice │ -│ │ (anvil-rpc.test-ns.svc) │ -│ ▼ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ k3d cluster │ │ -│ │ │ │ -│ │ Agent reconciles: │ │ -│ │ ✓ UpstreamHealthy (HTTP health-check to Anvil) │ │ -│ │ ✓ PaymentGateReady (Middleware created) │ │ -│ │ ✓ RoutePublished (HTTPRoute created) │ │ -│ │ ✓ Ready │ │ -│ │ │ │ -│ │ ┌─────────────┐ ┌──────────────┐ ┌──────────┐ │ │ -│ │ │ Traefik GW │────▶│ HTTPRoute │────▶│ Anvil │ │ │ -│ │ │ :8080 │ │ /services/x │ │ upstream │ │ │ -│ │ └─────────────┘ └──────────────┘ └──────────┘ │ │ -│ │ │ │ -│ │ curl POST obol.stack:8080/services/x │ │ -│ │ → eth_blockNumber response from Anvil ✓ │ │ -│ └──────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌────────────────────────────────────────────────────────────┐ │ -│ │ NOT TESTED: x402 ForwardAuth (no facilitator), no 402 │ │ -│ └────────────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `Route_AnvilUpstream` | Anvil responds locally | -| `Route_FullReconcile` | All 4 conditions reach True | -| `Route_MiddlewareCreated` | ForwardAuth Middleware exists | -| `Route_HTTPRouteCreated` | HTTPRoute has correct parentRef | -| `Route_TrafficRoutes` | HTTP through Traefik reaches Anvil | -| `Route_DeleteCascades` | ownerRef GC cleans up derived resources | - -**Gap vs real world**: No payment gate. Requests go straight through without x402 gating. Free endpoint, not monetized. - ---- - -## Phase 4 — Payment Gate (4 tests) - -**What it covers**: x402-verifier health, 402 response without payment, 402 response body format (x402 spec compliance), 200 response with mock payment. - -**Realism**: Medium-High. Real x402-verifier, real Traefik ForwardAuth. Mock facilitator always says `isValid: true`. - -``` -┌──────────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ ┌───────┐ POST /services/x ┌──────────┐ ForwardAuth │ -│ │Client │ ─────────────────────▶ │ Traefik │ ──────────────▶ │ -│ │(test) │ │ Gateway │ │ │ -│ └───────┘ └──────────┘ │ │ -│ │ │ ▼ │ -│ │ │ ┌──────────────┐│ -│ │ No X-PAYMENT header │ │ x402-verifier││ -│ │ ──────────────────▶ │ │ (real pod) ││ -│ │ │ │ ││ -│ │ ◀── 402 + pricing JSON │ │ Checks: ││ -│ │ │ │ ✓ route match││ -│ │ │ │ ✓ has header ││ -│ │ X-PAYMENT: │ │ ✓ call facil.││ -│ │ ──────────────────▶ │ │ ││ -│ │ │ │ ┌────────┐ ││ -│ │ │ │ │ Mock │ ││ -│ │ ◀── 200 + Anvil response │ │ │ Facil. │ ││ -│ │ │ │ │ always │ ││ -│ │ │ │ │ valid │ ││ -│ │ │ │ └────────┘ ││ -│ │ │ └──────────────┘│ -│ │ -│ ┌──────────────────────────────────────────────────────────────┐ │ -│ │ MOCK: facilitator (no real signature validation) │ │ -│ │ MOCK: payment header (fake JSON, not real EIP-712) │ │ -│ └──────────────────────────────────────────────────────────────┘ │ -└──────────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `PaymentGate_VerifierHealthy` | /healthz and /readyz return 200 | -| `PaymentGate_402WithoutPayment` | No payment → 402 | -| `PaymentGate_RequirementsFormat` | 402 body matches x402 spec | -| `PaymentGate_200WithPayment` | Mock payment → 200 | - -**Gap vs real world**: The facilitator never validates the EIP-712 signature. Any well-formed JSON base64 header passes. Wire format bugs (string vs int types) are invisible. - ---- - -## Phase 5 — Full E2E CLI-Driven (3 tests) - -**What it covers**: `obol sell http` CLI → CR creation → agent reconciliation → 402 → 200 → `obol sell list/status/delete`. Heartbeat auto-reconciliation (90s wait). - -**Realism**: High for the CLI path. Still uses mock facilitator for payment. - -``` -┌──────────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ ┌────────────────┐ │ -│ │ obol sell│ │ -│ │ offer my-qwen │ ──▶ ServiceOffer CR │ -│ │ --type inference │ │ -│ │ --model qwen3 │ │ -│ │ --per-request .. │ │ -│ └────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌──────────────────────────────────────────────────────────────────┐ │ -│ │ Agent pod (autonomous reconciliation) │ │ -│ │ │ │ -│ │ monetize.py process ──▶ 6 conditions ──▶ Ready=True │ │ -│ │ │ │ -│ │ OR: heartbeat cron (every 30min) auto-reconciles │ │ -│ └──────────────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌──────────────────────────────────────────────────────────────────┐ │ -│ │ obol sell list → shows offer │ │ -│ │ obol sell status → shows all conditions │ │ -│ │ obol sell delete → cleans up CR + derived resources │ │ -│ └──────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Still uses mock facilitator for payment verification. │ -└──────────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `E2E_OfferLifecycle` | Full CLI → create → reconcile → pay → delete | -| `E2E_HeartbeatReconciles` | Cron-driven reconciliation without manual trigger | -| `E2E_ListAndStatus` | CLI query commands work | - -**Gap vs real world**: Mock facilitator. No real model (Anvil upstream, not Ollama). - ---- - -## Phase 6 — Tunnel E2E + Ollama (2 tests) - -**What it covers**: Real Ollama inference through the full stack, including Cloudflare tunnel accessibility. Agent-autonomous offer management. - -**Realism**: Very High for the local path. Tunnel tests require CF credentials. - -``` -┌───────────────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ ┌─────────┐ POST /services/x/v1/chat/completions │ -│ │ Client │ ────────────────────────────────────────▶ │ -│ └─────────┘ │ │ -│ │ ▼ │ -│ │ ┌──────────┐ ForwardAuth ┌──────────────────┐ │ -│ │ │ Traefik │ ──────────────▶ │ x402-verifier │ │ -│ │ │ Gateway │ │ → mock facilitator│ │ -│ │ └──────────┘ └──────────────────┘ │ -│ │ │ │ -│ │ │ payment valid │ -│ │ ▼ │ -│ │ ┌──────────┐ │ -│ │ │ Ollama │ ← REAL model (qwen3:0.6b) │ -│ │ │ (llm ns) │ REAL inference response │ -│ │ └──────────┘ │ -│ │ │ -│ │ Also tests via tunnel: │ -│ │ ┌─────────────────────┐ │ -│ │ │ Cloudflare Tunnel │ ← if CF credentials configured │ -│ │ │ https:// │ │ -│ │ └─────────────────────┘ │ -│ │ │ -│ ┌────────────────────────────────────────────────────────────────┐ │ -│ │ REAL: Ollama inference, Traefik routing, x402-verifier │ │ -│ │ MOCK: facilitator (still always-valid) │ │ -│ │ OPTIONAL: CF tunnel (skipped without credentials) │ │ -│ └────────────────────────────────────────────────────────────────┘ │ -└───────────────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `Tunnel_OllamaMonetized` | Real model → real inference → mock payment → response | -| `Tunnel_AgentAutonomousMonetize` | Agent creates/manages offer without CLI | - -**Gap vs real world**: Mock facilitator. Real-world buyers send real EIP-712 signatures. - ---- - -## Phase 7 — Fork Validation with Mock Facilitator (2 tests) - -**What it covers**: Anvil-fork-backed upstream with mock facilitator verify/settle tracking, agent error recovery from bad upstream state. - -**Realism**: Medium-High. Real on-chain environment (forked), but fake payment validation. - -``` -┌──────────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ ┌──────────┐ ┌─────────────────┐ │ -│ │ Anvil │ ◀── fork of Base Sepolia │ Mock Facilitator│ │ -│ │ (real │ real block numbers │ ✓ /verify │ │ -│ │ chain │ real chain ID 84532 │ → always valid│ │ -│ │ state) │ │ ✓ /settle │ │ -│ └──────────┘ │ → always ok │ │ -│ │ │ Tracks call │ │ -│ │ EndpointSlice │ counts only │ │ -│ ▼ └─────────────────┘ │ -│ ┌───────────────────────────────────┐ │ │ -│ │ Full reconciliation pipeline │ │ │ -│ │ ✓ UpstreamHealthy (Anvil health) │ │ │ -│ │ ✓ PaymentGateReady │ │ │ -│ │ ✓ RoutePublished │ │ │ -│ │ ✓ Ready │◀───────────┘ │ -│ │ │ │ -│ │ Also tests: │ │ -│ │ ✓ Pricing route in ConfigMap │ │ -│ │ ✓ Delete cleans up pricing route │ │ -│ │ ✓ Agent self-heals from bad state │ │ -│ └───────────────────────────────────┘ │ -│ │ -│ ┌──────────────────────────────────────────────────────────────┐ │ -│ │ MOCK: facilitator (no signature validation, no USDC check) │ │ -│ │ MOCK: payment header (fake JSON blob) │ │ -│ └──────────────────────────────────────────────────────────────┘ │ -└──────────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `Fork_FullPaymentFlow` | 402 → 200 with mock, verify/settle called | -| `Fork_AgentSkillIteration` | Agent recovers from unreachable upstream | - -**Gap vs real world**: Facilitator never validates signatures. USDC balance irrelevant. - ---- - -## Phase 5+ — Real Facilitator Payment (1 test) ← CLOSEST TO PRODUCTION - -**What it covers**: The entire payment cryptography stack. Real x402-rs facilitator binary, real EIP-712 TransferWithAuthorization signatures, real USDC balance on Anvil fork, real signature validation. - -**Realism**: Very High. The only mock remaining is the chain settlement (Anvil resets after test). - -``` -┌──────────────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ ┌──────────┐ Buyer: Anvil Account[0] │ -│ │ go test │ 10 USDC minted via anvil_setStorageAt │ -│ │ │ │ -│ │ Signs real EIP-712 │ -│ │ TransferWithAuthorization │ -│ │ (ERC-3009) │ -│ │ │ -│ │ ┌─────────────────────────────────────┐ │ -│ │ │ TypedData: │ │ -│ │ │ domain: USD Coin / v2 / 84532 │ │ -│ │ │ from: buyer address │ │ -│ │ │ to: seller address │ │ -│ │ │ value: "1000" (0.001 USDC) │ │ -│ │ │ validAfter: "0" ← STRING! │ │ -│ │ │ validBefore: "4294967295" ← STRING│ │ -│ │ │ nonce: random 32 bytes │ │ -│ │ └─────────────────────────────────────┘ │ -│ └──────────┘ │ -│ │ │ -│ │ X-PAYMENT: base64(envelope) │ -│ ▼ │ -│ ┌──────────┐ ForwardAuth ┌──────────────────┐ │ -│ │ Traefik │ ───────────────▶ │ x402-verifier │ │ -│ │ Gateway │ │ (real pod) │ │ -│ └──────────┘ └────────┬─────────┘ │ -│ │ │ │ -│ │ │ POST /verify │ -│ │ ▼ │ -│ │ ┌──────────────────┐ │ -│ │ │ x402-rs │ ← REAL binary │ -│ │ │ facilitator │ │ -│ │ │ │ │ -│ │ │ ✓ Decodes header │ │ -│ │ │ ✓ Validates EIP │ │ -│ │ │ 712 signature │ │ -│ │ │ ✓ Checks USDC │ │ -│ │ │ balance on │ │ -│ │ │ Anvil fork │ │ -│ │ │ ✓ Returns │ │ -│ │ │ isValid: true │ │ -│ │ └────────┬─────────┘ │ -│ │ │ │ -│ │ │ connected to: │ -│ │ ▼ │ -│ │ ┌──────────────────┐ │ -│ │ │ Anvil Fork │ ← REAL chain state │ -│ │ │ (Base Sepolia) │ │ -│ │ │ chain ID: 84532 │ │ -│ │ │ │ │ -│ │ │ Has USDC balance │ │ -│ │ │ for buyer address │ │ -│ │ └──────────────────┘ │ -│ │ │ -│ │ 200 OK │ -│ ▼ │ -│ Response from Anvil (eth_blockNumber) │ -│ │ -│ ┌───────────────────────────────────────────────────────────────────┐ │ -│ │ REAL: x402-rs binary, EIP-712 signing, USDC state, verifier, │ │ -│ │ Traefik ForwardAuth, agent reconciliation, CRD lifecycle │ │ -│ │ SIMULATED: chain (Anvil fork, not mainnet), settlement (no │ │ -│ │ actual USDC transfer, Anvil state resets) │ │ -│ └───────────────────────────────────────────────────────────────────┘ │ -└──────────────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `Fork_RealFacilitatorPayment` | Real EIP-712 → real x402-rs → real validation → 200 | - -**Gap vs real world**: Settlement doesn't transfer real USDC (Anvil fork resets). No real L1/L2 block confirmation. No Cloudflare tunnel in this test. - ---- - -## Phase 8 — Full Stack: Tunnel + Ollama + Real Facilitator (1 test) ← PRODUCTION EQUIVALENT - -**What it covers**: Everything. Real Ollama inference, real x402-rs facilitator, real EIP-712 signatures, USDC-funded Anvil fork, and requests entering through the Cloudflare quick tunnel's dynamic `*.trycloudflare.com` URL. - -**Realism**: Maximum. This is a production sell-side scenario with the only difference being Anvil (not mainnet) and a quick tunnel (not a persistent named tunnel). - -``` -┌──────────────────────────────────────────────────────────────────────────────┐ -│ TEST BOUNDARY │ -│ │ -│ BUYER (test runner) │ -│ ┌──────────────────────────────────────────────────────────────────────┐ │ -│ │ 1. Signs real EIP-712 TransferWithAuthorization (ERC-3009) │ │ -│ │ domain: USD Coin / v2 / 84532 │ │ -│ │ from: 0xf39F... (Anvil account[0], funded with 10 USDC) │ │ -│ │ to: 0x7099... (seller) │ │ -│ │ value: "1000" (0.001 USDC) │ │ -│ │ nonce: random 32 bytes │ │ -│ └──────────────────────────────────────────────────────────────────────┘ │ -│ │ │ -│ │ POST https://.trycloudflare.com/services/test-tunnel-real/ │ -│ │ /v1/chat/completions │ -│ │ X-PAYMENT: base64(real EIP-712 envelope) │ -│ ▼ │ -│ ┌──────────────────────────────────────┐ │ -│ │ Cloudflare Edge (quick tunnel) │ ← REAL Cloudflare infrastructure │ -│ │ *.trycloudflare.com │ dynamic URL, non-persistent │ -│ │ TLS termination │ │ -│ └────────────────┬─────────────────────┘ │ -│ │ cloudflared connector (k3d pod) │ -│ ▼ │ -│ ┌──────────────────────────────────────┐ │ -│ │ Traefik Gateway (:443 internal) │ ← REAL Traefik, Gateway API │ -│ │ HTTPRoute: /services/test-tunnel-* │ │ -│ │ ForwardAuth middleware │ │ -│ └────────────────┬─────────────────────┘ │ -│ │ ForwardAuth request │ -│ ▼ │ -│ ┌──────────────────────────────────────┐ │ -│ │ x402-verifier (2 replicas, PDB) │ ← REAL verifier pod │ -│ │ Extracts X-PAYMENT header │ │ -│ │ Looks up pricing route in ConfigMap │ │ -│ │ Calls facilitator /verify │ │ -│ └────────────────┬─────────────────────┘ │ -│ │ POST /verify │ -│ ▼ │ -│ ┌──────────────────────────────────────┐ │ -│ │ x402-rs facilitator (host process) │ ← REAL Rust binary │ -│ │ │ │ -│ │ ✓ Decodes x402 V1 envelope │ │ -│ │ ✓ Recovers signer from EIP-712 sig │ │ -│ │ ✓ Checks USDC balance on Anvil │ │ -│ │ ✓ Validates nonce not replayed │ │ -│ │ ✓ Returns isValid: true + payer │ │ -│ └────────────────┬─────────────────────┘ │ -│ │ connected to: │ -│ ▼ │ -│ ┌──────────────────────────────────────┐ │ -│ │ Anvil Fork (host process) │ ← REAL chain state (Base Sepolia) │ -│ │ chain ID: 84532 │ USDC balances, nonce tracking │ -│ │ 10 USDC minted to buyer │ │ -│ └──────────────────────────────────────┘ │ -│ │ -│ ◀── verifier returns 200 (payment valid) │ -│ │ │ -│ ▼ Traefik forwards to upstream │ -│ ┌──────────────────────────────────────┐ │ -│ │ Ollama (llm namespace) │ ← REAL model inference │ -│ │ model: qwen2.5 / qwen3:0.6b │ actual LLM generation │ -│ │ │ │ -│ │ POST /v1/chat/completions │ │ -│ │ → "say hello in one word" │ │ -│ │ ← {"choices":[{"message":...}]} │ │ -│ └──────────────────────────────────────┘ │ -│ │ -│ ◀── 200 + inference response returned to buyer via tunnel │ -│ │ -│ ┌───────────────────────────────────────────────────────────────────────┐ │ -│ │ REAL: tunnel, Traefik, x402-verifier, x402-rs, EIP-712, USDC, │ │ -│ │ Ollama, agent reconciliation, CRD, RBAC, Gateway API │ │ -│ │ SIMULATED: chain (Anvil fork, not mainnet), settlement │ │ -│ │ NOT PERSISTENT: quick tunnel URL changes on restart │ │ -│ └───────────────────────────────────────────────────────────────────────┘ │ -└──────────────────────────────────────────────────────────────────────────────┘ -``` - -| Test | What It Proves | -|------|----------------| -| `Tunnel_RealFacilitatorOllama` | Buyer → CF tunnel → x402 gate → real EIP-712 validation → real Ollama inference → response via tunnel | - -**What makes this different from every other test**: - -| Component | Phase 6 (existing) | Phase 5+ (Anvil) | Phase 8 (this) | -|-----------|-------------------|-------------------|----------------| -| Inference | Real Ollama | Anvil RPC | Real Ollama | -| Facilitator | Mock (always valid) | Real x402-rs | Real x402-rs | -| Payment signature | Fake JSON blob | Real EIP-712 | Real EIP-712 | -| USDC balance | N/A | Minted on Anvil | Minted on Anvil | -| Entry point | obol.stack:8080 | obol.stack:8080 | **\*.trycloudflare.com** | -| TLS | None (HTTP) | None (HTTP) | **Real TLS** (CF edge) | - -**Gap vs real world**: Quick tunnel URL is ephemeral (not a persistent `myagent.example.com`). USDC settlement doesn't transfer real tokens (Anvil resets). No real L1/L2 block finality. - ---- - -## Base Tests — Inference + Skills (12 tests) - -**What they cover**: Ollama/Anthropic/OpenAI/Google/Zhipu inference through LiteLLM, skill staging and injection, skill visibility in pod, skill-driven agent responses. - -**Realism**: Very High for inference path. These are the "does the AI actually work" tests. - -Not directly part of the monetize subsystem, but they validate the upstream service that gets monetized. - ---- - -## Realism Comparison Matrix - -``` - CRD RBAC Agent Traefik x402 Facil. EIP-712 USDC Ollama Tunnel TLS - ─── ──── ───── ─────── ──── ────── ─────── ──── ────── ────── ─── -Phase 1 (CRD) ✓ -Phase 2 (RBAC) ✓ ✓ ✓ -Phase 3 (Route) ✓ ✓ ✓ ✓ -Phase 4 (Gate) ✓ ✓ ✓ ✓ ✓ MOCK MOCK -Phase 5 (E2E) ✓ ✓ ✓ ✓ ✓ MOCK MOCK -Phase 6 (Tunnel) ✓ ✓ ✓ ✓ ✓ MOCK MOCK ✓ ✓ ✓ -Phase 7 (Fork) ✓ ✓ ✓ ✓ ✓ MOCK MOCK N/A -Phase 5+ (Real) ✓ ✓ ✓ ✓ ✓ REAL REAL REAL -Phase 8 (FULL) ✓ ✓ ✓ ✓ ✓ REAL REAL REAL ✓ ✓ ✓ - - ✓ = real component MOCK = simulated REAL = production-equivalent -``` - ---- - -## What's Still Not Tested - -| Gap | Impact | Mitigation | -|-----|--------|------------| -| **Real USDC settlement** | Anvil fork doesn't persist transfers | Would need Base Sepolia testnet with real USDC faucet | -| **Persistent named tunnel** | Quick tunnel URL is ephemeral | Phase 8 uses quick tunnel; persistent requires `obol tunnel provision` with CF credentials | -| **Concurrent buyers** | All tests are single-buyer | Add load test with multiple signed payments | -| **ERC-8004 registration** | `obol sell register` not tested end-to-end | Would need real Base Sepolia tx (gas costs) | -| **Price change hot-reload** | Agent updates price in CR → verifier picks up new amount | Test exists partially in Phase 4 format checks | -| **Buy-side flow** | No buyer CLI/SDK test | Planned as next phase | - ---- - -## Running the Tests - -```bash -# Prerequisites -export OBOL_DEVELOPMENT=true -export OBOL_CONFIG_DIR=$(pwd)/../../.workspace/config -export OBOL_BIN_DIR=$(pwd)/../../.workspace/bin -export OBOL_DATA_DIR=$(pwd)/../../.workspace/data - -# Phase 1-3: CRD + RBAC + Routing (fast, ~2min) -go test -tags integration -v -timeout 5m \ - -run 'TestIntegration_CRD_|TestIntegration_RBAC_|TestIntegration_Monetize_|TestIntegration_Route_' \ - ./internal/openclaw/ - -# Phase 4-5: Payment gate + E2E (medium, ~5min) -go test -tags integration -v -timeout 10m \ - -run 'TestIntegration_PaymentGate_|TestIntegration_E2E_' \ - ./internal/openclaw/ - -# Phase 6: Tunnel + Ollama (slow, ~8min, needs Ollama model cached) -go test -tags integration -v -timeout 15m \ - -run 'TestIntegration_Tunnel_' \ - ./internal/openclaw/ - -# Phase 7: Fork validation (medium, ~5min) -go test -tags integration -v -timeout 10m \ - -run 'TestIntegration_Fork_FullPaymentFlow|TestIntegration_Fork_AgentSkillIteration' \ - ./internal/openclaw/ - -# Phase 5+: Real facilitator (medium, ~5min, needs x402-rs) -export X402_RS_DIR=/path/to/x402-rs -go test -tags integration -v -timeout 15m \ - -run 'TestIntegration_Fork_RealFacilitatorPayment' \ - ./internal/openclaw/ - -# Phase 8: FULL — tunnel + Ollama + real facilitator (~8min, needs everything) -export X402_RS_DIR=/path/to/x402-rs -go test -tags integration -v -timeout 15m \ - -run 'TestIntegration_Tunnel_RealFacilitatorOllama' \ - ./internal/openclaw/ - -# x402 verifier standalone E2E -go test -tags integration -v -timeout 10m \ - -run 'TestIntegration_PaymentGate' \ - ./internal/x402/ - -# All monetize tests -go test -tags integration -v -timeout 20m ./internal/openclaw/ -``` diff --git a/docs/monetisation-architecture-proposal.md b/docs/monetisation-architecture-proposal.md deleted file mode 100644 index 7588c935..00000000 --- a/docs/monetisation-architecture-proposal.md +++ /dev/null @@ -1,480 +0,0 @@ -# Obol Agent: Autonomous Compute Monetization - -**Branch:** `feat/secure-enclave-inference` | **Date:** 2026-02-25 | **Status:** Architecture proposal - ---- - -## 1. The Goal - -A singleton OpenClaw instance — the **obol-agent** — deployed via `obol agent init`, autonomously monetizes compute resources running in the Obol Stack. A user (or the frontend) declares *what* to expose via a Custom Resource; the obol-agent handles *everything else*: model pulling, health validation, payment gating, public exposure, on-chain registration, and status reporting. - -No separate controller binary. No Go operator. The obol-agent is a regular OpenClaw instance with elevated RBAC and the `monetize` skill. Only one obol-agent can exist per cluster; other OpenClaw instances retain standard read-only access. - ---- - -## 2. How It Works - -``` - ┌──────────────────────────────────┐ - │ User / Frontend / obol CLI │ - │ │ - │ kubectl apply -f offer.yaml │ - │ OR: frontend POST to k8s API │ - │ OR: obol sell http ... │ - └──────────┬───────────────────────────┘ - │ creates CR - ▼ - ┌────────────────────────────────────┐ - │ ServiceOffer CR │ - │ apiVersion: obol.org/v1alpha1 │ - │ kind: ServiceOffer │ - └──────────┬───────────────────────────┘ - │ read by - ▼ - ┌────────────────────────────────────┐ - │ obol-agent (singleton OpenClaw) │ - │ namespace: openclaw- │ - │ │ - │ Cron job (every 60s): │ - │ python3 monetize.py process --all│ - │ │ - │ `monetize` skill: │ - │ 1. Read ServiceOffer CRs │ - │ 2. Pull model (if runtime=ollama) │ - │ 3. Health-check upstream service │ - │ 4. Create ForwardAuth Middleware │ - │ 5. Create HTTPRoute │ - │ 6. Register on ERC-8004 │ - │ 7. Update CR status │ - └────────────────────────────────────┘ -``` - -The obol-agent uses its mounted ServiceAccount token to talk to the Kubernetes API — the same pattern `kube.py` already uses for read-only monitoring, but extended with write operations for Middleware and HTTPRoute resources. - -The reconciliation loop is built on OpenClaw's native **cron system**: a `{ kind: "every", everyMs: 60000 }` job runs `monetize.py process --all` every 60 seconds. No sidecar, no K8s CronJob — the cron scheduler runs inside the OpenClaw Gateway process and persists across pod restarts. - ---- - -## 3. Why Not a Separate Controller - -| Concern | Go operator (controller-runtime) | OpenClaw with `monetize` skill | -|---------|----------------------------------|--------------------------------| -| New binary to build/maintain | Yes — new cmd/, Dockerfile, CI | No — skill is a SKILL.md + Python script | -| Hot-updatable logic | No — rebuild + redeploy image | Yes — update skill files on PVC | -| Error handling | Hardcoded retry/backoff | AI reasons about failures, adapts | -| Watch loop | Built-in informer cache | Built-in cron: `monetize.py process --all` every 60s | -| Dependencies | controller-runtime, kubebuilder, code-gen | stdlib Python (`urllib`, `json`, `ssl`) | -| Existing infrastructure | Needs new Deployment, SA, RBAC | Uses existing OpenClaw pod, SA, skill system | - -The traditional operator pattern is the right answer when you need guaranteed sub-second reconciliation with leader election. For monetization lifecycle (deploy → expose → register → monitor), OpenClaw acting on ServiceOffer CRs via skills is simpler and leverages everything already built. - ---- - -## 4. The CRD - -```yaml -apiVersion: obol.org/v1alpha1 -kind: ServiceOffer -metadata: - name: qwen-inference - namespace: openclaw-default # lives alongside the OpenClaw instance -spec: - # What to serve - model: - name: Qwen/Qwen3.5-35B-A3B # Ollama model tag to pull - runtime: ollama # runtime that serves the model - - # Upstream service (Ollama already running in-cluster) - upstream: - service: ollama # k8s Service name - namespace: openclaw-default # where the service runs - port: 11434 - healthPath: /api/tags # endpoint to probe after pull - - # How to price it - pricing: - amount: "0.50" - unit: MTok # per million tokens - currency: USDC - chain: base - - # Who gets paid - wallet: "0x1234...abcd" - - # Public path - path: /services/qwen-inference - - # On-chain advertisement - register: true -``` - -```yaml -status: - conditions: - - type: ModelReady - status: "True" - reason: PullCompleted - message: "Qwen/Qwen3.5-35B-A3B pulled and loaded on ollama" - - type: UpstreamHealthy - status: "True" - reason: HealthCheckPassed - message: "Model responds to inference at ollama.openclaw-default.svc:11434" - - type: PaymentGateReady - status: "True" - reason: MiddlewareCreated - message: "ForwardAuth middleware x402-qwen-inference created" - - type: RoutePublished - status: "True" - reason: HTTPRouteCreated - message: "Exposed at /services/qwen-inference via traefik-gateway" - - type: Registered - status: "True" - reason: ERC8004Registered - message: "Registered on Base (tx: 0xabc...)" - - type: Ready - status: "True" - reason: AllConditionsMet - endpoint: "https://stack.example.com/services/qwen-inference" - observedGeneration: 1 -``` - -**Design:** -- **Namespace-scoped** — the CR lives in the same namespace as the upstream service. This preserves OwnerReference cascade (garbage collection on delete) and avoids cross-namespace complexity. The obol-agent's ClusterRoleBinding lets it watch ServiceOffers across all namespaces via `GET /apis/obol.org/v1alpha1/serviceoffers` (cluster-wide list). -- **Conditions, not Phase** — [deprecated by API conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties). Conditions give granular insight into which step failed. -- **Status subresource** — prevents users from accidentally overwriting status. ([docs](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#status-subresource)) -- **Same-namespace as upstream** — the Middleware and HTTPRoute are created alongside the upstream service. OwnerReferences work (same namespace), so deleting the ServiceOffer garbage-collects the route and middleware. ([docs](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/)) - -### CRD installation - -The CRD manifest is embedded in the infrastructure helmfile (same pattern as `obol-agent.yaml`) and applied during `obol stack init`. No kubebuilder, no code-gen — just a static YAML manifest. - ---- - -## 5. The `monetize` Skill - -``` -internal/embed/skills/monetize/ -├── SKILL.md # Teaches OpenClaw when and how to use this skill -├── scripts/ -│ └── monetize.py # K8s API client for ServiceOffer lifecycle -└── references/ - └── x402-pricing.md # Pricing strategies, chain selection -``` - -### SKILL.md (summary) - -Teaches OpenClaw: -- When a user asks to monetize a service, create a ServiceOffer CR -- When asked to check monetization status, read ServiceOffer CRs and report conditions -- When asked to process offers, run the monetization workflow (health → gate → route → register) -- When asked to stop monetizing, delete the ServiceOffer CR (garbage collection handles cleanup) - -### kube.py extension - -`kube.py` gains write helpers (`api_post`, `api_patch`, `api_delete`) alongside its existing `api_get`. The read-only contract is preserved by convention: `kube.py` commands remain read-only; `monetize.py` imports the shared helpers and adds write operations. Pure Python stdlib — no new dependencies. - -Why not a K8s MCP server? The mounted ServiceAccount token already gives direct API access. An MCP server (e.g., Red Hat's `containers/kubernetes-mcp-server`) adds a sidecar container, image pull, and Helm chart changes for what amounts to wrapping the same REST calls. It's a known upgrade path if K8s operations outgrow script-based tooling, but adds no value today. - -### monetize.py - -``` -python3 monetize.py offers # list ServiceOffer CRs -python3 monetize.py process # run full workflow for one offer -python3 monetize.py process --all # process all pending offers -python3 monetize.py status # show conditions -python3 monetize.py create --upstream .. # create a ServiceOffer CR -python3 monetize.py delete # delete CR (cascades cleanup) -``` - -Each `process` invocation: - -1. **Read the ServiceOffer CR** from the k8s API -2. **Pull the model** — if `spec.model.runtime == ollama`, `POST /api/pull` to Ollama -3. **Health-check** — verify model responds at `..svc:` -4. **Create/update Middleware** — Traefik ForwardAuth pointing at `x402-verifier.x402.svc:8080/verify` -5. **Create/update HTTPRoute** — `parentRef: traefik-gateway`, path from spec, backend = upstream service, filter = the Middleware -6. **ERC-8004 registration** — if `spec.register`, call `signer.py` to sign and submit the registration tx -7. **Update CR status** — set conditions and endpoint - -All via the k8s REST API using the mounted ServiceAccount token. No kubectl, no client-go, no external dependencies. - ---- - -## 6. What Gets Created Per ServiceOffer - -All resources are created in the **same namespace** as the upstream service (and the ServiceOffer CR). OwnerReferences on the ServiceOffer handle cleanup. - -| Resource | Purpose | -|----------|---------| -| `Middleware` (traefik.io/v1alpha1) | ForwardAuth to `x402-verifier.x402.svc:8080/verify` — gates the upstream with payment | -| `HTTPRoute` (gateway.networking.k8s.io/v1) | Routes `spec.path` from Traefik Gateway to upstream, through the Middleware | - -That's it. Two resources. The upstream service already runs. The x402 verifier already runs. The Gateway already runs. The tunnel already runs. - -### Why no new namespace - -The upstream service already has a namespace. Creating a new namespace per offer would mean: -- Cross-namespace OwnerReferences don't work ([docs](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/)) -- Need ReferenceGrant for cross-namespace backend refs in HTTPRoute ([docs](https://gateway-api.sigs.k8s.io/api-types/referencegrant/)) -- Broader RBAC (namespace create/delete permissions) - -Instead: Middleware and HTTPRoute live alongside the upstream. Delete the ServiceOffer CR → Kubernetes cascades the deletion. - -### Cross-namespace HTTPRoute → Gateway - -The HTTPRoute references `traefik-gateway` in the `traefik` namespace. No ReferenceGrant needed — the Gateway's `allowedRoutes.namespaces.from: All` handles this. ([Gateway API docs](https://gateway-api.sigs.k8s.io/guides/multiple-ns/)) - -### Middleware locality - -Traefik's `ExtensionRef` in HTTPRoute is a `LocalObjectReference` — Middleware must be in the same namespace as the HTTPRoute. The skill creates it there. ([traefik#11126](https://github.com/traefik/traefik/issues/11126)) - ---- - -## 7. RBAC: Singleton obol-agent vs Regular OpenClaw - -### Two tiers of access - -| | obol-agent (singleton) | Regular OpenClaw instances | -|---|---|---| -| **Deployed by** | `obol agent init` | `obol openclaw onboard` | -| **RBAC** | `openclaw-monetize` ClusterRole | Namespace-scoped read-only Role (chart default) | -| **Skills** | All default skills + `monetize` | Default skills only | -| **Cron** | `monetize.py process --all` every 60s | No monetization cron | -| **Count** | Exactly one per cluster | Zero or more | - -Only the obol-agent gets the elevated ClusterRole. `obol agent init` enforces the singleton constraint — it refuses to create a second obol-agent if one already exists. - -### obol-agent ClusterRole - -```yaml -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: openclaw-monetize -rules: - # Read/write ServiceOffer CRs - - apiGroups: ["obol.org"] - resources: ["serviceoffers"] - verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - - apiGroups: ["obol.org"] - resources: ["serviceoffers/status"] - verbs: ["get", "update", "patch"] - - # Create Middleware and HTTPRoute in service namespaces - - apiGroups: ["traefik.io"] - resources: ["middlewares"] - verbs: ["get", "list", "create", "update", "patch", "delete"] - - apiGroups: ["gateway.networking.k8s.io"] - resources: ["httproutes"] - verbs: ["get", "list", "create", "update", "patch", "delete"] - - # Read pods/services/endpoints/deployments for health checks (any namespace) - - apiGroups: [""] - resources: ["pods", "services", "endpoints"] - verbs: ["get", "list"] - - apiGroups: ["apps"] - resources: ["deployments"] - verbs: ["get", "list"] - - apiGroups: [""] - resources: ["pods/log"] - verbs: ["get"] -``` - -This is bound to OpenClaw's ServiceAccount via ClusterRoleBinding — the skill needs to read services and create routes across namespaces (e.g., check health of Ollama in `openclaw-default`, create a route for an Ethereum node in `ethereum-knowing-wahoo`). - -### What is explicitly NOT granted - -| Excluded | Why | -|----------|-----| -| `secrets` (cluster-wide) | OpenClaw has secrets access in its own namespace only (chart default) | -| `rbac.authorization.k8s.io/*` | Cannot modify its own permissions | -| `namespaces` create/delete | Doesn't create namespaces | -| `deployments` create/update | Doesn't create workloads — gates existing ones | -| `configmaps` create (cluster-wide) | Reads config for diagnostics, doesn't write it | - -### How this gets applied - -The ClusterRole and ClusterRoleBinding are added to the OpenClaw helmfile generation in `internal/openclaw/openclaw.go`, same as the existing `rbac.create: true` overlay. When `obol openclaw onboard` runs, the chart deploys these RBAC resources alongside the pod. - -**Ref:** [RBAC Good Practices](https://kubernetes.io/docs/concepts/security/rbac-good-practices/) - -### Fix the existing `admin` RoleBinding - -The per-network `agent-rbac.yaml` currently binds the `admin` ClusterRole, which includes Secrets and RBAC manipulation. Replace with a scoped ClusterRole (read pods/services + write Middleware/HTTPRoute). - ---- - -## 8. Admission Policy Guardrail - -Defense-in-depth via [ValidatingAdmissionPolicy](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/) (GA in k8s 1.30, available in k3s 1.31): - -```yaml -apiVersion: admissionregistration.k8s.io/v1 -kind: ValidatingAdmissionPolicy -metadata: - name: openclaw-monetize-guardrail -spec: - failurePolicy: Fail - matchConstraints: - resourceRules: - - apiGroups: ["traefik.io"] - apiVersions: ["v1alpha1"] - operations: ["CREATE", "UPDATE"] - resources: ["middlewares"] - - apiGroups: ["gateway.networking.k8s.io"] - apiVersions: ["v1"] - operations: ["CREATE", "UPDATE"] - resources: ["httproutes"] - matchConditions: - - name: is-openclaw - expression: >- - request.userInfo.username.startsWith("system:serviceaccount:openclaw-") - validations: - # HTTPRoutes must reference traefik-gateway only - - expression: >- - object.spec.parentRefs.all(ref, - ref.name == "traefik-gateway" && ref.?namespace.orValue("traefik") == "traefik" - ) - message: "OpenClaw can only attach routes to traefik-gateway" - # Middlewares must use ForwardAuth to x402-verifier only - - expression: >- - !has(object.spec.forwardAuth) || - object.spec.forwardAuth.address.startsWith("http://x402-verifier.x402.svc") - message: "ForwardAuth must point to x402-verifier" -``` - -Even if RBAC allows creating any Middleware, the admission policy ensures OpenClaw can only create ForwardAuth rules pointing at the legitimate x402 verifier. A prompt injection can't make it route traffic to an attacker-controlled auth endpoint. - ---- - -## 9. The Full Flow - -``` -1. User: "Monetize Qwen3.5-35B-A3B on Ollama at $0.50 per M token on Base" - -2. OpenClaw (using monetize skill) creates the ServiceOffer CR: - python3 monetize.py create qwen-inference \ - --model Qwen/Qwen3.5-35B-A3B --runtime ollama \ - --upstream ollama --namespace openclaw-default --port 11434 \ - --price 0.50 --unit MTok --chain base --wallet 0x... --register - → Creates ServiceOffer CR via k8s API - -3. OpenClaw processes the offer: - python3 monetize.py process qwen-inference - - Step 1: Pull the model through Ollama - POST http://ollama.openclaw-default.svc:11434/api/pull - {"name": "Qwen/Qwen3.5-35B-A3B"} - → Streams download progress, waits for completion - → sets condition: ModelReady=True - - Step 2: Health-check the model is loaded - POST http://ollama.openclaw-default.svc:11434/api/generate - {"model": "Qwen/Qwen3.5-35B-A3B", "prompt": "ping", "stream": false} - → 200 OK, model responds - → sets condition: UpstreamHealthy=True - - Step 3: Create ForwardAuth Middleware - POST /apis/traefik.io/v1alpha1/namespaces/openclaw-default/middlewares - → ForwardAuth → x402-verifier.x402.svc:8080/verify - → sets condition: PaymentGateReady=True - - Step 4: Create HTTPRoute - POST /apis/gateway.networking.k8s.io/v1/namespaces/openclaw-default/httproutes - → parentRef: traefik-gateway, path: /services/qwen-inference - → filter: ExtensionRef to Middleware - → backendRef: ollama:11434 - → sets condition: RoutePublished=True - - Step 5: ERC-8004 registration - python3 signer.py ... (signs registration tx) - → sets condition: Registered=True - - Step 6: Update status - PATCH /apis/obol.org/v1alpha1/.../serviceoffers/qwen-inference/status - → Ready=True, endpoint=https://stack.example.com/services/qwen-inference - -4. User: "What's the status?" - python3 monetize.py status qwen-inference - → Shows conditions table + endpoint + model info - -5. External consumer pays and calls: - POST https://stack.example.com/services/qwen-inference/v1/chat/completions - X-Payment: - → Traefik → ForwardAuth (x402-verifier) → Ollama (Qwen3.5-35B-A3B) -``` - ---- - -## 10. What the `obol` CLI Does - -The CLI becomes a thin CRD client — no deployment logic, no helmfile: - -```bash -obol sell http --upstream ollama --price 0.001 --chain base -# → creates ServiceOffer CR (same as kubectl apply) - -obol sell list -# → kubectl get serviceoffers (formatted) - -obol sell status qwen-inference -# → shows conditions, endpoint, pricing - -obol sell delete qwen-inference -# → deletes CR (OwnerReference cascades cleanup) -``` - -The frontend can do the same via the k8s API directly. - ---- - -## 11. What We Keep, What We Drop, What We Add - -| Component | Action | Reason | -|-----------|--------|--------| -| `cmd/x402-verifier/` | **Keep** | ForwardAuth verifier — the payment gate | -| `internal/x402/` | **Keep** | Verifier handler | -| `internal/erc8004/` | **Keep** | On-chain registration (called by `monetize.py` via `signer.py`) | -| `internal/enclave/` | **Keep** | Secure Enclave signing (orthogonal to monetization) | -| `internal/inference/gateway.go` | **Drop** | Inline x402 middleware — replaced by ForwardAuth | -| `internal/inference/store.go` | **Drop** | Deployment config on disk — replaced by CRD | -| `obol-agent.yaml` (busybox pod) | **Drop** | OpenClaw IS the agent; no separate placeholder pod | -| `agent-rbac.yaml` (`admin` binding) | **Replace** | Scoped ClusterRole instead of `admin` | -| `cmd/obol/service.go` | **Simplify** | Thin CRD client | -| `cmd/obol/monetize.go` | **Simplify** | Thin CRD client | -| `internal/embed/skills/monetize/` | **Add** | New skill: SKILL.md + `monetize.py` + references | -| ServiceOffer CRD manifest | **Add** | Intent interface, applied during `obol stack init` | -| ValidatingAdmissionPolicy | **Add** | Guardrail on what OpenClaw can create | -| `openclaw-monetize` ClusterRole | **Add** | Scoped write access for Middleware/HTTPRoute | - ---- - -## 12. Resolved Decisions - -| Question | Decision | Rationale | -|----------|----------|-----------| -| **Polling vs event-driven** | OpenClaw cron job, every 60s | OpenClaw has a built-in cron scheduler (`{ kind: "every", everyMs: 60000 }`). No sidecar, no K8s CronJob — runs inside the Gateway process. Jobs persist across restarts via `~/.openclaw/cron/jobs.json`. | -| **Multi-instance** | Singleton obol-agent | Only one obol-agent per cluster, enforced by `obol agent init`. Other OpenClaw instances keep read-only RBAC and no `monetize` skill. No coordination problem. | -| **CRD scope** | Namespace-scoped | OwnerReference cascade works (same namespace as Middleware/HTTPRoute). The obol-agent's ClusterRoleBinding lets it list ServiceOffers across all namespaces. Standard `kubectl get serviceoffers -A` works. | -| **K8s API access** | Extend `kube.py` with write helpers | `kube.py` gains `api_post`, `api_patch`, `api_delete` alongside `api_get`. `monetize.py` imports the shared helpers. Pure stdlib, zero new dependencies. K8s MCP server (Red Hat `containers/kubernetes-mcp-server`) is a known upgrade path but unnecessary today. | - ---- - -## References - -| Topic | Link | -|-------|------| -| Custom Resource Definitions | https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ | -| CRD status subresource | https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#status-subresource | -| API conventions (conditions) | https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md | -| RBAC | https://kubernetes.io/docs/reference/access-authn-authz/rbac/ | -| RBAC good practices | https://kubernetes.io/docs/concepts/security/rbac-good-practices/ | -| ValidatingAdmissionPolicy | https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ | -| OwnerReferences | https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/ | -| Cross-namespace routing (Gateway API) | https://gateway-api.sigs.k8s.io/guides/multiple-ns/ | -| ReferenceGrant | https://gateway-api.sigs.k8s.io/api-types/referencegrant/ | -| Accessing API from a pod | https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/ | -| Pod Security Standards | https://kubernetes.io/docs/concepts/security/pod-security-standards/ | -| Service account tokens | https://kubernetes.io/docs/concepts/security/service-accounts/ | -| Traefik ForwardAuth | https://doc.traefik.io/traefik/reference/routing-configuration/http/middlewares/forwardauth/ | -| Traefik Middleware locality | https://github.com/traefik/traefik/issues/11126 | diff --git a/docs/plans/buy-side-testing.md b/docs/plans/buy-side-testing.md deleted file mode 100644 index 39bfc260..00000000 --- a/docs/plans/buy-side-testing.md +++ /dev/null @@ -1,214 +0,0 @@ -# Buy-Side x402 Hands-Off Testing Plan - -## Current State - -- All clusters are down, no k3d containers running -- x402 extension (`x402.py`) created in LiteLLM fork, registered in `__init__.py` -- `buy-inference` skill created: `buy.py` + `SKILL.md` + `references/x402-buyer-api.md` -- `buy_side_test.go` exists but bypasses LiteLLM (sends directly to mock seller) -- LiteLLM Docker image `latest` includes x402 extension - -## Gaps (ordered by dependency) - -### Gap 0: LiteLLM image with x402 extension - -**Problem**: The LiteLLM Docker image needs to include the x402 extension for buy-side payments. - -**Fix**: -1. Ensure `internal/embed/infrastructure/base/templates/llm.yaml` references the correct LiteLLM image tag -2. The LiteLLM image should include x402 extension support -3. Update `llm.yaml` to use the correct version if needed - -**Verification**: `docker run --rm litellm python -c "from litellm.extensions.providers.x402 import install_x402; print('ok')"` (if applicable) - ---- - -### Gap 1: No test routes through LiteLLM x402 extension - -**Problem**: `buy_side_test.go` patches the ConfigMap but sends the paid request directly to the mock seller at `http://127.0.0.1:`. The critical path — LiteLLM receiving a request, the x402 extension signing via remote-signer, injecting `X-PAYMENT`, forwarding to the seller — is never exercised. - -**Fix**: Add a new integration test `TestIntegration_BuySide_ThroughLiteLLM` that: - -1. Starts mock x402 seller on host (reuse `startMockX402Seller`) -2. Patches `litellm-config` ConfigMap with x402 provider pointing at mock seller -3. Restarts litellm deployment to force immediate reload (not wait 120s) -4. Port-forwards litellm:4000 to localhost -5. Sends a chat request to litellm with the purchased model name (e.g., `test-buy-x402/test-model`) -6. litellm routes to `X402Provider.chat()` → signs via remote-signer → injects X-PAYMENT → forwards to mock seller -7. Asserts: mock seller received the X-PAYMENT header, response is 200 with inference data - -**Requires**: Running cluster with litellm + remote-signer (from `obol openclaw onboard`) - -**Key detail**: The mock seller must be reachable from inside the cluster. Use `testutil.ClusterHostIP(t)` (resolves to `host.k3d.internal` or `host.docker.internal`). Listen on `0.0.0.0` (already done in `startMockX402Seller`). - ---- - -### Gap 2: No mock remote-signer for isolated testing - -**Problem**: The x402 extension calls `POST remote-signer:9000/api/v1/sign/{addr}/typed-data`. In a full cluster, the real remote-signer handles this. But for faster/lighter tests, we have no mock. - -**Fix**: Add `testutil.StartMockRemoteSigner(t, privateKeyHex)` to provide a mock remote-signer that: - -1. Listens on `0.0.0.0:` -2. `GET /api/v1/keys` → returns `{"keys": ["
"]}` -3. `GET /healthz` → returns `{"status": "ok"}` -4. `POST /api/v1/sign/{addr}/typed-data` → uses `go-ethereum` crypto to sign EIP-712 typed data with the provided private key → returns `{"signature": "0x..."}` - -**Why**: Enables testing the LiteLLM x402 extension → remote-signer path without deploying the Rust remote-signer binary. Also enables testing `buy.py` commands (`balance` excepted) without a full cluster. - -**Scope**: ~80 lines Go. Reuses `testutil.eip712_signer.go` for signing logic. - -**Priority**: NICE-TO-HAVE for first test pass. The real remote-signer works fine in-cluster. Only needed if we want to test without a full cluster later. - ---- - -### Gap 3: buy.py skill not smoke-tested in-pod - -**Problem**: `buy.py` imports from sibling skills (`kube.py`, `signer.py`) via `sys.path.insert`. This works in theory (same pattern as `monetize.py`) but has never been tested in an actual pod where the skills are deployed at `/data/.openclaw/skills/`. - -**Fix**: Add a smoke test to verify the buy-inference skill loads correctly in-pod: - -```python -def test_buy_inference_help(): - """buy-inference skill loads and prints help.""" - result = subprocess.run( - ["python3", "/data/.openclaw/skills/buy-inference/scripts/buy.py", "--help"], - capture_output=True, text=True, timeout=10, - ) - assert result.returncode == 0 - assert "probe" in result.stdout - assert "buy" in result.stdout -``` - -**Scope**: 10 lines. - ---- - -### Gap 4: `llm.yaml` image tag configuration - -**Problem**: `internal/embed/infrastructure/base/templates/llm.yaml` needs to reference the correct LiteLLM image with x402 support. - -**Fix**: Ensure the LiteLLM deployment in `llm.yaml` uses the correct image tag: -```yaml -image: litellm:latest # or appropriate version with x402 support -``` - -**Scope**: Verify image references in llm.yaml are correct. - ---- - -## Testing Sequence - -### Phase 1: Build & Push (pre-cluster) - -``` -1. Ensure LiteLLM image with x402 extension is available (Gap 0) -2. Update llm.yaml image tag (Gap 4) -3. Build obol binary from worktree -4. Verify: go build ./... && go test ./... && go vet -tags integration ./internal/x402/ -``` - -### Phase 2: Cluster Up - -``` -5. OBOL_DEVELOPMENT=true obol stack init && obol stack up -6. obol openclaw onboard (deploys remote-signer + agent) -7. Verify: kubectl get pods -n llm (litellm Running) -8. Verify: kubectl get pods -n openclaw-obol-agent (remote-signer Running) -``` - -### Phase 3: Buy Skill Smoke Test - -``` -9. kubectl exec -n openclaw-obol-agent deploy/openclaw -- \ - python3 /data/.openclaw/skills/buy-inference/scripts/buy.py --help -10. kubectl exec -n openclaw-obol-agent deploy/openclaw -- \ - python3 /data/.openclaw/skills/buy-inference/scripts/buy.py list - (expect: "No purchased x402 providers.") -``` - -### Phase 4: Manual Buy-Side Walkthrough - -``` -11. Start mock seller on host: - go test -tags integration -v -run TestIntegration_BuySide_ProbeAndPurchase -timeout 10m ./internal/x402/ - (or start a real seller via: obol sell inference on another cluster) - -12. From inside the agent pod, run probe: - kubectl exec -n openclaw-obol-agent deploy/openclaw -- \ - python3 /data/.openclaw/skills/buy-inference/scripts/buy.py probe \ - http://host.k3d.internal:/v1/chat/completions - (expect: 402 pricing output) - -13. From inside the agent pod, run buy: - kubectl exec -n openclaw-obol-agent deploy/openclaw -- \ - python3 /data/.openclaw/skills/buy-inference/scripts/buy.py buy test-seller \ - --endpoint http://host.k3d.internal: \ - --model test-model --budget 10000 - (expect: provider added to litellm-config) - -14. Wait 2 min for ConfigMap reload, or force: - kubectl rollout restart -n llm deploy/litellm - kubectl rollout status -n llm deploy/litellm --timeout=60s - -15. Verify model appears in litellm: - kubectl exec -n llm deploy/litellm -- curl -s http://localhost:4000/models | jq . - -16. Send inference through litellm using purchased model: - kubectl exec -n llm deploy/litellm -- curl -s -X POST http://localhost:4000/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{"model":"test-seller/test-model","messages":[{"role":"user","content":"hello"}]}' - (expect: x402 extension signs payment, forwards to seller, returns 200) - -17. Check seller received X-PAYMENT header (from test logs or mock seller output) - -18. Cleanup: - kubectl exec -n openclaw-obol-agent deploy/openclaw -- \ - python3 /data/.openclaw/skills/buy-inference/scripts/buy.py remove test-seller -``` - -### Phase 5: Integration Test (automated) - -``` -19. Run the through-litellm integration test (Gap 1): - go test -tags integration -v -run TestIntegration_BuySide_ThroughLiteLLM -timeout 10m ./internal/x402/ - -20. Run existing buy-side tests: - go test -tags integration -v -run TestIntegration_BuySide -timeout 10m ./internal/x402/ -``` - -### Phase 6: Full Hands-Off (OpenClaw agent does it autonomously) - -``` -21. Trigger OpenClaw heartbeat with a task that exercises the buy skill: - "Discover x402 inference sellers, probe the first one, buy access if the price - is under 10000 micro-units, then send a test message through the purchased model." - -22. Watch logs for ~5 min: - kubectl logs -n openclaw-obol-agent deploy/openclaw -f - -23. Verify: the agent probed, bought, and used a remote model autonomously -``` - -## Minimal Critical Path - -If time is limited, the absolute minimum to verify the buy lifecycle works: - -1. **Gap 0** — ensure LiteLLM image with x402 extension is available (BLOCKER) -2. **Gap 4** — update image tag in llm.yaml (BLOCKER) -3. Build obol binary, bring up cluster, onboard openclaw -4. Start mock seller on host -5. Run `buy.py probe` + `buy.py buy` from agent pod -6. Restart litellm, send request through purchased model -7. Verify 200 response with X-PAYMENT header at seller - -Everything else (Gap 1 automated test, Gap 2 mock signer, Gap 3 smoke test) can follow after the manual walkthrough confirms the flow works. - -## Files to Modify - -| File | Change | Gap | -|------|--------|-----| -| `internal/embed/infrastructure/base/templates/llm.yaml` | Verify LiteLLM image tag | 4 | -| `internal/x402/buy_side_test.go` | Add `TestIntegration_BuySide_ThroughLiteLLM` | 1 | -| `internal/testutil/mock_signer.go` | New: mock remote-signer | 2 | -| `tests/skills_smoke_test.py` | Add buy-inference smoke test | 3 | diff --git a/docs/plans/cli-agent-readiness.md b/docs/plans/cli-agent-readiness.md deleted file mode 100644 index 90a5d7aa..00000000 --- a/docs/plans/cli-agent-readiness.md +++ /dev/null @@ -1,307 +0,0 @@ -# CLI Agent-Readiness Optimizations - -## Status - -**Implemented (this branch)**: -- Phase 1: Global `--output json` / `-o json` / `OBOL_OUTPUT=json` flag -- Phase 1: `OutputMode` + `IsJSON()` + `JSON()` on `internal/ui/UI` -- Phase 1: 11 commands refactored with typed JSON results (sell list/status/info, network list, model status/list, version, update, openclaw list, tunnel status) -- Phase 1: Human output redirected to stderr in JSON mode (stdout is clean JSON) -- Phase 2: `internal/validate/` package (Name, Namespace, WalletAddress, ChainName, Price, URL, Path, NoControlChars) -- Phase 2: Headless prompt paths — `Confirm`, `Select`, `Input`, `SecretInput` auto-resolve defaults in non-TTY/JSON mode -- Phase 2: `sell delete` migrated from raw `fmt.Scanln` to `u.Confirm()` -- Phase 6: `CONTEXT.md` — agent-facing context document - -- Phase 1D: `--from-json` on sell http, sell pricing, network add (`cmd/obol/input.go` helper) -- Phase 2B: `validate.Name()` wired into sell inference/http/stop/delete, `validate.URL()` in network add -- Phase 2C: model.go `promptModelPull()` migrated from bufio to `u.Select()`/`u.Input()`, openclaw onboard headless via `u.IsTTY() && !u.IsJSON()` - -**Deferred to follow-up**: -- Phase 3: `obol describe` schema introspection -- Phase 4: `--fields` field filtering -- Phase 5: `--dry-run` for mutating commands -- Phase 7: MCP surface (`obol mcp`) - -## Context - -The obol CLI is increasingly consumed by AI agents — Claude Code during development, OpenClaw agents in-cluster, and soon MCP clients. Today the CLI is human-optimized: colored output, spinners, interactive prompts, and hand-formatted tables. Agents need structured output, non-interactive paths, input hardening, and runtime introspection. This plan makes the CLI agent-ready while preserving human DX. - -**Strengths**: `internal/ui/` abstraction with TTY detection, `OutputMode` (human/json), `--verbose`/`--quiet`/`--output` global flags, `internal/schemas/` with JSON-tagged Go types, `internal/validate/` for input validation, `--force` pattern for non-interactive destructive ops, 23 SKILL.md files shipped in `internal/embed/skills/`, `CONTEXT.md` for agent consumption. - -**Remaining gaps**: `--from-json` for structured input, some `fmt.Printf` calls still bypass UI layer, `model.go` interactive prompts not fully migrated, `openclaw onboard` still hardwired `Interactive: true`, no schema introspection, no `--dry-run`, no field filtering, no MCP surface. - ---- - -## Phase 1: Global `--output json` + Raw JSON Input - -Structured output is table stakes. Raw JSON input (`--from-json`) is first-class — agents shouldn't have to translate nested structures into 15+ flags. - -### 1A. Extend UI struct with output mode - -**`internal/ui/ui.go`** — Add `OutputMode` type (`human`|`json`) and field to `UI` struct. Add `NewWithAllOptions(verbose, quiet bool, output OutputMode)`. Add `IsJSON() bool`. - -**`internal/ui/output.go`** — Add `JSON(v any) error` method that writes to stdout via `json.NewEncoder`. When `IsJSON()` is true, redirect `Info`/`Success`/`Detail`/`Print`/`Printf`/`Dim`/`Bold`/`Blank` to stderr (so agents get clean JSON on stdout, diagnostics on stderr). Suppress spinners in JSON mode. - -### 1B. Add global `--output` flag - -**`cmd/obol/main.go`** (lines 110-127) — Add `--output` / `-o` flag (`human`|`json`, env `OBOL_OUTPUT`, default `human`). Wire in `Before` hook to pass to `ui.NewWithAllOptions`. - -### 1C. Refactor commands to return typed results - -Don't just bolt JSON onto existing `fmt.Printf` calls. Refactor high-value commands to return typed data first, then format for human or JSON. This pays off twice: clean JSON output now, and reusable typed results for MCP later. - -**Audit note**: Raw `fmt.Printf` output is spread across `main.go:460` (version), `model.go:286` (tables), `network.go:188` (tables), and throughout `sell.go`. Each needs a return-data-then-format refactor. - -| Command | Strategy | Effort | -|---------|----------|--------| -| `sell list` | Switch kubectl arg from `-o wide` to `-o json` | Trivial | -| `sell status ` | Switch kubectl arg from `-o yaml` to `-o json` | Trivial | -| `sell status` (global) | Marshal `PricingConfig` + `store.List()` — currently raw `fmt.Printf` at `sell.go:463-498` | Medium | -| `sell info` | Already has `--json` (`sell.go:841`) — wire to global flag, deprecate local | Trivial | -| `network list` | `ListRPCNetworks()` returns `[]RPCNetworkInfo` — marshal it, but local node output also uses `fmt.Printf` at `network.go:188` | Medium | -| `model status` | Return provider status map as JSON — currently `fmt.Printf` tables at `model.go:286` | Medium | -| `model list` | `ListOllamaModels()` returns structured data | Low | -| `version` | `BuildInfo()` returns a string today — refactor to struct with fields (version, commit, date, go version) | Medium | -| `update` | Already has `--json` (`update.go:20`); wire to global flag, deprecate local | Trivial | -| `openclaw list` | Refactor to return data before formatting | Medium | -| `tunnel status` | Refactor to return data before formatting | Medium | - -### 1D. Raw JSON input (`--from-json`) - -Add `--from-json` flag to all commands that create resources. Accepts file path or `-` for stdin. Unmarshals into existing `internal/schemas/` types, validates, creates manifest. This is first-class, not an afterthought. - -| Command | Schema Type | Flags Bypassed | -|---------|-------------|----------------| -| `sell http` | `schemas.ServiceOfferSpec` | 15+ flags (wallet, chain, price, upstream, port, namespace, health-path, etc.) | -| `sell inference` | `schemas.ServiceOfferSpec` | 10+ flags | -| `sell pricing` | `schemas.PaymentTerms` | wallet, chain, facilitator | -| `network add` | New `RPCConfig` type | endpoint, chain-id, allow-writes | - -### Testing -- `internal/ui/ui_test.go`: OutputMode switching, JSON writes valid JSON to stdout, human methods go to stderr in JSON mode -- `cmd/obol/output_test.go`: `--output json` on each migrated command produces parseable JSON -- `cmd/obol/json_input_test.go`: `--from-json` with valid/invalid specs - ---- - -## Phase 2: Input Validation + Headless Paths - -Agents hallucinate inputs and can't answer interactive prompts. Fix both together. - -### 2A. New validation package - -**`internal/validate/validate.go`** (new) - -``` -Name(s) — k8s-safe: [a-z0-9][a-z0-9.-]*, no path traversal -Namespace(s) — same rules as Name -WalletAddress(s) — reuse x402verifier.ValidateWallet() pattern -ChainName(s) — from known set (base, base-sepolia, etc.) -Path(s) — no .., no %2e%2e, no control chars -Price(s) — valid decimal, positive -URL(s) — parseable, no control chars -NoControlChars(s) — reject \x00-\x1f except \n\t -``` - -### 2B. Wire into commands - -Add validation at the top of every action handler for positional args and key flags: -- **`cmd/obol/sell.go`**: name, wallet, chain, path, price, namespace, upstream URL -- **`cmd/obol/network.go`**: network name, custom RPC URL, chain ID -- **`cmd/obol/model.go`**: provider name, endpoint URL -- **`cmd/obol/openclaw.go`**: instance ID - -### 2C. Headless paths for interactive flows - -**`internal/ui/prompt.go`** — When `IsJSON() || !IsTTY()`: -- `Confirm` → return default value (no stdin read) -- `Select` → return error: "interactive selection unavailable; use --provider flag" -- `Input` → return default or error if no default -- `SecretInput` → return error: "use --api-key flag" - -**`cmd/obol/openclaw.go`** (line 36) — `openclaw onboard` is hardwired `Interactive: true`. Add a non-interactive path when all required flags are provided (`--id`, plus any other required inputs). Only fall through to interactive mode when flags are missing AND stdin is a TTY. - -**`cmd/obol/model.go`** (lines 62-84) — `model setup` enters interactive selection when `--provider` is omitted. In non-TTY/JSON mode, error with required flags instead. - -**`cmd/obol/model.go`** (lines 387-419) — `model pull` uses `bufio.NewReader(os.Stdin)` for interactive model selection. Same treatment. - -**`cmd/obol/sell.go`** (line 576-588) — `sell delete` confirmation uses raw `fmt.Scanln`. Migrate to `u.Confirm()` so the headless path is automatic. - -### Testing -- `internal/validate/validate_test.go`: Table-driven tests for path traversal variants, control char injection, valid inputs -- Test that `--output json` + missing required flags → clear error (not a hung prompt) -- Test that `openclaw onboard --id test -o json` works without interactive mode - ---- - -## Phase 3: Schema Introspection (`obol describe`) - -Let agents discover what the CLI accepts at runtime without parsing `--help` text. - -### 3A. Add `obol describe` command - -**`cmd/obol/describe.go`** (new) - -``` -obol describe # list all commands + flags as JSON -obol describe sell http # flags + ServiceOffer schema for that command -obol describe --schemas # dump resource schemas only -``` - -Walk urfave/cli's `*cli.Command` tree. For each command, emit: name, usage, flags (name, type, required, default, env vars, aliases), ArgsUsage. Output always JSON. - -### 3B. Schema registry - -**`internal/schemas/registry.go`** (new) — Map of schema names to JSON Schema generated from Go struct tags via `reflect`. Schemas: `ServiceOfferSpec`, `PaymentTerms`, `PriceTable`, `RegistrationSpec`. - -### 3C. Command metadata annotations - -Add `Metadata: map[string]any{"schema": "ServiceOfferSpec", "mutating": true}` to commands that create resources (sell http, sell inference, sell pricing). `obol describe` reads this and includes the schema in output. - -### Testing -- `cmd/obol/describe_test.go`: Valid JSON output, every command appears, schemas resolve, flag metadata matches actual flags - ---- - -## Phase 4: `--fields` Support - -Let agents limit response size to protect their context window. - -### 4A. Field mask implementation - -**`internal/ui/fields.go`** (new) — `FilterFields(data any, fields []string) any` using reflect on JSON tags. - -### 4B. Global `--fields` flag - -**`cmd/obol/main.go`** — Global `--fields` flag (comma-separated, requires `--output json`). Applied in `u.JSON()` before encoding. - -### Testing -- `--fields name,status` on `sell list -o json` returns only those fields -- `--fields` without `--output json` returns error - ---- - -## Phase 5: `--dry-run` for Mutating Commands - -Let agents validate before mutating. Safety rail. - -### 5A. Global `--dry-run` flag - -**`cmd/obol/main.go`** — Add `--dry-run` bool flag. - -### 5B. Priority commands - -| Command | Implementation | -|---------|---------------| -| `sell http` | Already builds manifest before `kubectlApply()` — return manifest instead of applying | -| `sell pricing` | Validate wallet/chain, show what would be written to ConfigMap | -| `network add` | Validate chain, show which RPCs would be added to eRPC config | -| `sell delete` | Validate name exists, show what would be deleted | - -Pattern: after validation, before execution, check `cmd.Root().Bool("dry-run")` and return a `DryRunResult{Command, Valid, WouldCreate, Manifest}` as JSON. - -### Testing -- `cmd/obol/dryrun_test.go`: `--dry-run sell http` returns manifest without kubectl apply, validation still runs in dry-run - ---- - -## Phase 6: Agent Context & Skills - -The 23 SKILL.md files are a strength, but there's no top-level `CONTEXT.md` encoding invariants agents can't intuit from `--help`. - -### 6A. Ship `CONTEXT.md` - -**`CONTEXT.md`** (repo root, also embedded in binary) — Agent-facing context file encoding: -- Always use `--output json` when parsing output programmatically -- Always use `--force` for non-interactive destructive operations -- Always use `--fields` on list commands to limit context window usage -- Always use `--dry-run` before mutating operations -- Use `obol describe ` to introspect flags and schemas -- Cluster commands require `OBOL_CONFIG_DIR` or a running stack (`obol stack up`) -- Payment wallet addresses must be 0x-prefixed, 42 chars -- Chain names: `base`, `base-sepolia` (not CAIP-2 format) - -### 6B. Update existing skills - -Review and update the 23 SKILL.md files to reference the new agent-friendly flags where relevant (e.g., the `sell` skill should mention `--from-json` and `--dry-run`). - ---- - -## Phase 7: MCP Surface (`obol mcp`) - -Expose the CLI as typed JSON-RPC tools over stdio. Depends on all previous phases. - -### 7A. New package `internal/mcp/` - -- `server.go` — MCP server over stdio using `github.com/mark3labs/mcp-go` -- `tools.go` — Tool definitions from the typed result functions built in Phase 1C (not by shelling out with `--output json`) -- `handlers.go` — Tool handlers that call the refactored return-typed-data functions directly - -### 7B. `obol mcp` command - -**`cmd/obol/mcp.go`** (new) — Starts MCP server. Exposes high-value tools only: -- sell: `sell_http`, `sell_list`, `sell_status`, `sell_pricing`, `sell_delete` -- network: `network_list`, `network_add`, `network_remove`, `network_status` -- model: `model_status`, `model_list`, `model_setup` -- openclaw: `openclaw_list`, `openclaw_onboard` -- utility: `version`, `update`, `tunnel_status` - -Excludes: kubectl/helm/k9s passthroughs, interactive-only commands, dangerous ops (stack purge/down). - -### Testing -- `internal/mcp/mcp_test.go`: Tool registration produces valid MCP definitions, stdin/stdout JSON-RPC round-trip - ---- - -## Key Files Summary - -| File | Changes | -|------|---------| -| `internal/ui/ui.go` | Add OutputMode, IsJSON(), NewWithAllOptions() | -| `internal/ui/output.go` | Add JSON() method, stderr redirect in JSON mode | -| `internal/ui/prompt.go` | Non-interactive behavior when JSON/non-TTY | -| `internal/ui/fields.go` | New — field mask filtering | -| `cmd/obol/main.go` | `--output`, `--dry-run`, `--fields` global flags + Before hook | -| `cmd/obol/sell.go` | JSON output, typed results, input validation, dry-run, --from-json, migrate Scanln to u.Confirm | -| `cmd/obol/network.go` | JSON output, typed results, input validation | -| `cmd/obol/model.go` | JSON output, typed results, input validation, headless paths | -| `cmd/obol/openclaw.go` | JSON output, typed results, input validation, headless onboard path | -| `cmd/obol/update.go` | Wire to global --output flag, deprecate local --json | -| `cmd/obol/describe.go` | New — schema introspection command | -| `cmd/obol/mcp.go` | New — `obol mcp` command | -| `internal/validate/validate.go` | New — input validation functions | -| `internal/schemas/registry.go` | New — JSON Schema generation from Go types | -| `internal/mcp/` | New package — MCP server, tools, handlers | -| `CONTEXT.md` | New — agent-facing context file | - -## Verification - -```bash -# Phase 1: JSON output + JSON input -obol sell list -o json | jq . -obol sell status -o json | jq . -obol version -o json | jq . -obol network list -o json | jq . -echo '{"upstream":{"service":"ollama","namespace":"llm","port":11434},...}' | obol sell http test --from-json - - -# Phase 2: Input validation + headless -obol sell http '../etc/passwd' --wallet 0x... --chain base-sepolia # should error -obol sell http 'valid-name' --wallet 'not-a-wallet' --chain base-sepolia # should error -echo '' | obol model setup -o json # should error with "use --provider flag", not hang - -# Phase 3: Schema introspection -obol describe | jq '.commands | length' -obol describe sell http | jq '.schema' - -# Phase 4: Fields -obol sell list -o json --fields name,namespace,status | jq . - -# Phase 5: Dry-run -obol sell http test-svc --wallet 0x... --chain base-sepolia --dry-run -o json | jq . - -# Phase 7: MCP -echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | obol mcp - -# Unit tests -go test ./internal/ui/ ./internal/validate/ ./internal/schemas/ ./internal/mcp/ ./cmd/obol/ -``` diff --git a/docs/plans/multi-network-sell.md b/docs/plans/multi-network-sell.md deleted file mode 100644 index 77b6bf01..00000000 --- a/docs/plans/multi-network-sell.md +++ /dev/null @@ -1,387 +0,0 @@ -# Multi-Network Sell Command + UX Improvements - -## Context - -The `obol sell` command currently only supports ERC-8004 registration on Base Sepolia, requires manual private key management via `--private-key-file`, and forces users to specify all flags explicitly. We want to: - -1. Support 3 registration networks: **base-sepolia**, **base**, **ethereum mainnet** -2. Support **multi-chain** registration: `--chain mainnet,base` registers on both, best-effort -3. Use the **remote-signer** for all signing (not private key extraction) — EIP-712 typed data + transaction signing via its REST API -4. Use **sponsored registration** (zero gas) on ethereum mainnet via howto8004.com -5. Use the **local eRPC** (`localhost/rpc`) for chain access instead of public RPCs -6. Add **interactive prompts** using `charmbracelet/huh` with good defaults -7. **Auto-discover** the remote-signer wallet address -8. Add **ethereum mainnet** as a valid x402 payment chain - -Frontend deferred to follow-up PR. EIP-7702 handled server-side by sponsor — no CLI implementation needed. - -### Network Matrix - -| Network | x402 Payment | x402 Facilitator | ERC-8004 Registration | Sponsored Reg | -|---------|-------------|-------------------|----------------------|---------------| -| base-sepolia | Yes | `facilitator.x402.rs` | Yes (direct tx via remote-signer) | No | -| base | Yes | `x402.gcp.obol.tech` | Yes (direct tx via remote-signer) | No | -| ethereum | Yes (no facilitator yet) | TBD | Yes | Yes (`sponsored.howto8004.com/api/register`) | - ---- - -## Phase 1: Multi-Network ERC-8004 Registry Config - -### `internal/erc8004/networks.go` (new) - -```go -type NetworkConfig struct { - Name string // "base-sepolia", "base", "ethereum" - ChainID int64 - RegistryAddress string // per-chain registry address - SponsorURL string // empty if no sponsor - DelegateAddress string // EIP-7702 delegate (for sponsored flow) - ERPCNetwork string // eRPC path segment: "base-sepolia", "base", "mainnet" -} - -func ResolveNetwork(name string) (NetworkConfig, error) -func ResolveNetworks(csv string) ([]NetworkConfig, error) // "mainnet,base" → []NetworkConfig -func SupportedNetworks() []NetworkConfig -``` - -Three entries: -- `base-sepolia`: chainID 84532, registry `0x8004A818BFB912233c491871b3d84c89A494BD9e`, eRPC `base-sepolia` -- `base`: chainID 8453, registry TBD (confirm CREATE2 address), eRPC `base` -- `ethereum` / `mainnet`: chainID 1, registry `0x8004A169FB4a3325136EB29fA0ceB6D2e539a432`, sponsor `https://sponsored.howto8004.com/api/register`, delegate `0x77fb3D2ff6dB9dcbF1b7E0693b3c746B30499eE8`, eRPC `mainnet` - -RPC URL is **not** in NetworkConfig — always use local eRPC at `http://localhost/rpc/{ERPCNetwork}` (from host via k3d port mapping). - -### `internal/erc8004/client.go` - -- Add `NewClientForNetwork(ctx, rpcBaseURL string, net NetworkConfig) (*Client, error)` — constructs RPC URL as `rpcBaseURL + "/" + net.ERPCNetwork`, uses `net.RegistryAddress` -- Keep `NewClient(ctx, rpcURL)` as backward-compat wrapper - -### Files -- `internal/erc8004/networks.go` (new) -- `internal/erc8004/networks_test.go` (new) -- `internal/erc8004/client.go` (add `NewClientForNetwork`) - ---- - -## Phase 2: Remote-Signer Integration for Registration - -### Architecture - -The remote-signer REST API at port 9000 already supports: -- `POST /api/v1/sign/{address}/transaction` — sign raw transactions -- `POST /api/v1/sign/{address}/typed-data` — sign EIP-712 typed data -- `GET /api/v1/keys` — list loaded wallet addresses - -From the host CLI, access via **temporary port-forward** to `remote-signer:9000` (same pattern as `openclaw cli`). - -### `internal/erc8004/signer.go` (new) - -```go -// RemoteSigner wraps the remote-signer REST API for ERC-8004 operations. -type RemoteSigner struct { - baseURL string // e.g. "http://localhost:19000" (port-forwarded) -} - -func NewRemoteSigner(baseURL string) *RemoteSigner - -// GetAddress returns the first loaded signing address. -func (s *RemoteSigner) GetAddress(ctx context.Context) (common.Address, error) - -// SignTransaction signs an EIP-1559 transaction for direct on-chain registration. -func (s *RemoteSigner) SignTransaction(ctx context.Context, addr common.Address, tx SignTxRequest) ([]byte, error) - -// SignTypedData signs EIP-712 typed data (for sponsored registration). -func (s *RemoteSigner) SignTypedData(ctx context.Context, addr common.Address, data EIP712TypedData) ([]byte, error) -``` - -### `internal/erc8004/register.go` (new) - -Two registration paths: - -**Direct on-chain** (base-sepolia, base): -1. Port-forward to remote-signer -2. `signer.GetAddress()` → wallet address -3. Build `register(agentURI)` calldata -4. Get nonce + gas estimates from eRPC -5. `signer.SignTransaction()` → signed tx -6. `eth_sendRawTransaction` via eRPC -7. Wait for receipt, parse `Registered` event - -**Sponsored** (ethereum mainnet): -1. Port-forward to remote-signer -2. `signer.GetAddress()` → wallet address -3. `signer.SignTypedData()` → EIP-712 authorization + registration intent signatures -4. POST to `net.SponsorURL` with signatures -5. Parse response `{success, agentId, txHash}` - -### Port-Forward Helper - -Reuse or adapt the pattern from `openclaw cli` (`cmd/obol/openclaw.go`). New helper: - -```go -// portForwardRemoteSigner starts a port-forward to the remote-signer in the -// given namespace and returns the local URL + cleanup function. -func portForwardRemoteSigner(cfg *config.Config, namespace string) (baseURL string, cleanup func(), err error) -``` - -### Files -- `internal/erc8004/signer.go` (new — remote-signer REST client) -- `internal/erc8004/signer_test.go` (new — HTTP mock tests) -- `internal/erc8004/register.go` (new — direct + sponsored registration flows) -- `internal/erc8004/sponsor.go` (new — sponsored API client, EIP-712 types) -- `internal/erc8004/sponsor_test.go` (new) - ---- - -## Phase 3: Wallet Auto-Discovery - -### `internal/openclaw/wallet_resolve.go` (new) - -```go -// ResolveWalletAddress returns the wallet address from the single OpenClaw instance. -// 0 instances → error, 1 → auto-select, 2+ → error suggesting --wallet. -func ResolveWalletAddress(cfg *config.Config) (string, error) - -// ResolveInstanceNamespace returns the namespace of the single OpenClaw instance -// (needed for port-forwarding to the remote-signer in that namespace). -func ResolveInstanceNamespace(cfg *config.Config) (string, error) -``` - -Flow: -1. `ListInstanceIDs(cfg)` → instance IDs -2. 0 → error, 1 → read wallet.json, 2+ → error with list of addresses -3. `ReadWalletMetadata(DeploymentPath(cfg, id))` → `WalletInfo.Address` - -**No private key extraction.** The address is all we need for auto-discovery. Signing goes through the remote-signer API. - -### Files -- `internal/openclaw/wallet_resolve.go` (new) -- `internal/openclaw/wallet_resolve_test.go` (new) - ---- - -## Phase 4: Rewrite `sell register` - -### `cmd/obol/sell.go` — `sellRegisterCommand` - -**New flags:** - -| Flag | Type | Default | Notes | -|------|------|---------|-------| -| `--chain` | string | `base-sepolia` | Comma-separated: `base-sepolia,base,mainnet`. Register on each, best-effort | -| `--sponsored` | bool | auto | `true` when network has sponsor URL | -| `--endpoint` | string | auto | Auto-detected from tunnel | -| `--name` | string | `Obol Agent` | Agent name for registration | -| `--description` | string | smart default | Auto-generated from stack info | -| `--image` | string | smart default | Default Obol logo URL | -| `--private-key-file` | string | | Fallback — used only if no remote-signer detected | - -**Removed:** `--private-key` (deprecated), `--rpc-url` (use local eRPC) - -**Action logic:** -1. Parse `--chain` → `erc8004.ResolveNetworks(chainCSV)` → `[]NetworkConfig` -2. Resolve wallet: try `openclaw.ResolveWalletAddress(cfg)`. If found, use remote-signer path. If not, require `--private-key-file`. -3. Resolve endpoint: `--endpoint` if set, else tunnel auto-detect -4. For each network (best-effort): - a. If sponsored + network has sponsor → sponsored path (sign EIP-712 via remote-signer, POST to sponsor) - b. Else → direct path (sign tx via remote-signer, broadcast via eRPC) - c. On success: print CAIP-10 registry line - d. On failure: print warning, continue to next chain -5. Update `agent-registration.json` with all successful registrations in the `registrations[]` array - -### Files -- `cmd/obol/sell.go` (rewrite `sellRegisterCommand`) -- `cmd/obol/sell_test.go` (update `TestSellRegister_Flags`) - ---- - -## Phase 5: Interactive Prompts with `charmbracelet/huh` - -### New dependency - -`go get github.com/charmbracelet/huh` - -### Signature change - -`sellCommand(cfg *config.Config)` → `sellCommand(cfg *config.Config, u *ui.UI)` (match `openclawCommand` pattern). Wire from `main.go`. - -### TTY guard - -```go -import "golang.org/x/term" -isInteractive := term.IsTerminal(int(os.Stdin.Fd())) -``` - -### `sell inference` interactive flow: - -| Field | Default | Prompt type | When prompted | -|-------|---------|-------------|---------------| -| Name | (required) | Text input | No positional arg | -| Model | (required) | Select from Ollama models | `--model` not set | -| Wallet | auto-discovered | Text (pre-filled) | Auto-discover fails | -| Chain | `base-sepolia` | Select | Using default | -| Price | `0.001` | Text (pre-filled) | Confirm or override | - -### `sell http` interactive flow: - -| Field | Default | Prompt type | When prompted | -|-------|---------|-------------|---------------| -| Name | (required) | Text input | No positional arg | -| Upstream | (required) | Text input | `--upstream` not set | -| Port | `8080` | Text (pre-filled) | Confirm | -| Wallet | auto-discovered | Text (pre-filled) | Auto-discover fails | -| Chain | `base-sepolia` | Select | `--chain` not set (remove `Required: true`) | -| Price model | `perRequest` | Select | No price flag set | -| Price value | `0.001` | Text | After model selected | -| Register? | `false` | Confirm | Not explicitly set | - -### `sell register` interactive flow: - -| Field | Default | Prompt type | When prompted | -|-------|---------|-------------|---------------| -| Chain(s) | `base-sepolia` | Multi-select | Using default | -| Name | `Obol Agent` | Text (pre-filled) | Confirm or override | -| Description | auto-generated | Text (pre-filled) | Confirm or override | -| Image | default logo URL | Text (pre-filled) | Confirm or override | -| Sponsored? | yes (when available) | Confirm | Network supports it | -| Endpoint | auto-detected | Text (pre-filled) | Tunnel fails | - -### Non-interactive path - -All prompts gated on `isInteractive`. When not TTY: flag validation applies, defaults used, no prompts. - -### Files -- `go.mod` / `go.sum` (add `charmbracelet/huh`) -- `cmd/obol/sell.go` (add prompts to inference, http, register) -- `cmd/obol/main.go` (wire `*ui.UI` to `sellCommand`) - ---- - -## Phase 6: x402 Payment Chain Updates - -### `cmd/obol/sell.go` — `resolveX402Chain` - -Add: -```go -case "ethereum", "ethereum-mainnet", "mainnet": - return x402.EthereumMainnet, nil -``` - -If `x402.EthereumMainnet` doesn't exist in the upstream `mark3labs/x402-go` library, define a local constant. - -### `cmd/obol/sell.go` — `sellPricingCommand` - -- Auto-discover wallet via `openclaw.ResolveWalletAddress(cfg)` when `--wallet` not set -- Remove `Required: true` from `--wallet` -- Update chain usage help: `"Payment chain (base-sepolia, base, ethereum)"` - -### Files -- `cmd/obol/sell.go` (`resolveX402Chain`, `sellPricingCommand`) -- `internal/x402/config.go` (`ResolveChain` — add ethereum) -- `internal/x402/config_test.go` (add ethereum test cases) -- `cmd/obol/sell_test.go` (update `TestResolveX402Chain`) - ---- - -## Phase 7: Tests & Docs - -### Tests -- `internal/erc8004/networks_test.go`: `ResolveNetwork` all chains, `ResolveNetworks` CSV parsing -- `internal/erc8004/signer_test.go`: HTTP mock for remote-signer API -- `internal/erc8004/sponsor_test.go`: EIP-712 construction, HTTP mock -- `internal/openclaw/wallet_resolve_test.go`: 0/1/multi instance -- `cmd/obol/sell_test.go`: Updated register flags, multi-chain parsing, new x402 chains - -### Docs -- `CLAUDE.md`: Update CLI command table, add `--chain` multi-value, remove `--rpc-url` -- `internal/embed/skills/sell/SKILL.md`: New registration flow, multi-network, remote-signer -- `internal/embed/skills/discovery/SKILL.md`: Multi-network registry info -- `cmd/obol/main.go`: Update root help text for sell register - ---- - -## Dependency Graph - -``` -Phase 1 (multi-network config) - ├──→ Phase 2 (remote-signer integration + registration flows) - └──→ Phase 3 (wallet auto-discovery) - │ - v - Phase 4 (rewrite sell register) ← depends on 1+2+3 - │ - v - Phase 5 (interactive prompts) ← depends on 3 (wallet discovery) - │ - v - Phase 6 (x402 payment chains + sell pricing) - │ - v - Phase 7 (tests & docs — throughout) -``` - ---- - -## Key Design Decisions - -1. **Remote-signer for all signing** — Never extract private keys. Use `POST /api/v1/sign/{address}/transaction` for direct registration, `POST /api/v1/sign/{address}/typed-data` for sponsored EIP-712. Access via temporary port-forward. - -2. **Local eRPC for all chain access** — `http://localhost/rpc/{network}` via k3d port mapping. No public RPCs. eRPC already has upstreams for mainnet, base, base-sepolia. - -3. **Multi-chain `--chain mainnet,base`** — Same agentURI and wallet registered on each chain. Best-effort: if one fails, continue to next. Update `registrations[]` array in `agent-registration.json` with all successes. - -4. **Prefer remote-signer, fallback to `--private-key-file`** — Auto-discover wallet → use remote-signer. If no instance found, accept `--private-key-file` for standalone usage. - -5. **Good defaults for registration metadata** — Pre-fill name (`Obol Agent`), description, image URL. Interactive mode lets users confirm or override each. - -6. **`charmbracelet/huh` for prompts** — Modern TUI with select, input, confirm. TTY-gated. - ---- - -## Key Files Summary - -| File | Change | -|------|--------| -| `internal/erc8004/networks.go` | New — multi-network config registry | -| `internal/erc8004/signer.go` | New — remote-signer REST API client | -| `internal/erc8004/register.go` | New — direct + sponsored registration flows | -| `internal/erc8004/sponsor.go` | New — sponsored API client | -| `internal/erc8004/client.go` | Add `NewClientForNetwork` | -| `internal/openclaw/wallet_resolve.go` | New — wallet address + namespace discovery | -| `cmd/obol/sell.go` | Rewrite register, add prompts to inference/http/register/pricing | -| `cmd/obol/main.go` | Wire `*ui.UI`, update help text | -| `cmd/obol/sell_test.go` | Update all affected tests | -| `internal/x402/config.go` | Add ethereum mainnet chain | - ---- - -## Verification - -```bash -# Phase 1 -go test ./internal/erc8004/ -run TestResolveNetwork - -# Phase 2 (unit — mock remote-signer) -go test ./internal/erc8004/ -run TestRemoteSigner -go test ./internal/erc8004/ -run TestSponsored - -# Phase 3 -go test ./internal/openclaw/ -run TestResolveWallet - -# Phase 4+5 (manual — needs running cluster + tunnel) -obol sell register --chain base-sepolia # direct tx via remote-signer -obol sell register --chain mainnet --sponsored # zero-gas via howto8004 -obol sell register --chain mainnet,base # multi-chain best-effort -obol sell inference # interactive prompts -obol sell http # interactive prompts -obol sell register # interactive with defaults to confirm - -# Phase 6 -obol sell pricing --chain base # auto-discovers wallet - -# All unit tests -go test ./cmd/obol/ -run TestSell -go test ./internal/erc8004/ -go test ./internal/openclaw/ -run TestResolve -go test ./internal/x402/ -run TestResolveChain -``` diff --git a/docs/plans/per-token-metering.md b/docs/plans/per-token-metering.md deleted file mode 100644 index a5839286..00000000 --- a/docs/plans/per-token-metering.md +++ /dev/null @@ -1,164 +0,0 @@ -# Per-Token Metering Plan - -## Scope - -This document defines phase 2 of issue 258: exact seller-side token metering -for paid inference offers, with Prometheus-native monitoring and a lightweight -status surface on `ServiceOffer`. - -Phase 1 is already deployed separately: - -- `perMTok` is accepted by the sell flows -- the enforced x402 charge is approximated as `perMTok / 1000` -- the source pricing metadata is persisted on each pricing route -- buyer and verifier expose operational Prometheus metrics - -This document covers how to replace that approximation for non-streaming -OpenAI-compatible chat completions. - -## Goals - -- Meter actual prompt, completion, and total token usage for paid inference - routes. -- Convert measured usage into estimated USDC using the seller's `perMTok`. -- Expose seller-side metrics through Prometheus. -- Surface roll-up usage on `ServiceOffer.status.usage`. -- Keep the verifier as the pre-request payment gate. - -## Non-Goals - -- Post-pay settlement or escrow. -- Exact metering for streaming responses. -- Exact metering for non-OpenAI response formats. -- Buyer-side billing authority. Buyer token telemetry remains observational. - -## Request Flow - -```text -client - -> Traefik HTTPRoute - -> x402-verifier (pre-request payment gate) - -> x402-meter - -> upstream inference service - -> x402-meter parses usage.total_tokens - -> response returned to client - -> x402-meter exports Prometheus metrics and updates ServiceOffer.status.usage -``` - -Key point: - -- `x402-verifier` still decides whether a request may proceed. -- `x402-meter` becomes the source of truth for exact usage accounting after the - upstream response is known. - -## Config Schema - -`x402-meter` is configured per monetized route. - -```yaml -apiVersion: v1 -kind: ConfigMap -metadata: - name: x402-meter-config - namespace: x402 -data: - config.yaml: | - routes: - - pattern: /services/my-qwen/v1/chat/completions - offerNamespace: llm - offerName: my-qwen - upstreamURL: http://ollama.llm.svc.cluster.local:11434 - upstreamAuth: "" - perMTok: "1.25" - priceModel: perMTok - responseFormat: openai-chat-completions -``` - -Required fields: - -- `pattern` -- `offerNamespace` -- `offerName` -- `upstreamURL` -- `perMTok` - -Optional fields: - -- `upstreamAuth` -- `responseFormat` - -## Status Schema - -`ServiceOffer.status.usage` is extended with a seller-side rollup: - -```yaml -status: - usage: - requests: 124 - promptTokens: 102400 - completionTokens: 18432 - totalTokens: 120832 - estimatedUSDC: "0.15104" - lastUpdated: "2026-03-06T12:34:56Z" -``` - -Rules: - -- `estimatedUSDC` is derived from `totalTokens / 1_000_000 * perMTok` -- values are monotonic rollups, not per-request histories -- writes should be batched to avoid excessive CR status churn - -## Prometheus Metrics - -`x402-meter` exposes `/metrics` and is scraped through a `ServiceMonitor`. - -Metric set: - -- `obol_x402_meter_requests_total{offer_namespace,offer_name,route}` -- `obol_x402_meter_prompt_tokens_total{offer_namespace,offer_name,route}` -- `obol_x402_meter_completion_tokens_total{offer_namespace,offer_name,route}` -- `obol_x402_meter_total_tokens_total{offer_namespace,offer_name,route}` -- `obol_x402_meter_estimated_usdc_total{offer_namespace,offer_name,route}` -- `obol_x402_meter_parse_failures_total{offer_namespace,offer_name,route}` - -Label guidance: - -- keep labels limited to offer identity and route pattern -- do not label by user, wallet, or request id - -## Buyer-Side Observational Metrics - -Buyer-side metrics remain separate from billing: - -- `x402-buyer` continues exposing request, payment, auth pool, and active model - mapping metrics. -- A later extension may parse `usage.total_tokens` from remote seller responses - and emit observational counters keyed by `upstream` and `remote_model`. -- Disagreement between buyer-observed tokens and seller-billed tokens should be - treated as an alerting or debugging signal, not a settlement input. - -## Rollout Plan - -1. Deploy `x402-meter` behind the verifier for one non-streaming paid route. -2. Validate token parsing and Prometheus scrape health. -3. Enable `ServiceOffer.status.usage` updates with rate limiting. -4. Switch sell-side status output from approximation-first to exact-usage-first - whenever meter data is present. -5. Keep the phase-1 `perMTok / 1000` approximation as a fallback for routes not - yet migrated to `x402-meter`. - -## Failure Handling - -- If the response body cannot be parsed, increment - `obol_x402_meter_parse_failures_total` and return the upstream response - unchanged. -- If the upstream omits `usage.total_tokens`, do not synthesize exact billing. -- If status updates fail, metrics must still be emitted. -- If Prometheus is unavailable, request serving must continue. - -## Open Questions - -- Whether streamed responses should be handled with token trailers, chunk - aggregation, or remain explicitly unsupported. -- Whether meter state should be derived solely from Prometheus counters or also - persisted locally for faster CR status reconciliation. diff --git a/docs/x402-test-plan.md b/docs/x402-test-plan.md deleted file mode 100644 index ed694923..00000000 --- a/docs/x402-test-plan.md +++ /dev/null @@ -1,330 +0,0 @@ -# x402 + ERC-8004 Integration Test Plan - -**Feature branch:** `feat/secure-enclave-inference` -**Scope:** 100% coverage of x402 payment gating, ERC-8004 on-chain registration, verifier service, and CLI commands. - ---- - -## 1. Coverage Inventory - -### Current State - -| Package | File | Existing Tests | Coverage | -|---------|------|---------------|----------| -| `internal/erc8004` | `client.go` | TestNewClient, TestRegister | ~60% (missing SetAgentURI, SetMetadata error paths) | -| `internal/erc8004` | `store.go` | TestStore | ~70% (missing Save errors, corrupt file) | -| `internal/erc8004` | `types.go` | none | 0% (JSON marshaling/unmarshaling) | -| `internal/erc8004` | `abi.go` | implicit via client tests | ~50% (missing ABI parse error, constant verification) | -| `internal/x402` | `verifier.go` | 11 tests | ~85% (missing SetRegistration, HandleWellKnown) | -| `internal/x402` | `matcher.go` | 8 tests | ~95% (good) | -| `internal/x402` | `config.go` | implicit via verifier | ~40% (missing LoadConfig, ResolveChain edge cases) | -| `internal/x402` | `watcher.go` | none | 0% | -| `internal/x402` | `setup.go` | none | 0% (kubectl-dependent, needs mock) | -| `cmd/obol` | `monetize.go` | none | 0% | - -### Target: 100% Function Coverage - ---- - -## 2. Unit Tests to Add - -### 2.1 `internal/erc8004` Package - -#### `abi_test.go` (NEW) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestABI_ParsesSuccessfully` | Embedded ABI JSON parses without error | HIGH | -| `TestABI_AllFunctionsPresent` | All 10 functions present: register (3 overloads), setAgentURI, setMetadata, getMetadata, getAgentWallet, setAgentWallet, unsetAgentWallet, tokenURI | HIGH | -| `TestABI_AllEventsPresent` | All 3 events present: Registered, URIUpdated, MetadataSet | HIGH | -| `TestABI_RegisterOverloads` | 3 distinct register methods exist with correct input counts (0, 1, 2) | HIGH | -| `TestConstants_Addresses` | IdentityRegistryBaseSepolia, ReputationRegistryBaseSepolia, ValidationRegistryBaseSepolia are valid hex addresses (40 chars after 0x) | MEDIUM | -| `TestConstants_ChainID` | BaseSepoliaChainID == 84532 | LOW | - -#### `types_test.go` (NEW) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestAgentRegistration_MarshalJSON` | Full struct serializes to spec-compliant JSON (type, name, description, image, services, x402Support, active, registrations, supportedTrust) | HIGH | -| `TestAgentRegistration_UnmarshalJSON` | Canonical spec JSON (from ERC8004SPEC.md) deserializes correctly | HIGH | -| `TestAgentRegistration_OmitEmptyFields` | Optional fields (description, image, registrations, supportedTrust) omitted when zero-value | MEDIUM | -| `TestServiceDef_VersionOptional` | ServiceDef without version marshals correctly (version omitempty) | MEDIUM | -| `TestOnChainReg_AgentIDNumeric` | AgentID is int64, serializes as JSON number (not string) | HIGH | -| `TestRegistrationType_Constant` | RegistrationType == `"https://eips.ethereum.org/EIPS/eip-8004#registration-v1"` | LOW | - -#### `client_test.go` (ADDITIONS to existing) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestNewClient_DialError` | Returns error when RPC URL is unreachable | MEDIUM | -| `TestNewClient_ChainIDError` | Returns error when eth_chainId fails | MEDIUM | -| `TestSetAgentURI` | Successful tx + wait mined (mock sendRawTransaction + receipt) | HIGH | -| `TestSetMetadata` | Successful tx + wait mined | HIGH | -| `TestRegister_NoRegisteredEvent` | Returns error when receipt has no Registered event log | HIGH | -| `TestRegister_TxError` | Returns error when sendRawTransaction fails | MEDIUM | -| `TestGetMetadata_EmptyResult` | Returns nil when contract returns empty bytes | MEDIUM | - -#### `store_test.go` (ADDITIONS to existing) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestStore_SaveOverwrite` | Second Save overwrites first | MEDIUM | -| `TestStore_LoadCorruptJSON` | Returns error on malformed JSON file | MEDIUM | -| `TestStore_SaveReadOnly` | Returns error when directory is read-only (permission denied) | LOW | - -### 2.2 `internal/x402` Package - -#### `verifier_test.go` (ADDITIONS) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestVerifier_SetRegistration` | SetRegistration stores data, HandleWellKnown returns it | HIGH | -| `TestVerifier_HandleWellKnown_NoRegistration` | Returns 404 when no registration set | HIGH | -| `TestVerifier_HandleWellKnown_JSON` | Response is valid JSON AgentRegistration with correct Content-Type | HIGH | -| `TestVerifier_ReadyzNotReady` | Returns 503 when config is nil (fresh Verifier without config) | MEDIUM | - -#### `config_test.go` (NEW) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestLoadConfig_ValidYAML` | Parses complete YAML with wallet, chain, routes | HIGH | -| `TestLoadConfig_Defaults` | Empty chain defaults to "base-sepolia", empty facilitatorURL defaults | HIGH | -| `TestLoadConfig_InvalidYAML` | Returns parse error on malformed YAML | MEDIUM | -| `TestLoadConfig_FileNotFound` | Returns read error | MEDIUM | -| `TestResolveChain_AllSupported` | All 6 chain names resolve (base, base-sepolia, polygon, polygon-amoy, avalanche, avalanche-fuji) | HIGH | -| `TestResolveChain_Aliases` | "base-mainnet" == "base", "polygon-mainnet" == "polygon", etc. | MEDIUM | -| `TestResolveChain_Unsupported` | Returns error for unknown chain name | MEDIUM | -| `TestResolveChain_ErrorMessage` | Error message lists all supported chains | LOW | - -#### `watcher_test.go` (NEW) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestWatchConfig_DetectsChange` | Write new config file, watcher reloads verifier within interval | HIGH | -| `TestWatchConfig_IgnoresUnchanged` | Same mtime = no reload | MEDIUM | -| `TestWatchConfig_InvalidConfig` | Bad YAML doesn't crash watcher, verifier keeps old config | HIGH | -| `TestWatchConfig_CancelContext` | Context cancellation stops the watcher goroutine cleanly | MEDIUM | -| `TestWatchConfig_MissingFile` | Missing file logged but watcher continues | MEDIUM | - -#### `setup_test.go` (NEW — requires abstraction for kubectl) - -The `setup.go` file shells out to `kubectl`. To unit-test it, extract an interface: - -```go -// KubectlRunner abstracts kubectl execution for testing. -type KubectlRunner interface { - Run(args ...string) error - Output(args ...string) (string, error) -} -``` - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestSetup_PatchesSecretAndConfigMap` | Calls kubectl patch on both secret and configmap with correct args | HIGH | -| `TestSetup_NoKubeconfig` | Returns "cluster not running" error | HIGH | -| `TestAddRoute_AppendsToExisting` | Reads existing config, appends route, patches back | HIGH | -| `TestAddRoute_FirstRoute` | Adds route when routes list is empty | MEDIUM | -| `TestGetPricingConfig_EmptyResponse` | Returns empty PricingConfig when configmap has no data | MEDIUM | -| `TestGetPricingConfig_ParsesYAML` | Correct wallet/chain/routes from kubectl output | HIGH | -| `TestPatchPricingConfig_Serialization` | Generated YAML has correct format (routes array, descriptions) | MEDIUM | - ---- - -## 3. Integration Tests (//go:build integration) - -These require a running k3d cluster with `OBOL_DEVELOPMENT=true`. - -### 3.1 `internal/x402/integration_test.go` (NEW) - -**Prerequisites:** Running cluster, x402 namespace deployed. - -| Test | What it verifies | Runtime | Priority | -|------|-----------------|---------|----------| -| `TestIntegration_X402Setup` | `obol x402 setup --wallet 0x... --chain base-sepolia` patches configmap + secret in cluster | 30s | HIGH | -| `TestIntegration_X402Status` | `obol x402 status` reads correct config from cluster | 15s | HIGH | -| `TestIntegration_X402AddRoute` | `obol x402 setup` then AddRoute() adds route, verifiable via GetPricingConfig | 30s | MEDIUM | -| `TestIntegration_VerifierDeployment` | x402-verifier pod is running, responds to /healthz | 15s | HIGH | -| `TestIntegration_VerifierForwardAuth` | Send request to /verify endpoint with X-Forwarded-Uri, verify 200/402 behavior | 30s | HIGH | -| `TestIntegration_WellKnownEndpoint` | GET /.well-known/agent-registration.json returns valid JSON (after registration set) | 15s | MEDIUM | - -### 3.2 `internal/erc8004/integration_test.go` (NEW) - -**Prerequisites:** Base Sepolia RPC access, funded test wallet (ERC8004_PRIVATE_KEY env var). - -| Test | What it verifies | Runtime | Priority | -|------|-----------------|---------|----------| -| `TestIntegration_RegisterOnBaseSepolia` | Full register() tx on testnet, verify agentID returned | 60s | HIGH | -| `TestIntegration_SetAgentURI` | setAgentURI() after register, verify tokenURI() returns new URI | 60s | HIGH | -| `TestIntegration_SetAndGetMetadata` | setMetadata() + getMetadata() roundtrip | 60s | MEDIUM | -| `TestIntegration_GetAgentWallet` | getAgentWallet() returns owner address after registration | 30s | MEDIUM | - -**Skip logic:** -```go -func TestMain(m *testing.M) { - if os.Getenv("ERC8004_PRIVATE_KEY") == "" { - fmt.Println("Skipping ERC-8004 integration tests: ERC8004_PRIVATE_KEY not set") - os.Exit(0) - } - os.Exit(m.Run()) -} -``` - -### 3.3 End-to-End: x402 Payment Flow - -**File:** `internal/x402/e2e_test.go` (NEW, `//go:build integration`) - -**Prerequisites:** Running cluster with inference network deployed, x402 enabled, funded test wallet. - -| Test | Scenario | Steps | Priority | -|------|----------|-------|----------| -| `TestE2E_InferenceWithPayment` | Full x402 payment lifecycle | 1. Deploy inference network with x402Enabled=true; 2. Configure pricing via AddRoute; 3. Send request WITHOUT payment → 402; 4. Verify 402 body contains payment requirements; 5. Send request WITH valid x402 payment header → 200 | HIGH | -| `TestE2E_RegisterAndServeWellKnown` | ERC-8004 + well-known endpoint | 1. Register agent on Base Sepolia; 2. Set registration on verifier; 3. GET /.well-known/agent-registration.json → matches registration | MEDIUM | - ---- - -## 4. CLI Command Tests - -### `cmd/obol/x402_test.go` (NEW) - -Pattern: Build the CLI app, run subcommands against mocked infrastructure. - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestX402Command_Structure` | x402 has 3 subcommands: register, setup, status | HIGH | -| `TestX402Register_RequiresPrivateKey` | Fails without --private-key or ERC8004_PRIVATE_KEY | HIGH | -| `TestX402Register_TrimsHexPrefix` | 0x-prefixed key handled correctly | MEDIUM | -| `TestX402Setup_RequiresWallet` | Fails without --wallet flag | HIGH | -| `TestX402Setup_DefaultChain` | Default chain is "base-sepolia" | MEDIUM | -| `TestX402Status_NoCluster` | Graceful output when no cluster running | MEDIUM | -| `TestX402Status_NoRegistration` | Shows "not registered" message | MEDIUM | - ---- - -## 5. Helmfile Template Tests - -### Infrastructure Helmfile (conditional x402 resources) - -**File:** `internal/embed/infrastructure/helmfile_test.go` (NEW) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestHelmfile_X402DisabledByDefault` | x402Enabled=false: no Middleware CRD rendered, no ExtensionRef on eRPC HTTPRoute | HIGH | -| `TestHelmfile_X402Enabled` | x402Enabled=true: Middleware CRD rendered with correct ForwardAuth address, ExtensionRef added to eRPC HTTPRoute | HIGH | - -### Inference Network Template - -**File:** `internal/embed/networks/inference/template_test.go` (NEW) - -| Test | What it verifies | Priority | -|------|-----------------|----------| -| `TestInferenceValues_X402EnabledField` | values.yaml.gotmpl contains x402Enabled field with @enum true,false, @default false | HIGH | -| `TestInferenceHelmfile_X402Passthrough` | x402Enabled value passed through to helmfile.yaml.gotmpl | HIGH | -| `TestInferenceGateway_ConditionalMiddleware` | gateway.yaml: Middleware CRD only rendered when x402Enabled=true | HIGH | -| `TestInferenceGateway_ConditionalExtensionRef` | gateway.yaml: ExtensionRef only present when x402Enabled=true | HIGH | - ---- - -## 6. Coverage Gap Analysis — Functions NOT Tested - -### internal/erc8004 - -| Function | File:Line | Test Status | Action | -|----------|-----------|-------------|--------| -| `NewClient()` | client.go:26 | TESTED | - | -| `Close()` | client.go:57 | implicit | - | -| `Register()` | client.go:63 | TESTED | Add error paths | -| `SetAgentURI()` | client.go:95 | **UNTESTED** | Add test | -| `SetMetadata()` | client.go:114 | **UNTESTED** | Add test | -| `GetMetadata()` | client.go:133 | TESTED | Add empty result | -| `TokenURI()` | client.go:150 | TESTED | - | -| `NewStore()` | store.go:30 | implicit | - | -| `Save()` | store.go:39 | TESTED | Add error paths | -| `Load()` | store.go:55 | TESTED | Add corrupt file | - -### internal/x402 - -| Function | File:Line | Test Status | Action | -|----------|-----------|-------------|--------| -| `NewVerifier()` | verifier.go:25 | TESTED | - | -| `Reload()` | verifier.go:34 | TESTED | - | -| `HandleVerify()` | verifier.go:56 | TESTED (11 cases) | - | -| `HandleHealthz()` | verifier.go:114 | TESTED | - | -| `HandleReadyz()` | verifier.go:120 | TESTED | Add nil config case | -| `SetRegistration()` | verifier.go:131 | **UNTESTED** | Add test | -| `HandleWellKnown()` | verifier.go:136 | **UNTESTED** | Add test | -| `LoadConfig()` | config.go:46 | **UNTESTED** | Add tests | -| `ResolveChain()` | config.go:69 | partial (error case only) | Add all chains | -| `WatchConfig()` | watcher.go:16 | **UNTESTED** | Add tests | -| `Setup()` | setup.go:23 | **UNTESTED** | Needs kubectl abstraction | -| `AddRoute()` | setup.go:70 | **UNTESTED** | Needs kubectl abstraction | -| `GetPricingConfig()` | setup.go:96 | **UNTESTED** | Needs kubectl abstraction | -| `matchRoute()` | matcher.go:19 | TESTED (8 cases) | - | -| `matchPattern()` | matcher.go:29 | TESTED | - | -| `globMatch()` | matcher.go:52 | TESTED | - | - ---- - -## 7. Implementation Priority - -### Phase 1: Unit tests (no cluster needed) — ~2 hours - -1. `internal/erc8004/abi_test.go` — ABI integrity checks -2. `internal/erc8004/types_test.go` — JSON serialization spec compliance -3. `internal/x402/config_test.go` — LoadConfig + ResolveChain -4. `internal/x402/verifier_test.go` — SetRegistration + HandleWellKnown additions -5. `internal/x402/watcher_test.go` — File watcher - -### Phase 2: Missing client methods + error paths — ~1 hour - -6. `internal/erc8004/client_test.go` — SetAgentURI, SetMetadata, error paths -7. `internal/erc8004/store_test.go` — overwrite, corrupt, permissions - -### Phase 3: Setup abstraction + tests — ~1.5 hours - -8. Extract `KubectlRunner` interface from `setup.go` -9. `internal/x402/setup_test.go` — all Setup/AddRoute/GetPricingConfig - -### Phase 4: Integration tests — ~2 hours (requires running cluster) - -10. `internal/x402/integration_test.go` — cluster-based tests -11. `internal/erc8004/integration_test.go` — Base Sepolia testnet tests - -### Phase 5: Template + CLI tests — ~1 hour - -12. Helmfile template rendering tests -13. `cmd/obol/x402_test.go` — CLI command structure + validation - ---- - -## 8. Test Execution Commands - -```bash -# Phase 1-3: Unit tests only -go test -v ./internal/erc8004/... ./internal/x402/... - -# Phase 4: Integration tests (requires cluster + testnet key) -export OBOL_CONFIG_DIR=$(pwd)/.workspace/config -export OBOL_BIN_DIR=$(pwd)/.workspace/bin -export OBOL_DATA_DIR=$(pwd)/.workspace/data -export ERC8004_PRIVATE_KEY= -go build -o .workspace/bin/obol ./cmd/obol -go test -tags integration -v -timeout 15m ./internal/x402/ ./internal/erc8004/ - -# Coverage report -go test -coverprofile=coverage.out ./internal/erc8004/... ./internal/x402/... -go tool cover -html=coverage.out -o coverage.html -``` - ---- - -## 9. Success Criteria - -- [ ] 100% function coverage on `internal/erc8004/` (all 10 exported functions) -- [ ] 100% function coverage on `internal/x402/` (all 14 exported functions) -- [ ] All 3 ABI register overloads verified against canonical ABI -- [ ] JSON serialization roundtrip matches ERC-8004 spec format -- [ ] WatchConfig tested with file changes, cancellation, and error recovery -- [ ] Setup/AddRoute/GetPricingConfig tested via kubectl mock -- [ ] HandleWellKnown tested (200 with data, 404 without) -- [ ] Integration tests skip gracefully when prerequisites unavailable -- [ ] `go test ./...` passes with zero failures diff --git a/features/application_management.feature b/features/application_management.feature new file mode 100644 index 00000000..5af48fb3 --- /dev/null +++ b/features/application_management.feature @@ -0,0 +1,36 @@ +@bdd +Feature: Application management + As a local operator + I want named managed applications that can be installed, synced, listed, and deleted + So that supporting workloads follow the same lifecycle discipline as the rest of the stack + + # References: SPEC Section 3.8 (Application Management and Supporting Operations), B&E Section 2.8 (Managed Applications and Supporting Operations) + + Background: + Given the operator has a running stack and access to supported application sources + + @phase1 @fast + Scenario: Installing an application creates a named managed deployment + Given the operator selects a supported application source + When the operator installs an application with a name + Then the platform records that name as the persistent application identity + And later sync and delete operations target that same managed deployment + + @phase1 @fast + Scenario: Deleting an application removes only the selected deployment + Given multiple managed applications exist + When the operator deletes one named application + Then only that application's deployment artifacts are removed + And unrelated applications remain intact + + @phase1 + Scenario Outline: Sync applies the current desired source state to a named application + Given the operator has an installed application from + When the operator runs app sync for that application + Then the deployment is reconciled against + + Examples: + | source_kind | + | helm chart | + | OCI chart | + | local path | diff --git a/features/buy_side_payments.feature b/features/buy_side_payments.feature new file mode 100644 index 00000000..6b84fef4 --- /dev/null +++ b/features/buy_side_payments.feature @@ -0,0 +1,35 @@ +@bdd +Feature: Buy-side remote inference + As a remote buyer + I want paid remote models to resolve through a bounded-risk payment sidecar + So that I can purchase inference without receiving direct access to signing authority + + # References: SPEC Section 3.6 (Buy-Side Remote Inference), B&E Section 2.6 (Buy-Side Payments) + + Background: + Given the cluster-wide LiteLLM gateway exposes a static paid model namespace + + @phase1 @fast + Scenario: Paid model routing uses the static paid namespace + Given a remote model has been configured for paid access + When a buyer requests that model through LiteLLM + Then the request resolves through the static paid namespace + And payment handling is delegated to the buyer sidecar + + @phase1 @fast + Scenario: Spending is bounded by the pre-signed auth pool + Given the buyer sidecar has a finite pool of pre-signed authorizations + When the sidecar forwards paid requests + Then it uses only the available authorizations in that pool + And it fails explicitly instead of escalating to live signing authority + + @phase1 + Scenario Outline: Unmapped paid models fail explicitly + Given the buyer requests + When no remote payment mapping exists for that model + Then the request fails with an explicit unmapped-model error + + Examples: + | model_name | + | paid/unknown-model | + | paid/missing-offer | diff --git a/features/frontend_and_monitoring.feature b/features/frontend_and_monitoring.feature new file mode 100644 index 00000000..945c12f5 --- /dev/null +++ b/features/frontend_and_monitoring.feature @@ -0,0 +1,36 @@ +@bdd +Feature: Frontend and monitoring surfaces + As a local operator + I want observability and browser surfaces that match the platform's local-first posture + So that I can inspect the stack without accidentally publishing operator-only interfaces + + # References: SPEC Section 3.7 (Tunnel, Discovery, Frontend, and Monitoring), B&E Section 2.7 (Tunnel, Discovery, Frontend, and Monitoring) + + Background: + Given the stack has deployed its default frontend and monitoring components + + @phase1 @fast + Scenario: Frontend stays on the local hostname by default + Given the operator opens the stack frontend + When no explicit architecture change has been made for public exposure + Then the frontend is served through the local hostname contract + And the public tunnel does not expose that interface + + @phase1 @fast + Scenario: Monitoring remains an operator-only surface + Given Prometheus-backed monitoring is installed + When buyers access public monetized services + Then monitoring data remains separate from buyer-facing endpoints + And operator diagnostics stay inside the local control plane + + @phase1 + Scenario Outline: Status surfaces expose operational data through the intended channel + Given the operator inspects + When the platform reports health or runtime state + Then the operator receives + + Examples: + | surface | operational_view | + | sell status | pricing and reconciliation | + | model status | provider and route readiness | + | tunnel | current tunnel activation | diff --git a/features/llm_routing.feature b/features/llm_routing.feature new file mode 100644 index 00000000..46fe1550 --- /dev/null +++ b/features/llm_routing.feature @@ -0,0 +1,37 @@ +@bdd +Feature: LLM routing and provider management + As a local operator + I want one model gateway for local, cloud, and paid inference routes + So that OpenClaw instances and buyers see a consistent model contract + + # References: SPEC Section 3.2 (LLM Routing and Provider Management), B&E Section 2.2 (LLM Routing) + + Background: + Given the stack has a cluster-wide LiteLLM deployment + + @phase1 @fast + Scenario: LiteLLM is the central operator-facing gateway + Given an OpenClaw instance needs model access + When the instance sends inference traffic through the platform + Then the request is routed through LiteLLM + And provider-specific credentials remain centralized at the cluster gateway + + @phase1 @fast + Scenario: Invalid custom endpoints are rejected before publication + Given the operator supplies a custom OpenAI-compatible endpoint + When the operator runs model setup for that endpoint + Then the endpoint is validated before it is added to the route set + And broken provider entries are not published to downstream consumers + + @phase1 + Scenario Outline: Model namespaces resolve to the correct upstream class + Given LiteLLM is configured for + When a request targets the model namespace + Then the platform routes the request to + + Examples: + | namespace_type | model_name | upstream_class | + | local Ollama | llama3.2:3b | the local model runtime | + | cloud Anthropic | claude-sonnet-4-5-20250929 | the Anthropic API | + | cloud OpenAI | gpt-4o | the OpenAI API | + | buy-side paid route | paid/qwen3.5:9b | the x402 buyer sidecar | diff --git a/features/network_management.feature b/features/network_management.feature new file mode 100644 index 00000000..19c1f058 --- /dev/null +++ b/features/network_management.feature @@ -0,0 +1,35 @@ +@bdd +Feature: Network management and eRPC + As a local operator + I want local chain deployments and remote RPC aliases to remain distinct + So that network support claims and routing behavior stay accurate + + # References: SPEC Section 3.3 (Network Management and eRPC), B&E Section 2.3 (Network Management) + + Background: + Given the operator has a running stack with eRPC available + + @phase1 @fast + Scenario: Installable networks come only from embedded bundles + Given the operator wants to deploy a local network + When the operator lists installable networks + Then only embedded deployable network bundles are shown + And remote RPC aliases are not presented as local deployments + + @phase1 @fast + Scenario: Remote RPC aliases default to read-only forwarding + Given the operator adds a remote chain without allow-writes + When requests are routed through eRPC for that chain + Then write methods remain blocked by default + And read-only RPC methods continue to work + + @phase1 + Scenario Outline: Network status matches current command semantics + Given the operator has + When the operator runs network status + Then the command reports + + Examples: + | deployment_state | status_surface | + | local and remote networks configured | global eRPC health and upstreams | + | no named local deployment selected | the current gateway summary contract | diff --git a/features/openclaw_runtime.feature b/features/openclaw_runtime.feature new file mode 100644 index 00000000..b527d115 --- /dev/null +++ b/features/openclaw_runtime.feature @@ -0,0 +1,36 @@ +@bdd +Feature: OpenClaw runtime and agent capabilities + As an agent developer + I want a canonical elevated OpenClaw runtime plus separately managed instances + So that automation and custom agents share the same safe deployment model + + # References: SPEC Section 3.4 (OpenClaw Runtime and Agent Capabilities), B&E Section 2.4 (OpenClaw Runtime) + + Background: + Given the stack has completed baseline startup + + @phase1 @fast + Scenario: The default elevated runtime is prepared automatically + Given the operator has not created any extra OpenClaw instances + When the stack deploys its defaults + Then the canonical elevated OpenClaw runtime is prepared for obol-agent workflows + And the runtime receives the elevated capabilities required by shipped skills + + @phase1 @fast + Scenario: Additional instances remain operator-managed deployments + Given the operator has created one or more named OpenClaw instances + When the operator syncs or deletes an instance + Then the action targets the named deployment the operator selected + And other instances remain unchanged + + @phase1 + Scenario Outline: Operator surfaces resolve to the correct OpenClaw instance + Given the operator targets the instance + When the operator uses the command + Then the command returns data for + + Examples: + | instance_id | surface | + | obol-agent | token | + | obol-agent | dashboard | + | my-agent | token | diff --git a/features/sell_side_monetization.feature b/features/sell_side_monetization.feature new file mode 100644 index 00000000..dff445b2 --- /dev/null +++ b/features/sell_side_monetization.feature @@ -0,0 +1,44 @@ +@bdd +Feature: Sell-side monetization + As a local operator + I want to expose priced services through a ServiceOffer control loop + So that public buyers can discover and pay for bounded compute or HTTP endpoints + + # References: SPEC Section 3.5 (Sell-Side Monetization), B&E Section 2.5 (Sell-Side Monetization) + + Background: + Given the operator has a running stack with the elevated agent runtime available + + @phase1 @fast + Scenario: A ServiceOffer is created in the namespace the operator chose + Given the operator creates a sell-side offer with an explicit namespace + When the CLI submits the ServiceOffer resource + Then the resource is written into that namespace + And downstream pricing and routing assets are derived from that resource + + @phase1 @fast + Scenario: Probe verifies the payment gate without spending buyer funds + Given a sell-side offer has published its payment route + When the operator runs sell probe against that offer + Then the command confirms the payment gate is reachable + And no paid inference budget is consumed + + @phase1 + Scenario Outline: Pricing models remain explicit about their current billing contract + Given a sell-side offer uses the pricing model + When the offer is reconciled successfully + Then the route publishes payment terms for + And operators can inspect the current pricing contract through status surfaces + + Examples: + | pricing_model | + | perRequest | + | perMTok | + | perHour | + + @phase2 + Scenario: Exact token metering supplements the pre-request payment gate + Given an inference offer uses per-token pricing + When phase 2 exact metering is enabled for that route + Then pre-request authorization still happens before execution + And post-response usage updates the seller-side accounting surfaces diff --git a/features/stack_lifecycle.feature b/features/stack_lifecycle.feature new file mode 100644 index 00000000..08d7ba29 --- /dev/null +++ b/features/stack_lifecycle.feature @@ -0,0 +1,36 @@ +@bdd +Feature: Stack lifecycle + As a local operator + I want to initialize, start, stop, and purge the stack safely + So that I can control the local platform without losing important state unexpectedly + + # References: SPEC Section 3.1 (Stack Lifecycle), B&E Section 2.1 (Stack Lifecycle) + + Background: + Given the operator is using the obol CLI against a local workspace + + @phase1 @fast + Scenario: Initialize and start a new stack + Given no stack config exists yet + When the operator runs stack init and then stack up + Then the CLI persists a stable stack identity and backend choice + And baseline infrastructure is deployed before any optional public exposure + + @phase1 @fast + Scenario: Purge without force preserves persistent data + Given a stack has existing config and persistent data + When the operator runs stack purge without force + Then the cluster state and config are removed + And persistent data remains available for later recovery + + @phase1 + Scenario Outline: Startup tolerates missing optional provider dependencies + Given the host + When the operator runs stack up + Then the stack reaches a usable baseline + And provider setup can be completed + + Examples: + | provider_state | recovery_path | + | has discoverable local models | automatically during startup | + | lacks local models or cloud credentials | later through model setup | diff --git a/features/tunnel_and_discovery.feature b/features/tunnel_and_discovery.feature new file mode 100644 index 00000000..050296ca --- /dev/null +++ b/features/tunnel_and_discovery.feature @@ -0,0 +1,36 @@ +@bdd +Feature: Tunnel, discovery, and public exposure + As a local operator + I want public routes to be optional and narrowly scoped + So that local control surfaces remain private while discoverable services can still be published + + # References: SPEC Section 3.7 (Tunnel, Discovery, Frontend, and Monitoring), B&E Section 2.7 (Tunnel, Discovery, Frontend, and Monitoring) + + Background: + Given the stack can run with or without a public tunnel + + @phase1 @fast + Scenario: Quick tunnels are activated on demand + Given the operator has not provisioned a persistent DNS tunnel + When the stack starts + Then the quick tunnel remains dormant until a public route needs it + And local-only operation remains available immediately + + @phase1 @fast + Scenario: Discovery metadata follows the active tunnel URL + Given a public service has discovery metadata + When the active tunnel URL changes + Then discovery metadata is refreshed to reflect the current public address + And stale public URLs are not treated as canonical + + @phase1 + Scenario Outline: Operator surfaces remain local-only unless the architecture changes deliberately + Given the operator inspects the + When the platform computes public exposure rules + Then remains local-only + + Examples: + | surface | + | frontend | + | eRPC | + | monitoring | diff --git a/plans/agent-services.md b/plans/agent-services.md deleted file mode 100644 index a05869ff..00000000 --- a/plans/agent-services.md +++ /dev/null @@ -1,567 +0,0 @@ -# Agent Services: Autonomous x402-Gated HTTP Endpoints - -**Goal:** A skill that lets OpenClaw deploy its own HTTP services into the cluster, gate them with x402 payments, register them with ERC-8004, expose them to the public internet, and monitor earnings — turning the agent from a tool-user into an autonomous economic actor. - ---- - -## Why This Is The One - -The Obol Stack already has every piece: - -| Capability | How it exists today | -|------------|-------------------| -| Wallet | Web3Signer in-cluster, `signer.py` for signing | -| Onchain identity | `agent-identity` skill, ERC-8004 registration | -| Kubernetes cluster | k3d with Traefik gateway | -| Public internet access | Cloudflare tunnel (`obol tunnel`) | -| x402 payment infrastructure | `inference-gateway` binary, Go x402 SDK, Coinbase facilitator | -| Blockchain nodes | eRPC gateway routing to local/remote nodes | - -What's missing: **the agent can't deploy a service, price it, and collect payment.** This skill closes that gap. - ---- - -## Existing Precedent: The Inference Gateway - -The `inference` network (`internal/embed/networks/inference/`) already implements this exact pattern: - -1. User specifies a model, price, wallet, and chain -2. Helmfile deploys: Ollama pod + x402 gateway pod + Service + HTTPRoute + metadata ConfigMap -3. Gateway wraps Ollama's OpenAI-compatible API with x402 payment verification -4. Traefik routes `/inference-/v1/*` to the gateway -5. Cloudflare tunnel makes it publicly accessible -6. Frontend discovers it via the metadata ConfigMap - -**The `agent-services` skill generalises this pattern** from "inference only" to "any HTTP handler the agent writes." - ---- - -## Architecture - -``` -OpenClaw pod (writes handler + config) - │ - │ 1. Agent writes handler.py (business logic) - │ 2. identity.sh registers with ERC-8004 - │ 3. service.sh deploys via helmfile - │ - ▼ -agent-service- namespace - ┌─────────────────────────────┐ - │ Pod: agent-svc- │ - │ ┌────────────────────────┐ │ - │ │ x402-proxy (sidecar) │ │ ← Verifies payment, settles via facilitator - │ │ port 8402 │ │ - │ └──────────┬─────────────┘ │ - │ │ proxy_pass │ - │ ┌──────────▼─────────────┐ │ - │ │ handler.py (main) │ │ ← Agent's business logic (plain HTTP) - │ │ port 8080 │ │ - │ └────────────────────────┘ │ - │ │ - │ ConfigMap: handler-code │ ← Agent's Python handler - │ ConfigMap: svc-metadata │ ← Pricing, endpoints, description - │ Service: agent-svc- │ ← ClusterIP, port 8402 - │ HTTPRoute: agent-svc-│ ← /services//* → port 8402 - └─────────────────────────────┘ - │ - ▼ - Traefik Gateway (traefik namespace) - │ - ▼ - Cloudflare Tunnel → https:///services//* -``` - -### Why a Sidecar Proxy? - -The agent writes **plain HTTP handlers** — no x402 awareness needed. A sidecar `x402-proxy` container handles all payment logic: - -1. Receives inbound request -2. If no payment header → responds `402 Payment Required` with pricing -3. If payment header present → verifies signature via facilitator -4. If valid → proxies request to handler on `localhost:8080` -5. Settles payment onchain via facilitator -6. Returns handler response with `PAYMENT-RESPONSE` header - -**Benefits:** -- Agent doesn't need to understand x402 protocol internals -- Same proxy image reused across all services (already exists as `inference-gateway`) -- Handler can be any language/framework — just serve HTTP on port 8080 -- Payment config is environment variables, not code - -### The x402 Proxy Image - -The existing `inference-gateway` (`cmd/inference-gateway/main.go`) is already a generic x402 reverse proxy. It takes `--upstream`, `--wallet`, `--price`, `--chain`, `--facilitator` flags and wraps any upstream HTTP service with x402 payment gates. - -**Reuse strategy:** The inference gateway image (`ghcr.io/obolnetwork/inference-gateway`) can proxy any upstream, not just Ollama. For `agent-services`, the upstream is `http://localhost:8080` (the agent's handler running in the same pod). - -If needed, we can extract the generic proxy into its own image (`ghcr.io/obolnetwork/x402-proxy`) later. For now, the inference gateway binary works as-is. - ---- - -## Skill Structure - -``` -agent-services/ -├── SKILL.md -├── scripts/ -│ └── service.sh # Deploy, list, update, teardown, monitor -├── templates/ -│ ├── helmfile.yaml.gotmpl # Helmfile template for service deployment -│ ├── handler.py.tmpl # Minimal Python handler scaffold -│ └── metadata.json.tmpl # Service metadata template -└── references/ - └── x402-server-patterns.md # Pricing strategies, facilitator config, chain selection -``` - -### `service.sh` Commands - -```bash -# === Lifecycle === - -# Deploy a new service from a handler file -sh scripts/service.sh deploy \ - --name weather-api \ - --handler ./my_handler.py \ - --price 0.10 \ - --chain base \ - --wallet 0xYourAddress \ - --description "Real-time weather data" \ - --register # auto-register endpoint with ERC-8004 - -# Deploy with the scaffold template (agent fills in the handler later) -sh scripts/service.sh scaffold --name weather-api -# → Creates handler.py from template, agent edits it, then deploys - -# Update handler code (patches ConfigMap, restarts pod) -sh scripts/service.sh update --name weather-api --handler ./updated_handler.py - -# Update pricing (patches gateway config, no restart needed) -sh scripts/service.sh set-price --name weather-api --price 0.05 - -# Tear down a service (deletes namespace + all resources) -sh scripts/service.sh teardown --name weather-api - -# === Discovery === - -# List deployed services with status and URLs -sh scripts/service.sh list - -# Show service details (pricing, endpoints, health, earnings) -sh scripts/service.sh status --name weather-api - -# === Monitoring === - -# Check USDC earnings for a service's wallet -sh scripts/service.sh earnings --name weather-api - -# View service logs -sh scripts/service.sh logs --name weather-api [--tail 100] - -# Health check -sh scripts/service.sh health --name weather-api -``` - -### How `deploy` Works Internally - -``` -1. Validate inputs (handler file exists, chain supported, wallet valid) - -2. Create deployment directory: - $CONFIG_DIR/services// - ├── helmfile.yaml ← generated from template - ├── handler.py ← copied from --handler - └── values.yaml ← generated (price, chain, wallet, etc.) - -3. Run helmfile sync: - helmfile -f $CONFIG_DIR/services//helmfile.yaml sync - - This creates: - - Namespace: agent-svc- - - ConfigMap: handler-code (contains handler.py) - - ConfigMap: svc-metadata (pricing, description, endpoints) - - Deployment: agent-svc- (2 containers: handler + x402 proxy) - - Service: agent-svc- (ClusterIP, port 8402) - - HTTPRoute: agent-svc- (path: /services//*) - -4. Wait for pod ready - -5. If --register flag: - sh scripts/identity.sh --from $WALLET register \ - --uri "ipfs://$(pin metadata.json)" - # Or update existing agent's service endpoints -``` - -### Handler Template (`handler.py.tmpl`) - -The agent gets a minimal scaffold to fill in. No x402 awareness needed — just return HTTP responses. - -```python -#!/usr/bin/env python3 -""" -Agent service handler — {{.Name}} -{{.Description}} - -This runs behind an x402 payment proxy. Requests that reach this -handler have already been paid for. Just return the data. - -Serve on port 8080 (the proxy forwards paid requests here). -""" -import json -from http.server import HTTPServer, BaseHTTPRequestHandler - - -class Handler(BaseHTTPRequestHandler): - def do_GET(self): - """Handle GET requests.""" - # TODO: implement your service logic here - data = {"message": "Hello from {{.Name}}"} - - self.send_response(200) - self.send_header("Content-Type", "application/json") - self.end_headers() - self.wfile.write(json.dumps(data).encode()) - - def do_POST(self): - """Handle POST requests.""" - content_length = int(self.headers.get("Content-Length", 0)) - body = self.rfile.read(content_length) if content_length else b"" - - # TODO: process the request body - data = {"received": len(body)} - - self.send_response(200) - self.send_header("Content-Type", "application/json") - self.end_headers() - self.wfile.write(json.dumps(data).encode()) - - def log_message(self, format, *args): - """Structured logging.""" - print(f"[{{.Name}}] {args[0]}") - - -if __name__ == "__main__": - server = HTTPServer(("0.0.0.0", 8080), Handler) - print(f"[{{.Name}}] Serving on :8080") - server.serve_forever() -``` - -### Helmfile Template (`helmfile.yaml.gotmpl`) - -```yaml -releases: - - name: agent-svc-{{ .Values.name }} - namespace: agent-svc-{{ .Values.name }} - createNamespace: true - chart: bedag/raw - version: 2.1.0 - values: - - resources: - # --- Handler code as ConfigMap --- - - apiVersion: v1 - kind: ConfigMap - metadata: - name: handler-code - data: - handler.py: | -{{ .Values.handlerCode | indent 16 }} - - # --- Service metadata for discovery --- - - apiVersion: v1 - kind: ConfigMap - metadata: - name: svc-metadata - labels: - app.kubernetes.io/part-of: obol.stack - obol.stack/app: agent-service - obol.stack/service-name: {{ .Values.name }} - data: - metadata.json: | - { - "name": "{{ .Values.name }}", - "description": "{{ .Values.description }}", - "pricing": { - "pricePerRequest": "{{ .Values.price }}", - "currency": "USDC", - "chain": "{{ .Values.chain }}" - }, - "endpoints": { - "external": "{{ .Values.publicURL }}/services/{{ .Values.name }}", - "internal": "http://agent-svc-{{ .Values.name }}.agent-svc-{{ .Values.name }}.svc.cluster.local:8402" - } - } - - # --- Deployment: handler + x402 proxy sidecar --- - - apiVersion: apps/v1 - kind: Deployment - metadata: - name: agent-svc-{{ .Values.name }} - spec: - replicas: 1 - selector: - matchLabels: - app: agent-svc-{{ .Values.name }} - template: - metadata: - labels: - app: agent-svc-{{ .Values.name }} - spec: - containers: - # Handler container — agent's business logic - - name: handler - image: python:3.12-slim - command: ["python3", "/app/handler.py"] - ports: - - containerPort: 8080 - volumeMounts: - - name: handler-code - mountPath: /app - readinessProbe: - httpGet: - path: / - port: 8080 - initialDelaySeconds: 3 - periodSeconds: 5 - - # x402 proxy sidecar — payment verification + settlement - - name: x402-proxy - image: ghcr.io/obolnetwork/inference-gateway:latest - args: - - --listen=:8402 - - --upstream=http://localhost:8080 - - --wallet={{ .Values.wallet }} - - --price={{ .Values.price }} - - --chain={{ .Values.chain }} - - --facilitator={{ .Values.facilitator }} - ports: - - containerPort: 8402 - readinessProbe: - httpGet: - path: /health - port: 8402 - initialDelaySeconds: 5 - periodSeconds: 10 - - volumes: - - name: handler-code - configMap: - name: handler-code - - # --- Service --- - - apiVersion: v1 - kind: Service - metadata: - name: agent-svc-{{ .Values.name }} - spec: - selector: - app: agent-svc-{{ .Values.name }} - ports: - - port: 8402 - targetPort: 8402 - name: x402 - - # --- HTTPRoute (Traefik) --- - - apiVersion: gateway.networking.k8s.io/v1 - kind: HTTPRoute - metadata: - name: agent-svc-{{ .Values.name }} - spec: - parentRefs: - - name: traefik-gateway - namespace: traefik - sectionName: web - rules: - - matches: - - path: - type: PathPrefix - value: /services/{{ .Values.name }} - filters: - - type: URLRewrite - urlRewrite: - path: - type: ReplacePrefixMatch - replacePrefixMatch: / - backendRefs: - - name: agent-svc-{{ .Values.name }} - port: 8402 -``` - ---- - -## Integration With Existing Skills - -| Skill | Integration point | -|-------|------------------| -| `agent-identity` | `--register` flag calls `identity.sh register` or `identity.sh set-uri` to advertise the service endpoint in ERC-8004 | -| `local-ethereum-wallet` | Wallet address for x402 payment settlement; `signer.py` for any onchain operations | -| `ethereum-networks` | `rpc.sh` to check USDC balance, query payment transactions, verify settlement | -| `obol-stack` | `kube.py` to monitor service pod health, logs, events | -| `standards` | x402 protocol reference, pricing strategies, facilitator documentation | - ---- - -## RBAC Requirements - -The OpenClaw pod currently has **read-only access to its own namespace**. To deploy services, it needs: - -### Option A: Expand OpenClaw's RBAC (Simple, Less Isolated) - -Add a ClusterRole that lets OpenClaw create resources in `agent-svc-*` namespaces: - -```yaml -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: openclaw-service-deployer -rules: - - apiGroups: [""] - resources: ["namespaces", "configmaps", "services"] - verbs: ["get", "list", "create", "update", "delete"] - - apiGroups: ["apps"] - resources: ["deployments"] - verbs: ["get", "list", "create", "update", "delete"] - - apiGroups: ["gateway.networking.k8s.io"] - resources: ["httproutes"] - verbs: ["get", "list", "create", "update", "delete"] -``` - -### Option B: Deploy via `obol` CLI (Preferred, Uses Existing Patterns) - -Don't give OpenClaw direct k8s write access. Instead: - -1. `service.sh` writes the helmfile + handler to the **host PVC** (same pattern as skills injection) -2. A lightweight controller or CronJob watches for new service definitions and runs `helmfile sync` -3. Or: the agent calls `obol` CLI via the existing passthrough pattern - -**Recommended: Option B** — it follows the existing principle that OpenClaw doesn't mutate cluster state directly. The `obol` binary handles deployment, OpenClaw handles the intent. - -In practice, `service.sh deploy` would: -1. Write helmfile + handler + values to `$DATA_DIR/services//` -2. Call the `obol` CLI wrapper (already available in `$PATH`) to run helmfile sync -3. The `obol` CLI has full kubeconfig access and handles the deployment - -This mirrors how `obol network install` + `obol network sync` work — config is staged, then synced. - ---- - -## Service Lifecycle - -### Deploy -``` -Agent writes handler → service.sh deploy → helmfile sync → pod running → HTTPRoute active → tunnel exposes → ERC-8004 registered -``` - -### Update Handler -``` -Agent edits handler → service.sh update → ConfigMap patched → pod restarted → same URL, new logic -``` - -### Update Price -``` -service.sh set-price → x402 proxy config updated → restarts sidecar only → price change takes effect -``` - -### Teardown -``` -service.sh teardown → helmfile destroy → namespace deleted → ERC-8004 URI updated (mark inactive) -``` - -### Monitor -``` -service.sh earnings → rpc.sh checks USDC balance → shows delta since deployment -service.sh status → pod health + request count + uptime + reputation score -``` - ---- - -## Pricing Strategies (Reference Material) - -The `x402-server-patterns.md` reference would cover: - -### Scheme: `exact` (Live) -Fixed price per request. Simple, predictable. -``` -Price: $0.10 USDC per weather query -Price: $0.001 USDC per data point -``` - -### Scheme: `upto` (Emerging) -Client authorises a maximum, server settles actual cost. Critical for metered services: -``` -LLM inference: max $0.50, settle per token generated -Compute jobs: max $1.00, settle per second of runtime -Data queries: max $0.10, settle per row returned -``` - -### Free Tier Pattern -Set price to 0 for discovery/reputation building. Upgrade later: -```bash -# Start free to build reputation -sh scripts/service.sh deploy --name weather-api --handler ./handler.py --price 0 --register - -# After building reputation, add pricing -sh scripts/service.sh set-price --name weather-api --price 0.05 -``` - -### Chain Selection -| Chain | Gas cost per settlement | Best for | -|-------|------------------------|----------| -| Base | ~$0.001 | Consumer services, micropayments | -| Base Sepolia | Free (testnet) | Development, testing | -| Polygon | ~$0.005 | Medium-value services | -| Avalanche | ~$0.01 | Higher-value services | - ---- - -## Implementation Order - -| Phase | Work | Effort | Dependencies | -|-------|------|--------|-------------| -| **1** | Create `agent-services` SKILL.md | Small | None | -| **2** | Create `service.sh` — scaffold + deploy + teardown | Large | Helmfile template | -| **3** | Create helmfile.yaml.gotmpl + handler.py.tmpl | Medium | Inference gateway image | -| **4** | Create `x402-server-patterns.md` reference | Small | None | -| **5** | Add `service.sh` — update, set-price, list, status | Medium | Phase 2 | -| **6** | Add `service.sh` — earnings monitoring, logs, health | Small | Phase 2 | -| **7** | Add `--register` flag (ERC-8004 integration) | Small | `agent-identity` skill | -| **8** | Add RBAC / obol CLI integration for deployment | Medium | Decision on Option A vs B | -| **9** | Test end-to-end: deploy → pay → earn → rate cycle | Large | All phases | - -### Phase 1-4 delivers a working MVP. Phases 5-9 add polish and integration. - ---- - -## Validation Criteria - -- [ ] Agent can scaffold a handler template with `service.sh scaffold` -- [ ] Agent can deploy a handler that serves HTTP on a public URL -- [ ] Unauthenticated requests receive `402 Payment Required` with pricing info -- [ ] Paid requests (valid x402 signature) reach the handler and return data -- [ ] Payment settles onchain (USDC transferred to agent's wallet) -- [ ] Agent can update handler code without changing the URL -- [ ] Agent can update pricing without redeploying -- [ ] Agent can tear down a service cleanly -- [ ] Agent can list deployed services with status -- [ ] Agent can check USDC earnings -- [ ] `--register` flag creates/updates ERC-8004 registration with service endpoint -- [ ] Service is discoverable by other agents via ERC-8004 + reputation queries -- [ ] All scripts are POSIX sh, work in the OpenClaw pod -- [ ] Follows existing Obol Stack patterns (helmfile, namespace isolation, Traefik HTTPRoute) - ---- - -## Open Questions - -1. **x402 proxy image:** Reuse `inference-gateway` as-is, or extract a generic `x402-proxy` image? The inference gateway already accepts `--upstream` so it works, but the name is misleading for non-inference services. - -2. **Handler language:** Start with Python-only (stdlib HTTPServer, no dependencies)? Or support a generic Docker image where the agent provides a Dockerfile? - -3. **ConfigMap size limit:** Handler code goes in a ConfigMap (1MB limit). For larger services, should we use the PVC injection pattern instead? 1MB is generous for a Python handler but could be limiting for services with bundled data. - -4. **Multi-endpoint services:** One handler = one service = one price? Or support multiple endpoints with different prices within a single service? The x402 middleware can be configured per-path. - -5. **Service discovery by other agents:** Beyond ERC-8004 registration, should there be an in-cluster service registry (ConfigMap-based, like the inference metadata pattern) so co-located agents can discover each other without going onchain? - -6. **Auto-restart on failure:** Should the skill configure liveness probes to auto-restart crashed handlers? The template includes readiness probes but not liveness. - -7. **Rate limiting:** Should there be built-in rate limiting to prevent abuse even with x402 payments? Or is the payment itself sufficient protection? diff --git a/plans/litellmrouting.md b/plans/litellmrouting.md deleted file mode 100644 index f4b731c4..00000000 --- a/plans/litellmrouting.md +++ /dev/null @@ -1,123 +0,0 @@ -# LiteLLM + OpenClaw Smart Routing - -## Context - -When `obol model setup anthropic` adds a cloud provider, OpenClaw can't use the new models because: -1. LiteLLM requires every model to be individually registered in `model_list` -2. OpenClaw's per-agent `models.json` persists stale config (old URLs, old model lists) -3. OpenClaw requires an explicit model allowlist — it does NOT auto-discover from `/v1/models` -4. The sync between LiteLLM config and OpenClaw config is fragile and multi-step - -**Goal**: `obol model setup anthropic` → any Claude model immediately works in OpenClaw. Same for OpenAI. Ollama models work as soon as they're pulled. Direct-to-provider wiring preserved. - -## Approach: Wildcards for Cloud + Explicit for Ollama + Host-Side Patching - -### Why This Approach - -| Feature | LiteLLM | OpenClaw | -|---------|---------|----------| -| `anthropic/*` wildcard | Works | N/A (LiteLLM-side) | -| `openai/*` wildcard | Works | N/A | -| `ollama_chat/*` wildcard | **Broken** | N/A | -| File watcher hot-reload | N/A | **Yes** — hot-applies model changes | - -**Key insight**: LiteLLM wildcards handle cloud routing, but OpenClaw needs an explicit model allowlist. We solve this with: (a) wildcards in LiteLLM so any model routes, and (b) writing a clean `models.json` to OpenClaw's host-side PVC which its file watcher picks up. - -### End-to-End Flows - -**`obol model setup anthropic --api-key sk-ant-...`**: -1. LiteLLM gets `anthropic/*` wildcard + API key in Secret → restarts -2. `syncOpenClawModels()` queries running LiteLLM `/v1/models` for actual available models (falls back to baked-in well-known list if cluster unreachable) -3. Writes clean `models.json` to host PVC (replaces entire file) -4. OpenClaw file watcher hot-reloads — Claude models immediately available, no pod restart - -**`obol model setup ollama`** (new models detected): -1. Explicit `ollama_chat/` entries added to LiteLLM (no wildcards) -2. `syncOpenClawModels()` queries LiteLLM, updates `models.json` -3. OpenClaw hot-reloads - -**Direct-to-provider** (`obol openclaw setup` → choose Anthropic direct): -- Unchanged — `buildDirectProviderOverlay()` is a separate code path, no LiteLLM involved - -## Changes - -### 1. LiteLLM: Wildcard entries for cloud providers - -**File**: `internal/model/model.go` — `buildModelEntries()` - -``` -anthropic → wildcard: model_name: "anthropic/*", model: "anthropic/*" - + explicit entries for requested models (better /v1/models) -openai → wildcard: model_name: "openai/*", model: "openai/*" - + explicit entries for requested models -ollama → unchanged (explicit ollama_chat/ entries) -``` - -### 2. LiteLLM: Enable `drop_params: true` - -**File**: `internal/embed/infrastructure/base/templates/llm.yaml` (line 71) - -Cross-provider compatibility — LiteLLM drops unsupported params instead of erroring when routing across providers. - -### 3. Model list: Live query + baked-in fallback - -**File**: `internal/model/model.go` — `GetConfiguredModels()` - -When syncing to OpenClaw: -1. **Try**: Query running LiteLLM pod's `/v1/models` endpoint (with `check_provider_endpoint: true` so wildcards expand to real models) -2. **Fallback**: Expand wildcards using baked-in `wellKnownModels` map if cluster unreachable - -```go -var wellKnownModels = map[string][]string{ - "anthropic": {"claude-sonnet-4-6", "claude-opus-4", "claude-sonnet-4-5-20250929", "claude-haiku-3-5-20241022"}, - "openai": {"gpt-4o", "gpt-4o-mini", "o3", "o3-mini"}, -} -``` - -### 4. Host-side `models.json` patching (clean replacement) - -**File**: `internal/openclaw/openclaw.go` — new `patchAgentModelsJSON()` - -Writes a **clean** `models.json` to `$DATA_DIR/openclaw-/openclaw-data/.openclaw/agents/main/agent/models.json`. Replaces entire file — no backward-compatible merge needed (the stale llmspy config never shipped). Contains only the `openai` provider pointing at LiteLLM with the current model list. - -### 5. Update `SyncOverlayModels()` — file watcher only, no helmfile re-sync - -**File**: `internal/openclaw/openclaw.go` - -After patching the overlay YAML, also call `patchAgentModelsJSON()` for each instance. **Skip helmfile re-sync** — OpenClaw's file watcher handles `models.json` changes in <1s. Only do helmfile sync when overlay YAML changes that affect the Helm release (e.g. new provider added, not just model list updates). - -### 6. Add `obol model sync` CLI command - -**File**: `cmd/obol/model.go` - -Manual escape hatch: re-reads LiteLLM config (live query) and pushes to all OpenClaw instances. Useful when new models appear after binary was built. - -### 7. Update `detectProvider()` for wildcards - -**File**: `internal/model/model.go` - -Handle wildcard model names (`anthropic/*`, `openai/*`) in provider detection logic. - -### 8. Tests - -- `model_test.go`: wildcard entry generation, wildcard expansion, provider detection for wildcards -- `overlay_test.go`: `models.json` clean write, end-to-end sync - -## Files to Modify - -| File | Changes | -|------|---------| -| `internal/model/model.go` | `buildModelEntries()` wildcards, `GetConfiguredModels()` live query + fallback, `detectProvider()` wildcards, `wellKnownModels` map | -| `internal/openclaw/openclaw.go` | New `patchAgentModelsJSON()`, update `SyncOverlayModels()` to patch models.json + skip helmfile sync | -| `internal/embed/infrastructure/base/templates/llm.yaml` | `drop_params: true` | -| `cmd/obol/model.go` | New `model sync` subcommand | -| `internal/model/model_test.go` | Tests for wildcards | -| `internal/openclaw/overlay_test.go` | Tests for models.json patching | - -## Verification - -1. `go build ./...` + `go test ./...` -2. `obol model setup anthropic --api-key sk-ant-...` → LiteLLM has `anthropic/*` → OpenClaw `models.json` has Claude models → inference works -3. `obol model setup ollama` → new models appear in OpenClaw -4. `obol model sync` → refreshes all instances from live LiteLLM -5. `obol openclaw setup` → direct Anthropic → still works (no LiteLLM) diff --git a/plans/monetise.md b/plans/monetise.md deleted file mode 100644 index 118eaeec..00000000 --- a/plans/monetise.md +++ /dev/null @@ -1,480 +0,0 @@ -# Obol Agent: Autonomous Compute Monetization - -**Branch:** `feat/secure-enclave-inference` | **Date:** 2026-02-25 | **Status:** Architecture proposal - ---- - -## 1. The Goal - -A singleton OpenClaw instance — the **obol-agent** — deployed via `obol agent init`, autonomously monetizes compute resources running in the Obol Stack. A user (or the frontend) declares *what* to expose via a Custom Resource; the obol-agent handles *everything else*: model pulling, health validation, payment gating, public exposure, on-chain registration, and status reporting. - -No separate controller binary. No Go operator. The obol-agent is a regular OpenClaw instance with elevated RBAC and the `monetize` skill. Only one obol-agent can exist per cluster; other OpenClaw instances retain standard read-only access. - ---- - -## 2. How It Works - -``` - ┌──────────────────────────────────┐ - │ User / Frontend / obol CLI │ - │ │ - │ kubectl apply -f offer.yaml │ - │ OR: frontend POST to k8s API │ - │ OR: obol sell http ... │ - └──────────┬───────────────────────────┘ - │ creates CR - ▼ - ┌────────────────────────────────────┐ - │ ServiceOffer CR │ - │ apiVersion: obol.network/v1alpha1 │ - │ kind: ServiceOffer │ - └──────────┬───────────────────────────┘ - │ read by - ▼ - ┌────────────────────────────────────┐ - │ obol-agent (singleton OpenClaw) │ - │ namespace: openclaw- │ - │ │ - │ Cron job (every 60s): │ - │ python3 monetize.py process --all│ - │ │ - │ `monetize` skill: │ - │ 1. Read ServiceOffer CRs │ - │ 2. Pull model (if runtime=ollama) │ - │ 3. Health-check upstream service │ - │ 4. Create ForwardAuth Middleware │ - │ 5. Create HTTPRoute │ - │ 6. Register on ERC-8004 │ - │ 7. Update CR status │ - └────────────────────────────────────┘ -``` - -The obol-agent uses its mounted ServiceAccount token to talk to the Kubernetes API — the same pattern `kube.py` already uses for read-only monitoring, but extended with write operations for Middleware and HTTPRoute resources. - -The reconciliation loop is built on OpenClaw's native **cron system**: a `{ kind: "every", everyMs: 60000 }` job runs `monetize.py process --all` every 60 seconds. No sidecar, no K8s CronJob — the cron scheduler runs inside the OpenClaw Gateway process and persists across pod restarts. - ---- - -## 3. Why Not a Separate Controller - -| Concern | Go operator (controller-runtime) | OpenClaw with `monetize` skill | -|---------|----------------------------------|--------------------------------| -| New binary to build/maintain | Yes — new cmd/, Dockerfile, CI | No — skill is a SKILL.md + Python script | -| Hot-updatable logic | No — rebuild + redeploy image | Yes — update skill files on PVC | -| Error handling | Hardcoded retry/backoff | AI reasons about failures, adapts | -| Watch loop | Built-in informer cache | Built-in cron: `monetize.py process --all` every 60s | -| Dependencies | controller-runtime, kubebuilder, code-gen | stdlib Python (`urllib`, `json`, `ssl`) | -| Existing infrastructure | Needs new Deployment, SA, RBAC | Uses existing OpenClaw pod, SA, skill system | - -The traditional operator pattern is the right answer when you need guaranteed sub-second reconciliation with leader election. For monetization lifecycle (deploy → expose → register → monitor), OpenClaw acting on ServiceOffer CRs via skills is simpler and leverages everything already built. - ---- - -## 4. The CRD - -```yaml -apiVersion: obol.network/v1alpha1 -kind: ServiceOffer -metadata: - name: qwen-inference - namespace: openclaw-default # lives alongside the OpenClaw instance -spec: - # What to serve - model: - name: Qwen/Qwen3.5-35B-A3B # Ollama model tag to pull - runtime: ollama # runtime that serves the model - - # Upstream service (Ollama already running in-cluster) - upstream: - service: ollama # k8s Service name - namespace: openclaw-default # where the service runs - port: 11434 - healthPath: /api/tags # endpoint to probe after pull - - # How to price it - pricing: - amount: "0.50" - unit: MTok # per million tokens - currency: USDC - chain: base - - # Who gets paid - wallet: "0x1234...abcd" - - # Public path - path: /services/qwen-inference - - # On-chain advertisement - register: true -``` - -```yaml -status: - conditions: - - type: ModelReady - status: "True" - reason: PullCompleted - message: "Qwen/Qwen3.5-35B-A3B pulled and loaded on ollama" - - type: UpstreamHealthy - status: "True" - reason: HealthCheckPassed - message: "Model responds to inference at ollama.openclaw-default.svc:11434" - - type: PaymentGateReady - status: "True" - reason: MiddlewareCreated - message: "ForwardAuth middleware x402-qwen-inference created" - - type: RoutePublished - status: "True" - reason: HTTPRouteCreated - message: "Exposed at /services/qwen-inference via traefik-gateway" - - type: Registered - status: "True" - reason: ERC8004Registered - message: "Registered on Base (tx: 0xabc...)" - - type: Ready - status: "True" - reason: AllConditionsMet - endpoint: "https://stack.example.com/services/qwen-inference" - observedGeneration: 1 -``` - -**Design:** -- **Namespace-scoped** — the CR lives in the same namespace as the upstream service. This preserves OwnerReference cascade (garbage collection on delete) and avoids cross-namespace complexity. The obol-agent's ClusterRoleBinding lets it watch ServiceOffers across all namespaces via `GET /apis/obol.network/v1alpha1/serviceoffers` (cluster-wide list). -- **Conditions, not Phase** — [deprecated by API conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties). Conditions give granular insight into which step failed. -- **Status subresource** — prevents users from accidentally overwriting status. ([docs](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#status-subresource)) -- **Same-namespace as upstream** — the Middleware and HTTPRoute are created alongside the upstream service. OwnerReferences work (same namespace), so deleting the ServiceOffer garbage-collects the route and middleware. ([docs](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/)) - -### CRD installation - -The CRD manifest is embedded in the infrastructure helmfile (same pattern as `obol-agent.yaml`) and applied during `obol stack init`. No kubebuilder, no code-gen — just a static YAML manifest. - ---- - -## 5. The `monetize` Skill - -``` -internal/embed/skills/monetize/ -├── SKILL.md # Teaches OpenClaw when and how to use this skill -├── scripts/ -│ └── monetize.py # K8s API client for ServiceOffer lifecycle -└── references/ - └── x402-pricing.md # Pricing strategies, chain selection -``` - -### SKILL.md (summary) - -Teaches OpenClaw: -- When a user asks to monetize a service, create a ServiceOffer CR -- When asked to check monetization status, read ServiceOffer CRs and report conditions -- When asked to process offers, run the monetization workflow (health → gate → route → register) -- When asked to stop monetizing, delete the ServiceOffer CR (garbage collection handles cleanup) - -### kube.py extension - -`kube.py` gains write helpers (`api_post`, `api_patch`, `api_delete`) alongside its existing `api_get`. The read-only contract is preserved by convention: `kube.py` commands remain read-only; `monetize.py` imports the shared helpers and adds write operations. Pure Python stdlib — no new dependencies. - -Why not a K8s MCP server? The mounted ServiceAccount token already gives direct API access. An MCP server (e.g., Red Hat's `containers/kubernetes-mcp-server`) adds a sidecar container, image pull, and Helm chart changes for what amounts to wrapping the same REST calls. It's a known upgrade path if K8s operations outgrow script-based tooling, but adds no value today. - -### monetize.py - -``` -python3 monetize.py offers # list ServiceOffer CRs -python3 monetize.py process # run full workflow for one offer -python3 monetize.py process --all # process all pending offers -python3 monetize.py status # show conditions -python3 monetize.py create --upstream .. # create a ServiceOffer CR -python3 monetize.py delete # delete CR (cascades cleanup) -``` - -Each `process` invocation: - -1. **Read the ServiceOffer CR** from the k8s API -2. **Pull the model** — if `spec.model.runtime == ollama`, `POST /api/pull` to Ollama -3. **Health-check** — verify model responds at `..svc:` -4. **Create/update Middleware** — Traefik ForwardAuth pointing at `x402-verifier.x402.svc:8080/verify` -5. **Create/update HTTPRoute** — `parentRef: traefik-gateway`, path from spec, backend = upstream service, filter = the Middleware -6. **ERC-8004 registration** — if `spec.register`, call `signer.py` to sign and submit the registration tx -7. **Update CR status** — set conditions and endpoint - -All via the k8s REST API using the mounted ServiceAccount token. No kubectl, no client-go, no external dependencies. - ---- - -## 6. What Gets Created Per ServiceOffer - -All resources are created in the **same namespace** as the upstream service (and the ServiceOffer CR). OwnerReferences on the ServiceOffer handle cleanup. - -| Resource | Purpose | -|----------|---------| -| `Middleware` (traefik.io/v1alpha1) | ForwardAuth to `x402-verifier.x402.svc:8080/verify` — gates the upstream with payment | -| `HTTPRoute` (gateway.networking.k8s.io/v1) | Routes `spec.path` from Traefik Gateway to upstream, through the Middleware | - -That's it. Two resources. The upstream service already runs. The x402 verifier already runs. The Gateway already runs. The tunnel already runs. - -### Why no new namespace - -The upstream service already has a namespace. Creating a new namespace per offer would mean: -- Cross-namespace OwnerReferences don't work ([docs](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/)) -- Need ReferenceGrant for cross-namespace backend refs in HTTPRoute ([docs](https://gateway-api.sigs.k8s.io/api-types/referencegrant/)) -- Broader RBAC (namespace create/delete permissions) - -Instead: Middleware and HTTPRoute live alongside the upstream. Delete the ServiceOffer CR → Kubernetes cascades the deletion. - -### Cross-namespace HTTPRoute → Gateway - -The HTTPRoute references `traefik-gateway` in the `traefik` namespace. No ReferenceGrant needed — the Gateway's `allowedRoutes.namespaces.from: All` handles this. ([Gateway API docs](https://gateway-api.sigs.k8s.io/guides/multiple-ns/)) - -### Middleware locality - -Traefik's `ExtensionRef` in HTTPRoute is a `LocalObjectReference` — Middleware must be in the same namespace as the HTTPRoute. The skill creates it there. ([traefik#11126](https://github.com/traefik/traefik/issues/11126)) - ---- - -## 7. RBAC: Singleton obol-agent vs Regular OpenClaw - -### Two tiers of access - -| | obol-agent (singleton) | Regular OpenClaw instances | -|---|---|---| -| **Deployed by** | `obol agent init` | `obol openclaw onboard` | -| **RBAC** | `openclaw-monetize` ClusterRole | Namespace-scoped read-only Role (chart default) | -| **Skills** | All default skills + `monetize` | Default skills only | -| **Cron** | `monetize.py process --all` every 60s | No monetization cron | -| **Count** | Exactly one per cluster | Zero or more | - -Only the obol-agent gets the elevated ClusterRole. `obol agent init` enforces the singleton constraint — it refuses to create a second obol-agent if one already exists. - -### obol-agent ClusterRole - -```yaml -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: openclaw-monetize -rules: - # Read/write ServiceOffer CRs - - apiGroups: ["obol.network"] - resources: ["serviceoffers"] - verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - - apiGroups: ["obol.network"] - resources: ["serviceoffers/status"] - verbs: ["get", "update", "patch"] - - # Create Middleware and HTTPRoute in service namespaces - - apiGroups: ["traefik.io"] - resources: ["middlewares"] - verbs: ["get", "list", "create", "update", "patch", "delete"] - - apiGroups: ["gateway.networking.k8s.io"] - resources: ["httproutes"] - verbs: ["get", "list", "create", "update", "patch", "delete"] - - # Read pods/services/endpoints/deployments for health checks (any namespace) - - apiGroups: [""] - resources: ["pods", "services", "endpoints"] - verbs: ["get", "list"] - - apiGroups: ["apps"] - resources: ["deployments"] - verbs: ["get", "list"] - - apiGroups: [""] - resources: ["pods/log"] - verbs: ["get"] -``` - -This is bound to OpenClaw's ServiceAccount via ClusterRoleBinding — the skill needs to read services and create routes across namespaces (e.g., check health of Ollama in `openclaw-default`, create a route for an Ethereum node in `ethereum-knowing-wahoo`). - -### What is explicitly NOT granted - -| Excluded | Why | -|----------|-----| -| `secrets` (cluster-wide) | OpenClaw has secrets access in its own namespace only (chart default) | -| `rbac.authorization.k8s.io/*` | Cannot modify its own permissions | -| `namespaces` create/delete | Doesn't create namespaces | -| `deployments` create/update | Doesn't create workloads — gates existing ones | -| `configmaps` create (cluster-wide) | Reads config for diagnostics, doesn't write it | - -### How this gets applied - -The ClusterRole and ClusterRoleBinding are added to the OpenClaw helmfile generation in `internal/openclaw/openclaw.go`, same as the existing `rbac.create: true` overlay. When `obol openclaw onboard` runs, the chart deploys these RBAC resources alongside the pod. - -**Ref:** [RBAC Good Practices](https://kubernetes.io/docs/concepts/security/rbac-good-practices/) - -### Fix the existing `admin` RoleBinding - -The per-network `agent-rbac.yaml` currently binds the `admin` ClusterRole, which includes Secrets and RBAC manipulation. Replace with a scoped ClusterRole (read pods/services + write Middleware/HTTPRoute). - ---- - -## 8. Admission Policy Guardrail - -Defense-in-depth via [ValidatingAdmissionPolicy](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/) (GA in k8s 1.30, available in k3s 1.31): - -```yaml -apiVersion: admissionregistration.k8s.io/v1 -kind: ValidatingAdmissionPolicy -metadata: - name: openclaw-monetize-guardrail -spec: - failurePolicy: Fail - matchConstraints: - resourceRules: - - apiGroups: ["traefik.io"] - apiVersions: ["v1alpha1"] - operations: ["CREATE", "UPDATE"] - resources: ["middlewares"] - - apiGroups: ["gateway.networking.k8s.io"] - apiVersions: ["v1"] - operations: ["CREATE", "UPDATE"] - resources: ["httproutes"] - matchConditions: - - name: is-openclaw - expression: >- - request.userInfo.username.startsWith("system:serviceaccount:openclaw-") - validations: - # HTTPRoutes must reference traefik-gateway only - - expression: >- - object.spec.parentRefs.all(ref, - ref.name == "traefik-gateway" && ref.?namespace.orValue("traefik") == "traefik" - ) - message: "OpenClaw can only attach routes to traefik-gateway" - # Middlewares must use ForwardAuth to x402-verifier only - - expression: >- - !has(object.spec.forwardAuth) || - object.spec.forwardAuth.address.startsWith("http://x402-verifier.x402.svc") - message: "ForwardAuth must point to x402-verifier" -``` - -Even if RBAC allows creating any Middleware, the admission policy ensures OpenClaw can only create ForwardAuth rules pointing at the legitimate x402 verifier. A prompt injection can't make it route traffic to an attacker-controlled auth endpoint. - ---- - -## 9. The Full Flow - -``` -1. User: "Monetize Qwen3.5-35B-A3B on Ollama at $0.50 per M token on Base" - -2. OpenClaw (using monetize skill) creates the ServiceOffer CR: - python3 monetize.py create qwen-inference \ - --model Qwen/Qwen3.5-35B-A3B --runtime ollama \ - --upstream ollama --namespace openclaw-default --port 11434 \ - --price 0.50 --unit MTok --chain base --wallet 0x... --register - → Creates ServiceOffer CR via k8s API - -3. OpenClaw processes the offer: - python3 monetize.py process qwen-inference - - Step 1: Pull the model through Ollama - POST http://ollama.openclaw-default.svc:11434/api/pull - {"name": "Qwen/Qwen3.5-35B-A3B"} - → Streams download progress, waits for completion - → sets condition: ModelReady=True - - Step 2: Health-check the model is loaded - POST http://ollama.openclaw-default.svc:11434/api/generate - {"model": "Qwen/Qwen3.5-35B-A3B", "prompt": "ping", "stream": false} - → 200 OK, model responds - → sets condition: UpstreamHealthy=True - - Step 3: Create ForwardAuth Middleware - POST /apis/traefik.io/v1alpha1/namespaces/openclaw-default/middlewares - → ForwardAuth → x402-verifier.x402.svc:8080/verify - → sets condition: PaymentGateReady=True - - Step 4: Create HTTPRoute - POST /apis/gateway.networking.k8s.io/v1/namespaces/openclaw-default/httproutes - → parentRef: traefik-gateway, path: /services/qwen-inference - → filter: ExtensionRef to Middleware - → backendRef: ollama:11434 - → sets condition: RoutePublished=True - - Step 5: ERC-8004 registration - python3 signer.py ... (signs registration tx) - → sets condition: Registered=True - - Step 6: Update status - PATCH /apis/obol.network/v1alpha1/.../serviceoffers/qwen-inference/status - → Ready=True, endpoint=https://stack.example.com/services/qwen-inference - -4. User: "What's the status?" - python3 monetize.py status qwen-inference - → Shows conditions table + endpoint + model info - -5. External consumer pays and calls: - POST https://stack.example.com/services/qwen-inference/v1/chat/completions - X-Payment: - → Traefik → ForwardAuth (x402-verifier) → Ollama (Qwen3.5-35B-A3B) -``` - ---- - -## 10. What the `obol` CLI Does - -The CLI becomes a thin CRD client — no deployment logic, no helmfile: - -```bash -obol sell http --upstream ollama --price 0.001 --chain base -# → creates ServiceOffer CR (same as kubectl apply) - -obol sell list -# → kubectl get serviceoffers (formatted) - -obol sell status qwen-inference -# → shows conditions, endpoint, pricing - -obol sell delete qwen-inference -# → deletes CR (OwnerReference cascades cleanup) -``` - -The frontend can do the same via the k8s API directly. - ---- - -## 11. What We Keep, What We Drop, What We Add - -| Component | Action | Reason | -|-----------|--------|--------| -| `cmd/x402-verifier/` | **Keep** | ForwardAuth verifier — the payment gate | -| `internal/x402/` | **Keep** | Verifier handler | -| `internal/erc8004/` | **Keep** | On-chain registration (called by `monetize.py` via `signer.py`) | -| `internal/enclave/` | **Keep** | Secure Enclave signing (orthogonal to monetization) | -| `internal/inference/gateway.go` | **Drop** | Inline x402 middleware — replaced by ForwardAuth | -| `internal/inference/store.go` | **Drop** | Deployment config on disk — replaced by CRD | -| `obol-agent.yaml` (busybox pod) | **Drop** | OpenClaw IS the agent; no separate placeholder pod | -| `agent-rbac.yaml` (`admin` binding) | **Replace** | Scoped ClusterRole instead of `admin` | -| `cmd/obol/service.go` | **Simplify** | Thin CRD client | -| `cmd/obol/monetize.go` | **Simplify** | Thin CRD client | -| `internal/embed/skills/monetize/` | **Add** | New skill: SKILL.md + `monetize.py` + references | -| ServiceOffer CRD manifest | **Add** | Intent interface, applied during `obol stack init` | -| ValidatingAdmissionPolicy | **Add** | Guardrail on what OpenClaw can create | -| `openclaw-monetize` ClusterRole | **Add** | Scoped write access for Middleware/HTTPRoute | - ---- - -## 12. Resolved Decisions - -| Question | Decision | Rationale | -|----------|----------|-----------| -| **Polling vs event-driven** | OpenClaw cron job, every 60s | OpenClaw has a built-in cron scheduler (`{ kind: "every", everyMs: 60000 }`). No sidecar, no K8s CronJob — runs inside the Gateway process. Jobs persist across restarts via `~/.openclaw/cron/jobs.json`. | -| **Multi-instance** | Singleton obol-agent | Only one obol-agent per cluster, enforced by `obol agent init`. Other OpenClaw instances keep read-only RBAC and no `monetize` skill. No coordination problem. | -| **CRD scope** | Namespace-scoped | OwnerReference cascade works (same namespace as Middleware/HTTPRoute). The obol-agent's ClusterRoleBinding lets it list ServiceOffers across all namespaces. Standard `kubectl get serviceoffers -A` works. | -| **K8s API access** | Extend `kube.py` with write helpers | `kube.py` gains `api_post`, `api_patch`, `api_delete` alongside `api_get`. `monetize.py` imports the shared helpers. Pure stdlib, zero new dependencies. K8s MCP server (Red Hat `containers/kubernetes-mcp-server`) is a known upgrade path but unnecessary today. | - ---- - -## References - -| Topic | Link | -|-------|------| -| Custom Resource Definitions | https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ | -| CRD status subresource | https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#status-subresource | -| API conventions (conditions) | https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md | -| RBAC | https://kubernetes.io/docs/reference/access-authn-authz/rbac/ | -| RBAC good practices | https://kubernetes.io/docs/concepts/security/rbac-good-practices/ | -| ValidatingAdmissionPolicy | https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/ | -| OwnerReferences | https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/ | -| Cross-namespace routing (Gateway API) | https://gateway-api.sigs.k8s.io/guides/multiple-ns/ | -| ReferenceGrant | https://gateway-api.sigs.k8s.io/api-types/referencegrant/ | -| Accessing API from a pod | https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/ | -| Pod Security Standards | https://kubernetes.io/docs/concepts/security/pod-security-standards/ | -| Service account tokens | https://kubernetes.io/docs/concepts/security/service-accounts/ | -| Traefik ForwardAuth | https://doc.traefik.io/traefik/reference/routing-configuration/http/middlewares/forwardauth/ | -| Traefik Middleware locality | https://github.com/traefik/traefik/issues/11126 | diff --git a/plans/skills-host-path-injection-v3.md b/plans/skills-host-path-injection-v3.md deleted file mode 100644 index 4c54228d..00000000 --- a/plans/skills-host-path-injection-v3.md +++ /dev/null @@ -1,120 +0,0 @@ -# Skills Host-Path Injection v3 - -## Problem - -The ConfigMap-based skill injection (tar → kubectl create configmap → init container extraction → rollout restart) is fragile, complex, and failed in practice. We need a simpler approach. - -## Solution - -Write embedded skills directly to the host filesystem path that maps to `/data/.openclaw/skills/` inside the OpenClaw container. This is the native skills directory that OpenClaw watches with a file watcher. No ConfigMap, no init container, no restart needed. - -## Key Discovery: Volume Mount Chain - -``` -HOST $DATA_DIR - → k3d volume mount → /data on all k3d nodes - → local-path-provisioner → /data/// - → PVC mount in container → /data -``` - -- **PVC name** (from chart): `openclaw-data` -- **Namespace**: `openclaw-` (e.g. `openclaw-default`) -- **Container mount**: `/data` (persistence.mountPath) -- **State dir**: `/data/.openclaw` (OPENCLAW_STATE_DIR env) -- **Native skills dir watched by OpenClaw**: `/data/.openclaw/skills/` - -## Host Path Formula - -``` -$DATA_DIR / openclaw- / openclaw-data / .openclaw / skills / -``` - -| Mode | Concrete Path | -|------|---------------| -| **Dev** | `.workspace/data/openclaw-/openclaw-data/.openclaw/skills/` | -| **Prod** | `~/.local/share/obol/openclaw-/openclaw-data/.openclaw/skills/` | - -## Implementation Steps - -### 1. Add `skillsVolumePath()` helper - -Returns the host-side path to `/data/.openclaw/skills/` inside the PVC. - -```go -func skillsVolumePath(cfg *config.Config, id string) string { - namespace := fmt.Sprintf("%s-%s", appName, id) - return filepath.Join(cfg.DataDir, namespace, "openclaw-data", ".openclaw", "skills") -} -``` - -### 2. Add `injectSkillsToVolume()` function - -Copies staged skills from config dir directly to the host PVC path. -Called BEFORE helmfile sync so skills are present at first pod boot. - -### 3. Rewrite `SkillsSync()` for runtime use - -`obol openclaw skills sync --from ` now copies to host path instead of creating ConfigMap. - -### 4. Remove old ConfigMap machinery from `doSync()` - -- Remove `ensureNamespaceExists()` call (only existed for pre-creating ConfigMap) -- Remove `syncStagedSkills()` call -- Replace with `injectSkillsToVolume()` call - -### 5. Disable chart skills feature in overlay - -Change overlay from: -```yaml -skills: - enabled: true - createDefault: false -``` -To: -```yaml -skills: - enabled: false -``` - -This removes the init container, ConfigMap volume, and `skills.load.extraDirs` config entirely. OpenClaw uses its native file watcher on `/data/.openclaw/skills/`. - -### 6. Update `copyWorkspaceToPod()` to use host path - -Same pattern — write directly to `$DATA_DIR/openclaw-/openclaw-data/.openclaw/workspace/` instead of kubectl cp. - -## Revised Data Flow - -``` -Embedded skills (internal/embed/skills/) - │ stageDefaultSkills() - ▼ -$CONFIG_DIR/applications/openclaw//skills/ ← staged source - │ injectSkillsToVolume() - ▼ -$DATA_DIR/openclaw-/openclaw-data/.openclaw/skills/ ← host PVC path - │ k3d volume mount - ▼ -Container: /data/.openclaw/skills/ ← native watched dir - │ OpenClaw file watcher - ▼ -Skills loaded ✓ -``` - -## Revised `doSync()` Flow - -**Before**: ensureNamespace → stageSkills → syncStagedSkills(ConfigMap) → helmfile sync → copyWorkspaceToPod(kubectl cp) - -**After**: stageSkills → injectSkillsToVolume(host path) → helmfile sync → copyWorkspaceToVolume(host path) - -## Files Modified - -- `internal/openclaw/openclaw.go` — all changes -- `internal/openclaw/overlay_test.go` — update expected overlay output - -## What Gets Deleted - -- `syncStagedSkills()` function -- ConfigMap creation logic in `SkillsSync()` (rewritten for host-path) -- `ensureNamespaceExists()` call in `doSync()` (before helmfile sync) -- `skills.enabled: true` / `skills.createDefault: false` from overlay -- tar archiving, kubectl delete/create configmap, rollout restart diff --git a/plans/skills-system-redesign-v2.md b/plans/skills-system-redesign-v2.md deleted file mode 100644 index be6fc0ac..00000000 --- a/plans/skills-system-redesign-v2.md +++ /dev/null @@ -1,253 +0,0 @@ -# Skills System Redesign v2 — Final Implementation Record - -> Distilled from v1 notes + Opus analysis. All open questions resolved. Implementation complete. -> The original `skills-system-redesign.md` is preserved as-is for reference. - ---- - -## Guiding Principles - -1. **Stock openclaw feel** — the user should not notice they're in a k8s pod. Lean on native openclaw CLI for skill management. -2. **Don't overengineer** — no custom registries, no git sparse-checkout, no lock files for MVP. Ship the simplest thing that works. -3. **Two delivery channels**: compile-time (embedded in obol binary, staged to host, pushed as ConfigMap) and runtime (`kubectl exec` running native openclaw-cli in-pod). -4. **Smart default resolution** — 0 instances: prompt setup. 1 instance: assume it. 2+ instances: require name. - ---- - -## Architecture - -``` - ┌─────────────────────────────┐ - │ obol CLI binary │ - │ (embedded SKILL.md files) │ - └────────────┬────────────────┘ - │ - ┌────────────────────────┼────────────────────────┐ - │ │ │ - ┌────────▼────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐ - │ obol openclaw │ │ obol openclaw │ │ obol openclaw │ - │ onboard / sync │ │ skills add/remove │ │ skills list │ - │ (compile-time) │ │ (runtime) │ │ (runtime) │ - └────────┬────────┘ └─────────┬─────────┘ └─────────┬─────────┘ - │ │ │ - │ stageDefaultSkills │ kubectl exec │ kubectl exec - │ → host config dir │ -c openclaw │ -c openclaw - │ syncStagedSkills │ openclaw skills add │ openclaw skills list - │ → ConfigMap │ (native openclaw CLI) │ (native openclaw CLI) - │ │ │ - └────────────────────────┼────────────────────────┘ - │ - ┌────────▼────────┐ - │ OpenClaw Pod │ - │ ConfigMap mount │ - │ + PVC-backed │ - │ ~/.openclaw/ │ - │ skills/ │ - └─────────────────┘ -``` - -### How skills reach the pod - -| Channel | Mechanism | When | Persistence | -|---------|-----------|------|-------------| -| **Compile-time** (Obol defaults) | Embedded → staged to `$CONFIG_DIR/.../skills/` → pushed as ConfigMap via `SkillsSync()` | Every `doSync()` (onboard and sync) | ConfigMap — chart mounts it | -| **Runtime add/remove** | `kubectl exec -c openclaw deploy/openclaw -- node openclaw.mjs skills add ` | User runs `obol openclaw skills add ...` | PVC — survives restarts | -| **Runtime list** | `kubectl exec -c openclaw deploy/openclaw -- node openclaw.mjs skills list` | User runs `obol openclaw skills list` | Read-only | - -### Why ConfigMap over kubectl cp - -The initial implementation used `kubectl cp` to copy skills directly into the pod. This required the pod to be Running, which fails on first deploy when the image pull takes >60s. The ConfigMap approach: -- Works without waiting for the pod (namespace is sufficient) -- Skills are available when the pod starts (chart's init container extracts them) -- Self-healing: `doSync()` stages defaults if missing, pushes every sync -- The host-path PV backing each PVC remains a fallback if ConfigMap hits limits - ---- - -## Part 1: Default Instance Resolution - -### Implementation: `internal/openclaw/resolve.go` - -```go -func ResolveInstance(cfg *config.Config, args []string) (id string, remaining []string, err error) -func ListInstanceIDs(cfg *config.Config) ([]string, error) -``` - -- **0 instances** → error: `no OpenClaw instances found — run 'obol agent init' to create one` -- **1 instance** → auto-select, return args unchanged -- **2+ instances** → consume `args[0]` if it matches an instance name, else error listing all - -Wired into all subcommands: `sync`, `setup`, `delete`, `token`, `dashboard`, `cli`, `skills`. - -Not needed for: `onboard` (creates new), `list` (shows all). - -### Tests: `internal/openclaw/resolve_test.go` - -9 unit tests covering all 0/1/2+ scenarios, including edge cases (no args, unknown name). - ---- - -## Part 2: Compile-Time Skills (Default Obol Skills) - -### What we embed - -``` -internal/embed/skills/ -├── hello/ -│ └── SKILL.md -└── ethereum/ - └── SKILL.md -``` - -### Delivery (two-stage: stage on host, push as ConfigMap) - -**Stage 1 — `stageDefaultSkills(deploymentDir)`** (called during `Onboard()` before sync, and inside `doSync()` for self-healing): - -- Writes embedded skills to `$CONFIG_DIR/applications/openclaw//skills/` -- **Skips** if `skills/` directory already exists (user customisation takes precedence) - -**Stage 2 — `syncStagedSkills(cfg, id, deploymentDir)`** (called inside `doSync()` after helmfile sync): - -- Checks `skills/` dir has subdirectories -- Calls existing `SkillsSync()` to package into ConfigMap `openclaw--skills` -- Chart's `extract-skills` init container unpacks it on pod (re)start - -**Self-healing**: `doSync()` calls `stageDefaultSkills()` before `syncStagedSkills()`. Instances created before the skills feature get defaults on their next sync. - -### Files - -| File | Status | -|------|--------| -| `internal/embed/skills/hello/SKILL.md` | Created | -| `internal/embed/skills/ethereum/SKILL.md` | Created | -| `internal/embed/embed.go` | Modified — `skillsFS`, `CopySkills()`, `GetEmbeddedSkillNames()` | -| `internal/openclaw/openclaw.go` | Modified — `stageDefaultSkills()`, `syncStagedSkills()`, wired into `Onboard()` + `doSync()` | - ---- - -## Part 3: Runtime Skill Management (`obol openclaw skills`) - -### CLI structure - -``` -obol openclaw skills [instance-name] -├── add → kubectl exec -c openclaw ... node openclaw.mjs skills add -├── remove → kubectl exec -c openclaw ... node openclaw.mjs skills remove -├── list → kubectl exec -c openclaw ... node openclaw.mjs skills list -└── sync --from → packages local dir as ConfigMap (existing SkillsSync mechanism) -``` - -### Implementation - -Thin wrappers in `internal/openclaw/openclaw.go`: - -```go -func SkillAdd(cfg, id, args) → cliViaKubectlExec(cfg, ns, ["skills", "add", ...args]) -func SkillRemove(cfg, id, args) → cliViaKubectlExec(cfg, ns, ["skills", "remove", ...args]) -func SkillList(cfg, id) → cliViaKubectlExec(cfg, ns, ["skills", "list"]) -``` - -`cliViaKubectlExec` uses `-c openclaw` to explicitly target the main container (pod has an `extract-skills` init container that confuses the default container selection). - -### Files - -| File | Status | -|------|--------| -| `cmd/obol/openclaw.go` | Modified — `skills` subcommand group with `add`, `remove`, `list`, `sync` | -| `internal/openclaw/openclaw.go` | Modified — `SkillAdd()`, `SkillRemove()`, `SkillList()` | - ---- - -## Part 4: CLI Structure (Final) - -``` -obol openclaw -├── onboard [--id ] [--force] [--no-sync] -├── sync [instance-name] -├── setup [instance-name] -├── list -├── delete [instance-name] -├── token [instance-name] -├── dashboard [instance-name] -├── cli [instance-name] [-- ] -└── skills [instance-name] - ├── add - ├── remove - ├── list - └── sync --from -``` - -All subcommands (except `onboard` and `list`) auto-resolve the instance when only one exists. - ---- - -## Part 5: Default Obol Skill Content - -### `hello` (SKILL.md) - -Smoke test. Says hello when invoked, confirms skills are loaded. - -### `ethereum` (SKILL.md) - -Ethereum JSON-RPC access via eRPC. Key details: -- Base URL: `http://erpc.erpc.svc.cluster.local:4000` -- Discovery: `GET /` returns config with connected networks -- RPC pattern: `POST /rpc/` with standard JSON-RPC -- Read-only: no write transactions -- Common methods: `eth_blockNumber`, `eth_syncing`, `eth_getBalance`, `eth_call`, `eth_chainId`, etc. - ---- - -## Decisions Made (resolving v1 open questions) - -| Question | Decision | Rationale | -|---|---|---| -| ConfigMap 1MB limit | **Not a concern for MVP** — text SKILL.md files are tiny | Can switch to PVC host-path if needed | -| Skill dependencies | **No** | Skills are independent instruction files | -| Private repo support | **Deferred** — `kubectl exec openclaw skills add` handles natively | Pod fetches from wherever openclaw-cli can | -| Helm chart init container | **Already exists** — `extract-skills` init container unpacks ConfigMap | No chart changes needed | -| Skill validation | **No** — trust skill author | Broken skills just won't work | -| Community skill registry | **Not for MVP** | GitHub repos are sufficient | -| Lock file | **Not for MVP** | Skills are embedded (versioned with binary) or runtime-added | -| GitHub fetching in obol CLI | **Not for MVP** | openclaw-cli in pod does this natively | -| Skill naming | **Plain names** — `hello`, `ethereum` | No `@obol/` prefix needed | -| Sandboxed skills | **Not for MVP** | Docker-in-k8s-in-Docker is fragile | -| Host-path PV for skills | **Fallback option** | Every PVC gets a hostPath PV; can write directly if ConfigMap hits limits | -| `skill` vs `skills` | **`skills` (plural)** | Matches openclaw-cli convention (`node openclaw.mjs skills ...`) | -| kubectl cp vs ConfigMap | **ConfigMap** | No pod readiness dependency; self-healing on every sync | -| Container targeting | **`-c openclaw` explicit** | Pod has `extract-skills` init container; must target main container | - ---- - -## What We Built - -1. **`ResolveInstance()`** — smart instance selection (0/1/2+ logic) for all openclaw subcommands -2. **2 embedded SKILL.md files** — `hello`, `ethereum` -3. **`stageDefaultSkills()` + `syncStagedSkills()`** — two-stage delivery: host staging → ConfigMap push -4. **Self-healing in `doSync()`** — stages defaults for pre-existing instances on next sync -5. **`obol openclaw skills add/remove/list`** — thin wrappers around `kubectl exec -c openclaw ... openclaw skills ...` -6. **`-c openclaw`** in `cliViaKubectlExec()` — explicit container targeting - -### Files created -- `internal/openclaw/resolve.go` -- `internal/openclaw/resolve_test.go` -- `internal/embed/skills/hello/SKILL.md` -- `internal/embed/skills/ethereum/SKILL.md` - -### Files modified -- `internal/embed/embed.go` — skills embed + `CopySkills()` + `GetEmbeddedSkillNames()` -- `internal/openclaw/openclaw.go` — staging, syncing, skill CLI wrappers, `-c openclaw` -- `cmd/obol/openclaw.go` — `ResolveInstance` refactor, `skills` subcommand group - ---- - -## Future Work (Phase 4+) - -| Skill | Priority | Notes | -|-------|----------|-------| -| `obol-wallet` | Nice to have | Web3Signer operations | -| `obol-doctor` | Next release | Stack health diagnostics | -| `obol-tunnel` | Future | Cloudflare tunnel management | -| `obol-deploy` | Future | Deploy apps/networks into the stack | - -When the skill set grows beyond ~10 skills or community contributions start, consider extracting to `github.com/ObolNetwork/openclaw-skills`. diff --git a/plans/skills-system-redesign.md b/plans/skills-system-redesign.md deleted file mode 100644 index d8f40465..00000000 --- a/plans/skills-system-redesign.md +++ /dev/null @@ -1,895 +0,0 @@ -/sc:workflow the ./plans/skills-system-redesign is a concatenation of my notes, and your plans (annotated by me answering your questions). I want you to study both, and take my choices into your implementation. Key things to consider are the refresh of how we do `default` openclaw instances (if we have none, prompt setup, 1 assume its a given, 2+ expect a name mid command you take out and use to route correctly ) in the obol cli. For compile time skills, we will copy them from obol-cli binary to the localhost path that corresponds with the openclaw-gateway's `~/.openclaw/skills`. For run time skill addition using the `obol openclaw skill` commands, lets try the approach of `kubectl exec ... ` running the openclaw-cli on the openclaw-gateway container, with the k8s secret auth token loaded etc. ask me any clarifying questions. don't overengineer features if you don't have to, we want the user to feel like they're using stock openclaw. output it as a new refined plan and keep this one. (Maybe do a cleaned version of this as an interim? we need to sort out the disjointed bits and multiple-choice etc) - -_______________ [My notes] _______ -Agent skills in obol openclaw - -Ideas gathering phase: -Local folder, obol-cli command to zip to .tgz and push to config map. Openclaw chart to detect and uncompress. -github.com/ObolNetwork/skills -Openclaw chart pulls these locally in an init script -Openclaw chart has helm sub packages which just contain skill repos? -What’s the advantage? To manage dependencies helm natively? -We create a derivative openclaw dockerfile, and embed skills in the image? -Review opus’s design -Lots of configurability, needs a tl;dr. -The idea of some skills in the cli so it can handle network/github api rate limits is cool. With local ollama someday you could have an offline, skill enabled obol agent. Should the skills just be in the chart though? Need to answer it about constraints -Some skills like using the stack itself may make more sense than the openclaw chart. The skill to use the stack is broader than that application. - - -we should figure out how a helm chart can bundle a set of skills, that other apps can find at runtime. -does the web3signer app expose a config map other namespaces can read? caps us at 1mb for all skills it exports -can they have shared disk across all apps (i.e. create a PV with them on it)? not easily but maybe if all the pvcs mount as read only that would work? -serve them like a webserver and expose a standard service to find them? ..svc.cluster.local/ -Reloading: “Changes to skills are picked up on the next agent turn when the watcher is enabled.” openclaw hot reloads files on disk -We’ll probably have to make this work for openclaw plugins almost as fast. - -Key note: -__________ -Locations and precedence -Skills are loaded from three places: -Bundled skills: shipped with the install (npm package or OpenClaw.app) -Managed/local skills: ~/.openclaw/skills -Workspace skills: /skills -If a skill name conflicts, precedence is: /skills (highest) → ~/.openclaw/skills → bundled skills (lowest) Additionally, you can configure extra skill folders (lowest precedence) via skills.load.extraDirs in ~/.openclaw/openclaw.json. - -__________ - -Actions: -We should sandbox skills by default maybe? (thats docker in k8s in docker though, so maybe asking for trouble? + routing difficulties to resources in the stack? - -Sandboxed skills + env vars -When a session is sandboxed, skill processes run inside Docker. The sandbox does not inherit the host process.env. Use one of: -agents.defaults.sandbox.docker.env (or per-agent agents.list[].sandbox.docker.env) -bake the env into your custom sandbox image -Global env and skills.entries..env/apiKey apply to host runs only. - - -~/.openclaw/openclaw.json - -{ - skills: { - allowBundled: ["gemini", "peekaboo"], - load: { - extraDirs: ["~/Projects/agent-scripts/skills", "~/Projects/oss/some-skill-pack/skills"], - watch: true, - watchDebounceMs: 250, - }, - install: { - preferBrew: true, - nodeManager: "npm", // npm | pnpm | yarn | bun (Gateway runtime still Node; bun not recommended) - }, - entries: { - "nano-banana-pro": { - enabled: true, - apiKey: "GEMINI_KEY_HERE", - env: { - GEMINI_API_KEY: "GEMINI_KEY_HERE", - }, - }, - peekaboo: { enabled: true }, - sag: { enabled: false }, - }, - }, -} - - - -Conclusion: - -We need to correctly set the openclaw config in our chart, and consider openclaw’s location precedence (above). If for example we put popular named skills in high inheritance places, that would put us in charge of the skill. (eth-wingman, etc) -Management commands: -Stick to openclaw standard and map straight into the gateway. -Requires a change to the obol openclaw CLI structure, I think its worth it. -When obol openclaw is called, first, we count how many instances are installed -If none are installed, prompt the user to do obol agent init -If exactly one is installed, assume that is default, pipe the rest of the commands into the openclaw cli (temporary pod, or the on-host way we have now). It needs to be able to speak to the openclaw gateway. -It needs to be coming from an IP that openclaw will accept for security reasons. -[I guess this depends on what part of the code writes the skill files. If its the CLI, then these files would appear on the host, and we’d be back to packaging them like i would like to avoid.] -1. We could exec on the openclaw container itself and do everything local to the container runtime, that should sort auth and folder writing perms? -2. Plan b, we could on the host write to: $HOME/.config/obol/applications/obol/openclaw/playful-rabbit/.openclaw/skills/ and rely on openclaw’s hot reload behaviour -If more than one instance is installed, then we have to interpret the next word of command input as a petname, use it to decide the host path to write the skill to, or the correct gateway to kubectl exec on, before giving the remainder to a correctly configured openclaw cli (if needed) - -Pre-installed skills: -We need: -[Must have] Ethereum Network (erpc) -[Nice to have] Ethereum Wallet (web3signer) -[Next Release] Obol Doctor (debugs the obol stack) (also will be an obol doctor) -[A plan for a skill to sensibly deploy stuff into the stack] -Open question: -How does LiteLLM handle the idea of telling llms about tools they can opt to use? Is it just dealt with at the openclaw layer and not a concern of the llm? -Same problem as always, how do we get this folder structure on the disk for the openclaw instance. -We can start with: -Embedding the skills in obol-cli (golang) and copying them to the appropriate host directory upon creation/import of a new openclaw instance -Import should be cognizant of folder conficts in case this is an imported openclaw. -Will we have an obol/ namespace in ~/.openclaw/skills? is that standard? - -Nice to haves / hardening: -We should see if we can get the sandboxed skills feature to work -Let people turn off telemetry? (for now we want plenty though to show adoption). - - - -Telemetry (install counts) -When you run clawhub sync while logged in, the CLI sends a minimal snapshot to compute install counts. You can disable this entirely: -export CLAWHUB_DISABLE_TELEMETRY=1 - - - - -https://docs.openclaw.ai/tools/skills-config#skills-config - - -_______________ [Opus 4.6] _______ - -# Skills System Redesign: Remote Skills & Default Skill Packs - -## Problem Statement - -OpenClaw skills are currently filesystem-based directories with a `SKILL.md` file. The only way to get custom skills into a deployed OpenClaw instance is: - -1. Have them locally on disk -2. Run `obol openclaw skills sync --from ` to package them into a ConfigMap -3. Run `obol openclaw sync ` to apply - -There is no way to: -- Pull skills from public GitHub repositories -- Share skills across the community -- Ship default Obol-specific skills with the stack -- Manage skill dependencies or versions -- Add/remove individual skills without replacing the entire set - -## Goals - -1. **`npx`-like skill fetching** - `obol openclaw skills add github.com/org/repo` pulls skills from GitHub -2. **Default Obol skill pack** - ships useful Obol Stack skills out of the box (k8s ops, network management, monitoring) -3. **Skill registry resolution** - shorthand like `@obol/ethereum` resolves to a known GitHub location -4. **Declarative skill management** - skills can be listed in config for reproducible setups -5. **Backward compatible** - existing `skills sync --from ` still works - -## Current Architecture - -### How OpenClaw Loads Skills (npm package internals) - -``` -Load precedence (highest → lowest): -1. /skills/ — per-agent workspace skills -2. ~/.openclaw/skills/ — managed/local skills -3. Bundled skills (npm package) — 40+ built-in skills -4. skills.load.extraDirs — additional paths from openclaw.json -``` - -Each skill is a directory containing `SKILL.md` with YAML frontmatter: - -```markdown ---- -name: my-skill -description: What it does -metadata: - openclaw: - requires: - bins: ["kubectl"] - env: ["KUBECONFIG"] ---- - -# Agent instructions for using this skill... -``` - -### How Obol Stack Delivers Skills Today - -``` -obol openclaw skills sync --from - │ - ├─ tar -czf skills.tgz -C . - ├─ kubectl delete configmap openclaw--skills (if exists) - ├─ kubectl create configmap openclaw--skills --from-file=skills.tgz= - └─ prints "To apply, re-sync: obol openclaw sync " -``` - -The Helm chart (remote `obol/openclaw v0.1.3`) mounts this ConfigMap and extracts it into the pod's skills directory. - -### Overlay Values (current) - -```yaml -skills: - enabled: true - createDefault: true # chart creates empty ConfigMap placeholder -``` - -### Key Constraints - -- The Helm chart is **remote** (`obol/openclaw` from `obolnetwork.github.io/helm-charts/`), not in this repo ANSWER: You can update this chart, its adjacent to you in ../helm-charts. -- Skills ConfigMap has a **1MB limit** (etcd object size limit) — fine for text-based SKILL.md files but limits total skill count ANSWER: lets modify folders on localhost, which are mapped straight into the pods PVs, and openclaw runs a file watcher so it will just detect and reload -- The pod needs skills at filesystem paths — whatever we do must end up as files in the container -- OpenClaw's `skills.load.extraDirs` config and `skills.entries` per-skill config are available levers. ANSWER: and knowing the right host path to write to to end up at ~/.openclaw/skills - ---- - -## Proposed Design - -### Architecture Overview - -``` - ┌─────────────────────────────┐ - │ GitHub / Git Repositories │ - │ │ - │ github.com/ObolNetwork/ │ - │ openclaw-skills/ │ - │ github.com/user/ │ - │ my-custom-skill/ │ - └──────────┬──────────────────┘ - │ - ┌───────────────────┼───────────────────┐ - │ │ │ - ┌─────▼─────┐ ┌───────▼───────┐ ┌──────▼──────┐ - │ CLI Fetch │ │ Init Container│ │ Declarative│ - │ (dev UX) │ │ (GitOps) │ │ Config │ - └─────┬─────┘ └───────┬───────┘ └──────┬──────┘ - │ │ │ - ▼ ▼ ▼ - ┌──────────────────────────────────────────────────┐ - │ Local Skills Directory │ - │ $CONFIG_DIR/applications/openclaw//skills/ │ - │ │ - │ ├── @obol/ │ - │ │ ├── kubernetes/SKILL.md │ - │ │ ├── ethereum/SKILL.md │ - │ │ └── monitoring/SKILL.md │ - │ ├── @user/ │ - │ │ └── custom-skill/SKILL.md │ - │ └── skills.lock.json │ - └──────────────────┬───────────────────────────────┘ - │ - │ obol openclaw skills sync - │ (tar → ConfigMap → helmfile sync) - ▼ - ┌──────────────────┐ - │ OpenClaw Pod │ - │ /skills/ mount │ - └──────────────────┘ -``` - -### Component 1: Skill Source Resolution (`internal/openclaw/skills/`) - -A new `skills` subpackage that handles fetching skills from various sources. - -#### Source Types - -```go -// SkillSource represents a fetchable skill location -type SkillSource struct { - Type string // "github", "local", "builtin" - Owner string // GitHub org/user - Repo string // Repository name - Path string // Subdirectory within repo (optional) - Ref string // Git ref: tag, branch, commit (default: HEAD) - Alias string // Local name override -} -``` - -#### Resolution Rules - -| Input | Resolves To | -|-------|-------------| -| `@obol/kubernetes` | `github.com/ObolNetwork/openclaw-skills/skills/kubernetes@latest` | -| `@obol/ethereum` | `github.com/ObolNetwork/openclaw-skills/skills/ethereum@latest` | -| `github.com/user/repo` | Clone entire repo, find all `SKILL.md` files | -| `github.com/user/repo/path/to/skill` | Clone repo, use specific subdirectory | -| `github.com/user/repo@v1.2.0` | Clone at specific tag | -| `./local/path` | Copy from local filesystem (existing behavior) | - -#### Registry File - -A simple JSON registry embedded in the obol CLI binary that maps shorthand names to GitHub sources: - -```go -//go:embed skills-registry.json -var defaultRegistry []byte -``` - -```json -{ - "version": 1, - "prefix": "@obol", - "repository": "github.com/ObolNetwork/openclaw-skills", - "skills": { - "kubernetes": { - "path": "skills/kubernetes", - "description": "Kubernetes cluster operations via kubectl", - "requires": { "bins": ["kubectl"] } - }, - "ethereum": { - "path": "skills/ethereum", - "description": "Ethereum node management and monitoring", - "requires": { "bins": ["kubectl"] } - }, - "monitoring": { - "path": "skills/monitoring", - "description": "Prometheus/Grafana monitoring operations" - }, - "network-ops": { - "path": "skills/network-ops", - "description": "Obol network install/sync/delete operations" - }, - "tunnel": { - "path": "skills/tunnel", - "description": "Cloudflare tunnel management" - } - } -} -``` - -### Component 2: CLI Commands (`cmd/obol/openclaw.go`) - -Expand the `skills` subcommand group: - -``` -obol openclaw skills -├── add [--ref ] # Fetch skill(s) from GitHub or local path -├── remove # Remove an installed skill -├── list [--remote] # List installed skills (or available @obol skills) -├── sync # Push local skills dir → ConfigMap → pod -├── update [|--all] # Update skill(s) to latest version -└── init [--defaults] # Initialize skills dir with default Obol pack -``` - -#### `obol openclaw skills add` — the npx-like command - -```bash -# Add from the Obol registry (shorthand) -obol openclaw skills add @obol/kubernetes -obol openclaw skills add @obol/ethereum @obol/monitoring - -# Add from any public GitHub repo -obol openclaw skills add github.com/someuser/cool-skill -obol openclaw skills add github.com/someuser/skill-pack/skills/specific-one - -# Add from GitHub with version pinning -obol openclaw skills add github.com/someuser/cool-skill@v2.0.0 - -# Add from local directory (replaces old --from behavior) -obol openclaw skills add ./my-local-skills/custom-skill - -# Add all default Obol skills -obol openclaw skills add @obol/defaults -``` - -**Flow:** - -``` -obol openclaw skills add @obol/kubernetes - │ - ├─ Resolve "@obol/kubernetes" → github.com/ObolNetwork/openclaw-skills/skills/kubernetes - ├─ Sparse checkout (or GitHub API tarball) of just that path - ├─ Validate: SKILL.md exists with valid frontmatter - ├─ Copy to: $CONFIG_DIR/applications/openclaw//skills/@obol/kubernetes/ - ├─ Update skills.lock.json with source, ref, commit SHA - ├─ Print: "✓ Added @obol/kubernetes" - └─ Print: "Run 'obol openclaw skills sync ' to deploy" -``` - -#### `obol openclaw skills init` — bootstrap with defaults - -```bash -# Initialize with the default Obol skill pack -obol openclaw skills init default --defaults - -# This is equivalent to: -obol openclaw skills add @obol/defaults -obol openclaw skills sync default -``` - -#### `obol openclaw skills list` - -```bash -$ obol openclaw skills list default -Installed skills for openclaw/default: - - @obol/kubernetes Kubernetes cluster operations v1.0.0 (up to date) - @obol/ethereum Ethereum node management v1.0.0 (up to date) - @obol/monitoring Prometheus/Grafana operations v1.0.0 (update: v1.1.0) - custom-skill My custom skill from local local - -Total: 4 skill(s) - -$ obol openclaw skills list --remote -Available skills from @obol registry: - - @obol/kubernetes Kubernetes cluster operations via kubectl - @obol/ethereum Ethereum node management and monitoring - @obol/monitoring Prometheus/Grafana monitoring operations - @obol/network-ops Obol network install/sync/delete operations - @obol/tunnel Cloudflare tunnel management -``` - -### Component 3: Skills Lock File - -Track installed skills and their versions for reproducibility: - -```json -{ - "version": 1, - "skills": { - "@obol/kubernetes": { - "source": "github.com/ObolNetwork/openclaw-skills", - "path": "skills/kubernetes", - "ref": "v1.0.0", - "commit": "abc123def456", - "installed": "2026-02-18T12:00:00Z" - }, - "@obol/ethereum": { - "source": "github.com/ObolNetwork/openclaw-skills", - "path": "skills/ethereum", - "ref": "v1.0.0", - "commit": "abc123def456", - "installed": "2026-02-18T12:00:00Z" - }, - "custom-skill": { - "source": "local", - "path": "/Users/dev/my-skills/custom-skill", - "installed": "2026-02-18T14:00:00Z" - } - } -} -``` - -### Component 4: GitHub Fetching Strategy - -Two approaches, use **GitHub API tarball** as primary (no git dependency): - -```go -// Primary: GitHub API tarball download (no git required) -func fetchFromGitHub(owner, repo, path, ref string) (string, error) { - // GET https://api.github.com/repos/{owner}/{repo}/tarball/{ref} - // Extract only the files under {path}/ - // Return path to extracted directory -} - -// Fallback: git sparse-checkout (for private repos or rate limiting) -func fetchViaGit(repoURL, path, ref string) (string, error) { - // git clone --depth 1 --filter=blob:none --sparse - // git sparse-checkout set - // Return path to checked out directory -} -``` - -**Rate limiting**: GitHub API allows 60 requests/hour unauthenticated, 5000 with a token. For the `add` command this is fine (one request per skill add). Support `GITHUB_TOKEN` env var for authenticated requests. - -### Component 5: Default Skills in Onboard Flow - -Modify `Onboard()` to optionally install default skills: - -```go -// In Onboard(), after writing overlay and helmfile: -if opts.Sync { - // Install default Obol skills if skills dir is empty - skillsDir := filepath.Join(deploymentDir, "skills") - if _, err := os.Stat(skillsDir); os.IsNotExist(err) { - fmt.Println("Installing default Obol skills...") - if err := installDefaultSkills(skillsDir); err != nil { - fmt.Printf("Warning: could not install default skills: %v\n", err) - // Non-fatal — continue with deployment - } - } - // Skills sync happens as part of doSync -} -``` - -The default skills should be fetched from `@obol/defaults` (which maps to a curated list). If network is unavailable, fall back to a minimal embedded skill set. - -### Component 6: Embedded Fallback Skills - -For air-gapped or offline scenarios, embed a minimal set of skills directly in the CLI binary: - -```go -//go:embed skills/kubernetes/SKILL.md -//go:embed skills/network-ops/SKILL.md -var embeddedSkills embed.FS -``` - -These serve as a fallback when GitHub is unreachable during `skills init --defaults`. - -### Component 7: Overlay Values Enhancement - -Update `generateOverlayValues()` to support skill configuration in the Helm values: - -```yaml -skills: - enabled: true - createDefault: true - # NEW: Configure per-skill settings via overlay - entries: - kubernetes: - enabled: true - ethereum: - enabled: true - env: - ETHEREUM_NETWORK: "mainnet" - monitoring: - enabled: true -``` - -This maps to OpenClaw's `skills.entries` configuration, giving operators control over which skills are active and their per-skill environment. - -### Component 8: Automatic Skills Sync on Deploy - -Modify `doSync()` to automatically package and push skills if the local skills directory exists: - -```go -func doSync(cfg *config.Config, id string) error { - deploymentDir := deploymentPath(cfg, id) - - // Auto-sync skills if local skills directory exists - skillsDir := filepath.Join(deploymentDir, "skills") - if info, err := os.Stat(skillsDir); err == nil && info.IsDir() { - entries, _ := os.ReadDir(skillsDir) - // Only sync if there are actual skill directories (not just lock file) - hasSkills := false - for _, e := range entries { - if e.IsDir() { - hasSkills = true - break - } - } - if hasSkills { - fmt.Println("Syncing skills to cluster...") - if err := SkillsSync(cfg, id, skillsDir); err != nil { - fmt.Printf("Warning: skills sync failed: %v\n", err) - } - } - } - - // ... existing helmfile sync logic -} -``` - -This removes the two-step manual process. Adding a skill and syncing the deployment automatically picks it up. - ---- - -## Proposed Obol Default Skills - -These would live in `github.com/ObolNetwork/openclaw-skills`: - -### `@obol/kubernetes` - -```markdown ---- -name: kubernetes -description: Kubernetes cluster operations for the Obol Stack -metadata: - openclaw: - requires: - bins: ["kubectl"] - env: ["KUBECONFIG"] ---- - -# Kubernetes Operations - -You have access to kubectl configured for the Obol Stack k3d cluster. - -## Capabilities -- List, describe, and inspect pods, services, deployments across all namespaces -- View pod logs and events -- Check resource usage and node status -- Debug failing pods (describe, logs, events) - -## Conventions -- The stack uses k3d with namespaces per deployment -- Network deployments: `ethereum-`, `helios-`, `aztec-` -- Infrastructure: `traefik`, `erpc`, `monitoring`, `llm`, `obol-frontend` -- Use `kubectl get all -n ` for namespace overview -``` - -### `@obol/ethereum` - -```markdown ---- -name: ethereum -description: Ethereum node management and monitoring -metadata: - openclaw: - requires: - bins: ["kubectl"] ---- - -# Ethereum Node Management - -Manage Ethereum network deployments in the Obol Stack. - -## Capabilities -- Monitor execution and beacon client sync status -- Check peer counts and network connectivity -- View client logs for debugging -- Monitor disk usage and resource consumption -- Check chain head and sync progress - -## Common Operations -- Sync status: `kubectl -n ethereum- logs deploy/execution -f` -- Beacon status: `kubectl -n ethereum- logs deploy/beacon -f` -- Resource usage: `kubectl -n ethereum- top pods` -``` - -### `@obol/monitoring` - -```markdown ---- -name: monitoring -description: Prometheus and Grafana monitoring operations -metadata: - openclaw: - requires: - bins: ["kubectl"] ---- - -# Monitoring Operations - -Access Prometheus metrics and Grafana dashboards for the Obol Stack. - -## Capabilities -- Query Prometheus for metrics -- Check alerting rules and firing alerts -- Monitor resource usage trends -- Access Grafana dashboards -``` - -### `@obol/network-ops` - -```markdown ---- -name: network-ops -description: Obol network deployment lifecycle operations -metadata: - openclaw: - requires: - bins: ["kubectl"] ---- - -# Network Operations - -Manage the full lifecycle of blockchain network deployments. - -## Capabilities -- List installed network deployments -- Check deployment health and sync status -- Monitor resource consumption per deployment -- Assist with network configuration decisions -``` - -### `@obol/tunnel` - -```markdown ---- -name: tunnel -description: Cloudflare tunnel management for public access -metadata: - openclaw: - requires: - bins: ["kubectl"] ---- - -# Tunnel Management - -Manage Cloudflare tunnels for exposing Obol Stack services publicly. - -## Capabilities -- Check tunnel status and connectivity -- View tunnel logs for debugging -- Monitor tunnel routes and DNS configuration -``` - ---- - -## Implementation Phases - -### Phase 1: Core Skill Fetching (MVP) - -**Files to create/modify:** - -| File | Action | Description | -|------|--------|-------------| -| `internal/openclaw/skills/resolve.go` | Create | Source resolution (GitHub URL parsing, @obol shorthand) | -| `internal/openclaw/skills/fetch.go` | Create | GitHub tarball download + extraction | -| `internal/openclaw/skills/lock.go` | Create | Lock file read/write | -| `internal/openclaw/skills/registry.go` | Create | Embedded registry loading | -| `internal/openclaw/skills/skills-registry.json` | Create | Default @obol skill registry | -| `cmd/obol/openclaw.go` | Modify | Add `skills add`, `skills remove`, `skills list`, `skills update` subcommands | -| `internal/openclaw/openclaw.go` | Modify | Update `SkillsSync` to work with new skills dir layout | - -**Deliverables:** -- `obol openclaw skills add ` works with GitHub URLs and @obol shorthand -- `obol openclaw skills remove ` removes a skill -- `obol openclaw skills list` shows installed skills -- Lock file tracks installed skills -- Existing `skills sync --from` still works - -### Phase 2: Default Skills & Auto-Install - -**Files to create/modify:** - -| File | Action | Description | -|------|--------|-------------| -| `internal/openclaw/skills/defaults.go` | Create | Default skill installation logic | -| `internal/openclaw/skills/embedded/` | Create | Minimal embedded fallback skills | -| `internal/openclaw/openclaw.go` | Modify | Wire default skills into `Onboard()` flow | -| `internal/openclaw/openclaw.go` | Modify | Auto-sync skills in `doSync()` | - -**Deliverables:** -- `obol openclaw skills init --defaults` bootstraps default skills -- `Onboard()` installs defaults on first deploy (with network fallback to embedded) -- `doSync()` automatically packages skills if present -- No more two-step manual skills sync - -### Phase 3: Skill Pack Repository - -**External repository:** `github.com/ObolNetwork/openclaw-skills` - -| Path | Description | -|------|-------------| -| `skills/kubernetes/SKILL.md` | K8s cluster operations | -| `skills/ethereum/SKILL.md` | Ethereum node management | -| `skills/monitoring/SKILL.md` | Prometheus/Grafana ops | -| `skills/network-ops/SKILL.md` | Network lifecycle management | -| `skills/tunnel/SKILL.md` | Cloudflare tunnel management | -| `README.md` | Contributing guide for community skills | - -**Deliverables:** -- Public repo with curated Obol skills -- CI validation that all skills have valid SKILL.md frontmatter -- Tagged releases for version pinning - -### Phase 4: Helm Chart Integration (Upstream) - -Changes to the **remote** `obol/openclaw` Helm chart (separate repo): - -- Support `skills.sources` in values for declarative skill fetching via init container -- Init container that can `git clone` or download skills from configured sources -- This enables GitOps workflows where skills are declared in values, not manually pushed - -```yaml -# Future values-obol.yaml -skills: - enabled: true - sources: - - name: obol-defaults - repo: github.com/ObolNetwork/openclaw-skills - ref: v1.0.0 - path: skills/ - - name: custom - repo: github.com/myorg/my-skills - ref: main - entries: - kubernetes: - enabled: true - ethereum: - enabled: true -``` - -This phase requires coordination with the upstream openclaw Helm chart maintainers. - ---- - -## Directory Layout (Post-Implementation) - -``` -$CONFIG_DIR/applications/openclaw// -├── values-obol.yaml -├── helmfile.yaml -├── values-obol.secrets.json -└── skills/ # NEW: managed skills directory - ├── skills.lock.json # Tracks sources, versions, commits - ├── @obol/ # Namespaced by source - │ ├── kubernetes/ - │ │ └── SKILL.md - │ ├── ethereum/ - │ │ └── SKILL.md - │ ├── monitoring/ - │ │ └── SKILL.md - │ ├── network-ops/ - │ │ └── SKILL.md - │ └── tunnel/ - │ └── SKILL.md - └── @someuser/ # Community skills - └── custom-skill/ - └── SKILL.md -``` - ---- - -## CLI UX Examples - -### First-time setup with defaults - -```bash -$ obol agent init -Generated deployment ID: default - ✓ Ollama detected at http://localhost:11434 - -✓ OpenClaw instance configured! - Installing default Obol skills... - ✓ Added @obol/kubernetes - ✓ Added @obol/ethereum - ✓ Added @obol/monitoring - ✓ Added @obol/network-ops - ✓ Added @obol/tunnel - -Deploying to cluster... - Syncing skills to cluster... - ✓ Skills ConfigMap updated: openclaw-default-skills - Running helmfile sync... - -✓ OpenClaw installed with 5 default skills! -``` - -### Adding a community skill - -```bash -$ obol openclaw skills add github.com/ethbuilder/validator-skill -Fetching github.com/ethbuilder/validator-skill... - ✓ Found valid SKILL.md (name: validator-ops, description: Ethereum validator management) - ✓ Added to skills/ethbuilder/validator-ops/ - -Run 'obol openclaw skills sync default' to deploy - -$ obol openclaw skills sync default -Syncing skills to cluster... - ✓ Skills ConfigMap updated: openclaw-default-skills - Running helmfile sync... -✓ Skills deployed (6 skills) -``` - -### Updating skills - -```bash -$ obol openclaw skills update --all -Checking for updates... - @obol/kubernetes v1.0.0 → v1.1.0 (updated) - @obol/ethereum v1.0.0 (up to date) - @obol/monitoring v1.0.0 → v1.0.1 (updated) - @obol/network-ops v1.0.0 (up to date) - @obol/tunnel v1.0.0 (up to date) - -Updated 2 skill(s). Run 'obol openclaw skills sync default' to deploy. -``` - ---- - -## Open Questions - -1. **ConfigMap size limit**: With many skills, we may hit the 1MB etcd limit. Should we split into multiple ConfigMaps or use a PVC-based approach for large skill sets? - -2. **Skill dependencies**: Should skills be able to declare dependencies on other skills? (e.g., `@obol/ethereum` depends on `@obol/kubernetes`). Adds complexity but prevents broken skill chains. - -3. **Private repository support**: Should we support `GITHUB_TOKEN` for private repos from day one, or add it later? The fetch code should accept it but the UX can wait. - -4. **Helm chart init container (Phase 4)**: This requires upstream chart changes. Should we propose the chart changes early and develop in parallel, or wait until the CLI-side is proven? - -5. **Skill validation**: Should `skills add` validate that the skill's `requires.bins` are available in the target pod image, or just warn? Strict validation prevents broken skills but may be too rigid. - -6. **Community skill registry**: Beyond `@obol/` shorthand, should there be a discoverable registry (like npm) for community skills? Or is GitHub search + convention (`openclaw-skill-*` repos) sufficient? - ---- - -## Risk Assessment - -| Risk | Probability | Impact | Mitigation | -|------|-------------|--------|------------| -| GitHub API rate limiting | Medium | Low | Support `GITHUB_TOKEN`, cache downloads, embedded fallback | -| ConfigMap size limit | Low | Medium | Monitor archive sizes, split if needed, document limits | -| Upstream chart incompatibility | Low | High | Test against pinned chart version, coordinate with chart maintainers | -| Stale/broken community skills | Medium | Low | Validation on `skills add`, clear error messages, `skills check` command | -| Network unavailable during init | Medium | Medium | Embedded fallback skills, graceful degradation | - ---- - -## Success Criteria - -- [ ] `obol openclaw skills add @obol/kubernetes` fetches and installs the skill in <5 seconds -- [ ] `obol agent init` installs default skills automatically on first deploy -- [ ] `obol openclaw skills list` shows all installed skills with version info -- [ ] Community skills from arbitrary GitHub repos work without special configuration -- [ ] Existing `skills sync --from ` workflow continues to work unchanged -- [ ] Default Obol skills provide meaningful agent capabilities for stack operations -- [ ] Skills survive pod restarts (ConfigMap-backed persistence) -- [ ] Lock file enables reproducible skill sets across environments - - diff --git a/plans/terminal-ux-improvement.md b/plans/terminal-ux-improvement.md deleted file mode 100644 index a331e7af..00000000 --- a/plans/terminal-ux-improvement.md +++ /dev/null @@ -1,135 +0,0 @@ -# Plan: Obol Stack CLI Terminal UX Improvement - -## Context - -The obol CLI (`cmd/obol`) and the bootstrap installer (`obolup.sh`) had inconsistent terminal output styles. obolup.sh had a clean visual language (colored `==>`, `✓`, `!`, `✗` prefixes, suppressed subprocess output), while the Go CLI used raw `fmt.Println` with no colors, no spinners, and direct subprocess passthrough that flooded the terminal with helmfile/k3d/kubectl output. Invalid commands produced poor error messages with no suggestions. - -**Goal**: Unify the visual language across both tools, capture subprocess output behind spinners, and add `--verbose`/`--quiet` flags for different user needs. - -**Decision**: User chose "Capture + spinner" for subprocess handling and Charmbracelet lipgloss as the styling library. - -## What Was Built - -### New Package: `internal/ui/` (7 files) - -| File | Exports | Purpose | -|------|---------|---------| -| `ui.go` | `UI` struct, `New(verbose)`, `NewWithOptions(verbose, quiet)` | Core type with TTY detection, verbose/quiet flags | -| `output.go` | `Info`, `Success`, `Warn`, `Error`, `Print`, `Printf`, `Detail`, `Dim`, `Bold`, `Blank` | Colored message functions matching obolup.sh's `log_*` style. Quiet mode suppresses all except Error/Warn. | -| `exec.go` | `Exec(ExecConfig)`, `ExecOutput(ExecConfig)` | Subprocess capture: spinner by default, streams with `--verbose`, dumps captured output on error | -| `spinner.go` | `RunWithSpinner(msg, fn)` | Braille spinner (`⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏`) — minimal goroutine impl, no bubbletea | -| `prompt.go` | `Confirm`, `Select`, `Input`, `SecretInput` | Thin wrappers around `bufio.Reader` with lipgloss formatting | -| `errors.go` | `FormatError`, `FormatActionableError` | Structured error display with hints and next-step commands | -| `suggest.go` | `SuggestCommand`, `findSimilarCommands` | Levenshtein distance for "did you mean?" on unknown commands | - -### Output Style (unified across both tools) - -``` -==> Starting cluster... (blue, top-level header — no indent) - ✓ Cluster created (green, subordinate result — 2-space indent) - ! DNS config skipped (yellow, warning — 2-space indent) -✗ Helmfile sync failed (red, error — no indent) -``` - -### Subprocess Capture Pattern - -- **Default** (TTY, not verbose): Spinner + buffer. Success → ` ✓ msg (Xs)`. Failure → `✗ msg` + dump captured output. -- **`--verbose`**: Stream subprocess output live, each line prefixed with dim ` │ `. -- **Non-TTY** (pipe/CI): Plain text, no spinner, live stream. -- **Exception**: Passthrough commands (`obol kubectl`, `obol helm`, `obol k9s`, `obol openclaw cli`) keep direct stdin/stdout piping. - -### Global Flags - -| Flag | Env Var | Effect | -|------|---------|--------| -| `--verbose` | `OBOL_VERBOSE=1` | Stream subprocess output live with `│` prefix | -| `--quiet` / `-q` | `OBOL_QUIET=1` | Suppress all output except errors and warnings | - -### CLI Improvements - -- **Colored errors**: `log.Fatal(err)` replaced with `✗ error message` (red) -- **"Did you mean?"**: Levenshtein-based command suggestions on typos (`obol netwerk` → "Did you mean? obol network") -- **Interactive prompts**: `obol model setup` uses styled select menu + hidden API key input via `ui.SecretInput` - -## Phased Rollout (as executed) - -### Phase 1: Foundation -Created `internal/ui/` package (7 files), added lipgloss dependency, wired `--verbose` flag, `Before` hook, `CommandNotFound` handler, replaced `log.Fatal` with colored error output. - -**Files created**: `internal/ui/*.go` -**Files modified**: `go.mod`, `cmd/obol/main.go` - -### Phase 2: Stack Lifecycle (highest impact) -Migrated `stack init/up/down/purge` — the noisiest commands. Added `*ui.UI` to `Backend` interface. Converted ~8 subprocess passthrough sites to `u.Exec()`. `waitForAPIServer` and polling loops wrapped in spinners. - -**Files modified**: `internal/stack/stack.go`, `internal/stack/backend.go`, `internal/stack/backend_k3d.go`, `internal/stack/backend_k3s.go`, `internal/stack/backend_test.go`, `internal/stack/stack_test.go`, `cmd/obol/bootstrap.go`, `cmd/obol/main.go` - -### Phase 3: Network + OpenClaw + App + Agent -Migrated network install/sync/delete, openclaw onboard/sync/setup/delete/skills, app install/sync/delete, and agent init. Cascaded `*ui.UI` through all call chains. Converted confirmation prompts to `u.Confirm()`. - -**Files modified**: `internal/network/network.go`, `internal/openclaw/openclaw.go`, `internal/openclaw/skills_injection_test.go`, `internal/app/app.go`, `internal/agent/agent.go`, `cmd/obol/network.go`, `cmd/obol/openclaw.go`, `cmd/obol/main.go` - -### Phase 4: Update, Tunnel, Model -Migrated remaining internal packages. `update.ApplyUpgrades` helmfile sync captured. All tunnel operations use `u.Exec()` (except interactive `cloudflared login` and `logs -f`). `model.ConfigureLLMSpy` status messages styled. - -**Files modified**: `internal/update/update.go`, `internal/tunnel/tunnel.go`, `internal/tunnel/login.go`, `internal/tunnel/provision.go`, `internal/model/model.go`, `cmd/obol/update.go`, `cmd/obol/model.go`, `cmd/obol/main.go` - -### Phase 5: Polish -Added `--quiet` / `-q` global flag with `OBOL_QUIET` env var. Quiet mode suppresses all output except errors/warnings. Migrated `obol model setup` interactive prompt to use `ui.Select()` + `ui.SecretInput()`. Fixed `cmd/obol/update.go` to use `getUI(c)` instead of `ui.New(false)`. - -**Files modified**: `internal/ui/ui.go`, `internal/ui/output.go`, `cmd/obol/main.go`, `cmd/obol/update.go`, `cmd/obol/model.go` - -### Phase 6: obolup.sh Alignment -Aligned the bash installer's output to match the Go CLI's visual hierarchy: -- `log_success`/`log_warn` gained 2-space indent (subordinate to `log_info`) -- Banner replaced from Unicode box (`╔═══╗`) to ASCII art logo (matches `obol --help`) -- Added `log_dim()` function and `DIM`/`BOLD` ANSI codes -- Instruction blocks indented consistently (2-space for text, 4-space for commands) - -**Files modified**: `obolup.sh` - -## Dependencies Added - -``` -github.com/charmbracelet/lipgloss — styles, colors, NO_COLOR support, TTY degradation -``` - -Transitive: `muesli/termenv`, `lucasb-eyer/go-colorful`, `mattn/go-runewidth`, `rivo/uniseg`, `xo/terminfo`. `mattn/go-isatty` was already an indirect dep. - -## Files Inventory - -**New files (7)**: -- `internal/ui/ui.go` -- `internal/ui/output.go` -- `internal/ui/exec.go` -- `internal/ui/spinner.go` -- `internal/ui/prompt.go` -- `internal/ui/errors.go` -- `internal/ui/suggest.go` - -**Modified Go files (~25)**: -- `go.mod`, `go.sum` -- `cmd/obol/main.go`, `cmd/obol/bootstrap.go`, `cmd/obol/network.go`, `cmd/obol/openclaw.go`, `cmd/obol/model.go`, `cmd/obol/update.go` -- `internal/stack/stack.go`, `internal/stack/backend.go`, `internal/stack/backend_k3d.go`, `internal/stack/backend_k3s.go` -- `internal/network/network.go` -- `internal/openclaw/openclaw.go` -- `internal/app/app.go` -- `internal/agent/agent.go` -- `internal/update/update.go` -- `internal/tunnel/tunnel.go`, `internal/tunnel/login.go`, `internal/tunnel/provision.go` -- `internal/model/model.go` -- `internal/stack/backend_test.go`, `internal/stack/stack_test.go`, `internal/openclaw/skills_injection_test.go` - -**Modified shell (1)**: -- `obolup.sh` - -## Verification - -1. `go build ./...` — compiles clean -2. `go vet ./...` — no issues -3. `go test ./...` — all 7 test packages pass -4. `bash -n obolup.sh` — syntax valid -5. `obol netwerk` — shows "Did you mean? obol network" -6. `obol --quiet network list` — suppresses output -7. `obol network list` — shows colored output with bold headers -8. `obol app install` — shows colored `✗` error with examples From 77216d04468990f174dda0025ef85f3627fc901e Mon Sep 17 00:00:00 2001 From: bussyjd Date: Sun, 29 Mar 2026 15:54:01 +0200 Subject: [PATCH 5/5] docs: remove legacy spec bundle artifacts --- .../backend-service-spec-bundler/SKILL.md | 138 -- .../backend-service-spec-bundler/reference.md | 218 --- .../templates/ADR.md | 18 - .../templates/ARCHITECTURE.md | 200 --- .../templates/BEHAVIORS_AND_EXPECTATIONS.md | 117 -- .../templates/CONTRIBUTING.md | 33 - .../templates/SPEC.md | 220 --- .../templates/feature.feature | 35 - .claude/skills/backend-service-spec-bundler | 1 - .../escrow_contract_cross_reference.md | 283 ---- .../diagrams/option-a-settlement-flow.md | 290 ---- docs/issues/features/FEATURE_REVIEW.md | 547 ------- .../commit_reveal_work_verification.feature | 63 - .../end_to_end_autoresearch_round.feature | 59 - .../erc8004_identity_lifecycle.feature | 105 -- .../features/escrow_round_lifecycle.feature | 87 - docs/issues/features/leaderboard_api.feature | 66 - ...ier_worker_discovery_with_fallback.feature | 47 - ...culation_with_anti_monopoly_parity.feature | 92 -- ...ard_pool_distribution_across_roles.feature | 59 - .../features/round_state_continuity.feature | 68 - docs/issues/issue-autoresearch-helm-chart.md | 1315 --------------- .../issue-reth-erc8004-indexer-helm-chart.md | 277 ---- docs/specs/ARCHITECTURE.md | 966 ----------- docs/specs/BEHAVIORS_AND_EXPECTATIONS.md | 667 -------- docs/specs/CONTRIBUTING.md | 199 --- docs/specs/SPEC.md | 1452 ----------------- docs/specs/adr/0001-local-first-k3d.md | 62 - docs/specs/adr/0002-litellm-gateway.md | 62 - docs/specs/adr/0003-x402-payment-gating.md | 65 - .../adr/0004-pre-signed-erc3009-buyer.md | 61 - docs/specs/adr/0005-traefik-gateway-api.md | 62 - docs/specs/adr/0006-erc8004-identity.md | 67 - docs/specs/features/buy_payments.feature | 152 -- docs/specs/features/erc8004_identity.feature | 149 -- docs/specs/features/llm_routing.feature | 147 -- docs/specs/features/network_rpc.feature | 149 -- docs/specs/features/sell_monetization.feature | 203 --- docs/specs/features/stack_lifecycle.feature | 166 -- docs/specs/features/tunnel_exposure.feature | 190 --- 40 files changed, 9157 deletions(-) delete mode 100644 .agents/skills/backend-service-spec-bundler/SKILL.md delete mode 100644 .agents/skills/backend-service-spec-bundler/reference.md delete mode 100644 .agents/skills/backend-service-spec-bundler/templates/ADR.md delete mode 100644 .agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md delete mode 100644 .agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md delete mode 100644 .agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md delete mode 100644 .agents/skills/backend-service-spec-bundler/templates/SPEC.md delete mode 100644 .agents/skills/backend-service-spec-bundler/templates/feature.feature delete mode 120000 .claude/skills/backend-service-spec-bundler delete mode 100644 docs/issues/analysis/escrow_contract_cross_reference.md delete mode 100644 docs/issues/diagrams/option-a-settlement-flow.md delete mode 100644 docs/issues/features/FEATURE_REVIEW.md delete mode 100644 docs/issues/features/commit_reveal_work_verification.feature delete mode 100644 docs/issues/features/end_to_end_autoresearch_round.feature delete mode 100644 docs/issues/features/erc8004_identity_lifecycle.feature delete mode 100644 docs/issues/features/escrow_round_lifecycle.feature delete mode 100644 docs/issues/features/leaderboard_api.feature delete mode 100644 docs/issues/features/multi_tier_worker_discovery_with_fallback.feature delete mode 100644 docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature delete mode 100644 docs/issues/features/reward_pool_distribution_across_roles.feature delete mode 100644 docs/issues/features/round_state_continuity.feature delete mode 100644 docs/issues/issue-autoresearch-helm-chart.md delete mode 100644 docs/issues/issue-reth-erc8004-indexer-helm-chart.md delete mode 100644 docs/specs/ARCHITECTURE.md delete mode 100644 docs/specs/BEHAVIORS_AND_EXPECTATIONS.md delete mode 100644 docs/specs/CONTRIBUTING.md delete mode 100644 docs/specs/SPEC.md delete mode 100644 docs/specs/adr/0001-local-first-k3d.md delete mode 100644 docs/specs/adr/0002-litellm-gateway.md delete mode 100644 docs/specs/adr/0003-x402-payment-gating.md delete mode 100644 docs/specs/adr/0004-pre-signed-erc3009-buyer.md delete mode 100644 docs/specs/adr/0005-traefik-gateway-api.md delete mode 100644 docs/specs/adr/0006-erc8004-identity.md delete mode 100644 docs/specs/features/buy_payments.feature delete mode 100644 docs/specs/features/erc8004_identity.feature delete mode 100644 docs/specs/features/llm_routing.feature delete mode 100644 docs/specs/features/network_rpc.feature delete mode 100644 docs/specs/features/sell_monetization.feature delete mode 100644 docs/specs/features/stack_lifecycle.feature delete mode 100644 docs/specs/features/tunnel_exposure.feature diff --git a/.agents/skills/backend-service-spec-bundler/SKILL.md b/.agents/skills/backend-service-spec-bundler/SKILL.md deleted file mode 100644 index 99bb8c40..00000000 --- a/.agents/skills/backend-service-spec-bundler/SKILL.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -name: backend-service-spec-bundler -description: > - Generate a complete specification bundle for a backend service from an idea. Produces SPEC.md, - ARCHITECTURE.md, BEHAVIORS_AND_EXPECTATIONS.md, BDD feature files, ADRs, and CONTRIBUTING.md - through an interactive discovery process. Spec-only — no implementation code. -argument-hint: "[idea description]" -allowed-tools: Read, Write, Edit, Glob, Grep, Bash(ls *), Bash(mkdir *) ---- - -# Spec Bundle Generator - -You are a specification architect. Your job is to take a user's idea and produce a -**complete, implementation-ready specification bundle** through structured discovery. - -You produce specs. You do NOT write implementation code. Ever. - -**Audience**: This skill is for seasoned developers. Do not dumb things down, do not -default to the "popular" choice, and do not assume comfort zones. The user can handle -type-safe languages, binary protocols, manual memory considerations, and lean infrastructure. -Your job is to find the *right* choice, not the *familiar* one. - -The user's idea: **$ARGUMENTS** - ---- - -## Phase 1: Idea Intake & Discovery - -Before writing anything, you must deeply understand the idea. Run an interactive -discovery session with the user. Ask questions in **batches of 3-5** (not one at a time, -not 20 at once). Cover these areas across 2-4 rounds: - -### Round 1 — Core Understanding -- What problem does this solve? Who is the primary user/actor? -- What are the 3-5 core capabilities (the "must haves")? -- What does this system explicitly NOT do? (Anti-scope) -- Is there a preferred tech stack, language, or platform? (Do NOT suggest defaults — ask openly. If the user doesn't have a preference, explore options based on their performance, safety, and deployment needs. Consider the full spectrum: Rust, Go, Java, Kotlin, C++, Zig, etc. — not just Python/JS/TypeScript.) - -### Round 2 — Boundaries & Constraints -- What external systems does it integrate with? (APIs, databases, services) -- What are the hard constraints? (Security, compliance, performance, infrastructure) -- How will users interact with it? (CLI, API, web UI, mobile, SDK) -- What is the deployment model? (Cloud, self-hosted, embedded, serverless) -- **How complex does the architecture really need to be?** Start from the simplest viable option and justify upward. Consider: plain filesystem, SQLite, embedded stores, in-memory state, single-process — before reaching for Redis, Postgres, Kafka, or distributed anything. Ask: "Could this run as a single binary with an embedded database, or is there a concrete reason it can't?" -- **Wire format**: If the system has an API boundary or inter-service communication, explore the full range: SBE, FlatBuffers, Protocol Buffers, Cap'n Proto, MessagePack, CBOR — not just JSON. JSON is fine for config and human-readable output, but for wire protocols, ask about throughput, payload size, and schema evolution needs before defaulting to it. - -### Round 3 — Behaviors & Edge Cases -- What happens on the "happy path" for the top 3 use cases? -- What are the failure modes? (Network down, bad input, rate limits, partial failures) -- Are there any phased delivery plans? (MVP first, then iterate) -- Are there non-functional requirements? (Latency targets, throughput, uptime) - -### Round 4 (if needed) — Clarification -- Resolve any ambiguities from prior rounds -- Confirm assumptions before proceeding - -**Rules for discovery:** -- Do NOT proceed to Phase 2 until the user confirms you have enough to start. -- Summarize your understanding back to the user before moving on. -- If the user provides a rich initial description, adapt — skip questions they've already answered. -- It's fine to propose reasonable defaults and ask "does this sound right?" rather than asking open-ended questions for everything. -- **Simplicity bias**: Always challenge complexity. If the user says "I need Redis for caching", ask what the cache hit rate and dataset size are — maybe an in-process LRU cache or SQLite is enough. If they say "microservices", ask what the team size and deployment cadence are — maybe a modular monolith is the right call. The goal is the simplest architecture that meets the actual requirements, not the one that looks impressive on a diagram. -- **No tech stack defaults**: Never pre-fill a tech choice. If the user hasn't stated a language, do not suggest Python or JavaScript by default. Ask what matters to them (type safety, performance, ecosystem, team expertise) and let the answer drive the recommendation. Treat languages like Rust, Go, Java, Kotlin, C#, and C++ as equally valid starting points. -- **No serialization defaults**: Do not assume JSON for wire formats. Ask about payload characteristics (size, frequency, schema stability, latency sensitivity) and recommend accordingly. Binary formats (SBE, FlatBuffers, Protobuf) are first-class options, not exotic choices. - ---- - -## Phase 2: Spec Bundle Generation - -Once discovery is complete, generate the following files. Read the templates in -`${CLAUDE_SKILL_DIR}/templates/` for the exact structure of each document. - -### Output Structure - -``` -/ - SPEC.md # Exhaustive technical specification - ARCHITECTURE.md # C4 diagrams and structural overview - BEHAVIORS_AND_EXPECTATIONS.md # Behavioral contract (trigger/expected/rationale) - CONTRIBUTING.md # Developer rules (non-negotiable) - features/ # BDD Gherkin feature files - .feature - .feature - ... - docs/adr/ # Architecture Decision Records - 0001-.md - ... -``` - -### Generation Order - -Generate in this order (each document builds on the previous): - -1. **SPEC.md** — The authoritative technical blueprint. Everything else derives from this. -2. **ARCHITECTURE.md** — Visual/structural companion to the spec. C4 diagrams in Mermaid. -3. **BEHAVIORS_AND_EXPECTATIONS.md** — Behavioral contract. Every behavior maps to a testable scenario. -4. **BDD Feature Files** — One `.feature` file per major feature area. Every scenario traces back to B&E + SPEC sections. -5. **ADRs** — One per significant architectural decision made during discovery. -6. **CONTRIBUTING.md** — Developer rules derived from constraints and architectural decisions. - -### Cross-Reference System - -Maintain bidirectional cross-references between all documents: -- Each feature file header must reference: `# References: SPEC Section X, B&E Section Y` -- Each behavior in B&E must note which SPEC section defines the underlying system -- ARCHITECTURE.md must link to SPEC.md for full details -- ADRs must reference which SPEC sections they impact - -### Quality Requirements - -- **No vagueness**: Every section must be specific enough for an engineer (or an AI agent) to implement from without asking questions. -- **No implementation code**: Specs describe WHAT and WHY, not HOW at the code level. Pseudocode for algorithms is acceptable. Actual implementation code is not. -- **Testable behaviors**: Every behavior in B&E must be expressible as a Gherkin scenario. -- **Mermaid diagrams**: All architecture diagrams use Mermaid syntax (C4, sequence, flowchart). -- **Phased delivery**: If the user mentioned phases, tag features and scenarios with `@phase1`, `@phase2`, etc. -- **Terminology table**: SPEC.md must include a glossary of domain-specific terms. -- **Constraints table**: SPEC.md must include a system constraints table. - ---- - -## Phase 3: Review & Iteration - -After generating all files: - -1. Present a summary of what was generated (file list with brief descriptions). -2. Ask if the user wants to review any specific document. -3. Iterate on feedback — update specs, not code. -4. Confirm the bundle is complete before finishing. - ---- - -## Important Reminders - -- You are a spec writer, not an implementer. If the user asks you to write code, remind them this skill produces specifications only. -- Adapt the templates to the user's domain — don't blindly copy crypto/blockchain terminology from the templates into an unrelated project. -- The templates show STRUCTURE, not content to copy. The content must come from the discovery session. -- If a section from the template doesn't apply (e.g., no multi-chain support for a web app), omit it. Don't include empty sections. -- If the user's project needs sections not in the templates, add them. The templates are a starting point, not a ceiling. diff --git a/.agents/skills/backend-service-spec-bundler/reference.md b/.agents/skills/backend-service-spec-bundler/reference.md deleted file mode 100644 index 0c013a3d..00000000 --- a/.agents/skills/backend-service-spec-bundler/reference.md +++ /dev/null @@ -1,218 +0,0 @@ -# Spec Bundle Methodology Reference - -This document describes the specification-first, behavior-driven methodology used by the -specbundle skill. Claude should internalize these principles when generating spec bundles. - ---- - -## Core Philosophy - -**Specs are the product, not the code.** A well-written spec bundle should allow any -competent engineer — or an AI coding agent — to implement the system without asking -clarifying questions. If the implementer needs to guess, the spec failed. - -**Simplicity is the default.** Every piece of infrastructure in the spec must justify its -existence. The question is never "why not use Postgres?" — it's "why not SQLite?" or "why -not the filesystem?" Start from the leanest possible architecture (single process, embedded -storage, in-memory state) and only add complexity when the requirements demand it. A system -that runs as a single binary with zero external dependencies is not "toy" — it's the ideal -starting point. Complexity is added, never assumed. - -**No comfort-zone defaults.** This methodology targets experienced developers who value -type safety, performance, and correctness. Do not default to JavaScript, Python, or JSON -because they're popular. Treat the full spectrum of languages (Rust, Go, Java, Kotlin, C++, -Zig, C#) and serialization formats (SBE, FlatBuffers, Protobuf, Cap'n Proto, MessagePack, -CBOR) as first-class options. The right choice comes from the project's constraints, not -from what's most common on Stack Overflow. - -## The Three-Tier Model - -Every component in the system gets three documents that form a layered specification: - -### Tier 1: Technical Specification (SPEC.md) - -The authoritative blueprint. Contains: -- System scope and anti-scope -- Terminology glossary (every domain term defined) -- System constraints table (hard limits that shape all decisions) -- Module decomposition with dependencies -- Complete API/protocol definitions -- Data model and storage architecture -- Security model -- Error handling taxonomy -- Performance targets -- Phased rollout plan -- Testing strategy - -**Key quality test**: Can someone implement from this document alone? If no, it's incomplete. - -### Tier 2: Architecture Document (ARCHITECTURE.md) - -The visual/structural companion. Contains: -- Design philosophy (3-5 guiding principles) -- C4 diagrams (Context, Container, Component) in Mermaid -- Module decomposition table with SPEC cross-references -- Sequence diagrams for major data flows -- Storage architecture overview -- Deployment model diagram -- Network topology -- Security architecture (trust boundaries, auth flows) - -**Key quality test**: Can someone understand the system structure in 10 minutes from diagrams alone? - -### Tier 3: Behavioral Contract (BEHAVIORS_AND_EXPECTATIONS.md) - -The testable behavioral specification. Contains: -- Desired behaviors: Trigger → Expected → Rationale -- Undesired behaviors: Trigger → Expected → Risk -- Edge cases: Scenario → Expected Handling → Rationale -- Performance expectations with targets and degradation handling -- Guardrail definitions (non-negotiable constraints) - -**Key quality test**: Can every entry be expressed as a Gherkin scenario? - -## BDD Feature Files - -One `.feature` file per major feature area. Each file: -- Uses `@bdd` tag at the top -- Includes a user story (As a / I want / So that) -- Has a header comment referencing SPEC and B&E sections -- Uses a Background block for common preconditions -- Tags scenarios with phase (`@phase1`, `@phase2`) and speed (`@fast`) -- Each scenario traces back to a specific B&E entry - -**Naming convention**: `{feature-area}.feature` (e.g., `trading.feature`, `authentication.feature`) - -## Architecture Decision Records (ADRs) - -One ADR per significant architectural decision. Each ADR: -- Is numbered sequentially: `0001-`, `0002-`, etc. -- Has three sections: Context, Decision, Consequences -- Is immutable once accepted (supersede, don't edit) -- Lives in `docs/adr/` - -**When to write an ADR**: Technology choices, protocol decisions, deployment model, database -selection, major pattern decisions, constraint trade-offs. - -## Cross-Reference System - -All documents maintain bidirectional references: - -``` -SPEC.md Section 4 ←→ ARCHITECTURE.md Section 3 (Module Decomposition) - ←→ B&E Section 2.1 (Desired Behaviors for this subsystem) - ←→ features/pipeline.feature (BDD scenarios) - ←→ docs/adr/0003-*.md (Decision that shaped this subsystem) -``` - -## The Discovery Process - -The spec writer's job during discovery is to extract: - -1. **The problem** — What pain exists today? -2. **The actors** — Who uses this and what are their goals? -3. **The scope** — What's in and what's explicitly out? -4. **The constraints** — What's non-negotiable? (Security, compliance, performance, infra) -5. **The interfaces** — How do users and external systems interact with it? -6. **The data** — What's stored, where, and for how long? -7. **The happy paths** — Walk through the top 3-5 use cases end to end -8. **The failure modes** — What breaks and how does the system recover? -9. **The phases** — What's MVP vs. future? -10. **The decisions** — What choices were already made and why? - -Good discovery questions are specific, not generic. "What's your tech stack?" is fine. -"Tell me about your requirements" is too vague. - -### Challenging Complexity During Discovery - -The spec writer must actively push back on unnecessary complexity. Common patterns to challenge: - -| User Says | Challenge With | -|-----------|---------------| -| "We need Redis for caching" | What's the dataset size? Could an in-process LRU or SQLite handle it? | -| "Postgres for the database" | What's the data volume and query complexity? Would SQLite or embedded RocksDB suffice? | -| "Kafka for event streaming" | What's the throughput? Could an in-process queue or simple file-based log work? | -| "Microservices architecture" | What's the team size? How often do components deploy independently? Would a modular monolith be simpler? | -| "JSON REST API" | What are the payload sizes and call frequency? Would a binary format (SBE, FlatBuffers, Protobuf) be more appropriate? | -| "Docker + Kubernetes" | Could this ship as a single binary / fat JAR? What's the actual scaling requirement? | -| "Python/Node for the backend" | What are the latency and throughput requirements? Is GC pressure a concern? Would a compiled language be a better fit? | - -The goal is not to reject these technologies — they're all valid in the right context. The goal -is to ensure they earn their place in the spec through concrete requirements, not habit. - -## Adapting to Different Domains - -The templates are derived from a crypto/DeFi project but the methodology is domain-agnostic. -When applying to other domains: - -- Replace domain-specific sections with relevant ones (e.g., "Multi-Chain Support" becomes - "Multi-Region Support" for a distributed web app) -- Adjust the constraint table for the domain (e.g., HIPAA for healthcare, PCI for payments) -- Scale the number of feature files to the project's complexity -- Adjust phasing to the project's delivery model -- Add domain-specific guardrails (e.g., "never expose PII" for a user-facing app) - -## The Complexity Ladder - -When specifying infrastructure, start at the bottom and move up only when a concrete -requirement forces you to: - -### Storage -``` -Filesystem (flat files, append-only logs) - ↓ need indexed queries? -SQLite / DuckDB (embedded, zero-config, single-file) - ↓ need concurrent write-heavy workloads from multiple processes? -Postgres / MySQL (server-based RDBMS) - ↓ need horizontal write scaling or document flexibility? -Distributed stores (CockroachDB, Cassandra, MongoDB) -``` - -### Caching -``` -In-process cache (LRU map, Caffeine, moka) - ↓ need shared cache across multiple processes? -Redis / Memcached - ↓ need persistence + cache semantics? -Redis with AOF / embedded KeyDB -``` - -### Message Passing -``` -In-process channels / queues (channels, ring buffers, Disruptor) - ↓ need cross-process or cross-machine messaging with low latency? -ZeroMQ / Aeron (brokerless, low-latency — work over IPC, TCP, and UDP/multicast) - ↓ need built-in durability, replay, or managed routing? -NATS / RabbitMQ - ↓ need massive throughput + log semantics + consumer groups? -Kafka / Redpanda -``` - -### Serialization -``` -Binary schema-driven (SBE, FlatBuffers, Cap'n Proto) — zero-copy, type-safe, fast - ↓ need schema evolution with broad ecosystem support? -Protobuf / MessagePack / CBOR — compact, good tooling - ↓ need human readability for debugging/config? -JSON / YAML / TOML — readable, verbose, no type safety at the wire level -``` - -### Deployment -``` -Single binary / fat JAR (zero dependencies) - ↓ need process isolation or heterogeneous runtimes? -Containers (Docker) - ↓ need orchestration across machines? -Kubernetes / Nomad -``` - -Each step up the ladder adds operational cost, failure modes, and cognitive load. The spec -must justify every step taken. - -## Multiple Subsystems - -Large projects may need separate spec bundles for distinct subsystems. In this case: -- Each subsystem gets its own SPEC, ARCHITECTURE, and B&E file (prefixed with the subsystem name) -- Feature files are organized per subsystem -- A top-level README ties them together -- Cross-references work across subsystem documents diff --git a/.agents/skills/backend-service-spec-bundler/templates/ADR.md b/.agents/skills/backend-service-spec-bundler/templates/ADR.md deleted file mode 100644 index e3b2e584..00000000 --- a/.agents/skills/backend-service-spec-bundler/templates/ADR.md +++ /dev/null @@ -1,18 +0,0 @@ -# ADR-{NNNN}: {Decision Title} - -**Date**: {date} -**Status**: Accepted - -## Context - - - -## Decision - - - -## Consequences - -- **Positive**: {benefits of this decision} -- **Negative**: {costs, trade-offs, or risks accepted} -- **Neutral**: {side effects that are neither good nor bad} diff --git a/.agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md b/.agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md deleted file mode 100644 index aa9403a8..00000000 --- a/.agents/skills/backend-service-spec-bundler/templates/ARCHITECTURE.md +++ /dev/null @@ -1,200 +0,0 @@ -# {Project Name} Architecture - -**Version**: 1.0.0-draft -**Status**: Draft -**Last Updated**: {date} - -This document provides a visual and structural overview of the {project name} system. For the full technical specification, see [SPEC.md](SPEC.md). - ---- - -## Table of Contents - -1. [System Overview](#1-system-overview) -2. [Component Diagrams](#2-component-diagrams) -3. [Module Decomposition](#3-module-decomposition) -4. [Data Flow Diagrams](#4-data-flow-diagrams) -5. [Storage Architecture](#5-storage-architecture) -6. [Deployment Model](#6-deployment-model) -7. [Network Topology](#7-network-topology) -8. [Security Architecture](#8-security-architecture) - ---- - -## 1. System Overview - -### Design Philosophy - - - -{Project name} is built around these principles: - -1. **{Principle 1}**: {explanation} -2. **{Principle 2}**: {explanation} -3. **{Principle 3}**: {explanation} - -### System Constraints - -| Constraint | Impact on Architecture | -|-----------|----------------------| -| {constraint} | {how it shapes the design} | - ---- - -## 2. Component Diagrams - -### 2.1 C4 Context Diagram - -Shows {project name} in relation to all external systems. - -```mermaid -C4Context - title {Project Name} — System Context - - Person(user, "{User Role}", "{User description}") - - System(sys, "{Project Name}", "{One-line description}") - - System_Ext(ext1, "{External System 1}", "{Description}") - System_Ext(ext2, "{External System 2}", "{Description}") - - Rel(user, sys, "{interaction}") - Rel(sys, ext1, "{protocol}", "{what flows}") - Rel(sys, ext2, "{protocol}", "{what flows}") -``` - -### 2.2 C4 Container Diagram - -Zooms into {project name} to show its internal containers. - -```mermaid -C4Container - title {Project Name} — Container Diagram - - Container_Boundary(boundary, "{Project Name}") { - Container(comp1, "{Component 1}", "{Tech}", "{Purpose}") - Container(comp2, "{Component 2}", "{Tech}", "{Purpose}") - ContainerDb(db, "{Database}", "{Type}", "{What it stores}") - } - - System_Ext(ext1, "{External System}", "{Description}") - - Rel(comp1, comp2, "{interaction}") - Rel(comp2, db, "{protocol}") - Rel(comp1, ext1, "{protocol}") -``` - -### 2.3 C4 Component Diagram (optional — for complex subsystems) - -```mermaid -C4Component - title {Subsystem Name} — Component Diagram - - Component(c1, "{Component}", "{Tech}", "{Purpose}") - Component(c2, "{Component}", "{Tech}", "{Purpose}") - - Rel(c1, c2, "{interaction}") -``` - ---- - -## 3. Module Decomposition - -| Module | Purpose | Key Dependencies | SPEC Reference | -|--------|---------|-----------------|----------------| -| {module} | {purpose} | {deps} | Section {N} | - ---- - -## 4. Data Flow Diagrams - -### 4.1 {Primary Flow} (e.g., "User Request Lifecycle") - -```mermaid -sequenceDiagram - participant U as User - participant A as {Component A} - participant B as {Component B} - participant E as {External System} - - U->>A: {action} - A->>B: {internal call} - B->>E: {external call} - E-->>B: {response} - B-->>A: {result} - A-->>U: {response} -``` - -### 4.2 {Secondary Flow} - - - ---- - -## 5. Storage Architecture - -### 5.1 Overview - - - -### 5.2 Schema Summary - - - -| Store | Entity | Key Fields | Purpose | -|-------|--------|-----------|---------| -| {store} | {entity} | {fields} | {purpose} | - ---- - -## 6. Deployment Model - -### 6.1 Deployment Diagram - -```mermaid -graph TD - subgraph "{Environment}" - A["{Component}"] --> B["{Component}"] - B --> C["{Store}"] - end -``` - -### 6.2 Infrastructure Requirements - -| Resource | Requirement | Notes | -|----------|-------------|-------| -| {resource} | {spec} | {notes} | - ---- - -## 7. Network Topology - - - ---- - -## 8. Security Architecture - -### 8.1 Trust Boundaries - - - -### 8.2 Authentication Flow - -```mermaid -sequenceDiagram - participant C as Client - participant S as Server - participant A as Auth Provider - - C->>S: Request + credentials - S->>A: Validate - A-->>S: Token/result - S-->>C: Authenticated response -``` - -### 8.3 Data Encryption - -| Data | At Rest | In Transit | -|------|---------|-----------| -| {data type} | {method} | {method} | diff --git a/.agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md b/.agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md deleted file mode 100644 index 268a12e1..00000000 --- a/.agents/skills/backend-service-spec-bundler/templates/BEHAVIORS_AND_EXPECTATIONS.md +++ /dev/null @@ -1,117 +0,0 @@ -# {Project Name} — Behaviors and Expectations - -**Version**: 1.0.0-draft -**Status**: Draft -**Last Updated**: {date} - -This document defines the behavioral contract for {project name}. Every behavior described here maps to one or more testable scenarios in the [BDD feature files](features/). - ---- - -## Table of Contents - -1. [Introduction](#1-introduction) -2. [Desired Behaviors](#2-desired-behaviors) -3. [Undesired Behaviors](#3-undesired-behaviors) -4. [Edge Cases](#4-edge-cases) -5. [Performance Expectations](#5-performance-expectations) -6. [Guardrail Definitions](#6-guardrail-definitions) - ---- - -## 1. Introduction - -### 1.1 Purpose - -This document is the behavioral specification for {project name}. It defines what the system should do, what it must not do, how it handles edge cases, and what performance it must achieve. - -It serves as: -- A contract between the product and engineering teams -- The source of truth for BDD feature file scenarios -- A test oracle for integration and adversarial testing - -### 1.2 How to Read This Document - -**Desired behaviors** (Section 2) follow this format: -- **Trigger**: What user action or system state initiates the behavior -- **Expected**: What the system should do -- **Rationale**: Why this behavior matters - -**Undesired behaviors** (Section 3) add: -- **Risk**: What goes wrong if this behavior occurs - -**Edge cases** (Section 4) describe unusual or boundary scenarios with expected handling. - -**Cross-references**: Section numbers reference [SPEC.md](SPEC.md) where the underlying system is defined. - ---- - -## 2. Desired Behaviors - - - -### 2.1 {Feature Area 1} - -#### B-2.1.1: {Behavior Name} - -**Trigger**: {What initiates this behavior} -**Expected**: {What the system does in response} -**Rationale**: {Why this matters} - -#### B-2.1.2: {Behavior Name} - -**Trigger**: ... -**Expected**: ... -**Rationale**: ... - -### 2.2 {Feature Area 2} - - - ---- - -## 3. Undesired Behaviors - - - -### 3.1 {Category} - -#### U-3.1.1: {Undesired Behavior Name} - -**Trigger**: {What could cause this} -**Expected**: {What should happen instead} -**Risk**: {What goes wrong if this occurs} - ---- - -## 4. Edge Cases - - - -### 4.1 {Category} - -#### E-4.1.1: {Edge Case Name} - -**Scenario**: {Description of the unusual situation} -**Expected Handling**: {What the system does} -**Rationale**: {Why this handling was chosen} - ---- - -## 5. Performance Expectations - -| Behavior | Target | Measurement | Degradation Handling | -|----------|--------|-------------|---------------------| -| {behavior} | {target} | {how measured} | {what happens if missed} | - ---- - -## 6. Guardrail Definitions - - - -### 6.1 {Guardrail Category} - -| Guardrail | Rule | Enforcement | Violation Response | -|-----------|------|-------------|-------------------| -| {name} | {rule} | {how enforced} | {what happens} | diff --git a/.agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md b/.agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md deleted file mode 100644 index 99744680..00000000 --- a/.agents/skills/backend-service-spec-bundler/templates/CONTRIBUTING.md +++ /dev/null @@ -1,33 +0,0 @@ -# {Project Name} — Developer Rules - -These rules are non-negotiable. Every contributor must follow them. - - diff --git a/.agents/skills/backend-service-spec-bundler/templates/SPEC.md b/.agents/skills/backend-service-spec-bundler/templates/SPEC.md deleted file mode 100644 index 9a472976..00000000 --- a/.agents/skills/backend-service-spec-bundler/templates/SPEC.md +++ /dev/null @@ -1,220 +0,0 @@ -# {Project Name} Technical Specification - -**Version**: 1.0.0-draft -**Status**: Draft -**Last Updated**: {date} - ---- - -## Table of Contents - - - -1. [Introduction](#1-introduction) -2. [System Architecture](#2-system-architecture) -3. [Core Subsystems](#3-core-subsystems) -4. [API / Protocol Definition](#4-api--protocol-definition) -5. [Data Model](#5-data-model) -6. [Integration Points](#6-integration-points) -7. [Security Model](#7-security-model) -8. [Error Handling](#8-error-handling) -9. [Performance](#9-performance) -10. [Phased Rollout](#10-phased-rollout) -11. [Testing Strategy](#11-testing-strategy) - ---- - -## 1. Introduction - -### 1.1 Purpose - - - -This document is the authoritative technical specification for {project name}. It defines every subsystem, protocol, interface, and behavioral contract that the implementation must satisfy. - -### 1.2 Scope - - - -The system: -- {capability 1} -- {capability 2} -- ... - -The system does **not**: -- {anti-scope 1} -- {anti-scope 2} -- ... - -### 1.3 Terminology & Glossary - - - -| Term | Definition | -|------|-----------| -| **{Term}** | {Definition} | - -### 1.4 System Constraints - - - -| Constraint | Detail | -|-----------|--------| -| **{Constraint}** | {Detail and impact} | - ---- - -## 2. System Architecture - -### 2.1 High-Level Component Diagram - - - -### 2.2 Module Decomposition - -| Module | Purpose | Key Dependencies | -|--------|---------|-----------------| -| {module} | {purpose} | {deps} | - -### 2.3 Request/Response Lifecycle - - - ---- - -## 3. Core Subsystems - - - -### 3.1 {Subsystem A} - -#### 3.1.1 Purpose -#### 3.1.2 Inputs & Outputs -#### 3.1.3 Logic -#### 3.1.4 Configuration -#### 3.1.5 Error States - -### 3.2 {Subsystem B} - - - ---- - -## 4. API / Protocol Definition - - - -### 4.1 {Interface Type} (e.g., REST API, gRPC, CLI) - -| Endpoint / Command | Method | Description | Request | Response | -|-------------------|--------|-------------|---------|----------| -| {endpoint} | {method} | {desc} | {schema} | {schema} | - -### 4.2 Authentication & Authorization - - - -### 4.3 Rate Limiting & Quotas - ---- - -## 5. Data Model - -### 5.1 Storage Architecture - - - -### 5.2 Schema Definitions - - - -### 5.3 Data Lifecycle - - - ---- - -## 6. Integration Points - - - -| System | Protocol | Purpose | Failure Mode | -|--------|----------|---------|-------------| -| {system} | {protocol} | {purpose} | {what happens when it's down} | - ---- - -## 7. Security Model - -### 7.1 Threat Model - - - -### 7.2 Authentication - -### 7.3 Input Validation & Sanitization - -### 7.4 Data Protection - ---- - -## 8. Error Handling - -### 8.1 Error Categories - -| Category | Example | Handling | -|----------|---------|---------| -| {category} | {example} | {how it's handled} | - -### 8.2 Error Response Format - -### 8.3 Retry & Recovery - ---- - -## 9. Performance - -### 9.1 Targets - -| Metric | Target | Measurement | -|--------|--------|-------------| -| {metric} | {target} | {how measured} | - -### 9.2 Bottlenecks & Mitigations - ---- - -## 10. Phased Rollout - - - -### Phase 1: {name} -- Scope: {what's included} -- Success criteria: {how to know it's done} - -### Phase 2: {name} -- Scope: ... -- Depends on: Phase 1 completion - ---- - -## 11. Testing Strategy - -### 11.1 Test Levels - -| Level | Tool | What It Covers | -|-------|------|---------------| -| Unit | {framework} | Individual functions/methods | -| Integration | {framework} | Subsystem interactions | -| BDD | Cucumber/Gherkin | Behavioral scenarios from B&E doc | -| Performance | {tool} | Latency, throughput targets | - -### 11.2 Test Data Strategy - -### 11.3 CI/CD Integration diff --git a/.agents/skills/backend-service-spec-bundler/templates/feature.feature b/.agents/skills/backend-service-spec-bundler/templates/feature.feature deleted file mode 100644 index d020f5a0..00000000 --- a/.agents/skills/backend-service-spec-bundler/templates/feature.feature +++ /dev/null @@ -1,35 +0,0 @@ -@bdd -Feature: {Feature Name} - As a {actor/role} - I want {capability} - So that {benefit} - - # References: SPEC Section {N} ({Section Name}), B&E Section {M} ({Section Name}) - - Background: - Given {common precondition for all scenarios} - - @phase1 @fast - Scenario: {Happy path scenario name} - Given {context/setup} - When {action the user takes} - Then {expected outcome} - And {additional assertion} - - @phase1 @fast - Scenario: {Validation/error scenario name} - Given {context/setup} - When {action with invalid input or edge case} - Then {expected error handling} - And {system remains in valid state} - - @phase1 - Scenario Outline: {Parameterized scenario name} - Given {context with } - When {action with } - Then {expected } - - Examples: - | parameter | input | outcome | - | {val1} | {val1} | {val1} | - | {val2} | {val2} | {val2} | diff --git a/.claude/skills/backend-service-spec-bundler b/.claude/skills/backend-service-spec-bundler deleted file mode 120000 index cb158e36..00000000 --- a/.claude/skills/backend-service-spec-bundler +++ /dev/null @@ -1 +0,0 @@ -../../.agents/skills/backend-service-spec-bundler \ No newline at end of file diff --git a/docs/issues/analysis/escrow_contract_cross_reference.md b/docs/issues/analysis/escrow_contract_cross_reference.md deleted file mode 100644 index 6203ef1b..00000000 --- a/docs/issues/analysis/escrow_contract_cross_reference.md +++ /dev/null @@ -1,283 +0,0 @@ -# Escrow Feature vs. AuthCaptureEscrow Contract: Cross-Reference Analysis - -**Date:** 2025-03-27 -**Contract:** AuthCaptureEscrow at 0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff (Base + Base Sepolia) -**Source:** https://github.com/base/commerce-payments (src/AuthCaptureEscrow.sol) -**Feature:** escrow_round_lifecycle.feature - ---- - -## 1. Multiple capture() Calls Per Authorization - -**Feature assumes:** capture() called once per worker (lines 39-40: capture for 0xAAA then 0xBBB) - -**Contract reality: YES — multiple captures ARE supported.** - -From AuthCaptureEscrow.sol line 258-260: -``` -/// @dev Can be called multiple times up to cumulative authorized amount -``` - -The logic at lines 283-291: -```solidity -if (state.capturableAmount < amount) { - revert InsufficientAuthorization(...) -} -state.capturableAmount -= uint120(amount); -state.refundableAmount += uint120(amount); -``` - -Each capture reduces capturableAmount and increases refundableAmount. The test -`test_succeeds_withMultipleCaptures` in capture.t.sol confirms two consecutive -captures work correctly. - -**Verdict: Feature assumption is correct for multiple calls. BUT SEE ISSUE #4 BELOW.** - ---- - -## 2. void() After Partial Captures (Partial Void) - -**Feature assumes:** void() returns "remaining 30 USDC" after captures (line 42) - -**Contract reality: YES — partial void works correctly.** - -From void() at lines 304-316: -```solidity -uint256 authorizedAmount = paymentState[paymentInfoHash].capturableAmount; -if (authorizedAmount == 0) revert ZeroAuthorization(paymentInfoHash); -paymentState[paymentInfoHash].capturableAmount = 0; -_sendTokens(paymentInfo.operator, paymentInfo.token, paymentInfo.payer, authorizedAmount); -``` - -After partial captures, capturableAmount holds only the REMAINING uncaptured amount. -void() returns exactly that remainder to the payer. It does NOT touch refundableAmount -(previously captured funds). - -**Verdict: Feature assumption is correct.** Capture 70 USDC, void returns 30 USDC. - ---- - -## 3. reclaim() After authorizationExpiry - -**Feature assumes:** "platform wallet calls reclaim() directly" with "no operator signature required" (lines 67-69) - -**Contract reality: PARTIALLY CORRECT — important nuances.** - -From reclaim() at lines 323-340: -```solidity -function reclaim(PaymentInfo calldata paymentInfo) - external nonReentrant onlySender(paymentInfo.payer) { - - if (block.timestamp < paymentInfo.authorizationExpiry) { - revert BeforeAuthorizationExpiry(...) - } - uint256 authorizedAmount = paymentState[paymentInfoHash].capturableAmount; - if (authorizedAmount == 0) revert ZeroAuthorization(paymentInfoHash); - - paymentState[paymentInfoHash].capturableAmount = 0; - _sendTokens(paymentInfo.operator, paymentInfo.token, paymentInfo.payer, authorizedAmount); -} -``` - -Key findings: -- reclaim() is restricted to `onlySender(paymentInfo.payer)` — the PAYER must call it -- No operator involvement needed — feature is correct about "no operator signature" -- Requires `block.timestamp >= authorizationExpiry` — feature's >= condition is correct -- Returns only capturableAmount (remaining after any captures) -- The payer needs the full PaymentInfo struct to call reclaim (for hash verification) - -**ISSUE:** Feature says "platform wallet calls reclaim()". This works ONLY IF the platform -wallet address equals paymentInfo.payer. In our model, the platform locks its own USDC, -so it IS the payer. This is fine but must be explicitly ensured. - -**ISSUE:** The feature says "full 100 USDC returns." This is only true if NO captures -were made before the crash. If any captures happened, reclaim returns only the remainder. -After authorizationExpiry, capture() also stops working (block.timestamp >= check), so -there's no race — once expired, only reclaim works. - -**Verdict: Feature is correct IF platform wallet == payer in PaymentInfo.** - ---- - -## 4. CRITICAL: Receiver Address Per Capture Call - -**Feature assumes:** Different receiver per capture: -``` -capture() is called for "0xAAA" with 42 USDC (line 39) -capture() is called for "0xBBB" with 28 USDC (line 40) -``` - -**Contract reality: NO — receiver is FIXED in PaymentInfo. THIS IS A BLOCKING ISSUE.** - -From capture() at line 295: -```solidity -_distributeTokens(paymentInfo.token, paymentInfo.receiver, amount, feeBps, feeReceiver); -``` - -The PaymentInfo struct (lines 27-52) has a SINGLE `receiver` field: -```solidity -struct PaymentInfo { - address operator; - address payer; - address receiver; // <-- FIXED per authorization - ... -} -``` - -The PaymentInfo hash is computed from ALL fields including receiver. You CANNOT change -the receiver between capture calls because: -1. capture() takes the full PaymentInfo as calldata -2. The hash must match the authorization's hash -3. Changing receiver = different hash = InsufficientAuthorization revert - -**ALL captures from one authorization go to the SAME receiver address.** - -### Workaround Options: - -**Option A: Separate Authorization Per Worker** -Create individual PaymentInfo (with unique salt+receiver) per worker. Each gets its own -authorize() call. Pros: Direct payment to workers. Cons: N authorize() transactions per -round (gas-expensive), requires knowing workers before round starts, complex payer -signature management. - -**Option B: Distributor Contract as Receiver (RECOMMENDED)** -Set receiver to a Splitter/Distributor contract that the platform controls. Flow: -1. authorize() with receiver = DistributorContract -2. capture(fullAmount) sends all to DistributorContract -3. DistributorContract.distribute() pays individual workers -Pros: Single authorize+capture. Cons: Extra contract, extra step, workers don't see -direct on-chain escrow commitment to their individual share. - -**Option C: Platform Wallet as Receiver + Direct Transfers** -Set receiver = platform wallet. After capture, platform sends to workers via standard -ERC20 transfers. Pros: Simplest. Cons: Workers must trust platform for last-mile -distribution — undermines the escrow trust model. - -**Option D: Multiple Authorizations with Multicall** -Use the deployed Multicall3 (at 0xcA11bde05977b3631167028862bE2a173976CA11) to batch -multiple authorize() calls in one transaction, each with a different worker as receiver. -Requires separate payer signatures per authorization. Could work with EIP-7702 batch -or Smart Wallet batching. - -**Verdict: Feature file is INCORRECT. Must be redesigned. Option B or D recommended.** - ---- - -## 5. Refund Flow — Per-Capture or Global? - -**Feature assumes:** "refund() for 42 USDC" (line 74) — appears per-worker amount - -**Contract reality: GLOBAL refund pool, not per-capture.** - -From refund() at lines 351-374: -```solidity -uint120 captured = paymentState[paymentInfoHash].refundableAmount; -if (captured < amount) revert RefundExceedsCapture(amount, captured); -paymentState[paymentInfoHash].refundableAmount = captured - uint120(amount); -``` - -Key findings: -- refundableAmount is the CUMULATIVE total of all captures for that paymentInfoHash -- refund() can return any amount up to that cumulative total -- Refund goes to the PAYER (paymentInfo.payer), not to any specific receiver/worker -- The OPERATOR must provide the refund tokens via a tokenCollector -- Refund has its own expiry: paymentInfo.refundExpiry - -**IMPORTANT:** The operator must have liquidity to fund the refund. The contract pulls -tokens FROM the operator (via OperatorRefundCollector), not from the receiver. This -means: -- After capture sends tokens to the receiver, the operator can't automatically refund -- The operator needs to acquire tokens independently to execute a refund -- In our model, this means the platform (as operator) must hold enough USDC to cover - potential refunds - -**Feature says "42 USDC returns to the platform wallet."** This is correct IF -platform wallet == payer. The refund goes to paymentInfo.payer. - -**Verdict: Feature is approximately correct but oversimplifies. Refund is global, -requires operator to supply tokens, and goes to payer not receiver.** - ---- - -## 6. Reentrancy and Ordering Issues Not Covered - -### Covered by Contract -- **ReentrancyGuardTransient** on all state-changing functions (authorize, capture, void, - reclaim, refund, charge). Uses Solady's transient storage variant. -- **Single authorization** enforced: `hasCollectedPayment` flag prevents double-authorize -- **Void idempotency**: void sets capturableAmount=0, second void reverts ZeroAuthorization -- Reentrancy tests in reentrancy.t.sol confirm protection works - -### Issues NOT Covered by Feature Scenarios - -**A. Authorization Expiry Race Condition** -The feature doesn't cover the case where authorizationExpiry is reached DURING the -capture sequence. If captures for workers are sequential (not batched), the last capture -could fail because block.timestamp >= authorizationExpiry. Must set authorizationExpiry -with generous buffer (feature says "round end + 1 hour" which helps). - -**B. Refund After Partial Void** -After void(), refundableAmount still holds the captured amount. refund() can still be -called. The feature has no scenario for: capture some, void remainder, THEN refund. - -**C. Payer Reclaim vs Operator Void Race** -Both void() (operator) and reclaim() (payer) clear capturableAmount. If both are -attempted near authorizationExpiry: -- Before expiry: only void() works (reclaim reverts) -- After expiry: both revert on void() (no timestamp check) but void still works! - Actually, void() has NO timestamp check — it can be called any time. So the operator - could void() even after authorizationExpiry, before the payer calls reclaim(). - First one to execute wins (both set capturableAmount to 0). - -**D. Fee Rounding on Multiple Small Captures** -Multiple small captures may accumulate rounding errors in fees vs. one large capture. -The contract calculates: `feeAmount = amount * feeBps / 10_000` per capture. With many -small captures, total fees could differ by a few wei from a single large capture. - -**E. Front-Running by Operator** -The operator could theoretically front-run a payer's reclaim() with a capture() just -before authorizationExpiry. The feature doesn't cover adversarial operator behavior. -Since in our model the operator IS the platform, this is self-defeating, but worth noting. - -**F. PaymentInfo Replay Protection** -Each PaymentInfo (identified by hash) can only be authorized once (hasCollectedPayment). -A unique `salt` field prevents hash collisions across rounds. The feature doesn't -explicitly test salt uniqueness per round — if the same salt is reused, authorize() -will revert with PaymentAlreadyCollected. - ---- - -## Summary of Feature File Accuracy - -| Check | Status | Notes | -|-------|--------|-------| -| Multiple capture() calls | CORRECT | Supported, decrements capturableAmount | -| Partial void after captures | CORRECT | Returns only remaining capturableAmount | -| reclaim() after expiry | CORRECT* | *Only if platform wallet == payer | -| Different receiver per capture | **INCORRECT** | **Receiver is fixed in PaymentInfo** | -| Refund flow | MOSTLY CORRECT | Global pool, operator must supply tokens | -| Reentrancy protection | ADEQUATE | transient reentrancy guard on all functions | - -## REQUIRED CHANGES TO FEATURE FILE - -1. **CRITICAL:** Lines 39-40 and 50 must be redesigned. Cannot capture to different - worker addresses from one authorization. Must either: - - Use a distributor/splitter contract as the single receiver - - Create separate authorizations per worker - - Use platform as receiver + off-chain distribution - -2. **Line 42:** void() call is correct but should specify it returns to the PAYER - (paymentInfo.payer), not generically to "platform wallet" - -3. **Line 67-68:** reclaim() must be called by the PAYER address. Feature should - clarify platform wallet == paymentInfo.payer - -4. **Lines 73-75:** refund() pulls tokens FROM the operator, not from the receiver. - The platform (operator) must have separate USDC to fund refunds. - -5. **Missing scenario:** Salt uniqueness per round to avoid PaymentAlreadyCollected - -6. **Missing scenario:** Authorization expiry race during sequential captures - -7. **Missing scenario:** What happens if authorizationExpiry is set too tight and - the last capture fails diff --git a/docs/issues/diagrams/option-a-settlement-flow.md b/docs/issues/diagrams/option-a-settlement-flow.md deleted file mode 100644 index ad3060e7..00000000 --- a/docs/issues/diagrams/option-a-settlement-flow.md +++ /dev/null @@ -1,290 +0,0 @@ -# Option A: Platform Wallet Settlement — No Custom Contracts - -## The Core Idea - -The platform wallet is both the **payer** (locks funds in escrow) and the -**receiver** (gets them back via capture). Then it distributes to workers -and innovators with standard USDC transfers. The "distributor" is a script, -not a smart contract. - -``` -ZERO custom Solidity. -ZERO new deployments. -Only existing audited Commerce Payments contracts + standard ERC20 transfers. -``` - -## Full Flow Diagram - -``` - BASE L2 (on-chain) -┌──────────────────────────────────────────────────────────────────────┐ -│ │ -│ ┌─────────────────┐ ┌────────────────────────────┐ │ -│ │ USDC Contract │ │ AuthCaptureEscrow │ │ -│ │ (Base Mainnet) │ │ 0xBdEA...0cff │ │ -│ │ │ │ (5x audited, deployed) │ │ -│ │ │ │ │ │ -│ │ balanceOf(plat) │ │ ┌────────────────────────┐ │ │ -│ │ balanceOf(w1) │ │ │ TokenStore │ │ │ -│ │ balanceOf(w2) │ │ │ (holds escrowed USDC) │ │ │ -│ │ balanceOf(inn) │ │ └────────────────────────┘ │ │ -│ └────────┬────────┘ └─────────────┬──────────────┘ │ -│ │ │ │ -└───────────│─────────────────────────────────────│────────────────────┘ - │ │ - │ OBOL-STACK CLUSTER │ -┌───────────│─────────────────────────────────────│────────────────────┐ -│ │ │ │ -│ ┌────────▼─────────────────────────────────────▼──────────────┐ │ -│ │ ESCROW ROUND MANAGER │ │ -│ │ (Python script in pod) │ │ -│ │ │ │ -│ │ Holds the platform wallet private key (or Secure Enclave) │ │ -│ │ This is the OPERATOR and the PAYER and the RECEIVER │ │ -│ └─────────────────────────┬───────────────────────────────────┘ │ -│ │ │ -│ ┌────────────────┼────────────────┐ │ -│ │ │ │ │ -│ ┌────────▼──────┐ ┌──────▼──────┐ ┌───────▼─────┐ │ -│ │ Reward Engine │ │ Verifier │ │ Discovery │ │ -│ │ (OPOW calc) │ │ (proofs) │ │ (ERC-8004) │ │ -│ └───────────────┘ └─────────────┘ └─────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -## Step-by-Step: One Round - -``` -STEP 1: AUTHORIZE -════════════════════════════════════════════════════════════════ - - Platform wallet signs ERC-3009 receiveWithAuthorization: - from: platform_wallet (payer) - to: AuthCaptureEscrow - value: 100 USDC (this round's pool) - - PaymentInfo struct: - operator: platform_wallet ← same entity - payer: platform_wallet ← same entity - receiver: platform_wallet ← SAME ENTITY (this is the trick) - token: USDC - maxAmount: 100 USDC - authorizationExpiry: round_end + 1 hour - refundExpiry: round_end + 24 hours - - Call: AuthCaptureEscrow.authorize(paymentInfo, 100, collector, sig) - - Result: - ┌──────────────────┐ ┌──────────────────┐ - │ Platform Wallet │ ──$100──▶ TokenStore │ - │ balance: -100 │ │ (escrowed) │ - │ │ │ capturableAmt=100│ - └──────────────────┘ └──────────────────┘ - - WHY: The 100 USDC is now LOCKED. Platform can't spend it on - anything else. Workers can verify on-chain that the commitment - is real before starting work. - - -STEP 2: WORKERS DO WORK (during the round) -════════════════════════════════════════════════════════════════ - - Workers submit experiments, proofs are verified. - No money moves. This is the same flow as today. - - ┌──────────┐ precommit ┌──────────┐ - │ Worker 1 │ ───────────▶ │ Verifier │ - │ (spark1) │ benchmark │ │ - │ │ ───────────▶ │ records │ - │ │ proof │ quals │ - │ │ ───────────▶ │ │ - └──────────┘ └──────────┘ - - ┌──────────┐ precommit ┌──────────┐ - │ Worker 2 │ ───────────▶ │ Verifier │ - │ (spark2) │ benchmark │ │ - │ │ ───────────▶ │ │ - │ │ proof │ │ - │ │ ───────────▶ │ │ - └──────────┘ └──────────┘ - - -STEP 3: REWARD ENGINE COMPUTES SHARES -════════════════════════════════════════════════════════════════ - - Pool: 100 USDC - Split: 70% workers, 20% innovators, 10% operator - - Worker influence (from OPOW parity formula): - Worker 1: influence = 0.6 → reward = 70 * 0.6 = 42 USDC - Worker 2: influence = 0.4 → reward = 70 * 0.4 = 28 USDC - - Innovator adoption: - "muon-v3": adoption 75% → reward = 20 * 0.75 = 15 USDC - "adamw": adoption 25% → reward = 20 * 0.25 = 5 USDC - - Operator: 10 USDC - - Total to distribute: 42 + 28 + 15 + 5 + 10 = 100 USDC - - -STEP 4: CAPTURE (single call) -════════════════════════════════════════════════════════════════ - - The escrow round manager calls capture() for the FULL distributable amount. - Since receiver = platform_wallet, the USDC comes right back to us. - - Call: AuthCaptureEscrow.capture(paymentInfo, 100, feeBps=0, feeReceiver) - - Result: - ┌──────────────────┐ ┌──────────────────┐ - │ TokenStore │ ──$100──▶ Platform Wallet │ - │ capturableAmt=0 │ │ balance: +100 │ - │ │ │ (back to us) │ - └──────────────────┘ └──────────────────┘ - - WHY capture to ourselves instead of just voiding? - Because capture creates an ON-CHAIN RECORD of settlement. - Anyone can verify the round was settled by reading the events. - void() would look like the round was cancelled. - - -STEP 5: DISTRIBUTE (standard ERC20 transfers) -════════════════════════════════════════════════════════════════ - - Now the platform wallet holds the USDC and distributes directly. - These are plain USDC.transfer() calls. No custom contract. - - ┌──────────────────┐ - │ Platform Wallet │ - │ balance: 100 │ - │ │ - │ transfer(W1, 42)│──── 42 USDC ────▶ Worker 1 wallet - │ transfer(W2, 28)│──── 28 USDC ────▶ Worker 2 wallet - │ transfer(I1, 15)│──── 15 USDC ────▶ Innovator 1 wallet - │ transfer(I2, 5)│──── 5 USDC ────▶ Innovator 2 wallet - │ (keep 10) │ Operator keeps 10 - │ │ - │ balance: 10 │ - └──────────────────┘ - - Each transfer is a separate on-chain tx. - With 10 workers + 5 innovators = 15 transfers ≈ 15 * $0.001 gas ≈ $0.015 - (Base L2 gas is extremely cheap) - - -STEP 6: VOID (cleanup — usually a no-op) -════════════════════════════════════════════════════════════════ - - If we captured the full amount, void() has nothing to return. - If we captured less (e.g., some workers didn't qualify), void() - returns the remainder to the platform wallet. - - Call: AuthCaptureEscrow.void(paymentInfo) - - Result: capturableAmount (if any) returns to payer. - - -STEP 7: NEXT ROUND -════════════════════════════════════════════════════════════════ - - Repeat from Step 1 with the new round's pool. - The operator's 10 USDC stays in the platform wallet. - New x402 payments from buyers add to the next pool. -``` - -## Why This Works - -``` -QUESTION ANSWER -────────────────────────────────────────────────────────────────── -"Isn't receiver=payer circular?" Yes, intentionally. We use - the escrow for COMMITMENT - (locked, verifiable on-chain) - not for routing. - -"Why not just transfer directly Because authorize() creates - to workers without escrow?" a verifiable on-chain commitment. - Workers see the locked pool - BEFORE doing work. Without it, - workers have to trust the - platform will pay. - -"What if the manager crashes?" After authorizationExpiry, - platform wallet calls reclaim(). - Money comes back. Workers - don't get paid for that round, - but no funds are lost. - -"What if a transfer to a worker The other transfers already - fails?" succeeded (each is independent). - Retry the failed one. USDC - transfer failures are almost - always gas-related, not - permanent. - -"Can workers verify they'll Yes. On-chain: - get paid?" 1. Read capturableAmount = pool - 2. Read authorizationExpiry > now - 3. PaymentInfo is deterministic - from round params - 4. If locked, the math determines - their share (public formula) - -"What about front-running the capture() is called by the - distribution?" operator (platform wallet) only. - Nobody else can capture. - Workers receive standard ERC20 - transfers after capture. - -"Is this as secure as a custom MORE secure. We use 5x-audited - distributor contract?" Commerce Payments + standard - ERC20 transfers. A custom - contract is a new attack surface. -``` - -## What Each Party Sees On-Chain - -``` -WORKER'S PERSPECTIVE: - Before work: - → AuthCaptureEscrow.getPaymentState(hash) shows capturableAmount = 100 USDC - → "Pool is committed, I'll get paid if I do good work" - - After round: - → USDC.Transfer(platform_wallet → my_wallet, 42 USDC) - → "I got paid" - - Audit trail: - → Authorized event (round start, 100 USDC locked) - → Captured event (round end, 100 USDC settled) - → Transfer events (42 USDC to me, 28 to other worker, etc) - -INNOVATOR'S PERSPECTIVE: - → Same as worker but smaller amount (adoption-weighted) - -PLATFORM OPERATOR'S PERSPECTIVE: - → authorize(): 100 USDC leaves wallet to escrow - → capture(): 100 USDC returns to wallet from escrow - → transfer(): 90 USDC leaves wallet to participants - → Net: kept 10 USDC (operator share) - → All events auditable on BaseScan -``` - -## Contract Interaction Summary - -``` -CALL WHO SIGNS CONTRACT WHAT HAPPENS -───────────────────────────────────────────────────────────────────────────────────────────── -authorize(paymentInfo, amt) platform wallet AuthCaptureEscrow USDC locked -capture(paymentInfo, amt) platform wallet AuthCaptureEscrow USDC returned to platform -void(paymentInfo) platform wallet AuthCaptureEscrow remainder returned -reclaim(paymentInfo) platform wallet AuthCaptureEscrow safety recovery -USDC.transfer(worker, amt) platform wallet USDC ERC20 worker gets paid -USDC.transfer(innovator, amt) platform wallet USDC ERC20 innovator gets paid - -Total contracts called: 2 (AuthCaptureEscrow + USDC) -Custom contracts deployed: 0 -New audits needed: 0 -``` diff --git a/docs/issues/features/FEATURE_REVIEW.md b/docs/issues/features/FEATURE_REVIEW.md deleted file mode 100644 index 3c6f5379..00000000 --- a/docs/issues/features/FEATURE_REVIEW.md +++ /dev/null @@ -1,547 +0,0 @@ -# Feature File Review: Gherkin Best Practices, ERC-8004 & x402 Edge Cases - -## Files Reviewed -1. multi_tier_worker_discovery_with_fallback.feature (47 lines) -2. reward_pool_distribution_across_roles.feature (59 lines) -3. end_to_end_autoresearch_round.feature (59 lines) -4. opow_influence_calculation_with_anti_monopoly_parity.feature (69 lines) -5. escrow_round_lifecycle.feature (76 lines) -6. commit_reveal_work_verification.feature (63 lines) - ---- - -## 1. GHERKIN BEST PRACTICES ISSUES - -### 1a. Missing tags for filtering -- discovery.feature: has @discovery, good -- reward.feature: has @rewards, good -- e2e.feature: has @e2e @slow, good -- opow.feature: has @opow @critical, good -- escrow.feature: has @escrow @critical, good -- verification.feature: has @verification @critical, good -- ISSUE: No @erc8004 or @x402 tags on relevant scenarios. These cross-cutting - concerns should be tagged for protocol-specific test runs. - -### 1b. Over-long E2E scenario -- end_to_end_autoresearch_round.feature "Complete round with two honest workers" - has ~25 steps with inline comments. Gherkin best practice says scenarios should - be 5-10 steps max. The comments (# Round setup, # Worker experiments, etc.) - are a code smell indicating this should be split into focused scenarios or use - a scenario outline. -- RECOMMENDATION: Split into "Round initialization with escrow", "Worker - experiments and verification", "Reward settlement" as separate scenarios - chained via shared state, or keep as a single narrative but trim to essential - assertions. - -### 1c. Magic numbers without context -- escrow.feature line 39: "42 USDC" and "28 USDC" appear without showing the - math. A reader must compute 100 * 0.7 * 0.6 = 42 themselves. Add a comment - or use a scenario outline with formula reference. -- opow.feature: penalty values (0.65, 0.11, 0.05) lack formula reference. - Consider adding a comment showing the parity formula being applied. - -### 1d. Inconsistent Background granularity -- escrow.feature Background specifies contract address "0xBdEA0D1bcC5...", - which is good for precision. -- discovery.feature Background uses only OASF skill filter but no contract - address or chain ID. Since the ERC-8004 contract is at a fixed address on - Base, this should be specified. - -### 1e. No "Rule:" groupings in some files -- e2e.feature lacks Rule: groupings entirely. Even for an E2E feature, Rules - help organize the phases (setup, execution, settlement). - ---- - -## 2. ERC-8004 METADATA READING GAPS (Question 1) - -### What exists: -- discovery.feature mentions "ERC-8004 NFT metadata is read for each agent" - (line 24) and "workers with the model_versioning skill are returned" but - this is extremely shallow. - -### What is MISSING: - -#### 2a. No tokenURI resolution scenario -The ERC-8004 spec requires calling tokenURI(tokenId) to get the registration -JSON URL. There is no scenario testing: -- tokenURI returns a valid HTTPS URL -- tokenURI returns an IPFS URL (needs gateway resolution) -- tokenURI returns empty/malformed data -- tokenURI call reverts (token burned or contract paused) - -RECOMMENDED SCENARIO: -```gherkin -Scenario: Discovery resolves tokenURI to registration JSON - Given worker "0xW001" has ERC-8004 token ID 12345 - When the coordinator calls tokenURI(12345) - Then a valid registration JSON URL is returned - And the JSON is fetched and parsed - -Scenario: Discovery handles IPFS tokenURI with gateway fallback - Given worker "0xW001" has tokenURI "ipfs://Qm..." - When the coordinator resolves the tokenURI - Then the IPFS gateway is used to fetch the registration JSON - And the registration is successfully parsed -``` - -#### 2b. No registration JSON schema validation scenario -The ERC-8004 AgentRegistration document has specific required fields (name, -description, services[], supportedTrust[]). No scenario tests: -- JSON missing required fields -- JSON with unknown/extra fields -- JSON with invalid service types -- JSON with services[].endpoint that is unreachable - -RECOMMENDED SCENARIO: -```gherkin -Scenario: Discovery rejects agent with malformed registration JSON - Given agent "0xW003" has registration JSON missing "services" field - When the coordinator parses the registration - Then agent "0xW003" is excluded from discovery results - And a warning is logged with the token ID and missing field -``` - -#### 2c. No OASF taxonomy filtering scenario -The Background says `the OASF skill filter is "devops_mlops/model_versioning"` -but there is NO scenario testing: -- How the skill filter maps to registration JSON fields -- What happens when an agent has multiple skills (partial match) -- What happens when an agent has no skills listed -- Hierarchical taxonomy matching (e.g., "devops_mlops/*" wildcard) -- The ServiceOffer CRD has services[].name with types: web, A2A, MCP, OASF, - ENS, DID, email — but the feature file never references OASF as a service type - -RECOMMENDED SCENARIOS: -```gherkin -Scenario: Discovery filters agents by OASF taxonomy path - Given agent "0xW001" has OASF service with skill "devops_mlops/model_versioning" - And agent "0xW002" has OASF service with skill "security/threat_detection" - When the coordinator discovers workers with filter "devops_mlops/model_versioning" - Then only agent "0xW001" is returned - -Scenario: Discovery supports wildcard OASF taxonomy matching - Given agent "0xW001" has skill "devops_mlops/model_versioning" - And agent "0xW002" has skill "devops_mlops/container_orchestration" - When the coordinator discovers workers with filter "devops_mlops/*" - Then both agents are returned - -Scenario: Agent with no OASF service entry is excluded from skill-filtered queries - Given agent "0xW003" has only a "web" service entry (no OASF) - When the coordinator discovers workers with any OASF skill filter - Then agent "0xW003" is not in the results -``` - ---- - -## 3. x402 PaymentRequirements & ESCROW SCHEME (Question 2) - -### What exists: -- escrow.feature tests authorize(), capture(), void(), reclaim(), refund() -- e2e.feature mentions "x402 payments were collected" as a precondition -- ServiceOffer CRD defines payment.scheme as "exact" (only enum value) - -### What is MISSING: - -#### 3a. No PaymentRequirements generation scenario -The ServiceOffer CRD (serviceoffer-crd.yaml) defines x402 PaymentRequirements -fields (payTo, network, scheme, maxTimeoutSeconds, price) but NO feature tests: -- PaymentRequirements struct generation from ServiceOffer spec -- CAIP-2 network resolution ("base-sepolia" -> "eip155:84532") -- maxTimeoutSeconds enforcement -- Price calculation (perRequest vs perMTok vs perHour) - -RECOMMENDED SCENARIOS: -```gherkin -@x402 -Scenario: ServiceOffer generates valid x402 PaymentRequirements - Given a ServiceOffer with network "base-sepolia" and payTo "0xAAA" - And price.perRequest is "0.01" - And scheme is "exact" - When the reconciler generates PaymentRequirements - Then the network field is "eip155:84532" (CAIP-2) - And the payTo field is "0xAAA" - And the maxAmountRequired matches "0.01" in USDC base units - -Scenario: Escrow scheme PaymentRequirements includes authorization metadata - Given the escrow round manager is preparing round 7 - And the reward pool is 60 USDC - When PaymentRequirements are generated for the escrow scheme - Then the scheme field is "escrow" - And the authorizationId references the current round - And the maxAmountRequired equals 60 USDC - And the authorizationExpiry is set to round_end + grace_period -``` - -#### 3b. No "exact" vs "escrow" scheme distinction -The CRD only allows scheme: "exact". But the escrow round lifecycle clearly -uses a different payment flow (authorize/capture/void). There is no scenario -showing how the two schemes coexist: -- Worker earns via x402 "exact" scheme (instant per-request payments) -- Platform collects those payments, then uses escrow for reward distribution -- The feature files treat these as independent but never show the handoff - -RECOMMENDED SCENARIO: -```gherkin -Scenario: x402 exact payments flow into escrow pool for next round - Given workers served 200 x402 "exact" scheme requests in round N - And the total collected USDC is 200 - When round N+1 begins - Then the escrow pool is 200 * 30% = 60 USDC - And the escrow authorization uses the "escrow" scheme internally - And workers can verify the pool amount on-chain -``` - ---- - -## 4. MISSING EDGE CASE SCENARIOS (Question 3) - -### 4a. Worker re-registration mid-round -NO SCENARIO EXISTS. Critical gap because ERC-8004 allows updating registration -at any time. - -```gherkin -@erc8004 -Scenario: Worker updates ERC-8004 registration mid-round - Given worker "0xW001" is participating in round 5 - And worker "0xW001" updates their registration JSON to remove the - "devops_mlops/model_versioning" skill - When the round completes - Then worker "0xW001" still qualifies for round 5 (snapshot at round start) - But worker "0xW001" is NOT discovered for round 6 - -Scenario: Worker re-registers with a different address mid-round - Given worker "0xW001" is participating in round 5 - And worker "0xW001" registers a new ERC-8004 token with address "0xW001b" - When the round completes - Then only the original registration is used for round 5 settlement -``` - -### 4b. ERC-8004 NFT transfer during round -NO SCENARIO EXISTS. Since ERC-8004 tokens are NFTs, they can be transferred. -This could cause: -- Worker loses ownership of their identity mid-round -- New owner could try to claim rewards -- The payTo address no longer matches the NFT owner - -```gherkin -@erc8004 -Scenario: ERC-8004 NFT transferred during active round - Given worker "0xW001" owns ERC-8004 token 12345 - And worker "0xW001" is a qualifier in round 5 - When token 12345 is transferred to "0xATTACKER" during the round - Then capture() still pays "0xW001" (the address that did the work) - And the new NFT owner "0xATTACKER" receives nothing for round 5 - -Scenario: Discovery uses token ownership snapshot at round start - Given worker "0xW001" owned token 12345 at block 1000 (round start) - And token 12345 was transferred to "0xW002" at block 1005 - When the coordinator discovers workers at round start - Then "0xW001" is the registered worker, not "0xW002" -``` - -### 4c. x402 facilitator timeout -NO SCENARIO EXISTS. The ServiceOffer CRD has maxTimeoutSeconds (default: 300). - -```gherkin -@x402 -Scenario: x402 payment verification times out - Given a ServiceOffer with maxTimeoutSeconds of 300 - And a buyer sends a payment header - When the x402 facilitator does not respond within 300 seconds - Then the payment is considered failed - And the request is rejected with HTTP 402 - And no USDC is deducted from the buyer - -Scenario: x402 facilitator timeout during round does not affect escrow - Given the escrow authorization for round 5 is already locked - And an x402 facilitator timeout occurs during the round - Then the escrow authorization remains valid - And workers can still submit proofs - But the affected request is not counted toward x402 revenue for round 6 -``` - -### 4d. BaseScan rate limiting -NO SCENARIO EXISTS. BaseScan free tier is 5 req/sec. With 18,512 ERC-8004 -holders, pagination + metadata fetching will hit limits. - -```gherkin -@discovery -Scenario: BaseScan API returns HTTP 429 rate limit - Given the coordinator is using BaseScan for discovery - And the BaseScan API returns HTTP 429 after 5 requests - When the coordinator discovers workers - Then the coordinator implements exponential backoff - And retries after the Retry-After header period - And eventually returns partial results with a warning - -Scenario: BaseScan rate limiting causes fallback to 8004scan - Given the coordinator is using BaseScan for discovery - And the BaseScan API consistently returns HTTP 429 - When 3 consecutive retries fail - Then the coordinator falls back to 8004scan.io - And workers are still discovered successfully -``` - -### 4e. Chain reorg affecting indexer data -NO SCENARIO EXISTS. The Reth indexer syncs Base chain data. Base has finality -~2 seconds but reorgs do happen. - -```gherkin -@discovery -Scenario: Chain reorg removes a recent ERC-8004 registration - Given the Reth indexer has synced to block 1000 - And agent "0xNEW" was registered at block 999 - When a 2-block reorg occurs at block 999 - And the new chain does not contain the registration transaction - Then the indexer re-processes blocks 999-1000 - And agent "0xNEW" is removed from discovery results - -Scenario: Coordinator uses confirmation depth for registration finality - Given the Reth indexer requires 12-block confirmation depth - And a new registration appears at block 1000 - When the current block is 1005 (only 5 confirmations) - Then the registration is not yet included in discovery results - When the current block reaches 1012 - Then the registration becomes discoverable -``` - ---- - -## 5. LEADERBOARD API FEATURE (Question 4) - -YES, a leaderboard feature is needed. Currently: -- The issue doc says "Exposes leaderboard API (GET /leaderboard, GET /round/:id)" -- The e2e feature mentions it once (line 43): "the leaderboard API shows both - workers with correct earnings" -- But there is NO dedicated feature file testing the leaderboard API - -RECOMMENDED: New file `leaderboard_api.feature`: - -```gherkin -@leaderboard @api -Feature: Leaderboard API - The reward engine exposes a leaderboard API that shows per-worker - earnings, influence, and qualification status across rounds. - - Background: - Given the autoresearch chart is deployed - And rounds 1-3 have completed with verified workers - - Rule: Current round leaderboard shows live state - - Scenario: GET /leaderboard returns ranked workers by cumulative earnings - When a client requests GET /leaderboard - Then workers are returned sorted by total USDC earned descending - And each entry includes address, total_earned, rounds_participated, avg_influence - - Scenario: GET /leaderboard includes only ERC-8004 registered workers - Given worker "0xAAA" is registered on ERC-8004 - And worker "0xBBB" was registered but token was burned - When a client requests GET /leaderboard - Then worker "0xAAA" appears in results - And worker "0xBBB" does not appear - - Rule: Per-round details are queryable - - Scenario: GET /round/:id returns round-specific data - When a client requests GET /round/3 - Then the response includes: - | field | type | - | round_id | integer | - | pool_usdc | decimal | - | num_qualifiers | integer | - | num_excluded | integer | - | captures | array | - | voided_usdc | decimal | - | escrow_tx_hash | string | - - Scenario: GET /round/:id for non-existent round returns 404 - When a client requests GET /round/9999 - Then the response status is 404 - - Rule: Leaderboard reflects escrow settlement accurately - - Scenario: Leaderboard updates only after capture() confirms on-chain - Given round 5 just completed - And capture() has been called for worker "0xAAA" - But the transaction is still pending - When a client requests GET /leaderboard - Then worker "0xAAA" earnings do NOT include round 5 yet - When the capture transaction confirms - Then worker "0xAAA" earnings include round 5 -``` - ---- - -## 6. ROUND-OVER-ROUND STATE FEATURE (Question 5) - -YES, a round-over-round state feature is needed. Currently: -- reward.feature line 45 mentions "the unadopted share rolls into the next round" -- The issue doc describes void() returning uncaptured funds for the next round -- But there is NO feature testing the cumulative/rollover mechanics - -RECOMMENDED: New file `round_over_round_state.feature`: - -```gherkin -@rounds @state -Feature: Round-over-round state (pool rollover, cumulative earnings) - The reward engine maintains state across rounds, rolling uncaptured - funds into the next round's pool and tracking cumulative worker - performance. - - Background: - Given the reward pool percentage is 30% - And the platform wallet holds 1000 USDC - - Rule: Uncaptured funds roll into the next round - - Scenario: Voided USDC from round N increases round N+1 pool - Given round 1 collected 200 USDC in x402 payments - And the round 1 pool was 60 USDC - And only 40 USDC was captured (void returned 20 USDC) - When round 2 begins - Then the round 2 pool includes the 20 USDC rollover - And the total round 2 pool is (round2_x402_revenue * 30%) + 20 USDC - - Scenario: Fully captured round has zero rollover - Given round 1 pool was 60 USDC - And all 60 USDC was captured across workers - When round 2 begins - Then the round 2 pool is exactly (round2_x402_revenue * 30%) - - Rule: Cumulative earnings are tracked per worker - - Scenario: Worker earnings accumulate across rounds - Given worker "0xAAA" earned 42 USDC in round 1 - And worker "0xAAA" earned 35 USDC in round 2 - When the cumulative earnings are queried - Then worker "0xAAA" has total earnings of 77 USDC - - Scenario: Worker who skips a round retains prior earnings - Given worker "0xAAA" earned 42 USDC in round 1 - And worker "0xAAA" did not participate in round 2 - When the cumulative earnings are queried after round 2 - Then worker "0xAAA" still has total earnings of 42 USDC - - Rule: Round numbering is monotonic and gap-free - - Scenario: Failed round start still increments round counter - Given round 5 completed successfully - And round 6 failed to authorize escrow (insufficient funds) - When the next successful authorization occurs - Then it is labeled round 7 (round 6 is recorded as failed) - And round 6 shows zero pool and zero captures in history -``` - ---- - -## 7. ERC-8004 IDENTITY REVOCATION/DEACTIVATION MID-ROUND (Question 6) - -NO SCENARIOS EXIST. This is a critical gap. ERC-8004 tokens can be: -- Burned (destroying the identity) -- Transferred (changing ownership) -- Have their registration JSON updated (removing services) - -RECOMMENDED SCENARIOS (add to discovery.feature or new file): - -```gherkin -@erc8004 @critical -Rule: ERC-8004 identity changes during active round - - Scenario: Worker's ERC-8004 token is burned mid-round - Given worker "0xW001" has ERC-8004 token 12345 - And worker "0xW001" is a qualifier in round 5 with verified proofs - When token 12345 is burned during the round - Then worker "0xW001" STILL receives their capture for round 5 - (work was verified before identity was revoked) - But worker "0xW001" is NOT discovered for round 6 - - Scenario: Worker's ERC-8004 registration JSON is updated to remove services - Given worker "0xW001" has OASF service "devops_mlops/model_versioning" - And worker "0xW001" is participating in round 5 - When worker "0xW001" updates their registration JSON to remove all services - Then round 5 continues using the snapshot taken at round start - And worker "0xW001" is excluded from round 6 discovery - - Scenario: ERC-8004 contract is paused during active round - Given the ERC-8004 contract at 0x8004...9432 is paused - And round 5 has already started with discovered workers - When the round completes - Then captures are still executed (escrow is independent of ERC-8004) - But round 6 discovery fails with "ERC-8004 contract paused" error - - Scenario: Worker address is sanctioned/blocklisted mid-round - Given worker "0xW001" is a qualifier in round 5 - When address "0xW001" appears on a sanctions list - Then the escrow round manager skips capture for "0xW001" - And the uncaptured amount is voided back to the platform - And the event is logged with the sanctions reason -``` - ---- - -## 8. ADDITIONAL MISSING SCENARIOS - -### 8a. Concurrent round edge cases -```gherkin -Scenario: Two rounds cannot be active simultaneously - Given round 5 is in progress with active escrow authorization - When the system attempts to start round 6 - Then the start is rejected with "round 5 still active" - And no new escrow authorization is created -``` - -### 8b. Gas price spike during settlement -```gherkin -Scenario: Gas price spike during capture phase - Given round 5 has 10 workers to pay - And 5 captures have succeeded - When the Base gas price exceeds the configured maxGasPrice - Then remaining captures are queued for retry - And the escrow authorization has not expired yet - And captures resume when gas price drops -``` - -### 8c. Zero-worker round -```gherkin -Scenario: Round starts but no workers respond - Given round 5 authorized 60 USDC in escrow - And no workers submitted precommitments - When the round duration expires - Then void() returns all 60 USDC to the platform - And the round is recorded with zero qualifiers -``` - -### 8d. Duplicate ERC-8004 registrations (same address, multiple tokens) -```gherkin -Scenario: Worker holds multiple ERC-8004 tokens with same skill - Given address "0xW001" owns tokens 12345 and 12346 - And both tokens have "devops_mlops/model_versioning" skill - When the coordinator discovers workers - Then "0xW001" appears only once (deduplicated by address) - And the most recent registration is used -``` - ---- - -## 9. SUMMARY OF RECOMMENDATIONS - -| Priority | Gap | Affected File(s) | Action | -|----------|-----|-------------------|--------| -| P0 | No tokenURI resolution testing | discovery.feature | Add 3 scenarios | -| P0 | No OASF taxonomy filtering tests | discovery.feature | Add 3 scenarios | -| P0 | No ERC-8004 identity revocation mid-round | NEW: erc8004_identity_lifecycle.feature | Create file | -| P0 | No NFT transfer during round | NEW or discovery.feature | Add 2 scenarios | -| P1 | No PaymentRequirements generation tests | NEW: x402_payment_requirements.feature | Create file | -| P1 | No leaderboard API feature | NEW: leaderboard_api.feature | Create file | -| P1 | No round-over-round state feature | NEW: round_over_round_state.feature | Create file | -| P1 | No BaseScan rate limiting scenarios | discovery.feature | Add 2 scenarios | -| P1 | No chain reorg scenarios | discovery.feature | Add 2 scenarios | -| P2 | No x402 facilitator timeout scenarios | escrow.feature or NEW | Add 2 scenarios | -| P2 | No gas spike during settlement | escrow.feature | Add 1 scenario | -| P2 | E2E scenario too long | e2e.feature | Refactor | -| P3 | Missing @erc8004 @x402 tags | All files | Add tags | -| P3 | Magic numbers without context | escrow.feature, opow.feature | Add comments | - -Total: 6 existing files need enhancements, 3-4 new feature files recommended. diff --git a/docs/issues/features/commit_reveal_work_verification.feature b/docs/issues/features/commit_reveal_work_verification.feature deleted file mode 100644 index cecb9cfc..00000000 --- a/docs/issues/features/commit_reveal_work_verification.feature +++ /dev/null @@ -1,63 +0,0 @@ -@verification @critical -Feature: Commit-reveal work verification - Workers commit to results via a Merkle root before learning - which nonces will be sampled. This prevents retroactive - fabrication of results. - - Background: - Given the verifier is running - And the neuralnet_optimizer challenge is active - And the sample count is 5 nonces per benchmark - - Rule: Honest workers pass verification - - Scenario: Worker with valid proofs becomes a qualifier - Given worker "0xAAA" precommits a benchmark with 100 nonces - And the verifier assigns a random hash and track - When worker "0xAAA" submits a Merkle root over 100 results - And the verifier samples 5 nonces for verification - And worker "0xAAA" submits valid Merkle proofs for all 5 - Then worker "0xAAA" is recorded as a qualifier - And the benchmark quality scores are accepted - - Scenario: Re-execution confirms claimed quality - Given worker "0xAAA" claims val_bpb of 3.2 for nonce 42 - When the verifier re-executes nonce 42 with the same settings - Then the re-executed val_bpb matches the claimed 3.2 - And the proof is accepted - - Rule: Dishonest workers fail verification - - Scenario: Invalid Merkle proof is rejected - Given worker "0xCCC" submitted a Merkle root - And the verifier sampled nonces [7, 23, 45, 61, 89] - When worker "0xCCC" submits a proof for nonce 23 that does not match the root - Then the verification fails for worker "0xCCC" - And worker "0xCCC" is excluded from qualifiers for this round - And no escrow capture is made for worker "0xCCC" - - Scenario: Worker who inflates quality scores is caught - Given worker "0xCCC" claims val_bpb of 2.8 for nonce 42 - When the verifier re-executes nonce 42 with the same settings - And the re-executed val_bpb is 3.5 - Then the quality mismatch is detected - And the verification fails for worker "0xCCC" - - Scenario: Worker who times out on proof submission is excluded - Given worker "0xCCC" submitted a Merkle root - And the verifier sampled 5 nonces - When worker "0xCCC" does not submit proofs within 300 seconds - Then worker "0xCCC" is excluded from qualifiers - And the round proceeds without them - - Rule: Sampling is fair and deterministic - - Scenario: Nonce sampling is deterministic from the round seed - Given the same benchmark settings and random hash - When nonces are sampled twice - Then the same 5 nonces are selected both times - - Scenario: Worker cannot predict which nonces will be sampled - Given the random hash is derived from a future block hash - When the worker commits their Merkle root - Then the sampled nonces have not yet been determined diff --git a/docs/issues/features/end_to_end_autoresearch_round.feature b/docs/issues/features/end_to_end_autoresearch_round.feature deleted file mode 100644 index d7b4fa35..00000000 --- a/docs/issues/features/end_to_end_autoresearch_round.feature +++ /dev/null @@ -1,59 +0,0 @@ -@e2e @slow -Feature: End-to-end autoresearch round - A complete round from escrow authorization through worker - experiments to reward distribution and settlement. - - Background: - Given the autoresearch chart is deployed with default values - And an Anvil fork of Base Sepolia is running - And the platform wallet holds 500 USDC - And 2 GPU workers are registered on ERC-8004: - | address | skill | gpu | - | 0xW001 | devops_mlops/model_versioning | NVIDIA T4 | - | 0xW002 | devops_mlops/model_versioning | NVIDIA A10 | - And 1 innovator submitted algorithm "muon-opt-v2" for neuralnet_optimizer - - Scenario: Complete round with two honest workers - # Round setup - Given 100 USDC of x402 payments were collected in the previous round - When a new round begins - Then 30 USDC is authorized in escrow - - # Worker experiments - When worker "0xW001" precommits a benchmark with 50 nonces - And worker "0xW002" precommits a benchmark with 50 nonces - And both workers submit Merkle roots over their results - And the verifier samples 5 nonces from each worker - And both workers submit valid Merkle proofs - Then both workers are recorded as qualifiers - - # Reward calculation - When the round duration expires - Then the reward engine computes influence for both workers - And both workers have balanced challenge participation - And influence is split proportionally to qualifier count - - # Settlement - When captures are executed - Then worker "0xW001" receives their earned USDC via capture() - And worker "0xW002" receives their earned USDC via capture() - And innovator "muon-opt-v2" receives adoption-weighted USDC - And the operator receives 10% of the pool - And void() returns any remainder to the platform wallet - And the leaderboard API shows both workers with correct earnings - And the next round begins with a new authorization - - Scenario: Round where one worker submits fraudulent proofs - Given 100 USDC of x402 payments were collected - When a new round begins - Then 30 USDC is authorized in escrow - - When worker "0xW001" submits valid proofs for all sampled nonces - And worker "0xW002" submits a proof with a quality mismatch - Then worker "0xW001" is a qualifier - And worker "0xW002" is excluded - - When captures are executed - Then worker "0xW001" receives the entire worker pool share - And worker "0xW002" receives nothing - And void() returns worker "0xW002"'s unclaimed share to the platform diff --git a/docs/issues/features/erc8004_identity_lifecycle.feature b/docs/issues/features/erc8004_identity_lifecycle.feature deleted file mode 100644 index 42cb8e43..00000000 --- a/docs/issues/features/erc8004_identity_lifecycle.feature +++ /dev/null @@ -1,105 +0,0 @@ -@erc8004 @identity -Feature: ERC-8004 identity lifecycle during rounds - Workers are identified by ERC-8004 agent NFTs on Base. - The system must handle registration, metadata updates, - NFT transfers, and deactivation gracefully during - active rounds. - - Background: - Given the ERC-8004 Identity Registry is at "0x8004A169FB4a3325136EB29fA0ceB6D2e539a432" - And the OASF skill filter is "devops_mlops/model_versioning" - - Rule: Only registered agents can participate - - Scenario: Worker with valid ERC-8004 registration is discovered - Given worker "0xW001" holds agent NFT token ID 12345 - And the NFT metadata includes skill "devops_mlops/model_versioning" - And the registration JSON at .well-known/agent-registration.json is valid - When the coordinator discovers workers - Then worker "0xW001" appears in the results - And the worker's x402 endpoint is read from the registration services list - - Scenario: Worker without ERC-8004 registration is excluded - Given worker "0xW002" has no agent NFT - When the coordinator discovers workers - Then worker "0xW002" does not appear in the results - - Scenario: Worker with wrong OASF skill is filtered out - Given worker "0xW003" holds agent NFT token ID 12346 - And the NFT metadata includes skill "communication/chat" but not "devops_mlops/model_versioning" - When the coordinator discovers workers with skill filter - Then worker "0xW003" does not appear in the results - - Rule: Metadata updates are reflected in discovery - - Scenario: Worker updates best_val_bpb in registration metadata - Given worker "0xW001" registered with best_val_bpb of 3.5 - When worker "0xW001" calls URIUpdated with best_val_bpb of 3.1 - And the discovery cache TTL expires - Then the coordinator sees worker "0xW001" with best_val_bpb 3.1 - - Scenario: Worker adds a new OASF skill to their registration - Given worker "0xW004" registered with skill "data_processing/etl" - When worker "0xW004" updates metadata to add "devops_mlops/model_versioning" - Then worker "0xW004" becomes discoverable by the coordinator - - Rule: Identity is snapshotted at benchmark acceptance - - # At the moment a worker's benchmark is accepted (precommit confirmed), - # the verifier snapshots: ownerOf(tokenId), payout wallet (from - # registration JSON), and registration metadata. All reward routing - # for that round uses the SNAPSHOT, not live on-chain state. - # This prevents mid-round transfers or metadata updates from - # redirecting or nullifying rewards after work is accepted. - - Scenario: Snapshot captures payout wallet at benchmark acceptance - Given worker "0xW001" holds agent NFT token ID 12345 - And the registration JSON lists payout wallet "0xPAY1" - When worker "0xW001"'s precommit is accepted by the verifier - Then the verifier snapshots owner "0xW001" and payout "0xPAY1" - And rewards for this round are routed to "0xPAY1" regardless of later changes - - Scenario: NFT transferred mid-round does not redirect rewards - Given worker "0xW001" is a qualifier in the current round - And the snapshot records payout wallet "0xPAY1" - When worker "0xW001" transfers their agent NFT to "0xNEW" - And "0xNEW" updates the registration payout to "0xPAY_NEW" - And the round completes - Then the RewardDistributor sends "0xW001"'s share to "0xPAY1" - And "0xNEW" is the registered owner for subsequent rounds - - Scenario: Worker deactivates registration mid-round - Given worker "0xW001" is a qualifier in the current round - And worker "0xW001" sets registration active=false - When the round completes - Then the distribution includes "0xW001"'s verified work from this round - And "0xW001" is excluded from discovery in the next round - - Scenario: Metadata URI update mid-round does not affect current snapshot - Given worker "0xW001" is a qualifier with snapshotted best_val_bpb 3.2 - When worker "0xW001" calls URIUpdated with best_val_bpb 2.8 - Then the current round still uses the snapshotted 3.2 - And the next round's discovery will reflect 2.8 - - Scenario: Burned agent NFT removes worker from future rounds only - Given worker "0xW001" holds agent NFT token ID 12345 - And worker "0xW001" is a qualifier in the current round - When the NFT is burned (transferred to address zero) - Then the current round's rewards are still distributed per snapshot - And worker "0xW001" is removed from all discovery backends - And "0xW001" cannot participate in subsequent rounds - - Rule: Registration JSON schema is validated - - Scenario: Malformed registration JSON is rejected - Given worker "0xW005" has a tokenURI pointing to invalid JSON - When the discovery client fetches the registration - Then worker "0xW005" is skipped with a schema validation warning - And discovery continues with remaining workers - - Scenario: Registration JSON with missing x402 endpoint is skipped - Given worker "0xW006" has valid registration JSON - But the services list contains no x402-compatible endpoint - When the coordinator discovers workers - Then worker "0xW006" is excluded - And a warning is logged about missing x402 endpoint diff --git a/docs/issues/features/escrow_round_lifecycle.feature b/docs/issues/features/escrow_round_lifecycle.feature deleted file mode 100644 index 15490a9f..00000000 --- a/docs/issues/features/escrow_round_lifecycle.feature +++ /dev/null @@ -1,87 +0,0 @@ -@escrow @critical -Feature: Escrow round lifecycle - The escrow round manager locks USDC in the Commerce Payments - AuthCaptureEscrow contract at the start of each round and - distributes earnings to verified workers at round end. - - Background: - Given the autoresearch chart is deployed on a k3s cluster - And an Anvil fork of Base Sepolia is running - And the platform wallet holds 1000 USDC - And the AuthCaptureEscrow contract is at "0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff" - And the reward pool percentage is 30% - - Rule: Funds must be locked before any work begins - - Scenario: Round starts with successful escrow authorization - Given 200 USDC of x402 payments were collected in the previous round - When a new round begins - Then the escrow round manager calls authorize() for 60 USDC - And the AuthCaptureEscrow capturableAmount equals 60 USDC - And the authorizationExpiry is set to round end plus 1 hour grace - And workers can verify the commitment on-chain - - Scenario: Round start fails when platform wallet has insufficient USDC - Given the platform wallet holds 0 USDC - When a new round begins - Then the escrow round manager logs an authorization failure - And no work is accepted for this round - And the previous round's uncaptured funds are not affected - - Rule: Workers are paid proportionally to verified influence - - # NOTE: The AuthCaptureEscrow receiver is FIXED per PaymentInfo. - # All captures from one authorize() go to the SAME receiver address. - # We use a RewardDistributor contract as the single receiver, - # which then splits USDC to individual workers via ERC20 transfers. - # - # Flow: authorize(receiver=RewardDistributor) → capture(full worker pool) - # → RewardDistributor.distribute(workers[], amounts[]) - - Scenario: Two verified workers receive proportional rewards - Given a round with 100 USDC authorized in escrow - And the escrow receiver is the RewardDistributor contract - And worker "0xAAA" has 60% influence - And worker "0xBBB" has 40% influence - And both workers passed commit-reveal verification - When the round completes - Then capture() is called once for 70 USDC to the RewardDistributor - And the RewardDistributor transfers 42 USDC to "0xAAA" - And the RewardDistributor transfers 28 USDC to "0xBBB" - And the platform fee receiver gets 2% of the capture - And void() is called for the remaining 30 USDC - And the remaining USDC returns to the platform wallet - - Scenario: Unverified worker receives no distribution - Given a round with 100 USDC authorized in escrow - And worker "0xAAA" passed verification with 100% influence - And worker "0xCCC" failed commit-reveal verification - When the round completes - Then capture() sends the worker pool to the RewardDistributor - And the RewardDistributor transfers funds only to "0xAAA" - And worker "0xCCC" receives nothing - And void() returns the uncaptured remainder to the platform wallet - - Scenario: Round with no verified workers voids entirely - Given a round with 100 USDC authorized in escrow - And no workers submitted valid proofs - When the round completes - Then no capture() is called - And void() returns the full 100 USDC to the platform wallet - - Rule: Funds are always recoverable - - Scenario: Platform reclaims funds after manager crash - Given a round with 100 USDC authorized in escrow - And the escrow round manager process has crashed - When the authorizationExpiry passes - Then the platform wallet calls reclaim() directly - And the full 100 USDC returns to the platform wallet - And no operator signature is required - - Scenario: Operator refunds a worker after post-capture fraud discovery - Given worker "0xAAA" received a 42 USDC capture in round 5 - And fraud is discovered within the refund window - When the operator calls refund() for 42 USDC - Then 42 USDC returns to the platform wallet - And the refund is recorded in the round history diff --git a/docs/issues/features/leaderboard_api.feature b/docs/issues/features/leaderboard_api.feature deleted file mode 100644 index e8802583..00000000 --- a/docs/issues/features/leaderboard_api.feature +++ /dev/null @@ -1,66 +0,0 @@ -@leaderboard @api -Feature: Leaderboard API - The reward engine exposes a REST API showing per-round - rankings, cumulative earnings, and worker performance - history. - - Background: - Given the reward engine is running - And 3 completed rounds exist in the history - - Rule: Current round leaderboard reflects live state - - Scenario: Leaderboard shows workers ranked by influence - Given round 4 is in progress - And worker "0xAAA" has influence 0.45 - And worker "0xBBB" has influence 0.35 - And worker "0xCCC" has influence 0.20 - When GET /leaderboard is called - Then the response contains 3 workers in descending influence order - And each entry includes worker address, influence, and estimated reward - - Scenario: Leaderboard includes innovator rankings - Given algorithm "muon-v3" has 60% adoption - And algorithm "adamw-base" has 40% adoption - When GET /leaderboard?role=innovator is called - Then the response shows innovators ranked by adoption percentage - - Rule: Historical round data is queryable - - Scenario: Completed round data includes settlement details - When GET /round/3 is called - Then the response includes: - | field | description | - | round_id | 3 | - | pool_amount | total USDC in the reward pool | - | worker_rewards | per-worker capture amounts | - | innovator_rewards | per-innovator adoption earnings | - | operator_reward | operator share | - | escrow_tx_hash | authorize() transaction hash | - | capture_tx_hashes | list of capture() transaction hashes| - | void_tx_hash | void() transaction hash | - | round_start | ISO 8601 timestamp | - | round_end | ISO 8601 timestamp | - - Scenario: Round history respects retention limit - Given the retention is set to 100 rounds - And 150 rounds have completed - When GET /round/10 is called - Then a 404 is returned - When GET /round/51 is called - Then the round data is returned - - Rule: Cumulative earnings are tracked per participant - - Scenario: Worker cumulative earnings span multiple rounds - Given worker "0xAAA" earned 42 USDC in round 1 - And worker "0xAAA" earned 35 USDC in round 2 - And worker "0xAAA" earned 50 USDC in round 3 - When GET /leaderboard?cumulative=true is called - Then worker "0xAAA" shows total earnings of 127 USDC - - Scenario: Leaderboard is empty before first round completes - Given no rounds have completed yet - When GET /leaderboard is called - Then the response contains an empty workers list - And the response includes round_in_progress=true diff --git a/docs/issues/features/multi_tier_worker_discovery_with_fallback.feature b/docs/issues/features/multi_tier_worker_discovery_with_fallback.feature deleted file mode 100644 index 0a64940a..00000000 --- a/docs/issues/features/multi_tier_worker_discovery_with_fallback.feature +++ /dev/null @@ -1,47 +0,0 @@ -@discovery -Feature: Multi-tier worker discovery with fallback - The coordinator discovers GPU workers through a prioritized - chain of discovery backends. If the preferred backend is - unavailable, it falls back to the next tier automatically. - - Background: - Given the OASF skill filter is "devops_mlops/model_versioning" - - Rule: Discovery uses the highest-priority available backend - - Scenario: Coordinator uses Reth indexer when available - Given the reth-erc8004-indexer is deployed in the cluster - And the indexer has synced past the latest registration - When the coordinator discovers workers - Then the query goes to the Reth indexer API - And workers with the model_versioning skill are returned - - Scenario: Coordinator falls back to BaseScan when indexer is down - Given the reth-erc8004-indexer is not deployed - And a BaseScan API key is configured - When the coordinator discovers workers - Then the query goes to the BaseScan API - And ERC-8004 NFT metadata is read for each agent - And workers with the model_versioning skill are returned - - Scenario: Coordinator falls back to 8004scan as last resort - Given the reth-erc8004-indexer is not deployed - And no BaseScan API key is configured - When the coordinator discovers workers - Then the query goes to 8004scan.io - And workers with the model_versioning skill are returned - - Scenario: All backends unavailable produces a clear error - Given no discovery backends are reachable - When the coordinator discovers workers - Then a "no discovery backend available" error is returned - And the round proceeds with zero workers - - Rule: Discovery results are cached to reduce API calls - - Scenario: Repeated queries within TTL use cached results - Given the cache TTL is 300 seconds - And a discovery query succeeded 60 seconds ago - When the coordinator discovers workers again - Then no external API call is made - And the cached results are returned diff --git a/docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature b/docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature deleted file mode 100644 index e4bc40b4..00000000 --- a/docs/issues/features/opow_influence_calculation_with_anti_monopoly_parity.feature +++ /dev/null @@ -1,92 +0,0 @@ -@opow @critical -Feature: OPOW influence calculation with anti-monopoly parity - The reward engine computes per-worker influence using a parity - formula that penalizes concentration on a single challenge. - Workers must diversify across all active challenges to maximize - their earnings. - - Background: - Given the imbalance multiplier is 3.0 - - Rule: Diversified workers earn more than concentrated workers - - Scenario: Equally diversified worker has zero penalty - Given 4 active challenges - And worker "0xAAA" has qualifier fractions: - | challenge | fraction | - | c001 | 0.25 | - | c002 | 0.25 | - | c003 | 0.25 | - | c004 | 0.25 | - When influence is calculated - Then worker "0xAAA" imbalance is 0.0 - And worker "0xAAA" penalty factor is 1.0 - - Scenario: Fully concentrated worker is severely penalized - Given 4 active challenges - And worker "0xBBB" has qualifier fractions: - | challenge | fraction | - | c001 | 1.00 | - | c002 | 0.00 | - | c003 | 0.00 | - | c004 | 0.00 | - When influence is calculated - Then worker "0xBBB" imbalance is 1.0 - And the imbalance-multiplier product is 3.0 - And worker "0xBBB" penalty factor is less than 0.05 - - Scenario: Concentrated worker earns less despite equal total output - Given 2 active challenges and a worker pool of 100 USDC - And worker "0xAAA" submitted 50 proofs to c001 and 50 to c002 - And worker "0xBBB" submitted 100 proofs to c001 and 0 to c002 - When influence is calculated and rewards are distributed - Then worker "0xAAA" earns more than worker "0xBBB" - And the ratio of earnings exceeds 5:1 - - Scenario Outline: Parity penalty scales with concentration - Given 2 active challenges - And a worker has qualifier fractions and - When influence is calculated - Then the penalty factor is approximately - - Examples: - | f1 | f2 | penalty | - | 0.50 | 0.50 | 1.00 | - | 0.70 | 0.30 | 0.62 | - | 0.90 | 0.10 | 0.15 | - | 1.00 | 0.00 | 0.05 | - - Rule: Influence values are normalized across all workers - - Scenario: Total influence sums to 1.0 - Given 3 workers with varying qualifier fractions - When influence is calculated for all workers - Then the sum of all influence values equals 1.0 - - Scenario: Single worker in a round gets full influence - Given 1 worker who participated in all active challenges - When influence is calculated - Then that worker's influence is 1.0 - And they receive the entire worker pool - - Rule: Single-challenge rounds disable the parity penalty - - Scenario: With only one active challenge all workers get zero imbalance - Given 1 active challenge - And worker "0xAAA" has qualifier fraction 0.8 in c001 - And worker "0xBBB" has qualifier fraction 0.2 in c001 - When influence is calculated - Then worker "0xAAA" imbalance is 0.0 - And worker "0xBBB" imbalance is 0.0 - And influence is proportional to qualifier count only - - Rule: New challenges phase in gradually - - Scenario: Newly added challenge does not immediately penalize existing workers - Given 2 active challenges c001 and c002 - And challenge c003 is added with a phase-in period of 100 blocks - And worker "0xAAA" has proofs in c001 and c002 but not c003 - When influence is calculated at block 10 of the phase-in - Then worker "0xAAA" receives a blended penalty - And the c003 weight is 10% of its final weight - And the penalty is less severe than after full phase-in diff --git a/docs/issues/features/reward_pool_distribution_across_roles.feature b/docs/issues/features/reward_pool_distribution_across_roles.feature deleted file mode 100644 index 3db33d85..00000000 --- a/docs/issues/features/reward_pool_distribution_across_roles.feature +++ /dev/null @@ -1,59 +0,0 @@ -@rewards -Feature: Reward pool distribution across roles - The reward engine splits the pool among innovators, workers, - and operators according to configured percentages. Worker - distribution is influence-weighted. Innovator distribution - is adoption-weighted. - - Background: - Given the pool split is 20% innovators, 70% workers, 10% operators - And a round with 100 USDC in the reward pool - - Rule: Pool splits match configured percentages - - Scenario: Standard round distributes to all three roles - When the round completes with verified workers - Then 20 USDC is allocated to innovators - And 70 USDC is allocated to workers - And 10 USDC is allocated to operators - - Rule: Workers earn by influence - - Scenario: Workers are paid proportionally to influence - Given the worker pool is 70 USDC - And worker "0xAAA" has influence 0.6 - And worker "0xBBB" has influence 0.4 - When worker rewards are distributed - Then worker "0xAAA" earns 42 USDC - And worker "0xBBB" earns 28 USDC - - Rule: Innovators earn by adoption - - Scenario: Algorithm author earns when workers adopt their code - Given the innovator pool is 20 USDC for the neuralnet_optimizer challenge - And algorithm "fast-muon-v3" by innovator "0xINN1" has 75% adoption - And algorithm "baseline-adamw" by innovator "0xINN2" has 25% adoption - When innovator rewards are distributed - Then innovator "0xINN1" earns 15 USDC - And innovator "0xINN2" earns 5 USDC - - Scenario: Unadopted algorithm earns nothing - Given the innovator pool is 20 USDC - And algorithm "untested-v1" has 0% adoption - When innovator rewards are distributed - Then the author of "untested-v1" earns 0 USDC - And the unadopted share rolls into the next round - - Rule: Gamma scaling adjusts for challenge count - - Scenario Outline: Reward scales with number of active challenges - Given gamma parameters a=1.0, b=0.5, c=0.3 - And challenges are active - When the gamma value is calculated - Then the scaling factor is approximately - - Examples: - | n | gamma | - | 1 | 0.63 | - | 3 | 0.80 | - | 7 | 0.94 | diff --git a/docs/issues/features/round_state_continuity.feature b/docs/issues/features/round_state_continuity.feature deleted file mode 100644 index ce9d1adc..00000000 --- a/docs/issues/features/round_state_continuity.feature +++ /dev/null @@ -1,68 +0,0 @@ -@rounds @state -Feature: Round-over-round state continuity - The reward pool, uncaptured funds, and participant state - carry over correctly between rounds. No funds are lost - or double-counted during transitions. - - Background: - Given the autoresearch chart is deployed - And the reward pool percentage is 30% - - Rule: Uncaptured funds roll into the next round - - Scenario: Voided funds increase the next round's pool - Given round 1 had 100 USDC in the pool - And 70 USDC was captured to workers - And void() returned 30 USDC to the platform wallet - And 50 USDC of new x402 payments arrived during round 1 - When round 2 begins - Then the pool for round 2 is 15 USDC from new payments plus the 30 USDC rollover - And authorize() locks 45 USDC in escrow - - Scenario: Unadopted innovator share rolls into next round - Given round 1 had 20 USDC in the innovator pool - And algorithm "untested-v1" had 0% adoption - And 5 USDC of the innovator pool was unadopted - When round 2 begins - Then the unadopted 5 USDC is added to round 2's innovator pool - - Rule: Round transitions are atomic - - Scenario: No gap between rounds allows work to go unrecorded - Given round 1 is ending - And worker "0xAAA" submits a proof at the round boundary - When the round transitions - Then the proof is attributed to round 1 if submitted before the cutoff - Or attributed to round 2 if submitted after the cutoff - And the proof is never lost or double-counted - - Scenario: Authorize for new round happens after void of previous round - Given round 1 is completing - When captures and void are executed for round 1 - Then authorize() for round 2 is called only after void() confirms - And there is no period where two rounds have active escrow authorizations - - Rule: Worker state resets each round - - Scenario: Worker's influence is recalculated fresh each round - Given worker "0xAAA" had 80% influence in round 1 - And worker "0xBBB" joins in round 2 with equal qualifier count - When round 2 influence is calculated - Then round 1 influence values have no effect - And both workers compete on round 2 qualifiers only - - Scenario: Worker who was excluded in round N can rejoin in round N+1 - Given worker "0xCCC" failed verification in round 3 - And worker "0xCCC" received no capture in round 3 - When round 4 begins - Then worker "0xCCC" is eligible to submit benchmarks - And their round 3 failure does not affect round 4 influence - - Rule: Platform wallet balance is tracked across rounds - - Scenario: Cumulative earnings are auditable from on-chain events - Given 5 rounds have completed - When the audit script reads all authorize/capture/void/reclaim events - Then the sum of all captures equals total worker + innovator + operator payouts - And the sum of all voids equals total uncaptured rollover - And the platform wallet balance matches expected remainder diff --git a/docs/issues/issue-autoresearch-helm-chart.md b/docs/issues/issue-autoresearch-helm-chart.md deleted file mode 100644 index 11b3f792..00000000 --- a/docs/issues/issue-autoresearch-helm-chart.md +++ /dev/null @@ -1,1315 +0,0 @@ -# Autoresearch infrastructure Helm chart with verified reward distribution - -## Summary - -Extract the autoresearch components from PR #288 into a standalone Helm chart that adds a round-based reward engine, commit-reveal work verification, and escrow-based reward settlement using the x402 Commerce Payments Protocol. This chart depends on the reth-erc8004-indexer (or its BaseScan/8004scan fallback) for worker discovery and on the base stack for x402 payment settlement. - -## Motivation - -PR #288 introduced the autoresearch coordinator, worker, and publish skills as embedded agent skills. This was the right starting point for validating the flow, but the economic layer — how workers get paid fairly, how results are verified, and how bad actors are penalized — is missing. - -Today's gaps: - -1. **Workers self-report val_bpb** — no independent verification of claimed results -2. **Direct 1:1 payment** — buyer pays seller, no reward pool or merit-based distribution -3. **No skin-in-the-game** — workers can submit garbage with no penalty -4. **Naive worker selection** — coordinator picks first available, not best performer -5. **No anti-monopoly** — a single well-resourced worker can capture all experiments -6. **Local provenance only** — results stored on disk, no on-chain attestation - -The autoresearch Helm chart addresses all six by adding infrastructure-level components that the skills can rely on. - -## Architecture - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ obol-stack cluster │ -│ │ -│ ┌─────────────────┐ ┌───────────────────────────────────────┐ │ -│ │ base chart │ │ autoresearch chart │ │ -│ │ │ │ │ │ -│ │ traefik │ │ ┌─────────────┐ ┌───────────────┐ │ │ -│ │ x402-verifier │ │ │ reward │ │ challenge │ │ │ -│ │ ollama │ │ │ engine │ │ registry │ │ │ -│ │ litellm │ │ │ (per-round │ │ (configmap) │ │ │ -│ │ │ │ │ OPOW calc) │ │ │ │ │ -│ └────────┬────────┘ │ └──────┬──────┘ └───────────────┘ │ │ -│ │ │ │ │ │ -│ │ x402 │ ┌──────▼──────┐ ┌───────────────┐ │ │ -│ │ payments │ │ verifier │ │ escrow round │ │ │ -│ │ │ │ (commit- │ │ manager │ │ │ -│ │ │ │ reveal │ │ (authorize/ │ │ │ -│ │ │ │ proofs) │ │ capture/void)│ │ │ -│ │ │ └─────────────┘ └───────────────┘ │ │ -│ │ │ │ │ -│ │ └───────────────────────────────────────┘ │ -│ │ │ │ -│ ┌────────▼────────────┐ ┌───────▼───────────┐ │ -│ │ discovery │ │ GPU workers │ │ -│ │ (reth-indexer / │ │ (ServiceOffers │ │ -│ │ BaseScan / │ │ with x402 gate) │ │ -│ │ 8004scan) │ │ │ │ -│ └─────────────────────┘ └────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -### Component Responsibilities - -**Reward Engine** — runs per-round reward calculation: -- Reads x402 payment logs from the round period -- Reads worker qualifier data from the verifier -- Computes influence per worker using OPOW-style parity formula -- Instructs the escrow round manager on per-worker capture amounts -- Exposes leaderboard API (GET /leaderboard, GET /round/:id) - -**Challenge Registry** — ConfigMap defining active challenges: -- Challenge parameters (quality metric, difficulty, tracks) -- Instance generation seeds (deterministic from block hash + nonce) -- Quality thresholds and verification parameters -- Lifecycle: challenges added/retired via values.yaml updates - -**Verifier** — commit-reveal work verification: -- Workers submit Merkle root of results before knowing which will be sampled -- Verifier randomly samples N nonces for re-execution -- Workers submit Merkle proofs for sampled nonces -- Proofs verified against committed root — unverified workers receive no capture - -**Escrow Round Manager** — manages per-round USDC settlement via x402 Commerce Payments: -- At round start: calls authorize() to lock the round's reward pool in escrow -- At round end: calls capture() per worker for their earned amount -- Uncaptured funds: void() returns them to the pool for the next round -- Safety net: reclaim() recovers funds if the manager fails -- See "Escrow-Based Reward Settlement" section below for full design - -## Escrow-Based Reward Settlement - -### Why Not a Custom Escrow Smart Contract - -A bespoke `USDCEscrow.sol` for worker bond/slash would require: -- Writing, auditing, and deploying a novel Solidity contract -- Taking on security liability for custom code handling real funds -- Requiring workers to lock upfront capital (barrier to entry) -- Managing adversarial slash mechanics (workers may refuse to participate) -- Paying gas costs for deposit/withdraw/slash per worker per round - -This is the highest-risk, highest-cost path. Instead, we use infrastructure that already exists and is already audited. - -### The x402 Commerce Payments Protocol - -Base (Coinbase) maintains the Commerce Payments Protocol — a set of audited smart contracts for escrow-based payments. These contracts are: - -- **Already deployed** on Base Mainnet and Base Sepolia at deterministic addresses -- **Audited 5 times**: 3x by Coinbase Protocol Security, 2x by Spearbit -- **Battle-tested**: used by Coinbase Commerce for merchant payments -- **Zero deployment cost**: we call existing contracts, not deploy new ones - -#### Deployed Contract Addresses (same on mainnet + sepolia) - -| Contract | Address | Purpose | -|----------|---------|---------| -| AuthCaptureEscrow | `0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff` | Core escrow: authorize/capture/void/reclaim/refund | -| ERC3009PaymentCollector | `0x0E3dF9510de65469C4518D7843919c0b8C7A7757` | Collects USDC via ERC-3009 receiveWithAuthorization | -| Permit2PaymentCollector | `0x992476B9Ee81d52a5BdA0622C333938D0Af0aB26` | Collects USDC via Permit2 signatures | -| PreApprovalPaymentCollector | `0x1b77ABd71FCD21fbe2398AE821Aa27D1E6B94bC6` | Pre-approved payment collection | -| OperatorRefundCollector | `0x934907bffd0901b6A21e398B9C53A4A38F02fa5d` | Handles refund flows | - -References: -- Contracts repo: https://github.com/base/commerce-payments -- x402 escrow scheme spec (WIP): https://github.com/coinbase/x402/pull/1425 -- x402 escrow scheme issue: https://github.com/coinbase/x402/issues/839 -- Reference TypeScript implementation: https://github.com/BackTrackCo/x402r-scheme (npm: @x402r/evm) - -#### AuthCaptureEscrow Lifecycle Functions - -```solidity -// Core data structure — all fields are client-signed, tamper-proof -struct PaymentInfo { - address operator; // who can capture/void/refund - address payer; // reward pool wallet - address receiver; // worker address (set per-capture) - address token; // USDC contract address - uint120 maxAmount; // total pool for this round - uint48 preApprovalExpiry; // deadline to submit authorization - uint48 authorizationExpiry; // deadline to capture; after this payer reclaims - uint48 refundExpiry; // deadline for post-capture refunds - uint16 minFeeBps; // fee floor (basis points) - uint16 maxFeeBps; // fee ceiling (basis points) - address feeReceiver; // platform fee recipient - uint256 salt; // unique per-round nonce -} - -// Round start: lock USDC in escrow -function authorize( - PaymentInfo calldata paymentInfo, - uint120 amount, // amount to lock (up to maxAmount) - address tokenCollector, // ERC3009PaymentCollector address - bytes calldata collectorData // ERC-3009 receiveWithAuthorization signature -) external nonReentrant; - -// Round end: pay the RewardDistributor the total earned amount -function capture( - PaymentInfo calldata paymentInfo, - uint120 amount, // total pool to distribute - uint16 feeBps, // platform fee (within signed bounds) - address feeReceiver // platform fee recipient -) external nonReentrant; -// IMPORTANT: receiver is FIXED in PaymentInfo. All captures from one -// authorize() go to the SAME receiver. For multi-worker distribution, -// set receiver = RewardDistributor contract, then call distribute(). -// Can be called MULTIPLE TIMES (partial captures), sum <= maxAmount. -// Must be called BEFORE authorizationExpiry. - -// Round end: return uncaptured funds to pool -function void( - PaymentInfo calldata paymentInfo -) external nonReentrant; -// Returns ALL remaining escrowed funds (capturableAmount) to payer. -// Can be called AFTER partial captures — only returns what remains. -// Callable by operator at any time (no expiry gate on void itself). - -// Safety net: payer self-recovers if operator disappears -function reclaim( - PaymentInfo calldata paymentInfo -) external nonReentrant; -// Only callable AFTER authorizationExpiry has passed -// No operator needed — payer calls directly - -// Post-capture correction: return captured funds -function refund( - PaymentInfo calldata paymentInfo, - uint120 amount, - address refundCollector, - bytes calldata collectorData -) external nonReentrant; -// Only within refundExpiry window -// amount <= refundableAmount (previously captured) -``` - -Expiry ordering enforced by contract: `preApprovalExpiry <= authorizationExpiry <= refundExpiry` - -### Verified: receiver == payer Is Valid - -Option A sets `receiver = payer = platform_wallet` in PaymentInfo. This is -intentional and verified against the contract source: - -1. **AuthCaptureEscrow** (`_validatePayment()` at line 480) checks: - amount <= maxAmount, expiry ordering, fee bounds. It **never** checks - `payer != receiver`. No constraint exists. - -2. **ERC-3009** (`receiveWithAuthorization`) requires `to == msg.sender` - ("caller must be the payee" — Circle's FiatTokenV2/EIP3009.sol). But - `to` is set to the `ERC3009PaymentCollector` contract address, not to - `paymentInfo.receiver`. The collector is an intermediary: - - ``` - authorize: payer → collector (to==msg.sender ✓) → TokenStore (locked) - capture: TokenStore → receiver (== payer, standard ERC20 transfer) - distribute: payer wallet → worker1, worker2, ... (standard ERC20 transfers) - ``` - -3. **Capture to self** is just a TokenStore → platform_wallet ERC20 transfer. - No special-case logic, no revert condition. - -Source: `base/commerce-payments/src/AuthCaptureEscrow.sol` lines 236-295, -`src/collectors/ERC3009PaymentCollector.sol` lines 42-49, -`circlefin/stablecoin-evm/contracts/v2/EIP3009.sol` (receiveWithAuthorization). - -### Inverted Trust Model — Why It's Better - -The Commerce Payments escrow protects the **payer** (the reward pool), not the service provider (the worker). This inversion is actually the superior design for our use case: - -``` -TRADITIONAL BOND MODEL (what we rejected): - Worker posts capital → Platform slashes on fraud → Worker loses money - Problems: - - Workers need upfront capital (barrier to entry) - - Slashing is adversarial (discourages participation) - - Custom contract (security liability) - - Gas per deposit/withdraw/slash - -INVERTED ESCROW MODEL (what we use): - Platform locks reward pool → Workers earn by doing verified work → Platform captures proportionally - Advantages: - - Workers need ZERO upfront capital - - "Penalty" for bad work = not getting paid (natural, non-adversarial) - - Uses 5x-audited contracts (zero security liability) - - Gas only for authorize (once/round) + capture (once/worker) - - reclaim() = platform safety net if manager crashes - - refund() = can recover funds if fraud discovered post-capture -``` - -The economic effect is equivalent: -- In the bond model: bad worker loses $X deposit -- In the escrow model: bad worker earns $0 while honest workers split their share -- Both create strong incentives for honest work, but the escrow model does it without requiring workers to risk capital - -### Per-Round Escrow Flow - -``` -ROUND START: - │ - │ 1. Reward engine calculates pool for this round - │ pool = sum(x402_payments_last_round) * pool_percentage - │ - │ 2. Platform wallet signs ERC-3009 receiveWithAuthorization - │ to: AuthCaptureEscrow (0xBdEA...0cff) - │ amount: pool_amount (e.g., 100 USDC) - │ validAfter: now - │ validBefore: now + round_duration + grace_period - │ - │ 3. Escrow round manager calls authorize() - │ maxAmount: pool_amount - │ authorizationExpiry: round_end + 1 hour (grace for computation) - │ refundExpiry: round_end + 24 hours (fraud discovery window) - │ operator: escrow_round_manager_address - │ - │ Funds are now LOCKED in TokenStore. Platform cannot spend them - │ on anything else. Workers can see the commitment on-chain. - │ - ▼ -DURING ROUND: - │ - │ Workers submit experiments via existing coordinator loop: - │ THINK → CLAIM → RUN → VERIFY (commit-reveal proofs) - │ - │ Verifier records qualifiers per worker per challenge. - │ No money moves during the round. - │ - ▼ -ROUND END: - │ - │ 4. Reward engine computes per-worker earnings: - │ influence[i] = OPOW_formula(qualifiers, parity_penalty) - │ reward[i] = worker_pool * influence[i] - │ - │ 5. Settlement via RewardDistributor: - │ NOTE: AuthCaptureEscrow receiver is FIXED per PaymentInfo. - │ All captures go to the SAME address. We set receiver = - │ RewardDistributor contract, then distribute to workers. - │ - │ a) capture(paymentInfo, worker_total + innovator_total + operator_total, - │ feeBps, feeReceiver) - │ → USDC moves from escrow to RewardDistributor - │ → Platform fee (e.g., 2%) to feeReceiver - │ b) RewardDistributor.distribute( - │ workers[], worker_amounts[], - │ innovators[], innovator_amounts[], - │ operator, operator_amount) - │ → ERC20 transfers from distributor to each participant - │ - │ 6. Return uncaptured funds: - │ void(paymentInfo) - │ → Remaining USDC (pool - captured) returns to platform - │ → void() has no expiry gate — callable any time by operator - │ → Rolls into next round's pool - │ - │ 7. If manager crashes or bugs out: - │ After authorizationExpiry, platform calls reclaim() - │ → ALL remaining escrowed funds return safely - │ → Reclaim does NOT affect already-captured amounts - │ → Already-captured funds are in RewardDistributor - │ (operator can call emergency withdraw if needed) - │ - ▼ -NEXT ROUND starts. Repeat from step 1. -``` - -### On-Chain Verification of Commitment - -Workers can independently verify that the reward pool is committed before doing work: - -``` -On-chain check (any worker, any block explorer): - 1. Read PaymentState for the current round's paymentInfo hash - 2. hasCollectedPayment == true → funds are locked - 3. capturableAmount == expected pool size → correct amount - 4. authorizationExpiry > current block → still active - -This makes the reward commitment credible and transparent without -any trust in the platform beyond the audited contract logic. -``` - -## x402-rs Implications - -### Current State - -x402-rs (v1.4.5) ships the `exact` and `upto` schemes but has **zero escrow support** today. No branch, no issue, no WIP code tracking the upstream escrow scheme. - -However, x402-rs has an excellent scheme extension system designed for exactly this kind of addition. - -### Scheme Extension Architecture - -x402-rs uses a trait-based plugin system for payment schemes: - -``` -Three core traits (from x402-rs/crates/core/): - - X402SchemeId - → identifies scheme: namespace ("eip155") + scheme name ("escrow") - - X402SchemeFacilitatorBuilder

- → factory that creates scheme handlers from JSON config - - X402SchemeFacilitator - → verify(payload, requirements) → VerifyResult - → settle(payload, requirements) → SettleResult - → supported(requirements) → bool - -Registration: - SchemeBlueprints registry → SchemeRegistry at runtime - New schemes register via: blueprints.and_register(V2Eip155Escrow) - -Config-driven activation: - {"id": "v2-eip155-escrow", "chains": "eip155:*", "config": {...}} -``` - -Reference: `x402-rs/docs/how-to-write-a-scheme.md` provides a step-by-step guide. - -### What x402-rs Needs for Escrow - -Once the upstream x402 escrow scheme spec (PR #1425) stabilizes, x402-rs needs: - -``` -New directory: - crates/chains/x402-chain-eip155/src/v2_eip155_escrow/ - ├── mod.rs # scheme registration - ├── types.rs # PaymentInfo, PaymentState, escrow extras - ├── facilitator.rs # verify + settle (authorize/capture/void) - ├── client.rs # sign ERC-3009 for escrow - └── server.rs # generate PaymentRequirements with escrow fields - -Key implementation points: - - verify(): - - Validate ERC-3009 signature for the authorized amount - - Simulate authorize() call against AuthCaptureEscrow - - Check operator, expiries, fee bounds match requirements - - Verify payer has sufficient USDC balance + allowance - - settle(): - - Determine settlement method from requirements.extra: - "authorize" → call operator.authorize() - "capture" → call operator.capture() - "void" → call operator.void() - - Submit transaction, wait for receipt (60s timeout) - - Return tx hash + network + payer address - - supported(): - - Check chain ID matches configured chains - - Check scheme name == "escrow" (or "commerce" if renamed) - -Registration (in facilitator/src/schemes.rs): - blueprints.and_register(V2Eip155Escrow) - // ~15 lines of boilerplate, same pattern as exact/upto -``` - -### Stateful vs Stateless Facilitator - -Current x402-rs facilitator is **stateless** — each verify/settle is independent. The escrow scheme introduces a **session concept** (authorize → use → capture/void) that spans multiple HTTP requests. - -Two approaches discussed in upstream x402 issue #839: - -``` -"Dumb facilitator" (recommended, aligns with x402-rs): - - Facilitator remains stateless - - Session tracking happens in the escrow round manager (our Helm chart) - - Facilitator only handles individual authorize/capture/void calls - - Each call is self-contained (paymentInfo hash identifies the session) - - No facilitator-side state storage needed - -"Smart facilitator" (rejected): - - Facilitator tracks session lifecycle internally - - Requires persistent state, adds complexity - - Goes against x402 principle of minimal facilitator trust -``` - -The "dumb facilitator" approach means x402-rs needs **no architectural changes** to its core — the escrow scheme handler is just another verify/settle implementation, same as exact. The session lifecycle lives in our escrow round manager, not in the facilitator. - -### Contribution Path - -This represents a concrete contribution opportunity for the obol-stack team back to x402-rs: - -``` -Phase 1: Use reference impl directly - - The BackTrackCo/x402r-scheme npm package implements the escrow - scheme for TypeScript x402 clients - - Our escrow round manager can call Commerce Payments contracts - directly via ethers/viem without going through x402-rs facilitator - - This works TODAY, no upstream dependency - -Phase 2: Port to x402-rs (contribute upstream) - - Once PR #1425 merges and the spec stabilizes - - Implement V2Eip155Escrow scheme in x402-rs - - Estimated effort: 2-3 days for a Rust developer - - The "upto" scheme implementation (variable settlement amounts) - is the closest analog and provides the template - - File path: crates/chains/x402-chain-eip155/src/v2_eip155_escrow/ - - Submit as PR to x402-rs/x402-rs - -Phase 3: Native x402 flow - - Once x402-rs ships escrow scheme support - - Escrow round manager uses x402 HTTP flow natively: - 402 response with scheme="escrow" → client signs → facilitator settles - - The entire authorize/capture/void flow goes through standard - x402 payment headers, same as current per-request payments - - Workers see escrow commitments as standard x402 PaymentRequirements -``` - -### Dependency Timeline - -``` -TODAY: PR #1425 is open, spec under review - Commerce Payments contracts are deployed and audited - Reference impl exists (BackTrackCo/x402r-scheme) - → We can build Phase 1 NOW - -WEEKS: PR #1425 merges (spec only, no SDK code) - → Spec is stable, safe to build Phase 2 - -MONTHS: x402-rs adds escrow scheme - → Phase 3, native flow - -Our Helm chart should work in ALL three phases: - values.yaml: - escrow: - mode: direct # Phase 1: call contracts directly - # mode: x402-rs # Phase 3: use x402 facilitator -``` - -## Reward Distribution — Detailed Design - -### Round Lifecycle - -``` -Round N starts - │ - ├─ Escrow round manager calls authorize() - │ └─ USDC locked in Commerce Payments escrow - │ - ├─ Workers submit experiments (x402 paid per-request as today) - │ └─ Each submission: precommit → benchmark → proof - │ - ├─ Verifier checks proofs, records qualifiers - │ - ├─ Round N ends (configurable duration, default: 1 hour) - │ - ├─ Reward engine runs: - │ │ - │ ├─ 1. Collect x402 payment totals from round - │ │ (total_pool = sum of payments * pool_percentage) - │ │ - │ ├─ 2. For each worker, compute challenge factors: - │ │ factor[c] = worker_qualifiers[c] / total_qualifiers[c] - │ │ - │ ├─ 3. Compute parity (anti-monopoly): - │ │ weighted_avg = mean(factors, weights) - │ │ variance = weighted_var(factors, weights) - │ │ imbalance = variance / (weighted_avg * (1 - weighted_avg)) - │ │ penalty = exp(-imbalance_multiplier * imbalance) - │ │ - │ ├─ 4. Compute influence: - │ │ weight[i] = weighted_avg[i] * penalty[i] - │ │ influence[i] = weight[i] / sum(weights) - │ │ - │ ├─ 5. Split pool: - │ │ innovator_pool = total_pool * innovator_share (e.g., 20%) - │ │ worker_pool = total_pool * worker_share (e.g., 70%) - │ │ operator_pool = total_pool * operator_share (e.g., 10%) - │ │ - │ ├─ 6. Settle via RewardDistributor: - │ │ capture(paymentInfo, total_distributable, feeBps, feeReceiver) - │ │ → single capture to RewardDistributor (receiver is fixed per PaymentInfo) - │ │ - │ └─ 7. Distribute from RewardDistributor: - │ RewardDistributor.distribute(workers[], amounts[], innovators[], ...) - │ → ERC20 transfers to each worker (influence-weighted) - │ → ERC20 transfers to each innovator (adoption-weighted, - │ only for algorithms above adoption_threshold or merged) - │ → Unadopted innovator share held in distributor for next round - │ - ├─ Escrow round manager calls void() - │ └─ Remaining USDC returns to pool wallet - │ - └─ Round N+1 starts -``` - -### Anti-Monopoly Formula - -> **Design note:** This formula is inspired by OPOW (Optimizable Proof of Work) -> research but is a **deliberate simplification** for our use case. It omits -> several mechanisms present in the full OPOW specification — specifically: -> challenge-factor weighting, capped self/delegated deposit factors, legacy -> track multipliers, and phase-in blending for new challenges. These are -> omitted because obol-stack uses USDC (not a staking token with weighted -> deposits) and starts with a small challenge set where these refinements -> add complexity without proportionate benefit. As the challenge set grows -> beyond 4+ challenges, these mechanisms should be revisited. - -Workers MUST participate across ALL active challenges to earn maximum rewards. Concentrating on a single challenge triggers an exponential penalty: - -``` -Given: - N challenges, worker i has qualifier fraction f[c] in each challenge c - w[c] = weight per challenge = 1/N (equal) - -Compute: - avg_i = Σ(w[c] * f_i[c]) - var_i = Σ(w[c] * (f_i[c] - avg_i)²) - imb_i = var_i / (avg_i * (1 - avg_i)) max value = 1.0 - pen_i = e^(-k * imb_i) where k = configurable multiplier - - influence_i = normalize(avg_i * pen_i) - -Effect (with k = 3.0): - Worker spreading effort across 4 challenges equally: - f = [0.25, 0.25, 0.25, 0.25] → imbalance = 0 → k*imb = 0 → penalty = 1.0 - - Worker concentrating on 1 challenge: - f = [1.0, 0.0, 0.0, 0.0] → imbalance = 1.0 → k*imb = 3.0 → penalty ≈ 0.05 - - The concentrated worker earns ~5% of what the diversified worker earns - despite producing the same total output. - -Verified penalty values for 2 challenges (k = 3.0): - f = [0.50, 0.50] → imb = 0.00 → penalty = 1.00 - f = [0.70, 0.30] → imb = 0.16 → penalty = 0.62 - f = [0.90, 0.10] → imb = 0.64 → penalty = 0.15 - f = [1.00, 0.00] → imb = 1.00 → penalty = 0.05 - -Note: with a single active challenge, imbalance is forced to 0.0 -(avg approaches 1.0, denominator approaches 0, clamped by epsilon). -All workers get penalty = 1.0 regardless. The parity mechanism -activates only with ≥2 challenges. -``` - -### Commit-Reveal Verification - -> **Design note:** This verification flow is inspired by OPOW proof-of-work -> verification but uses stratified sampling (above/below median quality -> regions) rather than flat random sampling. Flat sampling is easier to game -> by hiding low-quality work in unsampled bundles. Stratified sampling ensures -> both high-quality and low-quality results are checked. - -``` -Step 1: PRECOMMIT - Worker → Verifier: {challenge_id, settings, num_nonces} - Verifier → Worker: {benchmark_id, rand_hash, track_id} - Worker pays: base_fee + per_nonce_fee * num_nonces (via x402 exact scheme) - -Step 2: BENCHMARK - Worker runs all nonces, builds Merkle tree over results - Worker → Verifier: {merkle_root, solution_quality[]} - Verifier computes median quality across all nonces - Verifier performs STRATIFIED sampling: - - samples_above_median nonces from the above-median quality set - - samples_below_median nonces from the below-median quality set - This ensures both high- and low-quality results are verified, - preventing workers from hiding bad results in one quality tier. - Verifier → Worker: {sampled_nonces: [above1..., below1...]} - -Step 3: PROOF - Worker → Verifier: {merkle_proofs for sampled nonces} - Each proof: {nonce, solution, runtime_hash, quality} - Verifier checks: - - proof.nonce ∈ sampled_nonces - - hash(proof) produces leaf in Merkle tree - - Merkle branch validates against committed root - - solution quality matches claimed quality (re-execute) - - If any proof fails → worker is NOT a qualifier → no capture for them - If worker times out on proofs → same result, no capture - - Note: no slashing occurs. The penalty is simply not earning. - This is enforced by the escrow — uncaptured funds void() back to pool. -``` - -## Helm Chart Structure - -``` -charts/autoresearch/ -├── Chart.yaml -├── values.yaml -├── templates/ -│ ├── _helpers.tpl -│ ├── reward-engine-deployment.yaml -│ ├── reward-engine-service.yaml -│ ├── verifier-deployment.yaml -│ ├── verifier-service.yaml - ├── escrow-round-manager-deployment.yaml - ├── reward-distributor-configmap.yaml # RewardDistributor contract address - ├── challenge-registry-configmap.yaml - ├── commerce-payments-configmap.yaml # AuthCaptureEscrow + collector addresses -│ ├── servicemonitor.yaml # Prometheus metrics -│ └── tests/ -│ ├── test-reward-engine.yaml -│ ├── test-verifier.yaml -│ └── test-escrow-round-manager.yaml -└── scripts/ - ├── rewards.py # OPOW influence + pool distribution - ├── opow.py # Parity formula, imbalance calculation - ├── verifier.py # Commit-reveal, Merkle proof checking - └── escrow_round_manager.py # authorize/capture/void lifecycle -``` - -### values.yaml - -```yaml -rounds: - duration: 3600 # 1 hour per round - overlap: 0 # no overlapping rounds - -rewards: - poolPercentage: 0.30 # 30% of x402 payments go to reward pool - distribution: - innovators: 0.20 # 20% to algorithm authors - workers: 0.70 # 70% to GPU workers (OPOW) - operators: 0.10 # 10% to infrastructure operators - gamma: # reward scaling by active challenges - a: 1.0 - b: 0.5 - c: 0.3 - imbalanceMultiplier: 3.0 # parity penalty strength - -verification: - samplesAboveMedian: 3 # nonces sampled from above-median quality - samplesBelowMedian: 2 # nonces sampled from below-median quality - minActiveQuality: 3.5 # minimum val_bpb to qualify - proofTimeout: 300 # seconds to submit proofs - adoptionThreshold: 0.01 # minimum adoption fraction to earn innovator rewards - -escrow: - mode: direct # direct | x402-rs (Phase 1 vs Phase 3) - chain: base-sepolia # chain for Commerce Payments contracts - # AuthCaptureEscrow — same address on mainnet + sepolia - authCaptureEscrow: "0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff" - # ERC3009PaymentCollector — same address on mainnet + sepolia - erc3009Collector: "0x0E3dF9510de65469C4518D7843919c0b8C7A7757" - # OperatorRefundCollector — same address on mainnet + sepolia - refundCollector: "0x934907bffd0901b6A21e398B9C53A4A38F02fa5d" - # RewardDistributor — deployed per-cluster, receives all captures - # then distributes to individual workers/innovators via ERC20 transfers. - # Required because AuthCaptureEscrow receiver is fixed per PaymentInfo. - rewardDistributor: "" # set after deploying the distributor contract - # USDC contract addresses - usdcAddress: - base: "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913" - base-sepolia: "0x036CbD53842c5426634e7929541eC2318f3dCF7e" - # Escrow timing - authorizationGrace: 3600 # 1 hour after round end to capture - refundWindow: 86400 # 24 hours post-capture for refunds - # Platform fee - feeBps: 200 # 2% platform fee on captures - # Operator wallet (calls capture/void) - operatorWallet: "" # set by operator, signs capture txs - -challenges: - - name: neuralnet_optimizer - qualityMetric: val_bpb - qualityDirection: lower_is_better - minActiveQuality: 3.5 - baseFee: "0.001" # USDC per benchmark precommit - perNonceFee: "0.0001" # USDC per nonce - oasfSkill: "devops_mlops/model_versioning" - tracks: - default: - noncesPerBundle: 10 - maxQualifiersPerTrack: 100 - -discovery: - preferredBackend: auto # auto | reth | basescan | 8004scan - rethIndexerUrl: "" # auto-detected from cluster if empty - basescanApiKey: "" - cacheTtlSeconds: 300 - -leaderboard: - enabled: true - port: 8080 - retentionRounds: 100 # keep last 100 rounds of history - -image: - repository: ghcr.io/obolnetwork/autoresearch - tag: latest -``` - -## Dependency Chain - -```yaml -# Chart.yaml -apiVersion: v2 -name: autoresearch -version: 0.1.0 -dependencies: - - name: reth-erc8004-indexer - version: ">=0.1.0" - repository: "file://../reth-erc8004-indexer" - condition: discovery.preferredBackend == "reth" # optional dependency -``` - -The autoresearch chart MUST work without the reth-indexer installed (falls back to BaseScan/8004scan). The indexer is a "nice to have" for operators who want real-time, self-hosted discovery. - -## Relationship to Existing Skills - -The embedded skills in PR #288 remain as the **agent-facing interface**. The Helm chart provides the **infrastructure** those skills rely on: - -``` -SKILL USES FROM CHART -───────────────────────────────────────────────────── -autoresearch-coordinator reward-engine API (leaderboard) - coordinate.py verifier API (commit-reveal) - discovery client (worker lookup) - -autoresearch-worker escrow round manager (round status) - worker_api.py verifier API (proof submission) - -autoresearch (publish) reward-engine API (earnings query) - publish.py (mostly unchanged) -``` - -The coordinator's loop becomes: - -``` -THINK → pick hypothesis -CLAIM → discover worker via FallbackClient -CHECK → verify round escrow is authorized (on-chain) ← NEW -RUN → submit experiment, pay via x402 exact scheme -VERIFY → commit-reveal proof cycle ← NEW -SCORE → verifier records qualifiers ← NEW -REWARD → round-end OPOW distribution via capture() ← NEW -``` - -## Test Plan - -### Unit tests - -- [ ] `opow.py`: imbalance calculation with known inputs -- [ ] `opow.py`: parity penalty with balanced vs concentrated workers -- [ ] `opow.py`: influence normalization sums to 1.0 -- [ ] `rewards.py`: pool split matches configured percentages -- [ ] `rewards.py`: adoption-weighted innovator distribution -- [ ] `rewards.py`: gamma scaling with 1, 3, 7 active challenges -- [ ] `verifier.py`: Merkle tree construction and proof verification -- [ ] `verifier.py`: random nonce sampling is deterministic from seed -- [ ] `verifier.py`: unverified workers excluded from qualifiers -- [ ] `escrow_round_manager.py`: authorize() call construction -- [ ] `escrow_round_manager.py`: capture() per-worker amount calculation -- [ ] `escrow_round_manager.py`: void() after all captures -- [ ] `escrow_round_manager.py`: reclaim() on manager timeout/crash - -### Integration tests - -- [ ] Full round lifecycle: authorize → precommit → benchmark → proof → capture → void -- [ ] Worker with valid proofs receives capture proportional to influence -- [ ] Worker with invalid proofs gets no capture, funds void back to pool -- [ ] Anti-monopoly: worker in 1/4 challenges earns <<< worker in 4/4 challenges -- [ ] Innovator whose algorithm is adopted earns proportional to adoption -- [ ] Leaderboard API returns correct rankings after round completion -- [ ] Round transitions correctly (no double-counting, no missed payments) -- [ ] reclaim() works after authorizationExpiry if manager fails mid-round -- [ ] refund() works within refundExpiry window for post-capture corrections -- [ ] On-chain escrow state matches expected capturableAmount at each step - -### Integration tests (Commerce Payments on Anvil fork) - -- [ ] authorize() locks correct USDC amount in TokenStore -- [ ] Multiple capture() calls to different workers succeed -- [ ] sum(captures) cannot exceed maxAmount -- [ ] capture() fails after authorizationExpiry -- [ ] void() returns remaining funds to payer -- [ ] reclaim() works only after authorizationExpiry -- [ ] refund() works within refundExpiry window -- [ ] ERC-3009 nonce prevents replay of authorization - -### BDD scenarios - -```gherkin -Scenario: Honest worker earns proportional reward - Given a round with 100 USDC authorized in escrow - And worker A has 60% influence and worker B has 40% - When the round completes and captures are executed - Then worker A receives 42 USDC (60% of 70% worker pool) - And worker B receives 28 USDC (40% of 70% worker pool) - And 30 USDC is captured for innovators (20%) and operators (10%) - And void() returns 0 USDC (fully distributed) - -Scenario: Concentrated worker is penalized - Given challenges X and Y are both active - And worker A submits 100 proofs to X and 100 to Y - And worker B submits 200 proofs to X and 0 to Y - When influence is calculated - Then worker A influence > worker B influence - And worker B penalty factor < 0.10 - -Scenario: Worker fails verification — no capture - Given worker C submits a benchmark with invalid Merkle proofs - When the verifier checks the proofs - Then worker C is excluded from qualifiers - And no capture() is called for worker C - And worker C's share remains in escrow - And void() returns worker C's unclaimed share to pool - -Scenario: Manager crash — funds are safe - Given a round with 100 USDC authorized in escrow - And the escrow round manager crashes mid-round - When authorizationExpiry passes - Then the platform wallet calls reclaim() - And all 100 USDC returns to the platform wallet - And no funds are permanently locked -``` - -## Migration from PR #288 - -Files that stay in PR #288 (base sell/buy flow): -- `cmd/obol/sell.go` — sell command -- `internal/inference/store.go` — provenance types -- `internal/schemas/payment.go` — x402 payment parsing -- `flows/flow-06,07,08,10` — sell/buy flow tests -- `ralph-m1.md` — sell flow validation - -Files that move to this PR: -- `internal/embed/skills/autoresearch-coordinator/` — coordinator skill -- `internal/embed/skills/autoresearch-worker/` — worker skill -- `internal/embed/skills/autoresearch/` — publish skill -- `ralph-m2.md` — autoresearch validation (becomes test plan reference) -- `Dockerfile.worker` — worker container image -- `tests/test_autoresearch_worker.py` — worker tests - -New files: -- `charts/autoresearch/` — full Helm chart as described above -- `internal/discovery/` — shared with indexer PR (or imported) - -## Open Questions - -1. **Round duration**: 1 hour seems right for autoresearch experiments (5-10 min each, gives workers time for multiple submissions). Need to validate with real GPU workloads. - -2. **Pool funding source**: Currently "30% of x402 payments." Alternative: fixed per-round emission funded by the operator (simpler but requires operator treasury). Or a hybrid — operator seeds the pool, x402 payments top it up. - -3. **Innovator identity**: How does an algorithm author register? Options: (a) anyone who submits a train.py that gets adopted, (b) explicit registration via ERC-8004 with algorithm metadata, (c) git-based — author identified by commit signature in provenance. - -4. **Multi-challenge readiness**: Currently only `neuralnet_optimizer` (val_bpb) exists. The parity formula needs ≥2 challenges to be meaningful. Second challenge candidates: inference latency optimization, model compression ratio, data preprocessing throughput. - -5. **x402 escrow scheme timeline**: PR #1425 is under active review but spec-only. The reference implementation exists at BackTrackCo/x402r-scheme. We should build Phase 1 (direct contract calls) now and add Phase 3 (native x402-rs flow) when the ecosystem catches up. This is a clear contribution opportunity for obol-stack back to x402-rs. - -6. **Operator wallet security**: The operator wallet can call capture/void/refund. It should be a multisig or hardware wallet, not a hot key in a ConfigMap secret. Consider integration with the existing Secure Enclave key support in obol-stack's sell command. - -## References - -- Commerce Payments Protocol: https://github.com/base/commerce-payments -- x402 escrow scheme spec PR: https://github.com/coinbase/x402/pull/1425 -- x402 escrow scheme discussion: https://github.com/coinbase/x402/issues/839 -- Reference TypeScript implementation: https://github.com/BackTrackCo/x402r-scheme -- x402-rs scheme extension guide: https://github.com/x402-rs/x402-rs/blob/main/docs/how-to-write-a-scheme.md -- x402-rs facilitator scheme registration: https://github.com/x402-rs/x402-rs/tree/main/crates/facilitator/src -- AuthCaptureEscrow on BaseScan: https://basescan.org/address/0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff -- ERC-8004 Identity Registry on BaseScan: https://basescan.org/token/0x8004A169FB4a3325136EB29fA0ceB6D2e539a432 -- BaseScan ERC-8004 metadata announcement: https://x.com/etherscan/status/2037131140608434517 - -## Gherkin Feature Specifications - -The following `.feature` files define the executable BDD specifications for the autoresearch economic layer. Each feature covers one bounded area of behavior using declarative, domain-level language. Step definitions target the autoresearch Python scripts and the Commerce Payments contracts on an Anvil fork. - -### Feature: Escrow Round Lifecycle - -```gherkin -@escrow @critical -Feature: Escrow round lifecycle - The escrow round manager locks USDC in the Commerce Payments - AuthCaptureEscrow contract at the start of each round and - distributes earnings to verified workers at round end. - - Background: - Given the autoresearch chart is deployed on a k3s cluster - And an Anvil fork of Base Sepolia is running - And the platform wallet holds 1000 USDC - And the AuthCaptureEscrow contract is at "0xBdEA0D1bcC5966192B070Fdf62aB4EF5b4420cff" - And the reward pool percentage is 30% - - Rule: Funds must be locked before any work begins - - Scenario: Round starts with successful escrow authorization - Given 200 USDC of x402 payments were collected in the previous round - When a new round begins - Then the escrow round manager calls authorize() for 60 USDC - And the AuthCaptureEscrow capturableAmount equals 60 USDC - And the authorizationExpiry is set to round end plus 1 hour grace - And workers can verify the commitment on-chain - - Scenario: Round start fails when platform wallet has insufficient USDC - Given the platform wallet holds 0 USDC - When a new round begins - Then the escrow round manager logs an authorization failure - And no work is accepted for this round - And the previous round's uncaptured funds are not affected - - Rule: Workers are paid proportionally to verified influence - - Scenario: Two verified workers receive proportional captures - Given a round with 100 USDC authorized in escrow - And worker "0xAAA" has 60% influence - And worker "0xBBB" has 40% influence - And both workers passed commit-reveal verification - When the round completes - Then capture() is called for "0xAAA" with 42 USDC - And capture() is called for "0xBBB" with 28 USDC - And the platform fee receiver gets 2% of each capture - And void() is called for the remaining 30 USDC - And the remaining USDC returns to the platform wallet - - Scenario: Unverified worker receives no capture - Given a round with 100 USDC authorized in escrow - And worker "0xAAA" passed verification with 50% influence - And worker "0xCCC" failed commit-reveal verification - When the round completes - Then capture() is called for "0xAAA" with 35 USDC - And no capture() is called for "0xCCC" - And void() returns 65 USDC to the platform wallet - - Scenario: Round with no verified workers voids entirely - Given a round with 100 USDC authorized in escrow - And no workers submitted valid proofs - When the round completes - Then void() is called - And the full 100 USDC returns to the platform wallet - - Rule: Funds are always recoverable - - Scenario: Platform reclaims funds after manager crash - Given a round with 100 USDC authorized in escrow - And the escrow round manager process has crashed - When the authorizationExpiry passes - Then the platform wallet calls reclaim() directly - And the full 100 USDC returns to the platform wallet - And no operator signature is required - - Scenario: Operator refunds a worker after post-capture fraud discovery - Given worker "0xAAA" received a 42 USDC capture in round 5 - And fraud is discovered within the refund window - When the operator calls refund() for 42 USDC - Then 42 USDC returns to the platform wallet - And the refund is recorded in the round history -``` - -### Feature: OPOW Influence and Anti-Monopoly - -```gherkin -@opow @critical -Feature: OPOW influence calculation with anti-monopoly parity - The reward engine computes per-worker influence using a parity - formula that penalizes concentration on a single challenge. - Workers must diversify across all active challenges to maximize - their earnings. - - Background: - Given the imbalance multiplier is 3.0 - - Rule: Diversified workers earn more than concentrated workers - - Scenario: Equally diversified worker has zero penalty - Given 4 active challenges - And worker "0xAAA" has qualifier fractions: - | challenge | fraction | - | c001 | 0.25 | - | c002 | 0.25 | - | c003 | 0.25 | - | c004 | 0.25 | - When influence is calculated - Then worker "0xAAA" imbalance is 0.0 - And worker "0xAAA" penalty factor is 1.0 - - Scenario: Fully concentrated worker is severely penalized - Given 4 active challenges - And worker "0xBBB" has qualifier fractions: - | challenge | fraction | - | c001 | 1.00 | - | c002 | 0.00 | - | c003 | 0.00 | - | c004 | 0.00 | - When influence is calculated - Then worker "0xBBB" imbalance is 3.0 - And worker "0xBBB" penalty factor is less than 0.05 - - Scenario: Concentrated worker earns less despite equal total output - Given 2 active challenges and a worker pool of 100 USDC - And worker "0xAAA" submitted 50 proofs to c001 and 50 to c002 - And worker "0xBBB" submitted 100 proofs to c001 and 0 to c002 - When influence is calculated and rewards are distributed - Then worker "0xAAA" earns more than worker "0xBBB" - And the ratio of earnings exceeds 5:1 - - Scenario Outline: Parity penalty scales with concentration - Given 2 active challenges - And a worker has qualifier fractions and - When influence is calculated - Then the penalty factor is approximately - - Examples: - | f1 | f2 | penalty | - | 0.50 | 0.50 | 1.00 | - | 0.70 | 0.30 | 0.65 | - | 0.90 | 0.10 | 0.11 | - | 1.00 | 0.00 | 0.05 | - - Rule: Influence values are normalized across all workers - - Scenario: Total influence sums to 1.0 - Given 3 workers with varying qualifier fractions - When influence is calculated for all workers - Then the sum of all influence values equals 1.0 - - Scenario: Single worker in a round gets full influence - Given 1 worker who participated in all active challenges - When influence is calculated - Then that worker's influence is 1.0 - And they receive the entire worker pool -``` - -### Feature: Commit-Reveal Work Verification - -```gherkin -@verification @critical -Feature: Commit-reveal work verification - Workers commit to results via a Merkle root before learning - which nonces will be sampled. This prevents retroactive - fabrication of results. - - Background: - Given the verifier is running - And the neuralnet_optimizer challenge is active - And the sample count is 5 nonces per benchmark - - Rule: Honest workers pass verification - - Scenario: Worker with valid proofs becomes a qualifier - Given worker "0xAAA" precommits a benchmark with 100 nonces - And the verifier assigns a random hash and track - When worker "0xAAA" submits a Merkle root over 100 results - And the verifier samples 5 nonces for verification - And worker "0xAAA" submits valid Merkle proofs for all 5 - Then worker "0xAAA" is recorded as a qualifier - And the benchmark quality scores are accepted - - Scenario: Re-execution confirms claimed quality - Given worker "0xAAA" claims val_bpb of 3.2 for nonce 42 - When the verifier re-executes nonce 42 with the same settings - Then the re-executed val_bpb matches the claimed 3.2 - And the proof is accepted - - Rule: Dishonest workers fail verification - - Scenario: Invalid Merkle proof is rejected - Given worker "0xCCC" submitted a Merkle root - And the verifier sampled nonces [7, 23, 45, 61, 89] - When worker "0xCCC" submits a proof for nonce 23 that does not match the root - Then the verification fails for worker "0xCCC" - And worker "0xCCC" is excluded from qualifiers for this round - And no escrow capture is made for worker "0xCCC" - - Scenario: Worker who inflates quality scores is caught - Given worker "0xCCC" claims val_bpb of 2.8 for nonce 42 - When the verifier re-executes nonce 42 with the same settings - And the re-executed val_bpb is 3.5 - Then the quality mismatch is detected - And the verification fails for worker "0xCCC" - - Scenario: Worker who times out on proof submission is excluded - Given worker "0xCCC" submitted a Merkle root - And the verifier sampled 5 nonces - When worker "0xCCC" does not submit proofs within 300 seconds - Then worker "0xCCC" is excluded from qualifiers - And the round proceeds without them - - Rule: Sampling is fair and deterministic - - Scenario: Nonce sampling is deterministic from the round seed - Given the same benchmark settings and random hash - When nonces are sampled twice - Then the same 5 nonces are selected both times - - Scenario: Worker cannot predict which nonces will be sampled - Given the random hash is derived from a future block hash - When the worker commits their Merkle root - Then the sampled nonces have not yet been determined -``` - -### Feature: Reward Pool Distribution - -```gherkin -@rewards -Feature: Reward pool distribution across roles - The reward engine splits the pool among innovators, workers, - and operators according to configured percentages. Worker - distribution is influence-weighted. Innovator distribution - is adoption-weighted. - - Background: - Given the pool split is 20% innovators, 70% workers, 10% operators - And a round with 100 USDC in the reward pool - - Rule: Pool splits match configured percentages - - Scenario: Standard round distributes to all three roles - When the round completes with verified workers - Then 20 USDC is allocated to innovators - And 70 USDC is allocated to workers - And 10 USDC is allocated to operators - - Rule: Workers earn by influence - - Scenario: Workers are paid proportionally to influence - Given the worker pool is 70 USDC - And worker "0xAAA" has influence 0.6 - And worker "0xBBB" has influence 0.4 - When worker rewards are distributed - Then worker "0xAAA" earns 42 USDC - And worker "0xBBB" earns 28 USDC - - Rule: Innovators earn by adoption - - Scenario: Algorithm author earns when workers adopt their code - Given the innovator pool is 20 USDC for the neuralnet_optimizer challenge - And algorithm "fast-muon-v3" by innovator "0xINN1" has 75% adoption - And algorithm "baseline-adamw" by innovator "0xINN2" has 25% adoption - When innovator rewards are distributed - Then innovator "0xINN1" earns 15 USDC - And innovator "0xINN2" earns 5 USDC - - Scenario: Unadopted algorithm earns nothing - Given the innovator pool is 20 USDC - And algorithm "untested-v1" has 0% adoption - When innovator rewards are distributed - Then the author of "untested-v1" earns 0 USDC - And the unadopted share rolls into the next round - - Rule: Gamma scaling adjusts for challenge count - - Scenario Outline: Reward scales with number of active challenges - Given gamma parameters a=1.0, b=0.5, c=0.3 - And challenges are active - When the gamma value is calculated - Then the scaling factor is approximately - - Examples: - | n | gamma | - | 1 | 0.63 | - | 3 | 0.80 | - | 7 | 0.94 | -``` - -### Feature: Worker Discovery and Fallback - -```gherkin -@discovery -Feature: Multi-tier worker discovery with fallback - The coordinator discovers GPU workers through a prioritized - chain of discovery backends. If the preferred backend is - unavailable, it falls back to the next tier automatically. - - Background: - Given the OASF skill filter is "devops_mlops/model_versioning" - - Rule: Discovery uses the highest-priority available backend - - Scenario: Coordinator uses Reth indexer when available - Given the reth-erc8004-indexer is deployed in the cluster - And the indexer has synced past the latest registration - When the coordinator discovers workers - Then the query goes to the Reth indexer API - And workers with the model_versioning skill are returned - - Scenario: Coordinator falls back to BaseScan when indexer is down - Given the reth-erc8004-indexer is not deployed - And a BaseScan API key is configured - When the coordinator discovers workers - Then the query goes to the BaseScan API - And ERC-8004 NFT metadata is read for each agent - And workers with the model_versioning skill are returned - - Scenario: Coordinator falls back to 8004scan as last resort - Given the reth-erc8004-indexer is not deployed - And no BaseScan API key is configured - When the coordinator discovers workers - Then the query goes to 8004scan.io - And workers with the model_versioning skill are returned - - Scenario: All backends unavailable produces a clear error - Given no discovery backends are reachable - When the coordinator discovers workers - Then a "no discovery backend available" error is returned - And the round proceeds with zero workers - - Rule: Discovery results are cached to reduce API calls - - Scenario: Repeated queries within TTL use cached results - Given the cache TTL is 300 seconds - And a discovery query succeeded 60 seconds ago - When the coordinator discovers workers again - Then no external API call is made - And the cached results are returned -``` - -### Feature: End-to-End Round - -```gherkin -@e2e @slow -Feature: End-to-end autoresearch round - A complete round from escrow authorization through worker - experiments to reward distribution and settlement. - - Background: - Given the autoresearch chart is deployed with default values - And an Anvil fork of Base Sepolia is running - And the platform wallet holds 500 USDC - And 2 GPU workers are registered on ERC-8004: - | address | skill | gpu | - | 0xW001 | devops_mlops/model_versioning | NVIDIA T4 | - | 0xW002 | devops_mlops/model_versioning | NVIDIA A10 | - And 1 innovator submitted algorithm "muon-opt-v2" for neuralnet_optimizer - - Scenario: Complete round with two honest workers - # Round setup - Given 100 USDC of x402 payments were collected in the previous round - When a new round begins - Then 30 USDC is authorized in escrow - - # Worker experiments - When worker "0xW001" precommits a benchmark with 50 nonces - And worker "0xW002" precommits a benchmark with 50 nonces - And both workers submit Merkle roots over their results - And the verifier samples 5 nonces from each worker - And both workers submit valid Merkle proofs - Then both workers are recorded as qualifiers - - # Reward calculation - When the round duration expires - Then the reward engine computes influence for both workers - And both workers have balanced challenge participation - And influence is split proportionally to qualifier count - - # Settlement - When captures are executed - Then worker "0xW001" receives their earned USDC via capture() - And worker "0xW002" receives their earned USDC via capture() - And innovator "muon-opt-v2" receives adoption-weighted USDC - And the operator receives 10% of the pool - And void() returns any remainder to the platform wallet - And the leaderboard API shows both workers with correct earnings - And the next round begins with a new authorization - - Scenario: Round where one worker submits fraudulent proofs - Given 100 USDC of x402 payments were collected - When a new round begins - Then 30 USDC is authorized in escrow - - When worker "0xW001" submits valid proofs for all sampled nonces - And worker "0xW002" submits a proof with a quality mismatch - Then worker "0xW001" is a qualifier - And worker "0xW002" is excluded - - When captures are executed - Then worker "0xW001" receives the entire worker pool share - And worker "0xW002" receives nothing - And void() returns worker "0xW002"'s unclaimed share to the platform -``` - -## Labels - -`component:autoresearch` `component:rewards` `component:x402` `priority:high` `size:XL` diff --git a/docs/issues/issue-reth-erc8004-indexer-helm-chart.md b/docs/issues/issue-reth-erc8004-indexer-helm-chart.md deleted file mode 100644 index 13217e69..00000000 --- a/docs/issues/issue-reth-erc8004-indexer-helm-chart.md +++ /dev/null @@ -1,277 +0,0 @@ -# Extract reth-erc8004-indexer into standalone Helm chart with discovery fallback - -## Summary - -Carve the `reth-erc8004-indexer/` component out of PR #288 into its own PR with a dedicated Helm chart, proper test coverage, and a multi-tier discovery architecture that can fall back to Etherscan/BaseScan's native ERC-8004 metadata support when a full Reth node is impractical. - -## Motivation - -The reth-erc8004-indexer currently lives as a loose directory in the repo with a Dockerfile but no Helm chart, no CI, and no tests beyond ralph-m3's manual validation checklist. The autoresearch coordinator (and any future service that needs agent discovery) depends on a working ERC-8004 query API, but today that dependency is either: - -1. **8004scan.io** — a third-party centralized API we don't control -2. **reth-erc8004-indexer** — a custom Reth binary that requires syncing an entire Base L2 node - -Neither option is great for all deployment scenarios. Meanwhile, **Etherscan/BaseScan announced native ERC-8004 metadata display** (operational status, x402 support, services) on NFT detail pages. This creates a third discovery tier that's reliable, free, and doesn't require running infrastructure. - -The indexer should ship as a properly tested, independently installable Helm chart that the autoresearch chart (and others) can declare as an optional dependency. - -## Scope - -### In scope - -- [ ] Move `reth-erc8004-indexer/` and `Dockerfile.reth-erc8004-indexer` into a self-contained Helm chart at `charts/reth-erc8004-indexer/` -- [ ] Implement 3-tier discovery fallback in the coordinator's discovery client -- [ ] Integration tests for the indexer API surface -- [ ] CI pipeline for building the Reth binary image -- [ ] Document deployment scenarios (full node, lightweight, external-only) - -### Out of scope - -- Autoresearch reward engine or OPOW mechanics (separate issue) -- Changes to the autoresearch coordinator loop logic -- ERC-8004 registration/minting changes - -## Architecture: 3-Tier Discovery - -The coordinator and any other discovery consumer should attempt sources in priority order: - -``` -Priority 1: Internal Reth Indexer (self-hosted, real-time) - │ - │ OBOL_INDEXER_API_URL=http://reth-indexer:3400 - │ Latency: <100ms, block-level freshness - │ Cost: runs a full Base L2 node (~500GB disk, ongoing sync) - │ - ▼ if unavailable or not deployed -Priority 2: BaseScan / Etherscan API (hosted, reliable) - │ - │ BASESCAN_API_URL=https://api.basescan.org/api - │ BASESCAN_API_KEY= - │ Latency: <500ms, near real-time - │ Cost: free tier = 5 calls/sec, Pro = 100K calls/day - │ Coverage: ERC-8004 metadata now displayed natively - │ - agent operational status - │ - x402 support flag - │ - registered services list - │ - NFT detail page with full metadata - │ - ▼ if unavailable or no API key -Priority 3: 8004scan.io (community, best-effort) - │ - │ SCAN_API_URL=https://www.8004scan.io/api/v1/public - │ Latency: <1s, minutes behind chain - │ Cost: free, no key required - │ Risk: third-party, no SLA - │ - ▼ if all unavailable - Error: no discovery backend available -``` - -### BaseScan Integration Details - -As of March 26, 2026, BaseScan displays ERC-8004 metadata on NFT detail pages: -- Contract: `0x8004A169FB4a3325136EB29fA0ceB6D2e539a432` (18,512 holders, 45,198 transfers) -- Each agent NFT page shows: operational status, x402 support, services, metadata -- BaseScan API can query token holders, transfer events, and read contract state - -The BaseScan adapter needs to: -1. Query NFT holders of the Identity Registry contract -2. For each token ID, read the metadata URI via `tokenURI(tokenId)` -3. Fetch the off-chain registration JSON from the metadata URI -4. Filter by OASF skill/domain taxonomy (same as 8004scan queries) -5. Cache results with configurable TTL (default: 5 minutes) - -This is more work per query than the indexer (N+1 calls vs single query), but it requires **zero infrastructure** and uses a highly reliable API. - -### Discovery Client Interface - -```go -// internal/discovery/discovery.go - -type Agent struct { - TokenID string - ChainID uint64 - Owner string - Name string - Endpoint string - Skills []string - Domains []string - Metadata map[string]interface{} - X402Support bool -} - -type DiscoveryClient interface { - ListAgents(ctx context.Context, opts ListOptions) ([]Agent, error) - SearchAgents(ctx context.Context, query string, limit int) ([]Agent, error) - GetAgent(ctx context.Context, chainID uint64, tokenID string) (*Agent, error) - Health(ctx context.Context) error -} - -type ListOptions struct { - Skill string // OASF skill filter, e.g. "devops_mlops/model_versioning" - Domain string // OASF domain filter - ChainID uint64 // filter by chain - Limit int - SortBy string // "registered_at", "name" -} -``` - -Three implementations: `RethIndexerClient`, `BaseScanClient`, `EightKScanClient`. -A `FallbackClient` wraps all three and tries in priority order. - -### Cluster Topology - -``` -┌─────────────────────────────────────────────────────────────┐ -│ obol-stack cluster (k3d / k3s) │ -│ │ -│ ┌─────────────────────────┐ ┌──────────────────────┐ │ -│ │ reth-erc8004-indexer │ │ autoresearch chart │ │ -│ │ (optional Helm chart) │ │ (depends on discovery)│ │ -│ │ │ │ │ │ -│ │ ┌─────────┐ ┌────────┐ │ │ coordinator │ │ -│ │ │ Reth │ │ SQLite │ │ │ │ │ │ -│ │ │ ExEx │→│ WAL │ │ │ ▼ │ │ -│ │ │ (Base) │ │ store │ │ │ FallbackClient │ │ -│ │ └─────────┘ └────────┘ │ │ ├→ RethIndexer? │ │ -│ │ │ │ │ ├→ BaseScan? │ │ -│ │ ┌────▼──────┐ │ │ └→ 8004scan? │ │ -│ │ │ REST API │◄────────│───│─── GET /agents?skill= │ │ -│ │ │ :3400 │ │ │ │ │ -│ │ └───────────┘ │ └──────────────────────┘ │ -│ └─────────────────────────┘ │ -│ │ -│ OR (lightweight mode) │ -│ │ -│ ┌──────────────────────┐ │ -│ │ autoresearch chart │ │ -│ │ │ ┌──────────────────────┐ │ -│ │ FallbackClient ─────│────→│ api.basescan.org │ │ -│ │ (no indexer needed) │ │ (ERC-8004 metadata │ │ -│ │ │ │ natively supported) │ │ -│ └──────────────────────┘ └──────────────────────┘ │ -└─────────────────────────────────────────────────────────────┘ -``` - -## Helm Chart Structure - -``` -charts/reth-erc8004-indexer/ -├── Chart.yaml -├── values.yaml -├── templates/ -│ ├── _helpers.tpl -│ ├── statefulset.yaml # Reth + ExEx + API in one pod -│ ├── service.yaml # ClusterIP on port 3400 (API) + 30303 (P2P) -│ ├── pvc.yaml # Persistent volume for chain data + SQLite -│ ├── configmap.yaml # Reth config (Base chain, ExEx params) -│ ├── servicemonitor.yaml # Prometheus metrics (optional) -│ └── tests/ -│ └── test-api.yaml # Helm test: curl /health + /api/v1/public/stats -└── README.md -``` - -### values.yaml (key fields) - -```yaml -replicaCount: 1 - -image: - repository: ghcr.io/obolnetwork/reth-erc8004-indexer - tag: latest - -reth: - chain: base - dataDir: /data/reth - syncMode: full # full | archive - httpPort: 8545 - p2pPort: 30303 - -indexer: - apiPort: 3400 - dbPath: /data/indexer.db - identityRegistry: "0x8004A169FB4a3325136EB29fA0ceB6D2e539a432" - reputationRegistry: "0x8004BAa17C55a88189AE136b182e5fdA19dE9b63" - -persistence: - enabled: true - size: 500Gi # Base L2 chain data - storageClass: "" # use cluster default - -resources: - requests: - cpu: 2 - memory: 8Gi - limits: - cpu: 4 - memory: 16Gi -``` - -## Test Plan - -### Unit tests (Rust) - -- [ ] `storage.rs`: insert/query/update/delete agents in SQLite -- [ ] `storage.rs`: pagination, sorting, search with LIKE/FTS -- [ ] `indexer.rs`: parse `Registered`, `URIUpdated`, `MetadataSet` event logs -- [ ] `indexer.rs`: handle reorgs (rollback indexed data on chain reorg) -- [ ] `api.rs`: response shape matches 8004scan API contract - -### Integration tests (against running instance) - -- [ ] `/health` returns 200 with sync status -- [ ] `/api/v1/public/agents` returns paginated list -- [ ] `/api/v1/public/agents?protocol=OASF&search=model_versioning` filters correctly -- [ ] `/api/v1/public/agents/{chain_id}/{token_id}` returns single agent with full metadata -- [ ] `/api/v1/public/stats` returns registry statistics -- [ ] Response shapes are wire-compatible with 8004scan (coordinator works against both) - -### Discovery fallback tests - -- [ ] FallbackClient uses Reth indexer when available -- [ ] FallbackClient falls back to BaseScan when indexer is down -- [ ] FallbackClient falls back to 8004scan when BaseScan has no API key -- [ ] FallbackClient returns error when all three are unavailable -- [ ] BaseScan adapter correctly reads ERC-8004 NFT metadata via token API -- [ ] Cache TTL is respected (no redundant API calls within window) - -### Autoresearch-specific tests - -- [ ] Coordinator discovers workers with `devops_mlops/model_versioning` skill via each tier -- [ ] Coordinator reads `best_val_bpb` from worker metadata via each tier -- [ ] Coordinator probes discovered workers via x402 (402 response = alive) -- [ ] End-to-end: register worker → indexer picks up → coordinator discovers → probe succeeds - -## Migration from PR #288 - -Files to move into this PR: - -``` -reth-erc8004-indexer/ → charts/reth-erc8004-indexer/src/ (or keep at root with chart alongside) -Dockerfile.reth-erc8004-indexer → charts/reth-erc8004-indexer/Dockerfile -ralph-m3.md → reference for test plan, then remove -``` - -New files: - -``` -charts/reth-erc8004-indexer/ → Helm chart (as above) -internal/discovery/ → Go discovery client with fallback -internal/discovery/discovery.go → interface + FallbackClient -internal/discovery/reth.go → RethIndexerClient -internal/discovery/basescan.go → BaseScanClient -internal/discovery/eightkcan.go → EightKScanClient (8004scan) -internal/discovery/*_test.go → tests for each -``` - -## Acceptance Criteria - -1. `helm install indexer charts/reth-erc8004-indexer` deploys and syncs on a k3s cluster with Base chain -2. Coordinator discovers workers via the indexer with zero code changes to coordinate.py (SCAN_API_URL points to indexer) -3. When indexer is not installed, coordinator automatically falls back to BaseScan or 8004scan -4. All tests in the test plan pass in CI -5. Docker image builds in CI and publishes to ghcr.io/obolnetwork/reth-erc8004-indexer - -## Labels - -`component:indexer` `component:discovery` `priority:high` `size:L` diff --git a/docs/specs/ARCHITECTURE.md b/docs/specs/ARCHITECTURE.md deleted file mode 100644 index a6405da0..00000000 --- a/docs/specs/ARCHITECTURE.md +++ /dev/null @@ -1,966 +0,0 @@ -# Obol Stack -- Architecture Document - -> **Version:** 1.0.0 -> **Date:** 2026-03-27 -> **Companion to:** [SPEC.md](./SPEC.md) -> **Audience:** Seasoned developers, agentic workflows, and system integrators. - ---- - -## Table of Contents - -1. [Design Philosophy](#1-design-philosophy) -2. [C4 Diagrams](#2-c4-diagrams) -3. [Module Decomposition](#3-module-decomposition) -4. [Data Flow Diagrams](#4-data-flow-diagrams) -5. [Storage Architecture](#5-storage-architecture) -6. [Deployment Model](#6-deployment-model) -7. [Network Topology](#7-network-topology) -8. [Security Architecture](#8-security-architecture) - ---- - -## 1. Design Philosophy - -Five guiding principles govern every architectural decision in Obol Stack. They are listed in order of precedence -- when principles conflict, the higher-numbered principle yields to the lower. - -### 1.1 Local-First Sovereignty - -The operator's machine is the source of truth. All infrastructure runs inside a local k3d/k3s cluster, all state lives on the local filesystem under XDG-compliant paths, and no cloud account is required to start. Public exposure (Cloudflare tunnels) is opt-in and layered on top, never a prerequisite. This ensures the operator retains full custody of keys, models, and data at all times. - -*SPEC cross-ref: Section 1.3 (System Constraints), Section 2.3 (Configuration Hierarchy).* - -### 1.2 Configuration-Driven Infrastructure - -Infrastructure is declared, not scripted. Two-stage templating (CLI flags to Go templates to Helmfile to Kubernetes manifests) ensures that every deployed resource traces back to a versioned configuration file. Embedded assets (`internal/embed/`) ship default configurations; operators override via flags or values files. Helmfile is the single deployment orchestrator -- there are no imperative `kubectl apply` calls in the steady-state path. - -*SPEC cross-ref: Section 3.3.3 (Two-Stage Templating), Section 2.1 (High-Level Overview).* - -### 1.3 Payment-Gated by Default - -Every publicly exposed service is protected by x402 micropayments unless explicitly exempted. The ForwardAuth pattern means Traefik itself enforces payment before traffic ever reaches the upstream. This is not an afterthought bolt-on -- the payment gate is a first-class infrastructure primitive deployed alongside the service via the ServiceOffer reconciliation loop. - -*SPEC cross-ref: Section 3.4 (Monetize -- Sell Side), Section 4.1 (x402 Payment Protocol).* - -### 1.4 Bounded Trust, Bounded Spending - -The system minimizes trust surfaces at every layer. The buy-side sidecar has zero signer access; it can only spend pre-signed vouchers, bounding maximum loss to N * price. The sell-side verifier delegates to an external facilitator for settlement, never holding funds. Wallet private keys live in encrypted keystores or hardware enclaves, accessed only through a remote-signer REST API. RBAC scopes the agent to exactly the Kubernetes verbs it needs. - -*SPEC cross-ref: Section 7.2 (Payment Security), Section 7.3 (Wallet Security), Section 7.5 (RBAC).* - -### 1.5 Progressive Disclosure - -A single `obol stack up` gives operators a working cluster with auto-configured LLM routing, an AI agent, and local blockchain access. Advanced features -- selling inference, buying remote models, on-chain registration, Secure Enclave keys -- are activated incrementally through explicit CLI commands. Failures in optional subsystems (Ollama down, no cloud API key, tunnel unavailable) degrade gracefully with warnings, never blocking the core startup path. - -*SPEC cross-ref: Section 3.1.3 (Startup Sequence), Section 8.2 (Graceful Degradation).* - ---- - -## 2. C4 Diagrams - -### 2.1 Context Diagram (Level 1) - -The system boundary is the operator's machine. External systems interact via well-defined protocols. - -```mermaid -C4Context - title Obol Stack -- System Context - - Person(operator, "Operator", "Manages cluster via obol CLI") - Person(buyer, "Remote Buyer", "Purchases inference via x402") - - System(obol, "Obol Stack", "Local k3d/k3s cluster with AI agent,
payment-gated inference, blockchain networks") - - System_Ext(cloudflare, "Cloudflare", "Tunnel service for public exposure") - System_Ext(facilitator, "x402 Facilitator", "Payment verification and settlement
(facilitator.x402.rs)") - System_Ext(base, "Base L2", "ERC-8004 identity registry,
USDC settlement (Base Sepolia / Mainnet)") - System_Ext(ollama, "Ollama", "Local LLM inference engine
(host process)") - System_Ext(chainlist, "ChainList API", "Public RPC endpoint discovery") - System_Ext(cloud_llm, "Cloud LLM Providers", "Anthropic, OpenAI APIs") - - Rel(operator, obol, "Manages via CLI") - Rel(buyer, obol, "x402 payments over HTTPS") - Rel(obol, cloudflare, "HTTPS/QUIC tunnel") - Rel(obol, facilitator, "HTTPS POST /verify") - Rel(obol, base, "JSON-RPC (contract calls)") - Rel(obol, ollama, "HTTP /api/tags, /v1/*") - Rel(obol, chainlist, "HTTPS GET") - Rel(obol, cloud_llm, "HTTPS /v1/*") -``` - -### 2.2 Container Diagram (Level 2) - -Inside the k3d cluster, containers are organized by namespace. Each namespace represents a deployment unit with distinct responsibilities. - -```mermaid -C4Container - title Obol Stack -- Container Diagram (k3d Cluster) - - Person(operator, "Operator") - Person(buyer, "Remote Buyer") - - System_Boundary(cluster, "k3d / k3s Cluster") { - - Container(traefik, "Traefik Gateway", "Gateway API", "Ingress controller.
Routes local and public traffic.
ForwardAuth to x402-verifier.") - Container(cloudflared, "cloudflared", "Cloudflare Tunnel", "Exposes public routes
to the internet.") - Container(storefront, "Storefront", "busybox httpd", "Static landing page
at tunnel hostname root.") - - Container(litellm, "LiteLLM", "Python, port 4000", "OpenAI-compatible proxy.
Routes to Ollama, cloud,
or paid/* via sidecar.") - Container(x402_buyer, "x402-buyer", "Go sidecar, port 8402", "Buy-side payment attachment.
Pre-signed ERC-3009 auths.
Runs in LiteLLM Pod.") - - Container(x402_verifier, "x402-verifier", "Go, port 8080", "ForwardAuth middleware.
Route matching, 402 responses,
facilitator delegation.") - - Container(agent, "OpenClaw Agent", "Python", "AI agent singleton.
Skills via PVC injection.
monetize.py reconciler.") - Container(remote_signer, "Remote Signer", "REST API, port 9000", "Keystore-backed signing.
In-namespace access only.") - - Container(erpc, "eRPC", "Go, port 4000", "Blockchain RPC gateway.
Multiplexes upstreams,
caches eth_call.") - Container(frontend, "Frontend", "React, nginx", "Dashboard UI.
Local-only (obol.stack).") - Container(prometheus, "Prometheus", "Monitoring", "Metrics collection.
ServiceMonitor + PodMonitor.") - } - - System_Ext(ollama, "Ollama (Host)") - System_Ext(facilitator, "x402 Facilitator") - System_Ext(internet, "Public Internet") - - Rel(operator, traefik, "HTTP :80 / :443") - Rel(internet, cloudflared, "HTTPS/QUIC") - Rel(cloudflared, traefik, "HTTP") - Rel(buyer, cloudflared, "HTTPS") - - Rel(traefik, x402_verifier, "ForwardAuth POST /verify") - Rel(traefik, litellm, "/services/* (after 200)") - Rel(traefik, frontend, "/ (obol.stack)") - Rel(traefik, erpc, "/rpc (obol.stack)") - Rel(traefik, storefront, "/ (tunnel hostname)") - - Rel(litellm, ollama, "ollama/* models") - Rel(litellm, x402_buyer, "paid/* models :8402") - Rel(x402_buyer, internet, "x402 payment + request") - Rel(x402_verifier, facilitator, "POST /verify") - - Rel(agent, litellm, "Inference requests") - Rel(agent, remote_signer, "Sign transactions :9000") -``` - -### 2.3 Component Diagram -- Monetize Subsystem (Level 3) - -The monetize subsystem is the most architecturally complex part of Obol Stack. It spans the CLI, a Kubernetes CRD, a Python reconciler, Traefik middleware, and the x402-verifier. - -```mermaid -C4Component - title Monetize Subsystem -- Component Diagram - - Container_Boundary(cli_boundary, "obol CLI") { - Component(sell_cmd, "sell.go", "urfave/cli", "Parses flags, validates input,
creates ServiceOffer CR,
triggers tunnel activation.") - Component(schemas_pkg, "schemas/", "Go", "ServiceOffer struct definitions,
payment validation,
price approximation.") - } - - Container_Boundary(agent_boundary, "OpenClaw Agent Pod") { - Component(reconciler, "monetize.py", "Python", "6-stage reconciliation loop.
Watches ServiceOffer CRs.
Creates child resources with
ownerReferences for GC.") - } - - Container_Boundary(k8s_boundary, "Kubernetes API") { - Component(serviceoffer_crd, "ServiceOffer CRD", "obol.org/v1alpha1", "Declarative sell-side API.
Spec: type, model, upstream,
payment, path, registration.") - Component(middleware, "Traefik Middleware", "traefik.io", "ForwardAuth middleware
pointing to x402-verifier.") - Component(httproute, "HTTPRoute", "gateway.networking.k8s.io", "Public route:
/services//*
No hostname restriction.") - Component(pricing_cm, "x402-pricing ConfigMap", "x402 ns", "Route rules: pattern,
price, wallet, chain.") - Component(registration, "Registration Resources", "traefik ns", "ConfigMap + httpd + HTTPRoute
for /.well-known/ and /skill.md") - } - - Container_Boundary(verifier_boundary, "x402-verifier") { - Component(verifier_core, "Verifier", "Go", "ForwardAuth handler.
Route matching, 402 generation,
facilitator delegation.") - Component(watcher, "WatchConfig", "Go", "Polls pricing YAML every 5s.
Atomic config swap on change.") - Component(matcher, "Matcher", "Go", "First-match route resolution.
Exact, prefix, glob patterns.") - } - - Rel(sell_cmd, serviceoffer_crd, "kubectl apply") - Rel(sell_cmd, schemas_pkg, "Validate + build CR") - Rel(reconciler, serviceoffer_crd, "Watch (10s loop)") - Rel(reconciler, middleware, "Stage 3: Create") - Rel(reconciler, pricing_cm, "Stage 3: Patch routes[]") - Rel(reconciler, httproute, "Stage 4: Create") - Rel(reconciler, registration, "Stage 5: Create") - Rel(watcher, pricing_cm, "Poll mtime (5s)") - Rel(watcher, verifier_core, "Atomic Reload()") - Rel(verifier_core, matcher, "Match URI to route") -``` - ---- - -## 3. Module Decomposition - -Every Go package, its purpose, key dependencies, and SPEC cross-references. - -### 3.1 CLI Layer - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `cmd/obol` | CLI entry point and command definitions | `main.go`, `sell.go`, `network.go`, `openclaw.go`, `model.go`, `bootstrap.go`, `update.go` | `urfave/cli/v3`, all `internal/` packages | 4.2 | - -### 3.2 Core Infrastructure - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `internal/config` | XDG-compliant configuration resolution | `config.go` | (stdlib only) | 2.3 | -| `internal/stack` | Cluster lifecycle (init, up, down, purge) | `stack.go`, `backend.go`, `backend_k3d.go`, `backend_k3s.go` | `config`, `embed`, `model`, `openclaw`, `tunnel`, `agent`, `dns` | 3.1 | -| `internal/embed` | Embedded assets (infrastructure, networks, skills) | `embed.go` | `embed` (stdlib) | 2.1, 3.6.2 | -| `internal/kubectl` | Kubernetes API wrapper with auto-KUBECONFIG | `kubectl.go` | `config` | 2.3 | -| `internal/ui` | Terminal UI (spinners, prompts, branded output) | `ui.go`, `spinner.go`, `prompt.go`, `brand.go`, `errors.go`, `exec.go`, `output.go`, `suggest.go` | (stdlib, terminal libs) | 8.1 | -| `internal/version` | Build version information | `version.go` | (stdlib only) | -- | -| `internal/update` | Self-update and dependency management | `update.go`, `github.go`, `charts.go`, `hint.go` | `version` | -- | -| `internal/dns` | Local DNS resolver for `obol.stack` hostname | `resolver.go` | (stdlib only) | 3.7.5 | - -### 3.3 LLM and Inference - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `internal/model` | LiteLLM gateway configuration and provider management | `model.go` | `config`, `kubectl` | 3.2 | -| `internal/inference` | Standalone x402 inference gateway (bare metal / VM) | `gateway.go`, `container.go`, `store.go`, `client.go`, `enclave_middleware.go` | `enclave`, `tee`, `x402` | 3.9 | -| `internal/enclave` | Apple Secure Enclave key management (P-256, ECIES) | `enclave.go`, `enclave_darwin.go`, `enclave_stub.go`, `ecies.go`, `sysctl_darwin.go` | (CGo / Security.framework on macOS) | 3.9.6 | -| `internal/tee` | TEE attestation (TDX, SNP, Nitro) and key management | `tee.go`, `key.go`, `coco.go`, `verify.go`, `attest_*.go` | (platform-specific) | 3.9.6, 7.4 | - -### 3.4 Monetize (Sell Side) - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `internal/x402` | x402-verifier: ForwardAuth, route matching, config hot-reload | `verifier.go`, `config.go`, `matcher.go`, `watcher.go`, `setup.go`, `validate.go`, `metrics.go` | `x402-go` (library) | 3.4.4, 3.4.5 | -| `internal/schemas` | ServiceOffer CRD types, payment validation, pricing math | `serviceoffer.go`, `payment.go`, `registration.go` | (stdlib only) | 3.4.3, 5.3 | -| `internal/embed/skills/monetize/` | Python reconciler (`monetize.py`) for 6-stage ServiceOffer reconciliation | `monetize.py`, `SKILL.md` | Kubernetes Python client | 3.4.2 | - -### 3.5 Monetize (Buy Side) - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `internal/x402/buyer` | x402-buyer sidecar: reverse proxy, pre-signed auth pool, state tracking | `proxy.go`, `signer.go`, `config.go`, `state.go`, `metrics.go` | `x402-go` | 3.5 | -| `cmd/x402-buyer` | Sidecar binary entry point | `main.go` | `internal/x402/buyer` | 3.5 | -| `internal/embed/skills/buy-inference/` | Agent skill for discovery and purchasing remote inference | `SKILL.md`, `scripts/buy.py` | Python, Kubernetes client | 3.5.2 | - -### 3.6 Identity and Blockchain - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `internal/erc8004` | ERC-8004 Identity Registry client (register, metadata, URI) | `client.go`, `types.go`, `abi.go` | `go-ethereum` | 3.8 | -| `internal/network` | Blockchain RPC gateway management (eRPC, ChainList, local nodes) | `network.go`, `erpc.go`, `rpc.go`, `chainlist.go`, `resolve.go`, `parser.go` | `config`, `kubectl`, `embed` | 3.3 | - -### 3.7 Agent and Tunnel - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `internal/openclaw` | OpenClaw agent deployment, wallet generation, version management | `openclaw.go`, `wallet.go`, `resolve.go` | `config`, `embed`, `kubectl` | 3.6 | -| `internal/agent` | Agent RBAC patching and singleton management | `agent.go` | `kubectl` | 7.5 | -| `internal/tunnel` | Cloudflare tunnel lifecycle (quick/dns modes, storefront, URL propagation) | `tunnel.go`, `state.go`, `provision.go`, `cloudflare.go`, `login.go`, `agent.go`, `stackid.go` | `config`, `kubectl` | 3.7 | - -### 3.8 Applications - -| Module | Purpose | Key Files | Dependencies | SPEC Section | -|--------|---------|-----------|-------------|-------------| -| `internal/app` | Helm chart application management (install, sync, list, delete) | `app.go`, `chart.go`, `artifacthub.go`, `metadata.go`, `resolve.go` | `config`, `kubectl`, `embed` | 4.2 | - ---- - -## 4. Data Flow Diagrams - -### 4.1 Stack Initialization and Startup - -This diagram traces the full lifecycle from `obol stack init` through `obol stack up` to a running cluster with all services operational. - -```mermaid -sequenceDiagram - participant Op as Operator - participant CLI as obol CLI - participant Cfg as internal/config - participant Emb as internal/embed - participant Back as Backend (k3d/k3s) - participant HF as Helmfile - participant LLM as autoConfigureLLM - participant OC as OpenClaw Setup - participant Ag as agent.Init - participant Tun as Tunnel - - Note over Op,Tun: Phase 1: Initialization (obol stack init) - - Op->>CLI: obol stack init [--backend k3d] - CLI->>Cfg: Resolve paths (XDG / env / dev mode) - CLI->>CLI: Generate petname cluster ID - CLI->>CLI: Persist .stack-id, .stack-backend - CLI->>Back: Init(cfg, stackID) - Note over Back: Generate k3d.yaml / k3s config
Resolve Ollama host for backend - CLI->>Emb: Copy infrastructure defaults to $CONFIG_DIR/defaults/ - Note over Emb: Template substitution:
{{OLLAMA_HOST}}, {{OLLAMA_HOST_IP}}, {{CLUSTER_ID}} - - Note over Op,Tun: Phase 2: Cluster Startup (obol stack up) - - Op->>CLI: obol stack up - CLI->>Back: Up(cfg, stackID) - Back-->>CLI: kubeconfig bytes - CLI->>CLI: Write $CONFIG_DIR/kubeconfig.yaml - - Note over Op,Tun: Phase 3: Infrastructure Deployment - - CLI->>HF: syncDefaults() -- helmfile sync - Note over HF: Deploys in order:
1. Traefik (GatewayClass + Gateway)
2. eRPC
3. LiteLLM + x402-buyer sidecar
4. x402-verifier
5. Monitoring (Prometheus)
6. Frontend
7. cloudflared
8. ServiceOffer CRD + RBAC - - Note over Op,Tun: Phase 4: Auto-Configuration - - CLI->>LLM: autoConfigureLLM() - LLM->>LLM: Query Ollama /api/tags (host) - LLM->>LLM: Detect cloud API keys (env vars) - LLM->>LLM: Read ~/.openclaw/openclaw.json (agent model) - LLM->>LLM: Patch litellm-config ConfigMap - LLM->>LLM: Patch litellm-secrets Secret - LLM->>LLM: Single LiteLLM restart - - CLI->>OC: SetupDefault() - Note over OC: Deploy singleton agent
Inject skills PVC
($DATA_DIR/openclaw-/openclaw-data/) - - CLI->>Ag: agent.Init() -- patchMonetizeBinding() - Note over Ag: Ensure ClusterRoleBinding subjects
include openclaw SA - - Note over Op,Tun: Phase 5: Tunnel Activation - - CLI->>Tun: Check tunnel state ($CONFIG_DIR/tunnel/cloudflared.json) - alt DNS tunnel provisioned (persistent hostname) - CLI->>Tun: EnsureRunning() - Tun->>Tun: Propagate URL to agent, frontend, storefront - else Quick tunnel (default) - Note over Tun: Dormant -- activates on first obol sell - end - - CLI-->>Op: Stack ready -``` - -*SPEC cross-ref: Section 3.1.2 (Operations), Section 3.1.3 (Startup Sequence), Section 3.2.4 (Logic).* - -### 4.2 Sell-Side: ServiceOffer Creation to Public Route - -This traces the complete path from an operator running `obol sell http` through the 6-stage reconciliation loop to a publicly accessible, payment-gated route. - -```mermaid -sequenceDiagram - participant Op as Operator - participant CLI as obol sell http - participant Val as schemas/ (validation) - participant K8s as Kubernetes API - participant Tun as Tunnel - participant Rec as monetize.py (Reconciler) - participant Ver as x402-verifier - participant TF as Traefik - participant Reg as ERC-8004 Registry - - Op->>CLI: obol sell http myapi --wallet 0x... --chain base-sepolia --price 0.001 --upstream svc --port 8080 --namespace ns - - CLI->>Val: Validate chain, price, wallet, upstream - Val-->>CLI: ServiceOffer struct - - CLI->>K8s: Create ServiceOffer CR (openclaw-obol-agent ns) - CLI->>Tun: EnsureTunnelForSell() - Note over Tun: Start quick tunnel if dormant
or verify DNS tunnel running - - Note over Rec: Reconciliation loop runs every 10 seconds - - rect rgb(240, 248, 255) - Note over Rec: Stage 1: ModelReady - Rec->>K8s: Read ServiceOffer spec.model - Rec->>Rec: Validate model exists (inference type)
or skip (HTTP type) - Rec->>K8s: Update condition ModelReady=True - end - - rect rgb(240, 255, 240) - Note over Rec: Stage 2: UpstreamHealthy - Rec->>K8s: GET upstream.service:port/healthPath - Rec->>K8s: Update condition UpstreamHealthy=True - end - - rect rgb(255, 248, 240) - Note over Rec: Stage 3: PaymentGateReady - Rec->>K8s: Create Traefik Middleware (ForwardAuth -> x402-verifier) - Rec->>K8s: Patch x402-pricing ConfigMap (add route rule) - Note over Ver: WatchConfig detects mtime change (5s poll) - Ver->>Ver: Atomic Reload() with new routes - Rec->>K8s: Update condition PaymentGateReady=True - end - - rect rgb(248, 240, 255) - Note over Rec: Stage 4: RoutePublished - Rec->>K8s: Create HTTPRoute /services/myapi/* (no hostname restriction) - Note over TF: Route live: /services/myapi/* -> ForwardAuth -> upstream - Rec->>K8s: Update condition RoutePublished=True - end - - rect rgb(255, 255, 240) - Note over Rec: Stage 5: Registered - Rec->>K8s: Create registration ConfigMap (agent-registration.json) - Rec->>K8s: Create httpd Deployment + Service - Rec->>K8s: Create HTTPRoute for /.well-known/ and /skill.md - Rec->>Reg: Mint ERC-8004 NFT (via remote-signer) - Rec->>K8s: Update condition Registered=True, status.agentId - end - - rect rgb(240, 255, 255) - Note over Rec: Stage 6: Ready - Rec->>K8s: Set status.endpoint = tunnel_url/services/myapi - Rec->>K8s: Update condition Ready=True - end - - Op->>Op: obol sell status myapi -> Ready -``` - -*SPEC cross-ref: Section 3.4.2 (Sell-Side Flow), Section 3.4.3 (ServiceOffer CRD), Section 3.4.4 (x402-verifier).* - -### 4.3 Buy-Side: Discovery to Paid Inference - -This traces the agent's journey from discovering a remote seller to making paid inference requests through the local LiteLLM gateway. - -```mermaid -sequenceDiagram - participant Agent as OpenClaw Agent - participant Buy as buy.py - participant Seller as Remote Seller - participant K8s as Kubernetes API - participant LiteLLM as LiteLLM :4000 - participant Sidecar as x402-buyer :8402 - - Note over Agent,Sidecar: Phase 1: Discovery - - Agent->>Buy: buy.py probe - Buy->>Seller: GET /services//v1/models - Seller-->>Buy: 402 PaymentRequired + PaymentRequirements JSON - Note over Buy: Extract: price, wallet, chain, asset,
available models - - Buy-->>Agent: Probe results (price, models) - - Note over Agent,Sidecar: Phase 2: Purchase (Pre-sign Authorizations) - - Agent->>Buy: buy.py buy --seller --model --count N - Buy->>Buy: Generate N random nonces (32 bytes each) - Buy->>Buy: Sign N ERC-3009 TransferWithAuthorization
(via remote-signer at :9000) - Note over Buy: Each auth: {from, to, value, validAfter,
validBefore, nonce, signature} - - Buy->>K8s: Create/Patch x402-buyer-config ConfigMap
(upstream URL, model, chain, price) - Buy->>K8s: Create/Patch x402-buyer-auths ConfigMap
(array of pre-signed auths) - - Note over Sidecar: Config watcher detects change
Mutex-guarded Reload() rebuilds handlers - - Note over Agent,Sidecar: Phase 3: Paid Inference - - Agent->>LiteLLM: POST /v1/chat/completions
model: "paid/" - LiteLLM->>LiteLLM: Route paid/* -> openai/* -> :8402/v1 - LiteLLM->>Sidecar: POST /v1/chat/completions
model: "" - - Sidecar->>Sidecar: Resolve model -> upstream handler - Sidecar->>Seller: POST /services//v1/chat/completions - Seller-->>Sidecar: 402 PaymentRequired - - Sidecar->>Sidecar: Pop pre-signed auth from pool (mutex) - Sidecar->>Sidecar: Build X-PAYMENT header (base64 PaymentPayload) - Sidecar->>Seller: Retry with X-PAYMENT header - Seller-->>Sidecar: 200 OK + inference response - - Sidecar->>Sidecar: Mark nonce consumed (onConsume callback) - Sidecar-->>LiteLLM: 200 OK - LiteLLM-->>Agent: Chat completion response -``` - -*SPEC cross-ref: Section 3.5.2 (Buy-Side Flow), Section 3.5.3 (Architecture), Section 3.5.5 (Model Resolution).* - -### 4.4 Payment Flow: x402 Request Lifecycle - -This is the canonical request-level flow for a client paying for access to a service. It shows the interplay between Traefik, the verifier, the facilitator, and the upstream. - -```mermaid -sequenceDiagram - participant Client as Client / Buyer - participant TF as Traefik Gateway - participant Ver as x402-verifier - participant Match as Route Matcher - participant Fac as x402 Facilitator - participant Chain as Base L2 (USDC) - participant Up as Upstream Service - - Client->>TF: GET /services/myapi/data - - TF->>Ver: POST /verify
X-Forwarded-Uri: /services/myapi/data
X-Forwarded-Method: GET - - Ver->>Match: Match("/services/myapi/data") - Match-->>Ver: RouteRule{price: "1000", wallet: "0x...", chain: "base-sepolia"} - - alt No X-PAYMENT header - Ver-->>TF: 402 Payment Required - Note over Ver: Response body:
{x402Version: 1, accepts: [{
scheme: "exact",
network: "eip155:84532",
maxAmountRequired: "1000",
payTo: "0x...",
asset: "0x036C..." (USDC)
}]} - TF-->>Client: 402 + PaymentRequirements - - Note over Client: Sign ERC-3009
TransferWithAuthorization
(EIP-712 typed data) - - Client->>TF: GET /services/myapi/data
X-PAYMENT: base64(PaymentPayload) - TF->>Ver: POST /verify (with X-PAYMENT) - end - - Ver->>Ver: Decode X-PAYMENT (base64 -> JSON) - Ver->>Fac: POST /verify
{payload, paymentRequirements} - - alt Facilitator: verify only mode - Fac->>Fac: Validate signature (EIP-712) - Fac->>Fac: Check authorization fields - Fac-->>Ver: {valid: true, settled: false} - else Facilitator: verify + settle - Fac->>Chain: Submit TransferWithAuthorization tx - Chain-->>Fac: Tx confirmed - Fac-->>Ver: {valid: true, settled: true, txHash: "0x..."} - end - - Ver-->>TF: 200 OK - Note over Ver: Sets Authorization header
if route has upstreamAuth - - TF->>Up: GET /data (+ Authorization header) - Up-->>TF: 200 OK + response body - TF-->>Client: 200 OK + response -``` - -*SPEC cross-ref: Section 4.1.1 (Request Flow), Section 4.1.2 (PaymentRequired Response), Section 4.1.3 (PaymentPayload).* - ---- - -## 5. Storage Architecture - -### 5.1 Filesystem State - -All persistent state lives under three XDG-compliant directory trees. In development mode (`OBOL_DEVELOPMENT=true`), these collapse into `.workspace/`. - -``` -$OBOL_CONFIG_DIR/ # ~/.config/obol or .workspace/config -├── .stack-id # Cluster petname (e.g., "fluffy-penguin") -├── .stack-backend # "k3d" or "k3s" -├── kubeconfig.yaml # Kubernetes API access -├── tunnel/ -│ └── cloudflared.json # Tunnel state (mode, hostname, IDs) -├── defaults/ # Embedded infrastructure (templated) -│ ├── helmfile.yaml -│ ├── base/templates/*.yaml -│ ├── cloudflared/ -│ └── values/ -└── networks/// # Per-network deployment configs - ├── helmfile.yaml - └── values.yaml - -$OBOL_DATA_DIR/ # ~/.local/share/obol or .workspace/data -├── openclaw-/ -│ ├── openclaw-data/ -│ │ └── .openclaw/skills/ # 23 embedded skills (host-path PVC) -│ └── keystore/ # Web3 V3 encrypted keystores -└── local-path-provisioner/ # k3s PVC backing store (root-owned) - -$OBOL_BIN_DIR/ # ~/.local/bin or .workspace/bin -└── obol # CLI binary -``` - -*SPEC cross-ref: Section 2.3 (Configuration Hierarchy), Section 5.1 (Configuration Files).* - -### 5.2 Kubernetes ConfigMaps - -ConfigMaps are the primary in-cluster configuration mechanism. They serve as the control plane for runtime behavior changes without Pod restarts (where hot-reload is supported). - -| ConfigMap | Namespace | Key(s) | Purpose | Hot-Reload | -|-----------|-----------|--------|---------|-----------| -| `litellm-config` | `llm` | `config.yaml` | LiteLLM model_list, routing rules | No (restart required) | -| `x402-pricing` | `x402` | `pricing.yaml` | Verifier route rules, wallet, chain, facilitator URL | Yes (5s poll) | -| `x402-buyer-config` | `llm` | `config.json` | Buyer upstream definitions (URL, model, chain, price) | Yes (mutex reload) | -| `x402-buyer-auths` | `llm` | `auths.json` | Pre-signed ERC-3009 authorization pools | Yes (mutex reload) | -| `erpc-config` | `erpc` | `erpc.yaml` | RPC projects, networks, upstreams | No (restart required) | -| `obol-stack-config` | `obol-frontend` | `config.json` | Frontend dashboard configuration (tunnel URL) | Yes (volume mount) | -| `tunnel-storefront` | `traefik` | `index.html`, `mime.types` | Static HTML landing page content | Yes (volume mount) | - -### 5.3 Kubernetes Secrets - -| Secret | Namespace | Key(s) | Purpose | -|--------|-----------|--------|---------| -| `litellm-secrets` | `llm` | `LITELLM_MASTER_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY` | LiteLLM authentication credentials | -| `x402-secrets` | `x402` | (verifier credentials) | Verifier operational secrets | -| `openclaw-wallet` | `openclaw-obol-agent` | Keystore JSON | Agent wallet encrypted private key | - -### 5.4 Persistent Volume Claims - -| PVC | Namespace | Backing | Purpose | Ownership | -|-----|-----------|---------|---------|----------| -| Skills PVC | `openclaw-obol-agent` | Host-path (`$DATA_DIR/openclaw-/openclaw-data/`) | Skill injection into agent container | Root-owned (k3s provisioner) | -| Local-path PVCs | Various | Host-path (`$DATA_DIR/local-path-provisioner/`) | Blockchain node data, application state | Root-owned (`purge -f` to remove) | - -### 5.5 Wallet Keystores - -Wallet state spans both the filesystem and Kubernetes: - -``` -Filesystem: - $DATA_DIR/openclaw-/keystore/UTC----

.json - -Kubernetes: - Secret/openclaw-wallet (openclaw-obol-agent ns) - └── keystore.json (same content, accessible to remote-signer Pod) - -Remote Signer: - Deployment/remote-signer (openclaw-obol-agent ns, port 9000) - └── Loads keystore from mounted Secret - └── REST API: POST /sign, GET /address -``` - -*SPEC cross-ref: Section 5.4 (Wallet), Section 7.3 (Wallet Security), Section 3.6.3 (Wallet Generation).* - ---- - -## 6. Deployment Model - -### 6.1 k3d Cluster Topology - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Host Machine │ -│ │ -│ ┌──────────┐ ┌──────────┐ ┌──────────────────────────┐ │ -│ │ Ollama │ │ obol CLI │ │ Docker Desktop / Engine │ │ -│ │ :11434 │ │ │ │ │ │ -│ └──────────┘ └──────────┘ │ ┌──────────────────────┐ │ │ -│ │ │ k3d Cluster │ │ │ -│ │ │ (k3s v1.35.1-k3s1) │ │ │ -│ │ │ │ │ │ -│ │ │ 1 server node │ │ │ -│ │ │ Port mappings: │ │ │ -│ │ │ 80:80 (HTTP) │ │ │ -│ │ │ 8080:80 (HTTP alt) │ │ │ -│ │ │ 443:443 (HTTPS) │ │ │ -│ │ │ 8443:443 (HTTPS alt)│ │ │ -│ │ │ │ │ │ -│ │ └──────────────────────┘ │ │ -│ └──────────────────────────┘ │ -└─────────────────────────────────────────────────────────────┘ -``` - -**Backend variants:** - -| Property | k3d (default) | k3s (bare metal) | -|----------|--------------|------------------| -| Runtime | Docker container | Direct k3s binary | -| Ollama access | `host.docker.internal` (macOS) / `host.k3d.internal` (Linux) | `127.0.0.1` (loopback) | -| Port binding | Docker port mapping | Direct host binding | -| Data isolation | Docker volumes + host-path mounts | Direct filesystem | -| Backend switch | Destroys old cluster automatically | Destroys old cluster automatically | - -*SPEC cross-ref: Section 2.4 (Backend Abstraction), Section 3.1.4 (Ollama Host Resolution).* - -### 6.2 Namespace Layout - -Each namespace represents a failure domain and RBAC boundary. Resources within a namespace share a security context. - -```mermaid -graph TB - subgraph "traefik" - GC[GatewayClass: traefik] - GW[Gateway: traefik-gateway
:80, :443] - CFD[Deployment: cloudflared] - SF[Deployment: tunnel-storefront] - SF_SVC[Service: tunnel-storefront :8080] - SF_CM[ConfigMap: tunnel-storefront] - SF_HR[HTTPRoute: tunnel-storefront] - end - - subgraph "llm" - LLMD[Deployment: litellm
Containers: litellm :4000, x402-buyer :8402] - LLC[ConfigMap: litellm-config] - LLS[Secret: litellm-secrets] - BC[ConfigMap: x402-buyer-config] - BA[ConfigMap: x402-buyer-auths] - end - - subgraph "x402" - VD[Deployment: x402-verifier :8080] - VP[ConfigMap: x402-pricing] - VS[Secret: x402-secrets] - VSM[ServiceMonitor: x402-verifier] - end - - subgraph "openclaw-obol-agent" - OC[Deployment: openclaw] - RS[Deployment: remote-signer :9000] - WS[Secret: openclaw-wallet] - end - - subgraph "erpc" - ED[Deployment: erpc :4000] - EC[ConfigMap: erpc-config] - end - - subgraph "obol-frontend" - FD[Deployment: frontend] - FC[ConfigMap: obol-stack-config] - end - - subgraph "monitoring" - PD[Prometheus Stack] - end - - subgraph "network-petname (dynamic)" - EX[Execution Layer :8545] - CL[Consensus Layer] - end - - subgraph "cluster-scoped" - CRD[CRD: ServiceOffer obol.org] - CR[ClusterRole: openclaw-monetize] - CRB[ClusterRoleBinding: openclaw-monetize-binding] - end -``` - -*SPEC cross-ref: Section 5.2 (Kubernetes Resources by Namespace).* - ---- - -## 7. Network Topology - -### 7.1 Traefik Gateway API Routing - -Traefik operates as the single ingress point using the Kubernetes Gateway API (not legacy Ingress). All traffic classification happens at the Gateway level based on hostname and path. - -```mermaid -flowchart TB - subgraph "External Traffic" - Internet((Public Internet)) - Local((Local Machine
obol.stack)) - end - - Internet --> CF[cloudflared
Tunnel Pod] - CF --> GW - - Local --> GW[Traefik Gateway
:80 / :443] - - GW --> ClassifyHostname{Hostname?} - - ClassifyHostname -->|"obol.stack"| LocalRoutes - ClassifyHostname -->|"* (any / tunnel)"| PublicRoutes - - subgraph LocalRoutes["Local-Only Routes (hostnames: obol.stack)"] - direction TB - LR1["/ -> Frontend"] - LR2["/rpc -> eRPC"] - end - - subgraph PublicRoutes["Public Routes (no hostname restriction)"] - direction TB - PR1["/services/* -> ForwardAuth -> Upstream"] - PR2["/.well-known/* -> ERC-8004 httpd"] - PR3["/skill.md -> Service Catalog"] - PR4["/ (tunnel host) -> Storefront"] - end - - PR1 --> FA{x402 ForwardAuth} - FA -->|"No X-PAYMENT"| R402[402 Payment Required] - FA -->|"Valid X-PAYMENT"| R200[200 -> Upstream Service] - FA -->|"No route match"| PASS[200 -> Pass Through] -``` - -### 7.2 Route Classification Rules - -| Route | Hostname Restriction | Protection | Target | HTTPRoute Location | -|-------|---------------------|-----------|--------|-------------------| -| `/` | `obol.stack` | None (local network) | Frontend | `obol-frontend` ns | -| `/rpc` | `obol.stack` | None (local network) | eRPC | `erpc` ns | -| `/services//*` | None (public) | x402 ForwardAuth | Upstream service | `openclaw-obol-agent` ns | -| `/.well-known/agent-registration.json` | None (public) | None (read-only) | ERC-8004 httpd | `traefik` ns | -| `/skill.md` | None (public) | None (read-only) | Service catalog httpd | `traefik` ns | -| `/` (tunnel hostname) | None (public) | None (static HTML) | Storefront httpd | `traefik` ns | - -**Security invariant:** Internal services (frontend, eRPC, LiteLLM admin, Prometheus) MUST have `hostnames: ["obol.stack"]` to prevent tunnel exposure. See Section 8.2 for trust boundary details. - -### 7.3 Internal Service Communication - -All internal traffic uses Kubernetes ClusterIP services with DNS resolution (`..svc.cluster.local`). - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ Cluster-Internal Traffic (ClusterIP, no external exposure) │ -│ │ -│ LiteLLM :4000 │ -│ ├── ollama/* ──> Ollama Service :11434 ──> host Ollama │ -│ ├── paid/* ──> x402-buyer :8402 (localhost, same Pod) │ -│ ├── anthropic/* ──> api.anthropic.com (egress) │ -│ └── openai/* ──> api.openai.com (egress) │ -│ │ -│ x402-buyer :8402 │ -│ └── upstream ──> Remote seller (egress, x402 payment attached) │ -│ │ -│ x402-verifier :8080 │ -│ └── facilitator ──> facilitator.x402.rs (egress, HTTPS) │ -│ │ -│ OpenClaw Agent │ -│ ├── LiteLLM ──> litellm.llm.svc:4000 │ -│ └── Remote Signer ──> remote-signer:9000 (same namespace) │ -│ │ -│ eRPC :4000 │ -│ ├── Local nodes ──> -execution..svc:8545 │ -│ └── Remote RPCs ──> ChainList endpoints (egress) │ -│ │ -│ Prometheus │ -│ ├── x402-verifier ──> ServiceMonitor scrape /metrics │ -│ └── x402-buyer ──> PodMonitor scrape /metrics │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -*SPEC cross-ref: Section 2.2 (Routing Architecture), Section 6.2 (Internal Service Communication), Section 7.1 (Tunnel Exposure).* - ---- - -## 8. Security Architecture - -### 8.1 Trust Boundaries - -The system has four trust boundaries, each with distinct threat models and protection mechanisms. - -```mermaid -graph TB - subgraph TB1["Trust Boundary 1: Host Machine"] - CLI[obol CLI] - Ollama[Ollama] - Docker[Docker] - Keystore["Wallet Keystores
(encrypted, filesystem)"] - SE["Secure Enclave
(hardware, macOS)"] - end - - subgraph TB2["Trust Boundary 2: k3d Cluster"] - subgraph TB2a["TB 2a: Local-Only Zone"] - FE[Frontend] - ERPC[eRPC] - Prom[Prometheus] - LLMAdmin[LiteLLM Admin] - end - - subgraph TB2b["TB 2b: Payment-Gated Zone"] - Verifier[x402-verifier] - Services["/services/* upstreams"] - end - - subgraph TB2c["TB 2c: Agent Zone (RBAC-scoped)"] - Agent[OpenClaw Agent] - Signer[Remote Signer] - Wallet[Wallet Secret] - end - end - - subgraph TB3["Trust Boundary 3: Tunnel (Public Internet)"] - CF[cloudflared] - Buyers[Remote Buyers] - end - - subgraph TB4["Trust Boundary 4: External Services"] - Fac[x402 Facilitator] - Chain[Base L2] - CloudLLM[Cloud LLM APIs] - end - - TB3 -->|"Only: /services/*, /.well-known/,
/skill.md, / (storefront)"| TB2b - TB2b -->|"ForwardAuth"| Verifier - TB2c -->|"RBAC: openclaw-monetize ClusterRole"| TB2b - TB2c -->|"Port 9000, in-namespace only"| Signer - Verifier -->|"HTTPS POST"| Fac - Agent -->|"JSON-RPC"| Chain -``` - -### 8.2 Authentication and Authorization Flows - -| Flow | Mechanism | Protection | -|------|-----------|-----------| -| **Public -> Service** | x402 payment (EIP-712 signed ERC-3009) | Facilitator verifies signature + settles on-chain. No payment = 402 rejection. | -| **Local -> Frontend/eRPC** | Hostname restriction (`obol.stack`) | Only reachable from local machine via hosts file or DNS resolver. Tunnel traffic cannot match. | -| **Agent -> Kubernetes API** | ServiceAccount `openclaw` + RBAC | `openclaw-monetize` ClusterRole: CRUD on ServiceOffers, Middlewares, HTTPRoutes, ConfigMaps, Services, Deployments. Read-only on Pods, Endpoints, logs. | -| **Agent -> Signing** | Remote-signer REST API (port 9000) | In-namespace only (no Service exposed outside namespace). Keystore decryption at signer startup. | -| **Buyer -> Remote Seller** | Pre-signed ERC-3009 auths via X-PAYMENT header | Zero signer access. Finite auth pool. Max loss = N * price per auth. | -| **CLI -> Cluster** | kubeconfig (auto-generated, file-permission protected) | `0600` permissions on kubeconfig. Port drift handled by regeneration. | - -### 8.3 Wallet Isolation - -Three distinct wallet isolation models serve different security requirements: - -``` -1. Software Wallet (Default) - ┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ - │ Keystore File │────>│ Remote Signer │────>│ Agent Pod │ - │ (scrypt + AES) │ │ :9000 │ │ (REST only) │ - │ $DATA_DIR/... │ │ In-namespace │ │ │ - └─────────────────┘ └──────────────────┘ └─────────────┘ - Key at rest: encrypted. Key in use: signer memory only. - -2. Secure Enclave (macOS, standalone gateway) - ┌─────────────────┐ ┌──────────────────┐ - │ Apple SEP │────>│ Inference Gateway│ - │ (P-256, never │ │ (ECIES decrypt, │ - │ exported) │ │ ECDSA sign) │ - └─────────────────┘ └──────────────────┘ - Key never leaves hardware. SIP enforced. - -3. TEE (Linux, confidential computing) - ┌─────────────────┐ ┌──────────────────┐ - │ TEE Enclave │────>│ Inference Gateway│ - │ (TDX/SNP/Nitro) │ │ (attestation + │ - │ Key in-enclave │ │ ECIES decrypt) │ - └─────────────────┘ └──────────────────┘ - Key bound to attestation. Hardware-signed quote. -``` - -*SPEC cross-ref: Section 7.3 (Wallet Security), Section 7.4 (Enclave / TEE Security).* - -### 8.4 Pre-Signed Authorization Pool (Buy Side) - -The buy-side security model eliminates private key exposure from the sidecar entirely: - -``` -┌──────────────────────────────────────────────────────┐ -│ buy.py (Agent context, has signer access) │ -│ │ -│ 1. Generate N random 32-byte nonces │ -│ 2. For each nonce, sign ERC-3009 via remote-signer │ -│ 3. Write signed auths to ConfigMap │ -│ │ -│ Output: N pre-signed TransferWithAuthorization │ -│ Each authorizes exactly $price USDC transfer │ -└───────────────────────┬──────────────────────────────┘ - │ ConfigMap (x402-buyer-auths) - v -┌──────────────────────────────────────────────────────┐ -│ x402-buyer sidecar (NO signer access) │ -│ │ -│ - Pops one auth per 402 response (mutex-guarded) │ -│ - Marks nonce consumed (StateStore, crash-safe) │ -│ - Cannot generate new auths │ -│ - Cannot modify auth values │ -│ - Max spend = N * price (bounded by pool size) │ -│ - Pool exhausted -> 404 (agent must pre-sign more) │ -└──────────────────────────────────────────────────────┘ -``` - -*SPEC cross-ref: Section 3.5.3 (Architecture), Section 7.2 (Payment Security).* - -### 8.5 RBAC Model - -The `openclaw-monetize` ClusterRole is the sole RBAC grant for the agent. It follows the principle of least privilege across API groups: - -| API Group | Resources | Verbs | Rationale | -|-----------|-----------|-------|-----------| -| `obol.org` | `serviceoffers`, `serviceoffers/status` | get, list, watch, create, update, patch, delete | Full lifecycle of sell-side CRDs | -| `traefik.io` | `middlewares` | get, list, create, update, patch, delete | ForwardAuth middleware for x402 gating | -| `gateway.networking.k8s.io` | `httproutes` | get, list, create, update, patch, delete | Public route publication | -| (core) | `configmaps`, `services`, `deployments` | get, list, create, update, patch, delete | Pricing ConfigMap, registration httpd, storefront | -| (core) | `pods`, `endpoints`, `pods/log` | get, list | Health checks, debugging (read-only) | - -**Binding:** ClusterRoleBinding `openclaw-monetize-binding` binds to ServiceAccount `openclaw` in namespace `openclaw-obol-agent`. The `patchMonetizeBinding()` function in `internal/agent/agent.go` ensures the subjects array is populated, guarding against race conditions during initial cluster setup. - -*SPEC cross-ref: Section 7.5 (RBAC), Section 5.2 (Kubernetes Resources).* - -### 8.6 Threat Model Summary - -| Threat | Mitigation | Residual Risk | -|--------|-----------|---------------| -| Tunnel exposes internal services | `hostnames: ["obol.stack"]` restriction on all local-only HTTPRoutes | Misconfiguration (test: never create public routes for internal services) | -| Replay attack on x402 payments | Random 32-byte nonces, `validBefore`/`validAfter` windows, facilitator deduplication | Facilitator availability | -| Buyer overspending | Pre-signed auth pool with finite size, nonce consumption tracking | Pool size set at purchase time | -| Wallet key extraction | Encrypted keystore (scrypt), remote-signer pattern, Secure Enclave (non-exportable) | Software wallet in memory during signing | -| Reconciler privilege escalation | ClusterRole scoped to specific API groups and verbs | Agent code compromise could create arbitrary routes | -| Supply chain (container images) | Pinned image tags (LiteLLM, k3s, OpenClaw), version consistency tests | Upstream image compromise before pin update | -| ConfigMap propagation delay | 60-120s k3d file watcher; 5s verifier poll | Brief window where stale config serves requests | - -*SPEC cross-ref: Section 7.1 (Tunnel Exposure), Section 7.2 (Payment Security), Section 9.4 (Known Latencies).* diff --git a/docs/specs/BEHAVIORS_AND_EXPECTATIONS.md b/docs/specs/BEHAVIORS_AND_EXPECTATIONS.md deleted file mode 100644 index 37796692..00000000 --- a/docs/specs/BEHAVIORS_AND_EXPECTATIONS.md +++ /dev/null @@ -1,667 +0,0 @@ -# Obol Stack -- Behaviors and Expectations - -**Version**: 1.0.0 -**Status**: Living document -**Last Updated**: 2026-03-27 - -This document defines the behavioral contract for Obol Stack. Every behavior described here maps to one or more testable scenarios expressible as Gherkin. Cross-references point to [SPEC.md](SPEC.md) where the underlying system is defined. Existing BDD feature files live in [features/](features/) and `internal/x402/features/`. - ---- - -## Table of Contents - -1. [Introduction](#1-introduction) -2. [Desired Behaviors](#2-desired-behaviors) -3. [Undesired Behaviors](#3-undesired-behaviors) -4. [Edge Cases](#4-edge-cases) -5. [Performance Expectations](#5-performance-expectations) -6. [Guardrail Definitions](#6-guardrail-definitions) - ---- - -## 1. Introduction - -### 1.1 Purpose - -This document is the behavioral specification for Obol Stack. It defines what the system should do, what it must not do, how it handles edge cases, and what performance it must achieve. - -It serves as: -- A contract between the product and engineering teams -- The source of truth for BDD feature file scenarios -- A test oracle for integration and adversarial testing -- A guardrail reference that CI and code review can enforce - -### 1.2 How to Read This Document - -**Desired behaviors** (Section 2) follow this format: -- **Trigger**: What user action or system state initiates the behavior -- **Expected**: What the system should do -- **Rationale**: Why this behavior matters - -**Undesired behaviors** (Section 3) add: -- **Risk**: What goes wrong if this behavior occurs - -**Edge cases** (Section 4) describe unusual or boundary scenarios with expected handling. - -**Cross-references** use the notation `SPEC SS X.Y` to reference sections in [SPEC.md](SPEC.md). For example, `SPEC SS 3.1` refers to Section 3.1 (Stack Lifecycle). - -Every behavior in this document MUST be expressible as a Gherkin `Given / When / Then` scenario. Inline Gherkin examples are included for critical behaviors. - -### 1.3 Relationship to SPEC.md - -| This Document | SPEC.md | -|---------------|---------| -| Describes *what should happen* | Describes *how things are built* | -| Trigger / Expected / Rationale | Architecture, data model, APIs | -| Test-oriented (Gherkin-compatible) | Implementation-oriented | -| Guardrails are non-negotiable | Constraints are structural | - ---- - -## 2. Desired Behaviors - -### 2.1 Stack Lifecycle - -> SPEC SS 3.1 -- Stack Lifecycle - -#### B-2.1.1: Stack initialization generates a unique cluster identity - -**Trigger**: Operator runs `obol stack init`. -**Expected**: The CLI generates a petname-based stack ID, resolves absolute paths for ConfigDir/DataDir/BinDir, writes the backend config file (`.stack-backend`), and copies embedded infrastructure defaults with template substitution (`{{OLLAMA_HOST}}`, `{{OLLAMA_HOST_IP}}`, `{{CLUSTER_ID}}`). The stack ID is persisted at `$OBOL_CONFIG_DIR/.stack-id`. -**Rationale**: Unique naming prevents namespace collisions between clusters. Absolute paths are required because Docker volume mounts reject relative paths. - -```gherkin -Scenario: Stack init creates unique identity - Given no stack has been initialized - When the operator runs "obol stack init" - Then a ".stack-id" file exists in the config directory - And the stack ID is a two-word petname - And a ".stack-backend" file exists with value "k3d" - And the defaults directory contains rendered infrastructure templates -``` - -#### B-2.1.2: Stack init with --force preserves existing stack ID - -**Trigger**: Operator runs `obol stack init --force` on an already-initialized stack. -**Expected**: The cluster config is regenerated, but the existing stack ID is preserved. The previous backend's cluster is destroyed before initializing the new one. -**Rationale**: Preserving the stack ID maintains continuity for data directories and PVC paths. Destroying the old backend prevents orphaned Docker containers. - -#### B-2.1.3: Stack up deploys full infrastructure and auto-configures LLM - -**Trigger**: Operator runs `obol stack up` after `obol stack init`. -**Expected**: The CLI creates the k3d/k3s cluster, exports the kubeconfig, runs `syncDefaults()` (helmfile sync for all infrastructure), auto-configures LiteLLM with detected Ollama models and cloud provider API keys (single restart), deploys the OpenClaw agent singleton, patches agent RBAC, and starts the DNS tunnel if provisioned. The stack is fully operational after completion. -**Rationale**: One command must bring the entire stack from zero to operational. Auto-configuration eliminates manual `obol model setup` for the common case. - -```gherkin -Scenario: Stack up brings cluster to operational state - Given the stack has been initialized - When the operator runs "obol stack up" - Then the k3d cluster is running - And a kubeconfig file exists in the config directory - And the Traefik gateway is accepting connections on port 80 - And LiteLLM is running in the "llm" namespace - And the x402-verifier is running in the "x402" namespace - And the OpenClaw agent is running in the "openclaw-obol-agent" namespace -``` - -#### B-2.1.4: Stack down preserves config and data - -**Trigger**: Operator runs `obol stack down`. -**Expected**: The cluster is stopped. Config directory, data directory, and all PVCs are preserved. The DNS resolver is stopped. -**Rationale**: Operators expect to stop and restart without losing state. PVC data (wallets, skills, blockchain data) is valuable. - -#### B-2.1.5: Stack purge removes cluster and config - -**Trigger**: Operator runs `obol stack purge`. -**Expected**: The cluster is destroyed and the config directory is removed. Root-owned PVC data in the data directory is NOT removed unless `--force` is passed (which invokes sudo). -**Rationale**: Root-owned local-path-provisioner directories cannot be removed by regular users. The `--force` flag makes the destructive scope explicit. - -#### B-2.1.6: Helmfile sync failure triggers automatic cleanup - -**Trigger**: `helmfile sync` fails during `obol stack up`. -**Expected**: The cluster is automatically stopped via `Down()`. The operator receives a clear error message and can fix the issue before retrying. -**Rationale**: A partially deployed cluster is worse than no cluster. Auto-cleanup prevents orphaned resources. - ---- - -### 2.2 LLM Routing - -> SPEC SS 3.2 -- LLM Routing - -#### B-2.2.1: Auto-configuration detects Ollama models - -**Trigger**: `obol stack up` runs `autoConfigureLLM()` and Ollama is running on the host. -**Expected**: The CLI queries `http://localhost:11434/api/tags`, discovers available models, adds them to the `litellm-config` ConfigMap as `ollama/` entries pointing at `http://ollama.llm.svc:11434`, and restarts LiteLLM exactly once. -**Rationale**: Agent chat must work immediately after stack up without manual model configuration. - -```gherkin -Scenario: LLM auto-configuration detects Ollama models - Given Ollama is running with models "qwen3.5:9b" and "llama3.2:3b" - When the operator runs "obol stack up" - Then the litellm-config ConfigMap contains an entry for "qwen3.5:9b" - And the litellm-config ConfigMap contains an entry for "llama3.2:3b" - And LiteLLM was restarted exactly once -``` - -#### B-2.2.2: Auto-configuration detects cloud provider API keys - -**Trigger**: `obol stack up` runs `autoConfigureLLM()` and `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` is set in the environment. -**Expected**: The detected provider is added to `litellm-config` as a wildcard entry (e.g., `anthropic/*`) and the API key is stored in `litellm-secrets`. LiteLLM restart is batched with Ollama model configuration (single restart). -**Rationale**: Cloud providers should be available to the agent without manual setup when keys are present. - -#### B-2.2.3: Paid inference routes through x402-buyer sidecar - -**Trigger**: A request for model `paid/` arrives at LiteLLM. -**Expected**: LiteLLM matches the `paid/*` catch-all entry and proxies to `http://127.0.0.1:8402/v1`. The x402-buyer sidecar handles payment attachment and upstream routing. -**Rationale**: The static `paid/*` route means no LiteLLM fork or dynamic config is needed for buy-side payments. The sidecar pattern keeps payment logic isolated. - -#### B-2.2.4: Manual model setup validates before adding - -**Trigger**: Operator runs `obol model setup custom --name foo --endpoint http://example.com --model bar`. -**Expected**: The CLI validates the endpoint is reachable and the model exists before adding it to LiteLLM config. -**Rationale**: Prevents broken routes in LiteLLM that would cause silent inference failures. - ---- - -### 2.3 Network Management - -> SPEC SS 3.3 -- Network / RPC Gateway - -#### B-2.3.1: Adding a chain by ID fetches public RPCs - -**Trigger**: Operator runs `obol network add `. -**Expected**: The CLI queries the ChainList API for public RPC endpoints for the given chain ID, adds them as upstreams in the `erpc-config` ConfigMap, registers the network, and restarts eRPC. Write methods (`eth_sendRawTransaction`) are blocked by default. -**Rationale**: ChainList provides curated public endpoints. Blocking writes by default prevents accidental mainnet transactions. - -```gherkin -Scenario: Adding a chain blocks write methods by default - Given the stack is running - When the operator runs "obol network add 1" - Then the erpc-config contains upstreams for chain ID 1 - And write methods are blocked on all upstreams for chain ID 1 -``` - -#### B-2.3.2: Adding a chain with --allow-writes enables write methods - -**Trigger**: Operator runs `obol network add --allow-writes`. -**Expected**: Write methods are allowed on the configured upstreams for that chain. -**Rationale**: Some use cases (transaction submission, contract deployment) require write access. The flag makes this an explicit opt-in. - -#### B-2.3.3: Local Ethereum nodes register as priority upstreams with writes blocked - -**Trigger**: `obol network install ethereum` deploys a local execution layer node. -**Expected**: The local node is registered in eRPC as a priority upstream via `RegisterERPCUpstream()`, but write methods are blocked on the local upstream. Write requests are routed to remote upstreams instead. -**Rationale**: Local nodes provide low-latency reads. Writes to a local-only node would not propagate to the real network. - -#### B-2.3.4: Removing a chain cleans up all upstreams - -**Trigger**: Operator runs `obol network remove `. -**Expected**: All upstreams for that chain ID are removed from `erpc-config`. The network entry is also removed. eRPC is restarted. -**Rationale**: Clean removal prevents stale routing entries. - ---- - -### 2.4 Sell-Side Monetization - -> SPEC SS 3.4 -- Monetize: Sell Side - -#### B-2.4.1: Selling an HTTP service creates a ServiceOffer CR - -**Trigger**: Operator runs `obol sell http --wallet 0x... --chain base-sepolia --price 0.001 --upstream --port --namespace `. -**Expected**: A `ServiceOffer` CR is created in the `openclaw-obol-agent` namespace with the specified payment terms, upstream reference, and path (`/services/`). The CLI also calls `EnsureTunnelForSell()` to activate the tunnel if dormant. -**Rationale**: The ServiceOffer CRD is the declarative API. The tunnel must be active for public access. - -```gherkin -Scenario: Operator sells HTTP service via CLI - Given the stack is running - When the operator runs "obol sell http myapi --wallet 0xABC --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" - Then a ServiceOffer "myapi" exists in namespace "openclaw-obol-agent" - And the ServiceOffer has payment.payTo "0xABC" - And the ServiceOffer has payment.network "base-sepolia" - And the ServiceOffer has path "/services/myapi" -``` - -#### B-2.4.2: Agent reconciles ServiceOffer through 6 stages - -**Trigger**: A `ServiceOffer` CR exists and the `monetize.py` reconciler is running. -**Expected**: The reconciler watches for ServiceOffer CRs and progresses them through 6 stages (every 10 seconds): - -1. **ModelReady** -- Model availability verified (inference type) or skipped (HTTP type). -2. **UpstreamHealthy** -- Health check passes against `upstream.healthPath`. -3. **PaymentGateReady** -- Traefik `Middleware` (ForwardAuth) and pricing route in `x402-pricing` ConfigMap are created. -4. **RoutePublished** -- `HTTPRoute` created routing `/services//*` through the ForwardAuth middleware to the upstream. -5. **Registered** -- ERC-8004 on-chain registration and `/.well-known/agent-registration.json` published. -6. **Ready** -- All conditions met, endpoint URL set in status. - -All created resources have `ownerReferences` pointing to the ServiceOffer for automatic garbage collection. -**Rationale**: The 6-stage reconciliation provides observability into the sell-side pipeline. OwnerReferences ensure clean deletion. - -```gherkin -Scenario: Agent reconciles ServiceOffer to Ready - Given a ServiceOffer "myapi" exists - When the agent reconciles the ServiceOffer - Then the ServiceOffer status condition "ModelReady" is "True" - And the ServiceOffer status condition "UpstreamHealthy" is "True" - And the ServiceOffer status condition "PaymentGateReady" is "True" - And the ServiceOffer status condition "RoutePublished" is "True" - And the ServiceOffer status condition "Registered" is "True" - And the ServiceOffer status condition "Ready" is "True" - And a Middleware "x402-myapi" exists in the offer namespace - And an HTTPRoute "so-myapi" exists in the offer namespace -``` - -#### B-2.4.3: x402-verifier returns 402 for unpaid requests to priced routes - -**Trigger**: An HTTP request arrives at a path matching a pricing route in `x402-pricing` ConfigMap, without an `X-PAYMENT` header. -**Expected**: Traefik forwards the request to the x402-verifier via ForwardAuth. The verifier matches the `X-Forwarded-Uri` against configured routes (first match wins), finds a price, and returns HTTP 402 with a `PaymentRequirements` JSON body containing `x402Version`, `accepts` array (scheme, network, maxAmountRequired, resource, asset, payTo, maxTimeoutSeconds). -**Rationale**: The 402 response tells clients exactly how to pay. This is the core x402 protocol handshake. - -```gherkin -Scenario: Unpaid request returns 402 with pricing - Given a priced route exists at "/services/myapi/*" with price "1000" - When a client sends a POST to "/services/myapi/v1/chat/completions" without X-PAYMENT - Then the response status is 402 - And the response body contains "x402Version" with value 1 - And the response body contains an "accepts" array with the route price -``` - -#### B-2.4.4: x402-verifier passes paid requests after facilitator verification - -**Trigger**: An HTTP request with a valid `X-PAYMENT` header arrives at a priced route. -**Expected**: The verifier extracts the payment payload, delegates to the x402 facilitator for verification, and upon success returns 200 OK to Traefik (which then forwards to the upstream). The verifier optionally sets an `Authorization` header for upstream auth. -**Rationale**: Payment verification is delegated to the facilitator to avoid on-chain logic in the hot path. - -#### B-2.4.5: x402-verifier passes unpriced routes without payment - -**Trigger**: An HTTP request arrives at a path that does NOT match any pricing route. -**Expected**: The verifier returns 200 OK immediately (free route). -**Rationale**: Not all routes behind the ForwardAuth middleware require payment. Discovery endpoints and health checks should be freely accessible. - -#### B-2.4.6: Pricing config hot-reloads without restart - -**Trigger**: The `x402-pricing` ConfigMap is updated (e.g., new route added by reconciler). -**Expected**: `WatchConfig()` detects the file modification within 5 seconds (polling interval), parses the new config, and atomically swaps it via `Verifier.Reload()`. In-flight requests are not affected. -**Rationale**: Adding or removing services should not require verifier downtime. Atomic pointer swap ensures lock-free reads on the hot path. - -#### B-2.4.7: Per-million-token pricing approximated as per-request in Phase 1 - -**Trigger**: Operator sets `--per-mtok 1.00` on `obol sell http`. -**Expected**: The effective per-request price is calculated as `perMTok / 1000` (using `ApproxTokensPerRequest = 1000`). Both `perMTok` and the computed `perRequest` (as `price`) are stored in the pricing route. -**Rationale**: Phase 1 does not have exact token metering. A fixed approximation of 1000 tokens per request provides a reasonable baseline. - -#### B-2.4.8: Deleting a ServiceOffer cleans up all owned resources - -**Trigger**: Operator runs `obol sell delete `. -**Expected**: The ServiceOffer CR is deleted. All resources with ownerReferences (Middleware, HTTPRoute, pricing route ConfigMap entry, registration resources) are garbage-collected by Kubernetes. -**Rationale**: Clean deletion prevents orphaned routes and stale pricing entries. - ---- - -### 2.5 Buy-Side Payments - -> SPEC SS 3.5 -- Monetize: Buy Side - -#### B-2.5.1: Buyer probe discovers pricing from 402 response - -**Trigger**: Agent runs `buy.py probe` against a remote seller endpoint. -**Expected**: The probe sends an unpaid request, receives a 402 response, and extracts pricing information (payTo, price, network, asset) from the `PaymentRequirements` body. -**Rationale**: Discovery-driven purchasing lets agents find and pay for services without hardcoded pricing. - -```gherkin -Scenario: Buyer discovers pricing via probe - Given a remote seller is serving at "https://seller.example.com/services/qwen" - When the agent runs "buy.py probe" against the seller endpoint - Then the probe returns 402 with pricing info - And the pricing contains payTo, price, and network -``` - -#### B-2.5.2: Buyer pre-signs ERC-3009 authorizations into ConfigMaps - -**Trigger**: Agent runs `buy.py buy` with a seller endpoint and count. -**Expected**: The agent pre-signs N `TransferWithAuthorization` (ERC-3009) vouchers using the wallet private key, stores them in the `x402-buyer-auths` ConfigMap, and configures the upstream in `x402-buyer-config` ConfigMap. The sidecar hot-reloads the new config. -**Rationale**: Pre-signing moves the expensive signing operation out of the hot path. Bounded pool size limits maximum financial exposure. - -#### B-2.5.3: Paid inference through sidecar attaches payment on 402 - -**Trigger**: LiteLLM proxies a `paid/` request to the x402-buyer sidecar, which forwards to the remote seller and receives a 402. -**Expected**: The sidecar intercepts the 402, pops one pre-signed authorization from the pool, constructs an `X-PAYMENT` header, and retries the request. The seller verifies payment and returns the inference result. The sidecar returns the result to LiteLLM, which returns it to the agent. -**Rationale**: Transparent payment attachment means the agent sees standard OpenAI API responses. The sidecar handles the full x402 handshake internally. - -```gherkin -Scenario: Paid inference through sidecar - Given the buyer has 5 pre-signed authorizations for "seller-qwen" - When the agent requests model "paid/qwen3.5:9b" - Then LiteLLM proxies to the x402-buyer sidecar - And the sidecar sends an unpaid request to the seller - And the seller returns 402 - And the sidecar pops one authorization and retries with X-PAYMENT - And the seller returns 200 with inference content - And the agent receives the inference result - And the buyer has 4 remaining authorizations -``` - -#### B-2.5.4: Model resolution strips prefixes correctly - -**Trigger**: A request for `paid/openai/qwen3.5:9b` arrives at the x402-buyer sidecar. -**Expected**: The sidecar strips `paid/` and `openai/` prefixes to resolve `qwen3.5:9b`, looks up the model in `modelRoutes`, and dispatches to the correct upstream handler. -**Rationale**: LiteLLM adds `openai/` when routing through the `paid/*` catch-all. The sidecar must handle both prefixed and bare model names. - -#### B-2.5.5: Sidecar exposes status, health, and metrics endpoints - -**Trigger**: Monitoring or liveness probes query the sidecar. -**Expected**: `/status` returns JSON with remaining/spent auth counts per upstream. `/healthz` returns 200 for liveness. `/metrics` returns Prometheus-format metrics scraped via `PodMonitor`. -**Rationale**: Observability into auth pool state is critical for operational awareness. Prometheus integration enables alerting on low pool levels. - ---- - -### 2.6 Tunnel Management - -> SPEC SS 3.7 -- Tunnel Management - -#### B-2.6.1: Quick tunnel activates on first sell - -**Trigger**: Operator runs `obol sell http` and no DNS tunnel is provisioned. -**Expected**: The quick tunnel mode activates, starting the cloudflared pod. A random `*.trycloudflare.com` URL is assigned. The URL is propagated to the OpenClaw agent (`AGENT_BASE_URL`), the frontend ConfigMap, and the storefront. The tunnel URL is ephemeral and changes on restart. -**Rationale**: Quick mode provides zero-configuration public access. Activation on first sell means the tunnel is dormant until needed, reducing attack surface. - -```gherkin -Scenario: Quick tunnel activates on first sell - Given the stack is running with no DNS tunnel provisioned - And the tunnel is dormant - When the operator runs "obol sell http myapi ..." - Then the cloudflared pod is running in the "traefik" namespace - And a quick tunnel URL is assigned - And AGENT_BASE_URL is set on the OpenClaw Deployment -``` - -#### B-2.6.2: DNS tunnel persists across restarts - -**Trigger**: Operator runs `obol tunnel login --hostname stack.example.com`, provisions the tunnel, and later restarts the stack. -**Expected**: After `obol stack up`, the DNS tunnel automatically starts with the same stable hostname. Tunnel state (mode, hostname, accountID, zoneID, tunnelID) is persisted at `$OBOL_CONFIG_DIR/tunnel/cloudflared.json`. -**Rationale**: Stable hostnames are required for on-chain ERC-8004 registration and consistent discovery URLs. - -#### B-2.6.3: Tunnel URL propagation updates all consumers - -**Trigger**: A tunnel becomes active (either quick or DNS). -**Expected**: The tunnel URL is propagated to 4 consumers: -1. OpenClaw agent `AGENT_BASE_URL` environment variable. -2. Frontend `obol-stack-config` ConfigMap in `obol-frontend` namespace. -3. Agent overlay Helmfile state values. -4. Storefront landing page content. - -**Rationale**: Multiple components need the tunnel URL. Centralized propagation prevents URL drift. - -#### B-2.6.4: Storefront deploys at tunnel hostname root - -**Trigger**: Tunnel becomes active. -**Expected**: `CreateStorefront()` deploys 4 resources in the `traefik` namespace: a ConfigMap with HTML content, a busybox httpd Deployment (5m CPU, 8Mi RAM), a ClusterIP Service on port 8080, and an HTTPRoute for the tunnel hostname root (`/`). -**Rationale**: The storefront provides a human-readable landing page for visitors who navigate to the tunnel URL directly. - ---- - -### 2.7 ERC-8004 Identity - -> SPEC SS 3.8 -- ERC-8004 Identity - -#### B-2.7.1: On-chain registration mints agent NFT - -**Trigger**: The reconciler reaches stage 5 (Registered) for a ServiceOffer with `registration.enabled: true`, or the operator runs `obol sell register`. -**Expected**: The ERC-8004 client calls `Register(ctx, key, agentURI)` on the Identity Registry contract (Base Sepolia: `0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D`), minting an NFT. The returned `agentId` (token ID) and `registrationTxHash` are stored in the ServiceOffer status. -**Rationale**: On-chain identity enables decentralized agent discovery and reputation. - -```gherkin -Scenario: Agent registers on-chain via ERC-8004 - Given the ServiceOffer "myapi" is at stage "RoutePublished" - And the wallet has sufficient Base Sepolia ETH for gas - When the reconciler processes stage 5 (Registered) - Then an agent NFT is minted on the Identity Registry - And the ServiceOffer status has a non-empty "agentId" - And the ServiceOffer status has a non-empty "registrationTxHash" -``` - -#### B-2.7.2: Registration document served at well-known endpoint - -**Trigger**: A client fetches `/.well-known/agent-registration.json` via the tunnel. -**Expected**: The endpoint returns an `AgentRegistration` JSON document containing the agent name, description, services, x402Support (true), registrations (agentId + registry address), and supportedTrust array. -**Rationale**: The well-known endpoint is the standard ERC-8004 discovery mechanism. - -#### B-2.7.3: Metadata update via SetAgentURI - -**Trigger**: Agent metadata changes (e.g., new service added, description updated). -**Expected**: The ERC-8004 client calls `SetAgentURI(ctx, key, agentId, newURI)` to update the on-chain metadata pointer. -**Rationale**: On-chain metadata must stay current with the agent's actual capabilities. - ---- - -### 2.8 Security - -> SPEC SS 7 -- Security Model - -#### B-2.8.1: Local-only routes restricted by hostname - -**Trigger**: Any HTTPRoute for an internal service (frontend, eRPC, monitoring). -**Expected**: The HTTPRoute has `hostnames: ["obol.stack"]`, ensuring the route only matches requests with `Host: obol.stack`. Requests arriving via the tunnel (with the tunnel hostname) do not match. -**Rationale**: Internal services contain sensitive data (blockchain RPCs, inference admin, Prometheus metrics) and must not be reachable from the public internet. - -```gherkin -Scenario: Frontend is not accessible via tunnel - Given the tunnel is active with hostname "stack.example.com" - When a client sends GET "/" with Host "stack.example.com" - Then the response is the storefront landing page - And the response is NOT the frontend dashboard - -Scenario: Frontend is accessible locally - When a client sends GET "/" with Host "obol.stack" - Then the response is the frontend dashboard -``` - -#### B-2.8.2: RBAC binding patched by agent init - -**Trigger**: `obol agent init` runs during `obol stack up`. -**Expected**: `patchMonetizeBinding()` ensures the `openclaw-monetize-binding` ClusterRoleBinding has the `openclaw` ServiceAccount in `openclaw-obol-agent` namespace as a subject. This grants the agent CRUD access to ServiceOffers, Middlewares, HTTPRoutes, ConfigMaps, Services, and Deployments. -**Rationale**: The agent needs these permissions for the 6-stage reconciliation. Patching at init time handles the race condition where the binding may exist with empty subjects. - ---- - -## 3. Undesired Behaviors - -### 3.1 Security Violations - -#### U-3.1.1: Internal services exposed via tunnel (CRITICAL) - -**Trigger**: An HTTPRoute for the frontend, eRPC, LiteLLM admin, or monitoring is created without `hostnames: ["obol.stack"]` restriction. -**Expected**: This MUST NOT happen. All internal service HTTPRoutes MUST include the hostname restriction. Code review and CI must reject any change that removes hostname restrictions from internal routes. -**Risk**: Exposing the frontend exposes cluster management. Exposing eRPC exposes blockchain RPCs (potentially including write-enabled chains). Exposing LiteLLM admin exposes inference configuration and API keys. Exposing Prometheus exposes internal metrics and potentially secrets. This is the highest-severity security violation in the system. - -```gherkin -Scenario: Internal HTTPRoutes must have hostname restrictions - Given the stack is running - When I inspect the HTTPRoute for the frontend - Then it has hostnames containing "obol.stack" - When I inspect the HTTPRoute for eRPC - Then it has hostnames containing "obol.stack" -``` - -### 3.2 LLM Configuration - -#### U-3.2.1: Model without tool support assigned to agent - -**Trigger**: A model that does not support function/tool calling is configured as the agent's primary model. -**Expected**: The system should warn the operator that the model lacks tool support, as the OpenClaw agent relies on tool calling for skill execution and infrastructure management. -**Risk**: The agent silently fails to use skills, producing degraded responses with no indication of the root cause. - -#### U-3.2.2: drop_params silently strips tool definitions - -**Trigger**: LiteLLM's `drop_params` setting is enabled (default in some configs) and the model does not natively support tool parameters. -**Expected**: The system should NOT silently strip tool definitions. If a model does not support tools, the error should surface rather than being hidden. -**Risk**: Tool calls appear to succeed but the model never sees the tool definitions, leading to non-functional agent behavior that is extremely difficult to diagnose. - -### 3.3 Infrastructure Drift - -#### U-3.3.1: Kubeconfig port drift after restart - -**Trigger**: The k3d cluster is restarted and the API server is assigned a different port. -**Expected**: The kubeconfig should be refreshed during `obol stack up`. If the port drifts and the kubeconfig is stale, all kubectl operations fail. -**Risk**: All CLI commands that interact with the cluster fail with connection errors. The fix is `k3d kubeconfig write -o $CONFIG_DIR/kubeconfig.yaml --overwrite`, but the operator may not know this. - -#### U-3.3.2: RBAC binding empty subjects race condition - -**Trigger**: `obol agent init` runs before k3s has fully applied the `openclaw-monetize-binding` ClusterRoleBinding manifest. -**Expected**: The `patchMonetizeBinding()` function should handle this by creating or patching the binding. If the race occurs and is not handled, the agent has no permissions. -**Risk**: The 6-stage reconciliation silently fails on all stages that require Kubernetes API access (stages 3-6). The ServiceOffer status shows unhelpful error messages. - -### 3.4 Caching and Staleness - -#### U-3.4.1: eRPC cache staleness for balance queries - -**Trigger**: A paid request settles on-chain, and `buy.py balance` is called within 10 seconds. -**Expected**: The balance query may return a stale value because eRPC caches `eth_call` results for 10 seconds (unfinalized block TTL). -**Risk**: The agent or operator sees an incorrect USDC balance. This is cosmetic (the on-chain state is correct) but confusing. Operators should be aware of the ~10-second lag. - ---- - -## 4. Edge Cases - -### 4.1 Infrastructure Dependencies - -#### E-4.1.1: No Ollama running during stack up - -**Scenario**: The operator runs `obol stack up` but Ollama is not installed or not running on the host. -**Expected Handling**: `autoConfigureLLM()` fails to reach `http://localhost:11434/api/tags`. The failure is non-fatal: a warning is printed, and LiteLLM starts without local model entries. The operator can install/start Ollama later and run `obol model setup` manually. -**Rationale**: Ollama is not a hard dependency. Cloud-only configurations are valid. The stack must be operational even without local inference. - -```gherkin -Scenario: Stack up without Ollama - Given Ollama is not running - When the operator runs "obol stack up" - Then a warning is printed about Ollama not being available - And LiteLLM is running with no Ollama model entries - And the stack is otherwise fully operational -``` - -#### E-4.1.2: No cloud provider API keys available - -**Scenario**: Neither `ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, nor `OPENAI_API_KEY` is set in the environment. -**Expected Handling**: `autoConfigureLLM()` prints a warning for each missing provider. LiteLLM starts with only Ollama models (if available) or an empty model list. The operator can add keys later via `obol model setup`. -**Rationale**: Cloud API keys are not required. Local-only inference with Ollama is a valid configuration. - -### 4.2 Blockchain Operations - -#### E-4.2.1: Wallet lacks Base Sepolia ETH for registration - -**Scenario**: The reconciler reaches stage 5 (Registered) but the agent wallet has insufficient ETH to pay gas for the ERC-8004 registration transaction. -**Expected Handling**: The `Register()` call fails with a transaction error. The reconciler logs the error, sets the `Registered` condition to `False` with a message indicating insufficient gas, and retries on the next loop (10 seconds). The ServiceOffer remains at stage 4 (RoutePublished) -- the service is functional but not registered on-chain. -**Rationale**: Gas availability is an external dependency. The service should still work (stages 1-4 are complete) even if on-chain registration is pending. - -#### E-4.2.2: All discovery backends unavailable - -**Scenario**: The x402 facilitator, ChainList API, and blockchain RPC endpoints are all unreachable. -**Expected Handling**: Each subsystem degrades independently: -- Facilitator unavailable: x402-verifier cannot verify payments, returns 500. Existing free routes still work. -- ChainList unavailable: `obol network add ` fails with an error. Custom endpoints (`--endpoint`) still work. -- RPC unavailable: eRPC returns errors for blockchain queries. Local nodes (if deployed) still serve reads. - -**Rationale**: External service failures should not cascade. Each failure is isolated to its subsystem. - -### 4.3 Timing and Propagation - -#### E-4.3.1: ConfigMap propagation delay (60-120 seconds) - -**Scenario**: The reconciler updates the `x402-pricing` ConfigMap with a new route, but the x402-verifier does not see the change for up to 120 seconds. -**Expected Handling**: The k3d file watcher takes 60-120 seconds to propagate ConfigMap changes to mounted volumes. The verifier's `WatchConfig()` polls every 5 seconds for file modification time changes. Net worst-case delay: ~125 seconds from ConfigMap update to verifier reload. During this window, requests to the new route will pass through unpriced (free). -**Rationale**: This is a known k3d limitation. The window is short and the failure mode is permissive (free access, not blocked access). For immediate effect, the operator can force a pod restart. - -```gherkin -Scenario: Pricing route available after propagation delay - Given the x402-verifier is running - When a new pricing route is added to the x402-pricing ConfigMap - Then within 125 seconds the verifier serves the new pricing route - And requests to the route return 402 until payment is provided -``` - -#### E-4.3.2: ExternalName services with Traefik Gateway API - -**Scenario**: An operator creates an ExternalName Service expecting it to work as a Traefik upstream via Gateway API HTTPRoutes. -**Expected Handling**: ExternalName services do NOT work with Traefik Gateway API. The operator must use a ClusterIP Service with manually managed Endpoints instead. -**Rationale**: This is a known Traefik limitation. The `obol sell http` command creates ClusterIP Services to avoid this issue. - -### 4.4 Payment Edge Cases - -#### E-4.4.1: Pre-signed auth pool exhausted - -**Scenario**: All pre-signed ERC-3009 authorizations for a given upstream have been consumed. -**Expected Handling**: The `PreSignedSigner.Sign()` call returns an error. The x402-buyer sidecar returns a 404 for the model, indicating no purchased upstream is available. The agent must run `buy.py` to pre-sign additional authorizations. -**Rationale**: Bounded pool size is a security feature (maximum loss = N * price). Exhaustion is an expected operational event, not an error. - -```gherkin -Scenario: Auth pool exhaustion returns 404 - Given the buyer has 1 remaining authorization for "seller-qwen" - When the agent makes a paid request that consumes the last authorization - Then the request succeeds - When the agent makes another paid request - Then the sidecar returns 404 for the model - And the /status endpoint shows 0 remaining for "seller-qwen" -``` - -#### E-4.4.2: Quick tunnel URL changes after restart - -**Scenario**: The cluster is restarted (`obol stack down` then `obol stack up`) while in quick tunnel mode. -**Expected Handling**: The quick tunnel gets a new random `*.trycloudflare.com` URL. All URL consumers are re-propagated with the new URL. Any previous ERC-8004 registration with the old URL becomes stale. -**Rationale**: Quick mode is explicitly ephemeral. Operators needing stable URLs should use DNS mode (`obol tunnel login --hostname`). - ---- - -## 5. Performance Expectations - -| Behavior | Target | Measurement | Degradation Handling | -|----------|--------|-------------|---------------------| -| x402 ForwardAuth verify (no payment) | < 5ms | Time from ForwardAuth request to 402 response (local) | Lock-free `atomic.Pointer` config reads; pre-resolved chain map | -| x402 ForwardAuth verify (with payment) | < 600ms | Includes facilitator round-trip (100-500ms network) | Facilitator timeout; returns 500 on timeout | -| x402-buyer auth pop | < 1ms | Single mutex lock + O(1) pool pop per `Sign()` call | Mutex contention only under extreme concurrency | -| Route matching (verifier) | < 1ms | First-match short-circuit; no regex per request | Degenerate case: many routes, all glob patterns | -| Buyer model routing | < 1ms | `sync.RWMutex` concurrent reads; rebuild only on `Reload()` | Write lock held briefly during config reload | -| Pricing config hot-reload | < 10s | Poll interval (5s) + parse + atomic swap | Worst case: 5s poll + parse time; old config serves during swap | -| ConfigMap propagation (k3d) | 60-120s | k3d file watcher interval | Force pod restart for immediate effect | -| Quick tunnel URL availability | 10-20s | Time from pod start to URL assignment | Cloudflare registration latency; retry on failure | -| Helmfile sync (initial) | 2-5 min | Full infrastructure deployment | Progress reported via Helmfile output | -| LiteLLM restart | 10-30s | Pod termination + startup | In-flight requests may fail during restart window | -| `obol stack up` (cold start) | 3-7 min | Cluster creation + helmfile sync + auto-config | Depends on Docker image cache state | - ---- - -## 6. Guardrail Definitions - -### 6.1 Network Security - -| Guardrail | Rule | Enforcement | Violation Response | -|-----------|------|-------------|-------------------| -| Hostname restrictions on internal HTTPRoutes | Frontend, eRPC, monitoring, and LiteLLM admin HTTPRoutes MUST have `hostnames: ["obol.stack"]` | Code review; embedded template validation; `embed_crd_test.go` | Block PR merge; revert if deployed | -| Public routes limited to safe endpoints | Only `/services/*`, `/.well-known/*`, `/skill.md`, and `/` (storefront) may lack hostname restrictions | Template review; BDD integration tests | Block PR merge | -| Facilitator URL must use HTTPS | `ValidateFacilitatorURL()` rejects non-HTTPS facilitator URLs (loopback exempted for testing) | Runtime validation in CLI | CLI returns error; operation aborted | - -### 6.2 Payment Security - -| Guardrail | Rule | Enforcement | Violation Response | -|-----------|------|-------------|-------------------| -| x402 payment verification before resource access | Every request to `/services/*` MUST pass through x402 ForwardAuth | Traefik Middleware with ForwardAuth; reconciler creates Middleware in stage 3 | Middleware missing = route not published (stage 4 blocks) | -| Bounded spending on buyer sidecar | Maximum financial exposure = N * price, where N = pre-signed auth count | Finite pool in `PreSignedSigner`; no signer access in sidecar | Pool exhaustion returns 404; no additional spending possible | -| Replay protection | Every ERC-3009 authorization uses a unique random 32-byte nonce | `StateStore` tracks consumed nonces; `crypto/rand` for generation | Double-spend attempt rejected by contract | -| Zero signer access in buyer sidecar | The x402-buyer sidecar MUST NOT have access to any private key | Architecture: sidecar receives only pre-signed vouchers via ConfigMap | No private key mounted, injected, or accessible | - -### 6.3 Configuration Integrity - -| Guardrail | Rule | Enforcement | Violation Response | -|-----------|------|-------------|-------------------| -| KUBECONFIG auto-set for all K8s tools | `obol kubectl`, `obol helm`, `obol helmfile`, `obol k9s` MUST set `KUBECONFIG=$OBOL_CONFIG_DIR/kubeconfig.yaml` | CLI passthrough implementation sets env before exec | Without this, tools use default kubeconfig and target wrong cluster | -| OpenClaw version pinning consistency | Version in `OPENCLAW_VERSION` file, `openclawImageTag` Go const, and `obolup.sh` MUST agree | `TestOpenClawVersionConsistency` unit test | Test failure blocks CI | -| Two-stage templating separation | Stage 1 (CLI flags to Go templates) and Stage 2 (Helmfile to K8s) MUST NOT be mixed | Code review; template structure in `internal/embed/networks/` | Mixing stages causes unpredictable template rendering | -| Absolute paths in Docker volume mounts | All paths passed to Docker/k3d MUST be absolute | Resolved at `obol stack init` time; `config.Config` stores absolute paths | Relative paths cause Docker mount failures | - -### 6.4 Data Safety - -| Guardrail | Rule | Enforcement | Violation Response | -|-----------|------|-------------|-------------------| -| Wallet backup before purge | `PromptBackupBeforePurge()` MUST run before `obol stack purge` when wallets exist | CLI implementation checks for keystore files | Operator prompted to backup; can force with flag | -| Config hot-reload preserves previous on error | If parsing a new config file fails, the verifier/buyer MUST keep the previous valid config | Error handling in `WatchConfig()` and `Reload()` | Log error; continue serving with old config | -| OwnerReferences on reconciler-created resources | All Kubernetes resources created by `monetize.py` MUST have ownerReferences pointing to the ServiceOffer | Reconciler implementation sets ownerReferences on every create | Missing ownerReferences cause resource leaks on ServiceOffer deletion | -| Backend switching destroys old cluster | Changing from k3d to k3s (or vice versa) via `obol stack init --backend` MUST destroy the old backend first | `Init()` checks `.stack-backend` and calls `Destroy()` on mismatch | Orphaned Docker containers or k3s processes | diff --git a/docs/specs/CONTRIBUTING.md b/docs/specs/CONTRIBUTING.md deleted file mode 100644 index 20750187..00000000 --- a/docs/specs/CONTRIBUTING.md +++ /dev/null @@ -1,199 +0,0 @@ -# Developer Rules — Non-negotiable - -> References: SPEC Sections 1–9, ARCHITECTURE Section 1 (Design Philosophy) - -These rules derive from architectural decisions and hard-won operational experience. -Violating them creates silent failures, security holes, or infrastructure drift. - ---- - -## 1. Never Expose Internal Services via Tunnel - -Every HTTPRoute for frontend, eRPC, LiteLLM, or monitoring **must** carry `hostnames: ["obol.stack"]`. -Removing this restriction exposes admin UIs and RPC endpoints to the public internet through the Cloudflare tunnel. - -**Do this:** -```yaml -hostnames: - - "obol.stack" -``` - -**Not this:** -```yaml -# hostnames: [] ← CRITICAL: makes the route reachable via tunnel -``` - -*Why:* The tunnel exposes all routes without hostname restrictions. Internal services have no authentication layer beyond network isolation. (SPEC §7.1, ADR-0005) - ---- - -## 2. Two-Stage Templating Is Sacred - -Stage 1 (CLI flags → Go templates → `values.yaml`) and Stage 2 (Helmfile → K8s manifests) must stay separate. Never leak Helmfile template syntax into Stage 1 or vice versa. - -**Do this:** -```go -// Stage 1: Go template produces values.yaml -tmpl.Execute(out, map[string]string{"ChainID": "8453"}) -// Stage 2: helmfile sync --state-values-file values.yaml -``` - -**Not this:** -```go -// Mixing stages: Go template emitting {{ .Values.x }} Helm syntax -tmpl.Execute(out, "{{ .Values.chainID }}") // breaks Stage 2 -``` - -*Why:* Mixed stages produce undebuggable template errors. The separation enables `values.yaml` to be inspected as plain YAML between stages. (SPEC §3.3) - ---- - -## 3. Absolute Paths for Docker Volume Mounts - -All paths passed to k3d/Docker must be absolute. Relative paths resolve differently inside containers vs. host, causing silent mount failures. - -**Do this:** -```go -absPath, _ := filepath.Abs(cfg.DataDir) -// Use absPath in k3d volume mount -``` - -**Not this:** -```go -// Relative path: works on host, empty inside container -mount := ".workspace/data:/data" -``` - -*Why:* Resolved at `obol stack init` and stored in config. k3d volume mounts require host-absolute paths. (SPEC §1.3) - ---- - -## 4. Bound Spending on Buy-Side — Never Hot-Wallet the Sidecar - -The x402-buyer sidecar reads pre-signed ERC-3009 vouchers from a ConfigMap. It never holds signing keys. Maximum loss = N × price where N is the voucher pool size. - -**Do this:** -```go -// Sidecar pops one pre-signed auth per request -auth := pool.Pop(upstream) -``` - -**Not this:** -```go -// Signing in the sidecar: unbounded spending if compromised -sig, _ := wallet.Sign(transferAuth) -``` - -*Why:* A compromised sidecar with signing keys could drain the wallet. Pre-signed vouchers bound the blast radius by design. (SPEC §3.5, ADR-0004) - ---- - -## 5. KUBECONFIG Must Auto-Set for All K8s Tools - -Every command that touches Kubernetes (`kubectl`, `helm`, `helmfile`, `k9s`, and internal functions) must set `KUBECONFIG=$OBOL_CONFIG_DIR/kubeconfig.yaml`. Never rely on the user's default kubeconfig. - -**Do this:** -```go -cmd.Env = append(os.Environ(), "KUBECONFIG="+cfg.KubeconfigPath()) -``` - -**Not this:** -```go -// Omitting KUBECONFIG: hits user's default cluster, not obol's -cmd := exec.Command("kubectl", "apply", "-f", manifest) -``` - -*Why:* Users may have multiple clusters. Omitting KUBECONFIG operates on the wrong cluster, potentially destroying production workloads. (SPEC §1.3, §3.1) - ---- - -## 6. Version Pins Must Agree Across Three Locations - -OpenClaw version is pinned in `internal/openclaw/OPENCLAW_VERSION` (source of truth), `openclawImageTag` constant in `openclaw.go`, and `OPENCLAW_VERSION` in `obolup.sh`. All three must match. `TestOpenClawVersionConsistency` enforces this. - -**Do this:** -``` -# Update all three when bumping: -internal/openclaw/OPENCLAW_VERSION ← Renovate watches this -internal/openclaw/openclaw.go ← openclawImageTag const -obolup.sh ← OPENCLAW_VERSION variable -``` - -**Not this:** -``` -# Updating only one: CI passes, runtime pulls wrong image -echo "0.1.8" > internal/openclaw/OPENCLAW_VERSION -# Forgot openclaw.go and obolup.sh → version drift -``` - -*Why:* Mismatched versions cause the binary to deploy a different image than obolup.sh installs, producing silent behavioral differences. (SPEC §3.6) - ---- - -## 7. ServiceOffer Cleanup via OwnerReferences - -When the reconciler creates Kubernetes resources (Middleware, HTTPRoute, ConfigMap, Service, Deployment) for a ServiceOffer, every resource must carry an `ownerReference` back to the ServiceOffer CR. This enables automatic garbage collection on delete. - -**Do this:** -```python -owner_ref = { - "apiVersion": "obol.org/v1alpha1", - "kind": "ServiceOffer", - "name": offer["metadata"]["name"], - "uid": offer["metadata"]["uid"], -} -``` - -**Not this:** -```python -# Orphaned resources: deleting the ServiceOffer leaves routing artifacts -kubectl.create(middleware) # no ownerReference -``` - -*Why:* Without owner references, `obol sell delete` leaves orphaned Middleware and HTTPRoutes that continue routing traffic to dead upstreams. (SPEC §3.4) - ---- - -## 8. Conventional Commits, Scoped PRs - -Use conventional commit prefixes (`feat:`, `fix:`, `test:`, `docs:`, `chore:`). Keep PRs scoped — separate formatting changes from logic changes. Never mix refactoring with feature work in the same PR. - -**Do this:** -``` -feat: add per-mtok pricing to sell http command -fix: restore tunnel open/close dropped during cherry-pick -``` - -**Not this:** -``` -update sell command and fix formatting and add tests -``` - -*Why:* Scoped commits enable clean reverts, meaningful changelogs, and reviewable diffs. Mixed PRs are unreviewable and un-revertable. - ---- - -## 9. Integration Tests Skip Gracefully - -Integration tests use `//go:build integration` and must skip (not fail) when prerequisites are missing (no cluster, no Ollama, no API keys). Unit tests must never require a running cluster. - -**Do this:** -```go -//go:build integration - -func TestIntegration_SellFlow(t *testing.T) { - if os.Getenv("OBOL_DEVELOPMENT") == "" { - t.Skip("requires OBOL_DEVELOPMENT=true and running cluster") - } -} -``` - -**Not this:** -```go -// No build tag, fails in CI without cluster -func TestSellFlow(t *testing.T) { - // Calls kubectl internally → fails everywhere -} -``` - -*Why:* CI runs `go test ./...` without a cluster. Failing tests block unrelated PRs. (SPEC §10) diff --git a/docs/specs/SPEC.md b/docs/specs/SPEC.md deleted file mode 100644 index 1d9ba147..00000000 --- a/docs/specs/SPEC.md +++ /dev/null @@ -1,1452 +0,0 @@ -# Obol Stack -- Technical Specification - -> **Version:** 1.0.0 -> **Date:** 2026-03-27 -> **Status:** Living document reflecting the current codebase on the `main` branch. - ---- - -## Table of Contents - -1. [Introduction](#1-introduction) -2. [System Architecture](#2-system-architecture) -3. [Core Subsystems](#3-core-subsystems) -4. [API and Protocol Definition](#4-api-and-protocol-definition) -5. [Data Model](#5-data-model) -6. [Integration Points](#6-integration-points) -7. [Security Model](#7-security-model) -8. [Error Handling](#8-error-handling) -9. [Performance](#9-performance) -10. [Testing Strategy](#10-testing-strategy) - ---- - -## 1. Introduction - -### 1.1 Purpose - -Obol Stack is a framework for AI agents to run decentralized infrastructure locally. It deploys a k3d/k3s Kubernetes cluster containing an OpenClaw AI agent, blockchain networks, payment-gated inference via the x402 protocol, and Cloudflare tunnels for public exposure. All management is done through the `obol` CLI binary, built with Go and `github.com/urfave/cli/v3`. - -### 1.2 Terminology - -| Term | Definition | -|------|-----------| -| **x402** | HTTP 402 Payment Required protocol for micropayments. Clients attach EIP-712 signed `PaymentPayload` headers; servers verify via a facilitator service. | -| **ERC-8004** | Ethereum standard for on-chain agent identity. Defines an `IdentityRegistryUpgradeable` (ERC-721) with metadata storage. | -| **ServiceOffer** | Custom Kubernetes resource (`obol.org`) declaring a sell-side service with pricing, upstream, and registration metadata. | -| **ForwardAuth** | Traefik middleware pattern where every request is first forwarded to an auth service (`x402-verifier`) before reaching the upstream. | -| **ERC-3009** | `TransferWithAuthorization` -- gasless USDC transfers via pre-signed EIP-712 authorizations. | -| **Facilitator** | Third-party x402 service that verifies payment signatures and optionally settles on-chain. Default: `https://facilitator.x402.rs`. | -| **LiteLLM** | OpenAI-compatible proxy that routes inference requests to Ollama, Anthropic, OpenAI, or paid remote sellers. | -| **eRPC** | Blockchain RPC gateway that multiplexes and caches requests across multiple upstream RPC providers. | -| **OpenClaw** | AI agent runtime deployed as a singleton Kubernetes Deployment with skills injected via host-path PVC. | -| **Petname** | Two-word deterministic identifier (e.g., `fluffy-penguin`) generated via `dustinkirkland/golang-petname` for unique cluster/deployment naming. | -| **CAIP-2** | Chain Agnostic Improvement Proposal for network identifiers (e.g., `eip155:84532` for Base Sepolia). | -| **Storefront** | Static HTML landing page served at the tunnel hostname root via busybox httpd. | -| **Sidecar** | The `x402-buyer` container running alongside LiteLLM in the same Pod, handling buy-side payment attachment. | - -### 1.3 System Constraints - -| Constraint | Detail | -|-----------|--------| -| **Absolute paths** | Docker volume mounts require absolute paths, resolved at `obol stack init` time. | -| **Two-stage templating** | Stage 1: CLI flags populate Go templates in `values.yaml.gotmpl`. Stage 2: Helmfile renders final Kubernetes manifests. Stages must not be mixed. | -| **Unique namespaces** | Every deployment (network, app) gets a unique namespace: `-`. | -| **OBOL_DEVELOPMENT=true** | Required for `obol stack up` to auto-build and import local Docker images (x402-verifier, x402-buyer). | -| **Root-owned PVCs** | k3s local-path-provisioner creates root-owned directories. `obol stack purge -f` (sudo) required to remove. | -| **Single cluster** | One k3d/k3s cluster per config directory. Multiple stacks require separate `OBOL_CONFIG_DIR` values. | -| **OpenClaw version pinning** | Version must agree in 3 places: `OPENCLAW_VERSION` file, `openclawImageTag` Go const, `obolup.sh` shell const. `TestOpenClawVersionConsistency` enforces this. | -| **ConfigMap propagation delay** | k3d file watcher takes 60-120 seconds to pick up manifest changes. | - -### 1.4 Dependencies - -| Dependency | Minimum Version | Purpose | -|-----------|----------------|---------| -| Docker | 20.10.0 | Container runtime for k3d backend | -| Go | 1.25 | Build toolchain | -| kubectl | 1.35.0 | Kubernetes API client | -| Helm | 3.19.4 | Chart management | -| k3d | 5.8.3 | k3d cluster management (default backend) | -| Helmfile | 1.2.3 | Declarative Helm chart orchestration | -| k9s | 0.50.18 | Cluster TUI (optional) | -| k3s | v1.35.1-k3s1 | Kubernetes distribution (via `rancher/k3s` image or binary) | - ---- - -## 2. System Architecture - -### 2.1 High-Level Overview - -The system is composed of two parts: `obolup.sh` (bootstrap installer with pinned dependency versions) and the `obol` CLI (Go binary managing all lifecycle operations). - -```mermaid -graph TD - subgraph "Host Machine" - CLI["obol CLI (Go binary)"] - Ollama["Ollama (host)"] - Docker["Docker / Podman"] - end - - subgraph "k3d / k3s Cluster" - subgraph "traefik ns" - GW["Traefik Gateway
(Gateway API)"] - CF["cloudflared"] - SF["Storefront httpd"] - end - - subgraph "llm ns" - LiteLLM["LiteLLM :4000"] - Buyer["x402-buyer :8402
(sidecar)"] - end - - subgraph "x402 ns" - Verifier["x402-verifier
(ForwardAuth)"] - end - - subgraph "openclaw-obol-agent ns" - Agent["OpenClaw Agent"] - RS["Remote Signer :9000"] - end - - subgraph "erpc ns" - ERPC["eRPC Gateway"] - end - - subgraph "obol-frontend ns" - FE["Frontend"] - end - - subgraph "monitoring ns" - Prom["Prometheus"] - end - - subgraph "network-petname ns" - EL["Execution Layer"] - CL["Consensus Layer"] - end - end - - Internet["Public Internet"] - - CLI --> Docker - CLI --> GW - Ollama -.->|host.docker.internal| LiteLLM - Internet --> CF --> GW - GW -->|/services/*| Verifier -->|200 OK| GW --> LiteLLM - GW -->|obol.stack| FE - GW -->|obol.stack/rpc| ERPC - LiteLLM --> Buyer -->|x402 payment| Internet - Agent --> RS - ERPC --> EL -``` - -### 2.2 Routing Architecture - -Traefik serves as the cluster ingress using the Kubernetes Gateway API. A single `GatewayClass` (`traefik`) and `Gateway` (`traefik-gateway`) in the `traefik` namespace handle all HTTP/HTTPS traffic. - -```mermaid -flowchart LR - subgraph "Request Classification" - direction TB - R1["Local-only
hostnames: obol.stack"] - R2["Public
no hostname restriction"] - end - - R1 -->|"/"| FE["Frontend"] - R1 -->|"/rpc"| ERPC["eRPC"] - - R2 -->|"/services/name/*"| FA["x402 ForwardAuth"] --> US["Upstream Service"] - R2 -->|"/.well-known/*"| WK["ERC-8004 httpd"] - R2 -->|"/skill.md"| SK["Service Catalog"] - R2 -->|"/ (tunnel host)"| SF["Storefront"] -``` - -**Routing rules:** - -- **Local-only routes** are restricted to `hostnames: ["obol.stack"]`. This ensures the frontend, eRPC, LiteLLM admin, and monitoring are never reachable via the Cloudflare tunnel. -- **Public routes** have no hostname restriction and are intentionally exposed via the tunnel. The `/services/*` path is protected by x402 ForwardAuth. Discovery endpoints (`/.well-known/`, `/skill.md`) and the storefront landing page are unauthenticated. - -### 2.3 Configuration Hierarchy - -``` -Config{ConfigDir, BinDir, DataDir, StateDir} - -Precedence (each directory type): - 1. Explicit env var (OBOL_CONFIG_DIR, OBOL_BIN_DIR, OBOL_DATA_DIR, OBOL_STATE_DIR) - 2. XDG standard (XDG_CONFIG_HOME/obol, ~/.local/bin, XDG_DATA_HOME/obol, XDG_STATE_HOME/obol) - 3. OBOL_DEVELOPMENT=true -> .workspace/{config,bin,data,state} -``` - -**Source:** `internal/config/config.go` - -### 2.4 Backend Abstraction - -The `Backend` interface (`internal/stack/backend.go`) abstracts the Kubernetes runtime: - -| Method | Description | -|--------|-----------| -| `Init(cfg, ui, stackID)` | Generate backend-specific cluster configuration | -| `Up(cfg, ui, stackID)` | Create/start cluster, return kubeconfig bytes | -| `Down(cfg, ui, stackID)` | Stop cluster without destroying config/data | -| `Destroy(cfg, ui, stackID)` | Remove cluster entirely | -| `DataDir(cfg)` | Return storage path for local-path-provisioner | -| `Prerequisites(cfg)` | Check required software/permissions | -| `IsRunning(cfg, stackID)` | Check if cluster is currently running | - -**Implementations:** - -- `K3dBackend` (default): Docker-based via k3d. Ports: 80:80, 8080:80, 443:443, 8443:443. -- `K3sBackend`: Bare-metal k3s binary. Ollama host is `127.0.0.1` (no Docker networking). - -Backend choice is persisted in `.stack-backend` file. Switching backends triggers automatic destruction of the old cluster to prevent orphaned resources. - ---- - -## 3. Core Subsystems - -### 3.1 Stack Lifecycle - -**Source:** `internal/stack/stack.go`, `internal/stack/backend.go`, `internal/stack/backend_k3d.go`, `internal/stack/backend_k3s.go` - -#### 3.1.1 Purpose - -Manage the full lifecycle of the local Kubernetes cluster: initialization, startup (with infrastructure deployment), shutdown, and purge. - -#### 3.1.2 Operations - -| Command | Function | Behavior | -|---------|----------|---------| -| `obol stack init` | `Init()` | Generate cluster ID (petname), resolve absolute paths, write backend config, copy embedded infrastructure defaults, resolve Ollama host for backend | -| `obol stack up` | `Up()` | Create cluster, export kubeconfig, `syncDefaults()` (helmfile sync), auto-configure LiteLLM, deploy OpenClaw, apply agent RBAC, start DNS tunnel if persistent | -| `obol stack down` | `Down()` | Stop cluster (preserves config + data), stop DNS resolver | -| `obol stack purge` | `Purge()` | Destroy cluster, remove config dir; `--force` also removes root-owned data dir via sudo | - -#### 3.1.3 Startup Sequence - -```mermaid -sequenceDiagram - participant User - participant CLI as obol CLI - participant Backend as K3d/K3s Backend - participant Helmfile - participant LiteLLM - participant OpenClaw - participant Tunnel - - User->>CLI: obol stack up - CLI->>Backend: Up(cfg, stackID) - Backend-->>CLI: kubeconfig bytes - CLI->>CLI: Write kubeconfig - CLI->>Helmfile: syncDefaults (helmfile sync) - Note over Helmfile: Deploy infrastructure
(Traefik, eRPC, x402, LiteLLM, etc.) - Helmfile-->>CLI: Infrastructure deployed - CLI->>LiteLLM: autoConfigureLLM() - Note over LiteLLM: Detect Ollama models
Detect cloud provider API keys
Patch ConfigMap + Secret
Single restart - CLI->>OpenClaw: SetupDefault() - Note over OpenClaw: Deploy singleton agent
Inject skills via PVC - CLI->>CLI: agent.Init() (RBAC patching) - CLI->>Tunnel: Check tunnel state - alt DNS tunnel provisioned - CLI->>Tunnel: EnsureRunning() - else Quick tunnel - Note over Tunnel: Dormant until first sell - end - CLI-->>User: Stack started -``` - -#### 3.1.4 Ollama Host Resolution - -The Ollama host varies by backend and OS: - -| Backend | OS | Ollama Host | IP Resolution | -|---------|----|-------------|---------------| -| k3d | macOS | `host.docker.internal` | Docker Desktop gateway `192.168.65.254` | -| k3d | Linux | `host.k3d.internal` | `docker0` bridge IP | -| k3s | any | `127.0.0.1` | Loopback (k3s runs on host) | - -#### 3.1.5 Configuration - -- **Stack ID:** Persisted in `$OBOL_CONFIG_DIR/.stack-id`. Preserved across `--force` reinit. -- **Backend choice:** Persisted in `$OBOL_CONFIG_DIR/.stack-backend`. -- **Embedded defaults:** Copied to `$OBOL_CONFIG_DIR/defaults/` with template substitution (`{{OLLAMA_HOST}}`, `{{OLLAMA_HOST_IP}}`, `{{CLUSTER_ID}}`). - -#### 3.1.6 Error States - -| Error | Cause | Recovery | -|-------|-------|---------| -| `stack ID not found` | `Init()` not called | Run `obol stack init` | -| `port(s) already in use` | Conflicting service on 80/443/8080/8443 | Stop conflicting service | -| `helmfile sync failed` | Infrastructure deployment error | Cluster auto-stopped via `Down()`, fix and retry | -| `prerequisites check failed` | Missing Docker/k3s binary | Install prerequisites | - ---- - -### 3.2 LLM Routing - -**Source:** `internal/model/model.go` - -#### 3.2.1 Purpose - -Configure and manage the LiteLLM gateway (port 4000) as the central OpenAI-compatible inference proxy, routing requests to local Ollama, cloud providers, or paid remote sellers. - -#### 3.2.2 Inputs / Outputs - -| Input | Source | Description | -|-------|--------|-------------| -| Ollama models | Host Ollama API (`/api/tags`) | Auto-detected during `obol stack up` | -| Cloud API keys | Environment variables | `ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN`, `OPENAI_API_KEY` | -| OpenClaw config | `~/.openclaw/openclaw.json` | Agent model preference for cloud provider detection | - -| Output | Target | Description | -|--------|--------|-------------| -| `litellm-config` ConfigMap | `llm` namespace | YAML `config.yaml` with `model_list` entries | -| `litellm-secrets` Secret | `llm` namespace | Master key + provider API keys | -| LiteLLM Deployment restart | `llm` namespace | Triggered after config patches | - -#### 3.2.3 Provider Configuration - -Known providers are defined statically: - -| Provider | EnvVar | Alt EnvVars | Notes | -|----------|--------|-------------|-------| -| `anthropic` | `ANTHROPIC_API_KEY` | `CLAUDE_CODE_OAUTH_TOKEN` | Claude models | -| `openai` | `OPENAI_API_KEY` | -- | GPT models | -| `ollama` | -- | -- | Local, no API key | - -#### 3.2.4 Logic - -1. **Auto-configuration** (`autoConfigureLLM`): Detects Ollama models and cloud provider API keys. Patches all providers first, then performs a single LiteLLM restart. -2. **Manual configuration** (`ConfigureLiteLLM`): `obol model setup --provider `. Patches Secret + ConfigMap + restarts. -3. **Paid inference routing**: Static `paid/*` model alias routes through the `x402-buyer` sidecar at `http://127.0.0.1:8402`. The LiteLLM config contains a permanent catch-all entry; the sidecar handles payment attachment. - -#### 3.2.5 LiteLLM Config Structure - -```yaml -model_list: - - model_name: "qwen3.5:9b" # Ollama model - litellm_params: - model: "ollama/qwen3.5:9b" - api_base: "http://ollama.llm.svc:11434" - - model_name: "anthropic/*" # Cloud wildcard - litellm_params: - model: "anthropic/*" - api_key: "os.environ/ANTHROPIC_API_KEY" - - model_name: "paid/*" # Buy-side sidecar - litellm_params: - model: "openai/*" - api_base: "http://127.0.0.1:8402/v1" -``` - -#### 3.2.6 Error States - -| Error | Cause | Recovery | -|-------|-------|---------| -| `cluster not running` | Kubeconfig missing | Run `obol stack up` | -| `no models to configure` | Empty model list for provider | Ensure Ollama has models or provide API key | -| Auto-configure failures | Non-fatal | User can run `obol model setup` manually | - ---- - -### 3.3 Network / RPC Gateway - -**Source:** `internal/network/erpc.go`, `internal/network/rpc.go`, `internal/network/network.go`, `internal/network/resolve.go` - -#### 3.3.1 Purpose - -Manage blockchain RPC routing through the eRPC gateway. Add/remove chains with public or custom RPC endpoints, deploy local Ethereum nodes, and register them as priority upstreams. - -#### 3.3.2 eRPC ConfigMap Structure - -The eRPC configuration is stored in the `erpc-config` ConfigMap in the `erpc` namespace under the key `erpc.yaml`. It defines projects with networks and upstreams: - -```yaml -projects: - - id: main - networks: - - architecture: evm - evm: - chainId: 1 - upstreams: - - id: local-ethereum-fluffy-penguin - endpoint: http://ethereum-execution.ethereum-fluffy-penguin.svc.cluster.local:8545 - evm: - chainId: 1 - - id: chainlist-ethereum-1 - endpoint: https://eth.llamarpc.com - evm: - chainId: 1 -``` - -#### 3.3.3 Two-Stage Templating - -Network deployments use two-stage templating: - -1. **Stage 1 (CLI flags -> Go templates):** `values.yaml.gotmpl` files in `internal/embed/networks/` use `@enum`, `@default`, `@description` annotations. CLI flags populate these templates to produce `values.yaml`. -2. **Stage 2 (Helmfile -> K8s):** `helmfile sync --state-values-file values.yaml --state-values-set id=` renders final Kubernetes manifests. - -#### 3.3.4 Write Method Blocking - -By default, eRPC blocks write methods (`eth_sendRawTransaction`) on all upstreams. The `--allow-writes` flag on `obol network add` enables write methods for a specific chain. Local Ethereum nodes registered via `RegisterERPCUpstream()` always have writes blocked -- write requests are routed to remote upstreams instead. - -#### 3.3.5 Operations - -| Command | Function | Description | -|---------|----------|-------------| -| `obol network add` | `AddPublicRPCs()` / `AddCustomRPC()` | Add chain by ID (ChainList) or custom endpoint | -| `obol network remove` | `RemoveRPC()` | Remove chain from eRPC | -| `obol network list` | `ListRPCNetworks()` | Show configured chains and upstreams | -| `obol network install` | `Install()` | Deploy local Ethereum node (two-stage template) | -| `obol network sync` | `Sync()` | Re-sync helmfile for a deployed network | -| `obol network status` | `Status()` | Show deployment status | - ---- - -### 3.4 Monetize -- Sell Side - -**Source:** `cmd/obol/sell.go`, `internal/x402/`, `internal/schemas/`, `internal/embed/skills/monetize/` - -#### 3.4.1 Purpose - -Enable operators to sell access to cluster services (inference, HTTP endpoints) via x402 micropayments. The sell side creates ServiceOffer CRDs, reconciles them through 6 stages, and publishes payment-gated routes via Traefik. - -#### 3.4.2 Sell-Side Flow - -```mermaid -sequenceDiagram - participant Operator - participant CLI as obol sell http - participant K8s as Kubernetes API - participant Reconciler as monetize.py - participant Verifier as x402-verifier - participant Traefik - - Operator->>CLI: obol sell http myapi --wallet 0x... --price 0.001 - CLI->>K8s: Create ServiceOffer CR - CLI->>CLI: EnsureTunnelForSell() - - loop Reconciliation (every 10s) - Reconciler->>K8s: Watch ServiceOffer CRs - Reconciler->>Reconciler: Stage 1: ModelReady - Reconciler->>Reconciler: Stage 2: UpstreamHealthy - Reconciler->>Verifier: Stage 3: PaymentGateReady
(create Middleware + pricing route) - Reconciler->>Traefik: Stage 4: RoutePublished
(create HTTPRoute) - Reconciler->>K8s: Stage 5: Registered
(ERC-8004 on-chain) - Reconciler->>K8s: Stage 6: Ready - end - - Note over Traefik: /services/myapi/* -> ForwardAuth -> upstream -``` - -#### 3.4.3 ServiceOffer CRD - -The `ServiceOffer` CRD (`obol.org`) is the declarative API for sell-side services: - -**Spec fields:** - -| Field | Type | Description | -|-------|------|-------------| -| `type` | `WorkloadType` | `inference` or `fine-tuning` | -| `model` | `ModelSpec` | `{name, runtime}` -- LLM model metadata | -| `upstream` | `UpstreamSpec` | `{service, namespace, port, healthPath}` -- target K8s Service | -| `payment` | `PaymentTerms` | `{scheme, network, payTo, maxTimeoutSeconds, price}` | -| `path` | `string` | URL path prefix (default: `/services/`) | -| `registration` | `RegistrationSpec` | ERC-8004 metadata `{enabled, name, description, image, services, supportedTrust}` | - -**Status fields:** - -| Field | Type | Description | -|-------|------|-------------| -| `conditions[]` | `Condition` | 6 condition types tracking reconciliation progress | -| `endpoint` | `string` | Published URL | -| `agentId` | `string` | ERC-8004 token ID | -| `registrationTxHash` | `string` | On-chain registration transaction hash | - -#### 3.4.4 x402-verifier (ForwardAuth) - -**Source:** `internal/x402/verifier.go`, `internal/x402/config.go`, `internal/x402/matcher.go`, `internal/x402/watcher.go` - -The x402-verifier runs in the `x402` namespace as a Deployment. Traefik sends every request matching a ForwardAuth Middleware to `POST /verify`. The verifier: - -1. Reads `X-Forwarded-Uri` from the request headers. -2. Matches against `PricingConfig.Routes[]` (first match wins). -3. No match -> `200 OK` (free route). -4. Match + no `X-PAYMENT` header -> `402 Payment Required` with `PaymentRequirements` body. -5. Match + `X-PAYMENT` header -> delegates to `x402-go` middleware for verification/settlement. -6. Verified -> `200 OK` (optionally sets `Authorization` header for upstream auth). - -**Configuration hot-reload:** `WatchConfig()` polls the pricing YAML file every 5 seconds for modification time changes, then atomically swaps the `PricingConfig` via `Verifier.Reload()`. This handles ConfigMap volume mount updates (kubelet symlink swaps) without fsnotify. - -**Route matching** (`internal/x402/matcher.go`): - -| Pattern Type | Example | Behavior | -|-------------|---------|---------| -| Exact | `/health` | Matches only `/health` | -| Prefix | `/rpc/*` | Matches `/rpc/`, `/rpc/a/b/c` | -| Glob | `/inference-*/v1/*` | Segment-level wildcards via `path.Match` | - -#### 3.4.5 Pricing - -```go -// PricingConfig (YAML: x402-pricing ConfigMap) -type PricingConfig struct { - Wallet string // USDC recipient address - Chain string // e.g., "base-sepolia" - FacilitatorURL string // default: "https://facilitator.x402.rs" - VerifyOnly bool // skip settlement (testing) - Routes []RouteRule // first-match pricing rules -} - -type RouteRule struct { - Pattern string // URL path pattern - Price string // USDC per request - PayTo string // per-route wallet override - Network string // per-route chain override - UpstreamAuth string // Authorization header for upstream - PriceModel string // metadata: "perRequest", "perMTok" - PerMTok string // original per-million-token price - ApproxTokensPerRequest int // fixed estimate (default: 1000) - OfferNamespace string // originating ServiceOffer - OfferName string // originating ServiceOffer -} -``` - -**Phase 1 pricing approximation:** When `perMTok` is set, the effective per-request price is `perMTok / 1000` (using `ApproxTokensPerRequest = 1000`). Exact token metering is planned for phase 2. - -#### 3.4.6 Supported Chains - -| Chain | Name | CAIP-2 | -|-------|------|--------| -| Base Mainnet | `base` | `eip155:8453` | -| Base Sepolia | `base-sepolia` | `eip155:84532` | -| Polygon Mainnet | `polygon` | `eip155:137` | -| Polygon Amoy | `polygon-amoy` | `eip155:80002` | -| Avalanche Mainnet | `avalanche` | `eip155:43114` | -| Avalanche Fuji | `avalanche-fuji` | `eip155:43113` | - -#### 3.4.7 CLI Commands - -| Command | Description | -|---------|-------------| -| `obol sell http ` | Sell access to an HTTP service with x402 gating | -| `obol sell inference ` | Sell inference via standalone gateway (bare metal) | -| `obol sell list` | List active ServiceOffers | -| `obol sell status ` | Show reconciliation status for a ServiceOffer | -| `obol sell stop ` | Scale down a sold service | -| `obol sell delete ` | Delete ServiceOffer and all owned resources | -| `obol sell pricing` | Configure global wallet and chain | -| `obol sell register` | Trigger ERC-8004 on-chain registration | - -#### 3.4.8 Error States - -| Error | Cause | Recovery | -|-------|-------|---------| -| `unsupported chain` | Invalid chain name in `--chain` | Use one of: base, base-sepolia, polygon, polygon-amoy, avalanche, avalanche-fuji | -| `facilitator URL must use HTTPS` | Non-HTTPS facilitator (not localhost) | Use HTTPS URL or loopback for testing | -| Reconciler stuck at stage | Upstream unhealthy, wallet missing, tunnel down | Check `obol sell status ` for condition messages | - ---- - -### 3.5 Monetize -- Buy Side - -**Source:** `internal/x402/buyer/config.go`, `internal/x402/buyer/signer.go`, `internal/x402/buyer/proxy.go`, `internal/x402/buyer/state.go` - -#### 3.5.1 Purpose - -Enable agents to purchase inference from remote x402-gated sellers using pre-signed ERC-3009 `TransferWithAuthorization` vouchers. The `x402-buyer` sidecar runs as a second container in the `litellm` Deployment. - -#### 3.5.2 Buy-Side Flow - -```mermaid -sequenceDiagram - participant Agent as OpenClaw Agent - participant LiteLLM - participant Buyer as x402-buyer sidecar - participant Seller as Remote Seller - - Agent->>LiteLLM: POST /v1/chat/completions
model: "paid/qwen3.5:9b" - LiteLLM->>Buyer: Proxy to :8402/v1/chat/completions
model: "qwen3.5:9b" - Buyer->>Seller: POST /services/qwen/v1/chat/completions - Seller-->>Buyer: 402 PaymentRequired - Note over Buyer: Pop pre-signed auth
from pool - Buyer->>Seller: Retry with X-PAYMENT header - Seller-->>Buyer: 200 OK + inference response - Buyer-->>LiteLLM: 200 OK - LiteLLM-->>Agent: Chat completion -``` - -#### 3.5.3 Architecture - -The sidecar has zero signer access. Spending is bounded by design: maximum loss = N * price, where N is the number of pre-signed authorizations in the pool. - -**Components:** - -| Component | Role | -|-----------|------| -| `Proxy` | OpenAI-compatible reverse proxy with model-based routing | -| `PreSignedSigner` | Implements `x402.Signer` by popping from a finite auth pool | -| `StateStore` | Tracks consumed nonces to prevent double-spend across restarts | -| `X402Transport` | HTTP transport that intercepts 402 responses and attaches payments | - -#### 3.5.4 Configuration - -```json -// x402-buyer-config ConfigMap -{ - "upstreams": { - "seller-qwen": { - "url": "https://seller.example.com/services/qwen", - "remoteModel": "qwen3.5:9b", - "network": "base-sepolia", - "payTo": "0x...", - "asset": "0x...", - "price": "1000" - } - } -} -``` - -```json -// x402-buyer-auths ConfigMap (pre-signed ERC-3009 authorizations) -{ - "seller-qwen": [ - { - "signature": "0x...", - "from": "0x...", - "to": "0x...", - "value": "1000", - "validAfter": "0", - "validBefore": "115792089237316195423570985008687907853269984665640564039457584007913129639935", - "nonce": "0x..." - } - ] -} -``` - -#### 3.5.5 Model Resolution - -The proxy strips `paid/` and `openai/` prefixes from the requested model name to resolve the upstream: - -``` -"paid/openai/qwen3.5:9b" -> "qwen3.5:9b" -> lookup in modelRoutes -> upstream handler -``` - -#### 3.5.6 Endpoints - -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/v1/chat/completions` | POST | OpenAI chat completions (model-routed) | -| `/chat/completions` | POST | OpenAI chat completions (no `/v1` prefix) | -| `/v1/responses` | POST | OpenAI responses API | -| `/responses` | POST | OpenAI responses API (no `/v1` prefix) | -| `/upstream//...` | ANY | Direct upstream access (compatibility) | -| `/status` | GET | JSON with remaining/spent auths per upstream | -| `/healthz` | GET | Liveness probe | -| `/metrics` | GET | Prometheus metrics | - -#### 3.5.7 Error States - -| Error | Cause | Recovery | -|-------|-------|---------| -| `pre-signed auth pool exhausted` | All vouchers consumed | Agent runs `buy.py` to pre-sign more | -| `no purchased upstream mapped` | Model not in buyer config | Agent runs `buy.py probe` + `buy.py buy` | -| Payment failure from seller | Invalid/expired auth, insufficient balance | Check auth validity, top up USDC | - ---- - -### 3.6 OpenClaw and Skills - -**Source:** `internal/openclaw/openclaw.go`, `internal/openclaw/wallet.go`, `internal/openclaw/resolve.go`, `internal/embed/` - -#### 3.6.1 Purpose - -Deploy and manage the OpenClaw AI agent as a singleton Kubernetes Deployment, inject skills via host-path PVC, and manage agent wallets for on-chain operations. - -#### 3.6.2 Agent Deployment - -The agent is deployed as a singleton Deployment named `openclaw` in the `openclaw-obol-agent` namespace. Skills are delivered via host-path PVC injection to `$DATA_DIR/openclaw-/openclaw-data/.openclaw/skills/`. - -**23 embedded skills** in `internal/embed/skills/`: - -| Category | Skills | -|----------|--------| -| Infrastructure | ethereum-networks, ethereum-local-wallet, obol-stack, distributed-validators, monetize, discovery, buy-inference, maintain-inference | -| Ethereum Dev | addresses, building-blocks, concepts, gas, indexing, l2s, orchestration, security, standards, ship, testing, tools, wallets | -| Frontend | frontend-playbook, frontend-ux, qa, why | - -#### 3.6.3 Wallet Generation - -`GenerateWallet()` in `internal/openclaw/wallet.go`: - -1. Generate secp256k1 private key. -2. Derive Ethereum address (Keccak-256 of uncompressed public key, last 20 bytes). -3. Encrypt private key using Web3 V3 keystore format (scrypt KDF, AES-128-CTR cipher). -4. Write keystore JSON to `$DATA_DIR/openclaw-/keystore/`. -5. Deploy remote-signer REST API at port 9000 in the same namespace. - -#### 3.6.4 Cloud Provider Detection - -During `obol stack up`, `autoDetectCloudProvider()`: - -1. Reads `~/.openclaw/openclaw.json` for agent model preference. -2. Extracts provider from model name (e.g., `anthropic/claude-sonnet-4-6` -> `anthropic`). -3. Resolves API key: primary env var -> alt env vars -> `.env` file (dev mode). -4. Patches LiteLLM with the provider + key. - -#### 3.6.5 Version Pinning - -Three locations must agree: - -| Location | File | Format | -|----------|------|--------| -| Source of truth | `internal/openclaw/OPENCLAW_VERSION` | Plain text | -| Go constant | `internal/openclaw/openclaw.go` | `openclawImageTag` const | -| Shell constant | `obolup.sh` | `OPENCLAW_VERSION` variable | - -`TestOpenClawVersionConsistency` in `internal/openclaw/version_test.go` enforces consistency. - ---- - -### 3.7 Tunnel Management - -**Source:** `internal/tunnel/tunnel.go`, `internal/tunnel/state.go`, `internal/tunnel/provision.go`, `internal/tunnel/cloudflare.go`, `internal/tunnel/agent.go` - -#### 3.7.1 Purpose - -Manage Cloudflare tunnels that expose the cluster to the public internet, enabling remote access to x402-gated services and agent discovery endpoints. - -#### 3.7.2 Tunnel Modes - -| Mode | Activation | URL | Persistence | -|------|-----------|-----|-------------| -| `quick` | Dormant by default; activates on first `obol sell` | Random `*.trycloudflare.com` | Ephemeral (changes on restart) | -| `dns` | `obol tunnel login --hostname stack.example.com` | Stable user-controlled hostname | Persistent across restarts | - -#### 3.7.3 State - -Tunnel state is persisted at `$OBOL_CONFIG_DIR/tunnel/cloudflared.json`: - -```go -type tunnelState struct { - Mode string // "quick" or "dns" - Hostname string // e.g., "stack.example.com" - AccountID string // Cloudflare account ID - ZoneID string // Cloudflare zone ID - TunnelID string // Cloudflare tunnel ID - TunnelName string // Tunnel name - UpdatedAt time.Time // Last state update -} -``` - -#### 3.7.4 Lifecycle - -```mermaid -stateDiagram-v2 - [*] --> Dormant: obol stack up (quick mode) - Dormant --> Active: obol sell http / obol tunnel restart - Active --> Dormant: obol tunnel stop (scale to 0) - [*] --> Active: obol stack up (dns mode, auto-start) - Active --> [*]: obol stack down / purge - - state Active { - [*] --> Running - Running --> Restarting: obol tunnel restart - Restarting --> Running - } -``` - -#### 3.7.5 URL Propagation - -When a tunnel becomes active, the URL is propagated to multiple consumers: - -1. **obol-agent env:** `AGENT_BASE_URL` on the OpenClaw Deployment (for `monetize.py` registration JSON). -2. **Frontend ConfigMap:** `obol-stack-config` in `obol-frontend` namespace (dashboard URL). -3. **Agent overlay:** Helmfile state values for consistency across syncs. -4. **Storefront:** Busybox httpd landing page at the tunnel hostname root. - -#### 3.7.6 Storefront Resources - -`CreateStorefront()` deploys 4 Kubernetes resources in the `traefik` namespace: - -- `ConfigMap/tunnel-storefront`: HTML content + mime types -- `Deployment/tunnel-storefront`: busybox httpd serving the ConfigMap (5m CPU, 8Mi RAM) -- `Service/tunnel-storefront`: ClusterIP on port 8080 -- `HTTPRoute/tunnel-storefront`: Routes tunnel hostname root to the storefront - ---- - -### 3.8 ERC-8004 Identity - -**Source:** `internal/erc8004/client.go`, `internal/erc8004/types.go`, `internal/erc8004/abi.go` - -#### 3.8.1 Purpose - -Register AI agents on-chain using the ERC-8004 Identity Registry, enabling decentralized agent discovery and identity verification. - -#### 3.8.2 Contract - -| Property | Value | -|----------|-------| -| Standard | ERC-721 (IdentityRegistryUpgradeable) | -| Base Sepolia | `0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D` | -| Base Mainnet | `0x8004A169...` (abbreviated) | - -#### 3.8.3 Client Operations - -| Method | Description | -|--------|-------------| -| `Register(ctx, key, agentURI)` | Mint new agent NFT, returns `agentId` (token ID) | -| `SetAgentURI(ctx, key, agentId, uri)` | Update the agent's metadata URI | -| `SetMetadata(ctx, key, agentId, entries)` | Set on-chain metadata key-value pairs | -| `GetMetadata(ctx, agentId, key)` | Read on-chain metadata | -| `TokenURI(ctx, agentId)` | Read the agent's metadata URI | - -#### 3.8.4 Agent Registration Document - -Served at `/.well-known/agent-registration.json`: - -```go -type AgentRegistration struct { - Type string // "https://eips.ethereum.org/EIPS/eip-8004#registration-v1" - Name string // Agent name - Description string // Human-readable description - Image string // Agent icon URL - Services []ServiceDef // Endpoints (web, A2A, MCP, OASF) - X402Support bool // Always true for Obol Stack agents - Active bool // Service availability - Registrations []OnChainReg // On-chain records [{agentId, agentRegistry}] - SupportedTrust []string // ["reputation", "crypto-economic", "tee-attestation"] -} -``` - -#### 3.8.5 Error States - -| Error | Cause | Recovery | -|-------|-------|---------| -| `erc8004: dial` | RPC endpoint unreachable | Check network connectivity, verify RPC URL | -| `erc8004: register tx` | Transaction submission failed | Check wallet balance (gas), verify contract address | -| `erc8004: wait mined` | Transaction not mined | Retry, check network congestion | - ---- - -### 3.9 Standalone Inference Gateway - -**Source:** `internal/inference/gateway.go`, `internal/inference/container.go`, `internal/inference/store.go`, `internal/inference/client.go` - -#### 3.9.1 Purpose - -Provide a standalone, bare-metal OpenAI-compatible HTTP gateway with x402 payment gating and optional hardware-backed encryption (Secure Enclave or TEE). - -#### 3.9.2 Configuration - -```go -type GatewayConfig struct { - ListenAddr string // default ":8402" - UpstreamURL string // e.g., "http://localhost:11434" - WalletAddress string // USDC recipient - PricePerRequest string // default "0.001" - Chain x402.ChainConfig // default BaseSepolia - FacilitatorURL string // default "https://facilitator.x402.rs" - VerifyOnly bool // skip settlement - EnclaveTag string // macOS Secure Enclave key tag - VMMode bool // Apple Containerization VM - VMImage string // default "ollama/ollama:latest" - VMCPUs int // default 4 - VMMemoryMB int // default 8192 - VMHostPort int // default 11435 - VMBinary string // default "container" - TEEType string // "tdx", "snp", "nitro", "stub" - ModelHash string // SHA-256 of served model - NoPaymentGate bool // disable x402 (cluster mode) -} -``` - -#### 3.9.3 Middleware Stack - -The gateway composes middleware layers from innermost to outermost: - -``` -Client -> x402 Payment Gate -> Enclave/TEE Decrypt -> Reverse Proxy -> Upstream (Ollama) -``` - -| Layer | Condition | Behavior | -|-------|-----------|---------| -| x402 Payment Gate | `!NoPaymentGate` | Returns 402 for unpaid requests | -| Enclave Middleware | `EnclaveTag != ""` or `TEEType != ""` | Decrypts `application/x-obol-encrypted` bodies via SE/TEE key | -| Reverse Proxy | Always | Forwards to upstream inference service | - -#### 3.9.4 Endpoints - -| Endpoint | Auth | Description | -|----------|------|-------------| -| `GET /health` | None | Liveness probe | -| `GET /v1/enclave/pubkey` | None | SE/TEE public key (enclave mode only) | -| `GET /v1/attestation` | None | TEE attestation report (TEE mode only) | -| `POST /v1/chat/completions` | x402 | Chat completions (payment-gated) | -| `POST /v1/completions` | x402 | Text completions (payment-gated) | -| `POST /v1/embeddings` | x402 | Embeddings (payment-gated) | -| `GET /v1/models` | x402 | Model list (payment-gated) | -| `* /` | None | Passthrough to upstream | - -#### 3.9.5 VM Mode - -When `--vm` is set, the gateway: - -1. Starts an OCI container via Apple Containerization (`container` CLI). -2. Maps the container's Ollama port 11434 to `VMHostPort` on the host. -3. Overrides `UpstreamURL` with `http://localhost:`. -4. On `Stop()`, gracefully shuts down the container with a 30-second timeout. - -#### 3.9.6 Encryption Scheme (Enclave / TEE) - -**Source:** `internal/enclave/enclave.go` - -The `Key` interface provides hardware-backed P-256 key management: - -| Method | Description | -|--------|-------------| -| `PublicKeyBytes()` | Uncompressed 65-byte SEC1 public key | -| `Sign(digest)` | ECDSA signature via SE/TEE private key | -| `ECDH(peerPubKey)` | Diffie-Hellman shared secret | -| `Decrypt(ciphertext)` | Full ECIES decryption | -| `Persistent()` | Whether key survives process restart | - -**Wire format:** - -``` -[1 byte] version (0x01) -[65 bytes] uncompressed ephemeral public key -[12 bytes] AES-GCM nonce -[n bytes] ciphertext -[16 bytes] AES-GCM authentication tag -``` - -**Implementations:** - -| Platform | Backend | Source | -|----------|---------|-------| -| macOS (CGO) | Apple Secure Enclave (Security.framework) | `enclave_darwin.go` | -| Linux TEE | TDX, SNP, Nitro, or stub | `internal/tee/` | -| Other | `ErrNotSupported` | `enclave_stub.go` | - ---- - -## 4. API and Protocol Definition - -### 4.1 x402 Payment Protocol - -#### 4.1.1 Request Flow - -```mermaid -sequenceDiagram - participant Client - participant Traefik - participant Verifier as x402-verifier - participant Facilitator - participant Upstream - - Client->>Traefik: GET /services/myapi/data - Traefik->>Verifier: ForwardAuth (X-Forwarded-Uri: /services/myapi/data) - Verifier->>Verifier: Match route -> price $0.001 - Verifier-->>Traefik: 402 PaymentRequired - Traefik-->>Client: 402 + PaymentRequirements JSON - - Note over Client: Sign ERC-3009 TransferWithAuthorization - - Client->>Traefik: GET /services/myapi/data
X-PAYMENT: base64(PaymentPayload) - Traefik->>Verifier: ForwardAuth + X-PAYMENT - Verifier->>Facilitator: POST /verify (PaymentPayload) - Facilitator-->>Verifier: {valid: true} - Verifier-->>Traefik: 200 OK + Authorization header - Traefik->>Upstream: GET /data (Authorization: Bearer sk-...) - Upstream-->>Client: 200 OK + response -``` - -#### 4.1.2 PaymentRequired Response (402) - -```json -{ - "x402Version": 1, - "accepts": [ - { - "scheme": "exact", - "network": "eip155:84532", - "maxAmountRequired": "1000", - "resource": "https://seller.example.com/services/myapi/data", - "asset": "0x036CbD53842c5426634e7929541eC2318f3dCF7e", - "payTo": "0x...", - "maxTimeoutSeconds": 300 - } - ] -} -``` - -#### 4.1.3 PaymentPayload (X-PAYMENT header) - -```json -{ - "x402Version": 1, - "scheme": "exact", - "network": "eip155:84532", - "payload": { - "signature": "0x...", - "authorization": { - "from": "0x...", - "to": "0x...", - "value": "1000", - "validAfter": "0", - "validBefore": "115792089237316195423570985008687907853269984665640564039457584007913129639935", - "nonce": "0x..." - } - } -} -``` - -### 4.2 CLI Command Tree - -``` -obol -├── stack -│ ├── init [--force] [--backend k3d|k3s] -│ ├── up -│ ├── down -│ └── purge [--force] -├── agent -│ └── init -├── network -│ ├── list -│ ├── install [--id ] [flags] -│ ├── add [--allow-writes] [--endpoint ] -│ ├── remove -│ ├── status -│ ├── sync -│ └── delete -├── sell -│ ├── inference --model [--price|--per-mtok] [--vm] -│ ├── http --wallet --chain [--price|--per-request|--per-mtok] -│ │ --upstream --port --namespace [--health-path ] -│ ├── list -│ ├── status -│ ├── stop -│ ├── delete -│ ├── pricing --wallet --chain -│ └── register --name --private-key-file -├── openclaw -│ ├── onboard -│ ├── setup -│ ├── sync -│ ├── list -│ ├── delete -│ ├── dashboard -│ ├── cli -│ ├── token -│ └── skills -├── model -│ ├── setup [--provider ] [custom --name --endpoint --model] -│ └── status -├── app -│ ├── install [--id ] -│ ├── sync -│ ├── list -│ └── delete -├── tunnel -│ ├── status -│ ├── login [--hostname ] -│ ├── provision -│ ├── restart -│ └── logs [--follow] -├── kubectl (passthrough, auto KUBECONFIG) -├── helm (passthrough, auto KUBECONFIG) -├── helmfile (passthrough, auto KUBECONFIG) -├── k9s (passthrough, auto KUBECONFIG) -├── update -├── upgrade -└── version -``` - ---- - -## 5. Data Model - -### 5.1 Configuration Files - -| File | Location | Format | Purpose | -|------|----------|--------|---------| -| `.stack-id` | `$CONFIG_DIR/` | Plain text | Cluster petname identifier | -| `.stack-backend` | `$CONFIG_DIR/` | Plain text | `k3d` or `k3s` | -| `kubeconfig.yaml` | `$CONFIG_DIR/` | YAML | Kubernetes API access | -| `cloudflared.json` | `$CONFIG_DIR/tunnel/` | JSON | Tunnel state (mode, hostname, IDs) | -| `defaults/` | `$CONFIG_DIR/` | Helmfile + YAML | Infrastructure deployment manifests | -| `networks///` | `$CONFIG_DIR/` | Helmfile + YAML | Per-network deployment configs | - -### 5.2 Kubernetes Resources (by namespace) - -| Namespace | Resources | -|-----------|-----------| -| `traefik` | GatewayClass, Gateway, cloudflared Deployment, tunnel-storefront (Deployment, Service, ConfigMap, HTTPRoute) | -| `llm` | LiteLLM Deployment (+ x402-buyer sidecar), `litellm-config` ConfigMap, `litellm-secrets` Secret | -| `x402` | x402-verifier Deployment, `x402-pricing` ConfigMap, `x402-secrets` Secret, ServiceMonitor | -| `openclaw-obol-agent` | OpenClaw Deployment, remote-signer Deployment, wallet Secret, RBAC (ClusterRole, ClusterRoleBinding) | -| `erpc` | eRPC Deployment, `erpc-config` ConfigMap | -| `obol-frontend` | Frontend Deployment, `obol-stack-config` ConfigMap | -| `monitoring` | Prometheus stack | -| `-` | Execution layer, consensus layer, per-network resources | -| (cluster-scoped) | ServiceOffer CRD (`obol.org`), `openclaw-monetize` ClusterRole | - -### 5.3 ServiceOffer CRD Schema - -```yaml -apiVersion: obol.org/v1alpha1 -kind: ServiceOffer -metadata: - name: my-inference - namespace: openclaw-obol-agent -spec: - type: inference # WorkloadType: inference | fine-tuning - model: - name: qwen3.5:9b - runtime: ollama - upstream: - service: litellm - namespace: llm - port: 4000 - healthPath: /health/readiness - payment: - scheme: exact # x402 payment scheme - network: base-sepolia # Human-friendly chain name - payTo: "0x..." # USDC recipient - maxTimeoutSeconds: 300 - price: - perRequest: "0.001" # USDC per request - perMTok: "1.00" # USDC per million tokens (phase 1: /1000) - perHour: "5.00" # USDC per compute-hour (fine-tuning) - path: /services/my-inference # URL path prefix - registration: - enabled: true - name: "My Inference Agent" - description: "Sells qwen3.5:9b inference" - image: "https://example.com/icon.png" - services: - - name: web - endpoint: "" # Auto-filled from tunnel URL - supportedTrust: - - reputation -status: - conditions: - - type: ModelReady - status: "True" - - type: UpstreamHealthy - status: "True" - - type: PaymentGateReady - status: "True" - - type: RoutePublished - status: "True" - - type: Registered - status: "True" - - type: Ready - status: "True" - endpoint: "https://stack.example.com/services/my-inference" - agentId: "42" - registrationTxHash: "0x..." -``` - -### 5.4 Wallet (Web3 V3 Keystore) - -```json -{ - "address": "aabbccdd...", - "crypto": { - "cipher": "aes-128-ctr", - "ciphertext": "...", - "cipherparams": { "iv": "..." }, - "kdf": "scrypt", - "kdfparams": { "dklen": 32, "n": 262144, "r": 8, "p": 1, "salt": "..." }, - "mac": "..." - }, - "id": "uuid", - "version": 3 -} -``` - ---- - -## 6. Integration Points - -### 6.1 External Services - -| Service | Protocol | Purpose | Configuration | -|---------|----------|---------|---------------| -| Cloudflare Tunnel | HTTPS/QUIC | Public internet exposure | `obol tunnel login` / auto-provisioned | -| x402 Facilitator | HTTPS POST | Payment verification + settlement | `facilitatorURL` (default: `https://facilitator.x402.rs`) | -| ChainList API | HTTPS GET | Public RPC endpoint discovery | Used by `obol network add ` | -| Ollama API | HTTP | Local LLM inference | `http://localhost:11434` (host) | -| Anthropic API | HTTPS | Cloud LLM inference | `ANTHROPIC_API_KEY` env var | -| OpenAI API | HTTPS | Cloud LLM inference | `OPENAI_API_KEY` env var | -| Base Sepolia RPC | HTTPS | ERC-8004 registration + ERC-3009 settlement | Via eRPC or direct endpoint | - -### 6.2 Internal Service Communication - -```mermaid -graph LR - subgraph "Ingress" - T[Traefik :80/:443] - end - - subgraph "Auth" - V[x402-verifier :8080] - end - - subgraph "Compute" - L[LiteLLM :4000] - B[x402-buyer :8402] - O[Ollama :11434] - end - - subgraph "Data" - E[eRPC :4000] - EL[Execution Layer :8545] - end - - subgraph "Agent" - A[OpenClaw] - RS[Remote Signer :9000] - end - - T -->|ForwardAuth| V - T -->|upstream| L - T -->|local| E - L -->|ollama/*| O - L -->|paid/*| B - B -->|x402| Internet((Internet)) - E --> EL - A --> RS - A --> L -``` - ---- - -## 7. Security Model - -### 7.1 Tunnel Exposure - -The Cloudflare tunnel is the primary attack surface. The security model ensures only intentionally public endpoints are reachable via the tunnel. - -| Route | Exposure | Protection | -|-------|----------|-----------| -| `/services/*` | Public via tunnel | x402 payment gate (ForwardAuth) | -| `/.well-known/agent-registration.json` | Public via tunnel | Read-only, no sensitive data | -| `/skill.md` | Public via tunnel | Read-only service catalog | -| `/` (tunnel hostname) | Public via tunnel | Static HTML storefront | -| `/` (obol.stack) | Local only | `hostnames: ["obol.stack"]` restriction | -| `/rpc` | Local only | `hostnames: ["obol.stack"]` restriction | -| LiteLLM admin | Local only | Not exposed via any HTTPRoute | -| Prometheus | Local only | `hostnames: ["obol.stack"]` restriction | - -**Invariants (NEVER violate):** - -- Frontend and eRPC HTTPRoutes MUST have `hostnames: ["obol.stack"]`. -- Internal services MUST NOT have HTTPRoutes without hostname restrictions. -- The frontend, RPC gateway, monitoring, and LiteLLM admin MUST NOT be reachable via the tunnel. - -### 7.2 Payment Security - -| Property | Mechanism | -|----------|-----------| -| Payment integrity | EIP-712 typed signatures verified by facilitator | -| Replay protection | Random 32-byte nonces in ERC-3009 authorizations | -| Bounded spending (buyer) | Finite pool of pre-signed auths; max loss = N * price | -| Zero signer access (buyer) | Sidecar has no private key; only pre-signed vouchers | -| Facilitator HTTPS | `ValidateFacilitatorURL()` enforces HTTPS (loopback exempted) | -| Settlement verification | Facilitator verifies on-chain before confirming | - -### 7.3 Wallet Security - -| Property | Mechanism | -|----------|-----------| -| Key generation | secp256k1 via `crypto/rand` | -| Key storage | Web3 V3 keystore (scrypt KDF + AES-128-CTR) | -| Key access | Remote-signer REST API (port 9000, in-namespace only) | -| Enclave keys | Apple Secure Enclave (P-256, private key never leaves hardware) | -| TEE keys | Generated inside TEE (TDX/SNP/Nitro), bound to attestation | -| Wallet backup | `PromptBackupBeforePurge()` before destructive operations | - -### 7.4 Enclave / TEE Security - -| Property | macOS Secure Enclave | Linux TEE | -|----------|---------------------|-----------| -| Key generation | In-hardware (SEP) | In-enclave | -| Private key access | Never exported | Never exported | -| SIP requirement | `CheckSIP()` enforced | N/A | -| Attestation | N/A | Hardware-signed quote binding pubkey + model hash | -| Persistence | Keychain (persistent or ephemeral) | Per-enclave instance | - -### 7.5 RBAC - -The `openclaw-monetize` ClusterRole grants the OpenClaw agent CRUD access to: - -- ServiceOffers (`obol.org`) -- Middlewares (`traefik.io`) -- HTTPRoutes (`gateway.networking.k8s.io`) -- ConfigMaps, Services, Deployments (core) -- Read-only: Pods, Endpoints, logs - -Bound to ServiceAccount `openclaw` in `openclaw-obol-agent` namespace via ClusterRoleBinding. Patched by `obol agent init` via `patchMonetizeBinding()`. - ---- - -## 8. Error Handling - -### 8.1 Error Handling Strategy - -The codebase uses a layered error handling approach: - -| Layer | Strategy | -|-------|---------| -| CLI commands | Return `error` to `urfave/cli` which prints and exits non-zero | -| Non-fatal operations | Log warning via `u.Warnf()`, continue execution | -| Infrastructure deployment | Fatal: auto-cleanup via `Down()` on helmfile sync failure | -| Config hot-reload | Log error, keep previous config (verifier, buyer) | -| Network operations | `kubectl.EnsureCluster()` guard at entry points | - -### 8.2 Graceful Degradation - -| Component | Failure Mode | Behavior | -|-----------|-------------|---------| -| Ollama not running | Auto-configure skipped | LiteLLM starts without local models; user can add later | -| Cloud API key missing | Warning printed | Provider not configured; manual `obol model setup` possible | -| OpenClaw setup fails | Warning printed | User can run `obol openclaw onboard` manually | -| Tunnel not available | Warning printed | Services work locally; sell commands will start tunnel on demand | -| DNS resolver fails | Warning printed | `obol.stack` hostname resolution may not work; IP access still works | -| Pre-signed auths exhausted | 404 for model | Agent must pre-sign more via `buy.py` | - -### 8.3 Atomic Operations - -| Operation | Atomicity Mechanism | -|-----------|-------------------| -| Config reload (verifier) | `atomic.Pointer` swap | -| Config reload (buyer) | Mutex-guarded `Reload()` rebuilds all handlers | -| Auth consumption | Mutex-guarded pop from pool with `onConsume` callback | -| Tunnel state | File write with `0600` permissions | -| Backend switching | Destroy old backend before initializing new one | - ---- - -## 9. Performance - -### 9.1 Resource Allocation - -| Component | CPU Request | Memory Request | CPU Limit | Memory Limit | -|-----------|-----------|---------------|----------|-------------| -| Storefront httpd | 5m | 8Mi | 20m | 16Mi | -| x402-verifier | (cluster default) | (cluster default) | -- | -- | -| LiteLLM | (cluster default) | (cluster default) | -- | -- | -| x402-buyer sidecar | (cluster default) | (cluster default) | -- | -- | -| OpenClaw agent | (cluster default) | (cluster default) | -- | -- | - -### 9.2 Caching - -| Cache | TTL | Purpose | -|-------|-----|---------| -| eRPC `eth_call` | 10s (unfinalized) | Avoid redundant RPC calls | -| x402-verifier chain resolution | Permanent (per-load) | Pre-resolve all chain configs during `load()` | -| LiteLLM model routing | Permanent (until restart) | Static model_list in ConfigMap | - -### 9.3 Hot Paths - -| Path | Optimization | -|------|-------------| -| x402 ForwardAuth verify | `atomic.Pointer` for lock-free config reads; pre-resolved chain map | -| x402-buyer auth pop | Single mutex lock per Sign() call; O(1) pool pop | -| Route matching | First-match short-circuit; no regex compilation per request | -| Buyer model routing | `sync.RWMutex` for concurrent reads; rebuild only on Reload() | - -### 9.4 Known Latencies - -| Operation | Typical Latency | Notes | -|-----------|----------------|-------| -| ConfigMap propagation | 60-120s | k3d file watcher interval | -| Quick tunnel URL | 10-20s | Cloudflare registration after pod start | -| x402 facilitator verify | 100-500ms | Network round-trip to facilitator | -| Helmfile sync (initial) | 2-5min | Full infrastructure deployment | -| LiteLLM restart | 10-30s | Pod termination + startup | - ---- - -## 10. Testing Strategy - -### 10.1 Test Organization - -| Category | Build Tag | Location | Prerequisites | -|----------|----------|----------|---------------| -| Unit tests | (none) | `*_test.go` alongside source | `go test ./...` | -| Integration tests | `integration` | `internal/openclaw/integration_test.go` | Running cluster + Ollama + `OBOL_DEVELOPMENT=true` | -| BDD tests | `integration` | `internal/x402/bdd_integration_test.go` | Running cluster | - -### 10.2 Unit Test Coverage - -| Package | Key Test Files | Coverage Focus | -|---------|---------------|----------------| -| `internal/x402` | `config_test.go`, `verifier_test.go`, `matcher_test.go`, `validate_test.go`, `watcher_test.go` | Pricing config parsing, route matching, ForwardAuth responses, HTTPS validation | -| `internal/x402/buyer` | `signer_test.go`, `proxy_test.go` | Auth pool exhaustion, model resolution, payment attachment | -| `internal/erc8004` | `abi_test.go`, `client_test.go`, `types_test.go` | ABI encoding, registration document schema | -| `internal/schemas` | `serviceoffer_test.go`, `payment_test.go` | CRD field validation, price approximation | -| `internal/network` | `erpc_test.go`, `chainlist_test.go`, `resolve_test.go` | ConfigMap patching, chain resolution | -| `internal/model` | `model_test.go` | Provider detection, model entry building | -| `internal/stack` | `stack_test.go`, `backend_test.go`, `backend_k3s_test.go` | Backend abstraction, port checking | -| `internal/openclaw` | `wallet_test.go`, `wallet_backup_test.go`, `overlay_test.go`, `version_test.go`, `resolve_test.go` | Keystore generation, version consistency, instance resolution | -| `internal/inference` | `gateway_test.go`, `store_test.go`, `client_test.go`, `enclave_middleware_test.go` | Gateway handler, deployment persistence, encryption middleware | -| `internal/tunnel` | `tunnel_test.go` | URL parsing, state management | -| `internal/embed` | `embed_crd_test.go` | CRD + RBAC validation of embedded manifests | -| `cmd/obol` | `sell_test.go` | CLI flag parsing and validation | - -### 10.3 Integration Tests - -Integration tests use `//go:build integration` and require: - -```bash -export OBOL_DEVELOPMENT=true -export OBOL_CONFIG_DIR=$(pwd)/.workspace/config -export OBOL_BIN_DIR=$(pwd)/.workspace/bin -export OBOL_DATA_DIR=$(pwd)/.workspace/data -go build -o .workspace/bin/obol ./cmd/obol -go test -tags integration -v -timeout 15m ./internal/openclaw/ -``` - -**Key integration test:** `TestIntegration_Tunnel_SellDiscoverBuySidecar_QuotaAndBalance` validates the full paid-inference commerce loop (requires `qwen3.5:9b` model): - -1. Sell inference via `obol sell http` -2. Discover service via tunnel -3. Buy inference using pre-signed auths -4. Verify quota consumption and balance - -### 10.4 BDD Tests - -Gherkin-style BDD tests in `internal/x402/features/` exercise the x402 payment flow end-to-end using `godog`: - -- Payment verification happy path -- Payment rejection (insufficient funds, wrong chain) -- Route matching edge cases -- Config hot-reload during operation - -### 10.5 Version Consistency Tests - -`TestOpenClawVersionConsistency` in `internal/openclaw/version_test.go` reads all 3 version-pinning locations and fails if they disagree. This prevents version drift between the Go binary and the shell installer. - -### 10.6 Running Tests - -```bash -# All unit tests -go test ./... - -# Single test -go test -v -run 'TestMatchRoute' ./internal/x402/ - -# Integration tests (requires running cluster) -go test -tags integration -v -timeout 15m ./internal/openclaw/ - -# Full commerce loop (requires qwen3.5:9b) -go test -tags integration -v -run TestIntegration_Tunnel_SellDiscoverBuySidecar_QuotaAndBalance \ - -timeout 30m ./internal/openclaw/ - -# Check compilation only -go build ./... -``` diff --git a/docs/specs/adr/0001-local-first-k3d.md b/docs/specs/adr/0001-local-first-k3d.md deleted file mode 100644 index 99c10872..00000000 --- a/docs/specs/adr/0001-local-first-k3d.md +++ /dev/null @@ -1,62 +0,0 @@ -# ADR-0001: Local-First Kubernetes via k3d - -**Status:** Accepted -**Date:** 2026-03-27 - -## Context - -Obol Stack needs a reproducible local Kubernetes cluster that supports: - -- Port forwarding from the host (80, 443, 8080, 8443) for Traefik ingress. -- Docker image import for locally built images (x402-verifier, x402-buyer) during development. -- Fast startup times (under 60 seconds) for developer iteration. -- Consistent behavior across macOS and Linux. -- Access to host services (Ollama) from within the cluster. - -The main alternatives considered were: - -| Option | Pros | Cons | -|--------|------|------| -| **k3d** | Docker-based, fast startup, native image import, multi-platform, port mapping via k3d config | Requires Docker, k3s-only | -| **minikube** | Multi-driver (Docker, HyperKit, VirtualBox), wide adoption | Slower startup, heavier resource usage, image import via registry or `minikube image load` | -| **kind** | Docker-based, widely used for CI | No native port mapping (requires manual extraPortMappings), no built-in Ollama host routing | -| **bare-metal k3s** | No Docker dependency, direct host access | Requires root or systemd, harder to isolate, no image import | - -## Decision - -Use **k3d** as the default Kubernetes backend for local development and operation. - -Additionally, implement a `Backend` interface (`internal/stack/backend.go`) to abstract the runtime, allowing a secondary `K3sBackend` for bare-metal deployments where Docker is unavailable (e.g., production edge nodes). - -## Rationale - -1. **Port forwarding**: k3d natively maps host ports to the k3s server in its YAML config, avoiding manual iptables or NodePort workarounds. -2. **Image import**: `k3d image import` loads locally built Docker images directly into the cluster, critical for `OBOL_DEVELOPMENT=true` builds of x402-verifier and x402-buyer. -3. **Fast startup**: k3d cluster creation completes in 10-30 seconds, compared to 60-120 seconds for minikube. -4. **Host access**: k3d provides `host.docker.internal` (macOS) and `host.k3d.internal` (Linux) for Ollama connectivity. -5. **k3s compatibility**: k3d wraps k3s, so manifests placed in `/var/lib/rancher/k3s/server/manifests/` auto-apply on startup -- used for infrastructure deployment. - -## Consequences - -### Positive - -- Reproducible single-cluster setup with a declarative k3d YAML config. -- `obol stack up` reliably creates, configures, and tears down clusters. -- Development workflow is fast: build image, import, restart pod. -- The `Backend` interface means k3s bare-metal is also supported without code duplication. - -### Negative - -- **Docker dependency**: Operators must have Docker or Podman running. This excludes minimal environments without containerization. -- **Single cluster**: One k3d cluster per config directory. Multiple stacks require separate `OBOL_CONFIG_DIR` values. -- **Port conflicts**: k3d binds host ports 80/443/8080/8443 directly; other services using these ports cause startup failure. -- **Kubeconfig port drift**: The k3d API server port can change between cluster restarts, requiring `k3d kubeconfig write` to refresh. -- **ConfigMap propagation delay**: k3d's file watcher introduces 60-120 second delays for manifest changes placed in the k3s manifests directory. -- **Ollama host resolution varies**: `host.docker.internal` on macOS, `host.k3d.internal` on Linux, `127.0.0.1` for k3s -- resolved at `obol stack init` time. - -## SPEC References - -- Section 2.4 -- Backend Abstraction -- Section 3.1 -- Stack Lifecycle -- Section 1.3 -- System Constraints (absolute paths, single cluster) -- Section 3.1.4 -- Ollama Host Resolution diff --git a/docs/specs/adr/0002-litellm-gateway.md b/docs/specs/adr/0002-litellm-gateway.md deleted file mode 100644 index bc6475b2..00000000 --- a/docs/specs/adr/0002-litellm-gateway.md +++ /dev/null @@ -1,62 +0,0 @@ -# ADR-0002: LiteLLM as Unified LLM Gateway - -**Status:** Accepted -**Date:** 2026-03-27 - -## Context - -The OpenClaw agent and cluster services need to access LLM inference from multiple providers: - -- **Ollama** (local, no API key) for on-device models like qwen3.5:9b. -- **Anthropic** (cloud, API key) for Claude models. -- **OpenAI** (cloud, API key) for GPT models. -- **Paid remote sellers** (x402-gated) for purchased inference from other agents. - -The application layer (OpenClaw, LiteLLM overlays, downstream apps) should not need to know which provider serves a given model. A single OpenAI-compatible endpoint simplifies routing, auth, and configuration. - -Alternatives considered: - -| Option | Pros | Cons | -|--------|------|------| -| **LiteLLM** | OpenAI-compatible proxy, multi-provider, ConfigMap-driven, wildcard routing | `drop_params` behavior can silently discard unsupported fields, restart required for config changes | -| **Direct provider SDKs** | No proxy overhead, full parameter control | Each consumer must handle auth + routing per provider, no unified API | -| **vLLM / llm-d** | High-performance serving, GPU scheduling | Different abstraction layer (model serving, not routing); evaluated and rejected for this role | -| **Custom proxy** | Full control | Maintenance burden, reimplements LiteLLM's model routing | - -## Decision - -Use **LiteLLM** (deployed as a Kubernetes Deployment in the `llm` namespace on port 4000) as the unified LLM gateway for all inference routing. - -## Rationale - -1. **Single API surface**: All consumers (OpenClaw agent, apps, tests) use `http://litellm.llm.svc:4000/v1` with standard OpenAI client libraries. -2. **Multi-provider routing**: LiteLLM's `model_list` supports exact names (Ollama models), wildcards (`anthropic/*`, `openai/*`), and catch-alls (`paid/*`). -3. **ConfigMap-driven**: The `litellm-config` ConfigMap and `litellm-secrets` Secret are patched by Go code (`internal/model/model.go`) without forking LiteLLM. -4. **Auto-configuration**: During `obol stack up`, `autoConfigureLLM()` detects Ollama models and cloud API keys, patches config + secret, and performs a single restart. -5. **Paid inference integration**: The static `paid/*` route forwards to the `x402-buyer` sidecar at `http://127.0.0.1:8402/v1`, keeping the LiteLLM image unmodified. -6. **Per-instance overlay**: `buildLiteLLMRoutedOverlay()` reuses the "ollama" provider slot pointing at `litellm.llm.svc:4000/v1`, enabling app-level model aliasing without additional infrastructure. - -## Consequences - -### Positive - -- Unified endpoint for all LLM access -- no provider-specific client code needed. -- Adding a new provider is a ConfigMap patch + Secret update + restart. -- Paid inference works through vanilla LiteLLM with a static route to the buyer sidecar. -- `dangerouslyDisableDeviceAuth` is enabled for Traefik-proxied access, avoiding auth double-gate. - -### Negative - -- **`drop_params` risk**: LiteLLM silently drops parameters not supported by the target provider. This can cause subtle behavior differences between providers for the same model name. -- **Restart required**: Config changes require a Deployment restart (10-30 second latency). There is no live-reload mechanism. -- **Single point of failure**: All inference routes through one LiteLLM pod. Pod failure means no inference until restart. -- **ConfigMap complexity**: The `litellm-config` ConfigMap grows with every provider and model. Patching logic in `internal/model/model.go` must handle merges carefully. -- **Version coupling**: Pinned LiteLLM image (v1.82.3 as of writing, pinned for supply chain security) must be updated when new provider features are needed. - -## SPEC References - -- Section 3.2 -- LLM Routing -- Section 3.2.4 -- Logic (autoConfigureLLM, paid inference routing) -- Section 3.2.5 -- LiteLLM Config Structure -- Section 3.5 -- Monetize Buy Side (paid/* route) -- Section 3.6.4 -- Cloud Provider Detection diff --git a/docs/specs/adr/0003-x402-payment-gating.md b/docs/specs/adr/0003-x402-payment-gating.md deleted file mode 100644 index f67e414c..00000000 --- a/docs/specs/adr/0003-x402-payment-gating.md +++ /dev/null @@ -1,65 +0,0 @@ -# ADR-0003: x402 Payment Gating for Services - -**Status:** Accepted -**Date:** 2026-03-27 - -## Context - -Obol Stack needs a mechanism for operators to monetize cluster services (inference, HTTP endpoints). The payment system must be: - -- **Permissionless**: No API key registration, no account creation, no subscription management. -- **Per-request**: Each request is independently priced and paid for. -- **Gasless for buyers**: Buyers should not need to pay blockchain gas for every inference request. -- **Machine-to-machine**: AI agents must be able to pay autonomously without human interaction. -- **Composable**: Payment gating should work with any HTTP service behind Traefik, not just inference. - -Alternatives considered: - -| Option | Pros | Cons | -|--------|------|------| -| **x402 (HTTP 402)** | Permissionless, per-request, gasless via ERC-3009, standard HTTP, agent-native | Facilitator dependency, USDC-only, limited chain support | -| **API keys** | Simple, widely understood | Requires user registration, key management, not agent-native | -| **Stripe/subscriptions** | Established, fiat currency | Requires merchant account, not permissionless, not agent-to-agent | -| **Lightning Network** | Per-request micropayments, mature | Bitcoin-only, requires channel management, different user base | -| **State channels** | Low latency, off-chain | Complex setup, requires both parties online, custom protocol | - -## Decision - -Use the **x402 protocol** (HTTP 402 Payment Required) with **ERC-3009** (TransferWithAuthorization) for gasless USDC micropayments, implemented via **Traefik ForwardAuth** middleware. - -## Rationale - -1. **HTTP-native**: x402 uses standard HTTP 402 status codes. Any HTTP client can discover pricing by making an unauthenticated request. Payment is attached as an `X-PAYMENT` header. -2. **Gasless for buyers**: ERC-3009 `TransferWithAuthorization` allows pre-signed USDC transfers. The buyer signs once; the facilitator settles on-chain. No gas from the buyer. -3. **Traefik ForwardAuth**: The x402-verifier runs as a ForwardAuth middleware. Every request matching a Middleware is sent to `POST /verify`. This cleanly separates payment from business logic -- the upstream service never sees payment details. -4. **Facilitator delegation**: Payment verification and settlement are delegated to a trusted facilitator (`https://facilitator.x402.rs`). This simplifies the verifier to a stateless proxy. -5. **Multi-chain support**: The system supports Base, Polygon, and Avalanche (mainnet + testnet). Chain configuration is per-route. -6. **Agent-native**: AI agents can programmatically discover pricing (402 response), sign payments (ERC-3009), and consume services without human intervention. - -## Consequences - -### Positive - -- Any HTTP service can be monetized by adding a ServiceOffer CR -- no code changes to the upstream. -- Agents discover pricing automatically via the 402 response. -- USDC stablecoin avoids cryptocurrency price volatility. -- The ForwardAuth pattern means payment logic is fully decoupled from service logic. -- Route-level pricing: different paths can have different prices, wallets, and chains. - -### Negative - -- **Facilitator dependency**: Payment verification requires the facilitator to be reachable. If `facilitator.x402.rs` is down, all paid requests fail. No offline fallback exists. -- **USDC-only**: Only USDC is supported as the payment asset. Other stablecoins or tokens require facilitator support. -- **Limited chain support**: Only 6 chains (3 mainnets + 3 testnets) are supported. Adding new chains requires code changes to the chain resolution logic. -- **Phase 1 pricing approximation**: `perMTok` pricing is approximated as `perMTok / 1000` (fixed 1000 tokens per request). Exact token metering is deferred to phase 2. -- **HTTPS requirement**: The facilitator URL must use HTTPS (loopback exempted for testing). This prevents local-only facilitator setups without TLS. -- **Settlement latency**: Facilitator verification adds 100-500ms per request. This is acceptable for inference but may be too slow for high-frequency API calls. - -## SPEC References - -- Section 3.4 -- Monetize Sell Side -- Section 4.1 -- x402 Payment Protocol -- Section 3.4.4 -- x402-verifier (ForwardAuth) -- Section 3.4.5 -- Pricing -- Section 3.4.6 -- Supported Chains -- Section 7.2 -- Payment Security diff --git a/docs/specs/adr/0004-pre-signed-erc3009-buyer.md b/docs/specs/adr/0004-pre-signed-erc3009-buyer.md deleted file mode 100644 index e173f450..00000000 --- a/docs/specs/adr/0004-pre-signed-erc3009-buyer.md +++ /dev/null @@ -1,61 +0,0 @@ -# ADR-0004: Pre-Signed ERC-3009 Voucher Pool for Buy-Side Payments - -**Status:** Accepted -**Date:** 2026-03-27 - -## Context - -The OpenClaw agent needs to purchase inference from remote x402-gated sellers. The buy-side payment mechanism must satisfy: - -- **No hot wallet in the sidecar**: The x402-buyer sidecar must never have access to a private key. A compromised sidecar should not drain the wallet. -- **Bounded spending**: The maximum possible loss must be known and capped at deployment time. -- **Low latency**: Payment attachment must not add significant overhead to each inference request. -- **Restart resilience**: Consumed vouchers must not be reused after a sidecar restart. - -Alternatives considered: - -| Option | Pros | Cons | -|--------|------|------| -| **Pre-signed ERC-3009 vouchers** | Zero signer in sidecar, bounded loss (N * price), O(1) per request | Finite pool requires replenishment, storage in ConfigMap | -| **Hot wallet in sidecar** | Sign on demand, no pool management | Compromised sidecar = drained wallet, unbounded loss | -| **Allowance (ERC-20 approve)** | Standard pattern, no pre-signing | Unbounded spending once approved, requires revocation | -| **Permit (ERC-2612)** | Gasless approval | Still requires a signer for each permit, not supported by all tokens | -| **Payment channel** | Amortized gas, high throughput | Complex setup, requires both parties online, custom protocol | - -## Decision - -Pre-sign a **bounded batch of ERC-3009 `TransferWithAuthorization`** vouchers using the agent wallet (via `buy.py`), store them in Kubernetes ConfigMaps, and have the `x402-buyer` sidecar pop one voucher per paid request. - -## Rationale - -1. **Zero signer access**: The sidecar only reads from ConfigMaps. It has no private key, no signing capability, no wallet access. The `PreSignedSigner` implements the `x402.Signer` interface by popping from a finite pool. -2. **Bounded loss**: If the sidecar is compromised or misbehaves, the maximum loss is exactly `N * price` where N is the number of pre-signed vouchers. This is decided at `buy.py buy --count N` time. -3. **O(1) per request**: Popping a voucher is a mutex-guarded array pop. No cryptographic operations at request time. No network calls for signing. -4. **Restart resilience**: The `StateStore` persists consumed nonces. On restart, the sidecar reloads the state and skips already-consumed vouchers. -5. **ConfigMap-native**: Vouchers and upstream config are standard Kubernetes ConfigMaps, managed by `buy.py` (the agent's buy skill). No custom storage backend. -6. **Separation of concerns**: The agent (`buy.py`) handles discovery, negotiation, and pre-signing. The sidecar handles only payment attachment and forwarding. LiteLLM routes via the static `paid/*` entry. - -## Consequences - -### Positive - -- Security posture is strong: compromised sidecar has no signing capability and bounded financial exposure. -- The sidecar is stateless except for the consumed-nonce tracker. Scaling or replacing it is trivial. -- `buy.py` can pre-sign vouchers for multiple sellers, each with different prices and chains. -- The LiteLLM configuration is static (`paid/* -> :8402`); no dynamic reconfiguration needed per seller. - -### Negative - -- **Pool exhaustion**: When all vouchers are consumed, the sidecar returns `pre-signed auth pool exhausted`. The agent must run `buy.py` again to replenish. There is no automatic replenishment. -- **ConfigMap size limits**: Kubernetes ConfigMaps have a ~1MB limit. Each voucher is ~500 bytes of JSON, so the practical limit is ~2000 vouchers per ConfigMap. Large pools may need sharding. -- **No partial spending**: Each voucher is for a fixed amount. If the seller's price changes, existing vouchers may become invalid (underpayment) or wasteful (overpayment). -- **Nonce tracking persistence**: The `StateStore` must survive restarts. If the state file is lost, there is a risk of attempting to reuse consumed nonces (which will fail on-chain, but wastes a request). -- **Double-spend prevention is on-chain**: The ERC-3009 contract itself prevents double-spend. If two sidecars share the same pool, only the first submission of each nonce succeeds. - -## SPEC References - -- Section 3.5 -- Monetize Buy Side -- Section 3.5.3 -- Architecture (zero signer access, bounded spending) -- Section 3.5.4 -- Configuration (ConfigMap structure) -- Section 3.5.7 -- Error States (pool exhaustion) -- Section 7.2 -- Payment Security (bounded spending, replay protection) diff --git a/docs/specs/adr/0005-traefik-gateway-api.md b/docs/specs/adr/0005-traefik-gateway-api.md deleted file mode 100644 index 1e6f2b69..00000000 --- a/docs/specs/adr/0005-traefik-gateway-api.md +++ /dev/null @@ -1,62 +0,0 @@ -# ADR-0005: Traefik with Kubernetes Gateway API - -**Status:** Accepted -**Date:** 2026-03-27 - -## Context - -Obol Stack requires an ingress layer that supports: - -- **Per-route middleware**: x402 ForwardAuth must apply only to `/services/*` routes, not to all traffic. -- **Hostname-based access control**: Internal services (frontend, eRPC, monitoring) must be restricted to `obol.stack` hostname, while public routes (x402-gated services, discovery endpoints) must be accessible via the Cloudflare tunnel hostname. -- **Dynamic route creation**: The monetize reconciler creates HTTPRoutes programmatically when ServiceOffers reach the RoutePublished stage. -- **Standard CRDs**: Routes should be managed as Kubernetes resources with ownerReferences for automatic garbage collection. - -Alternatives considered: - -| Option | Pros | Cons | -|--------|------|------| -| **Traefik + Gateway API** | Per-route middleware via Middleware CRD, hostname filtering on HTTPRoute, standard K8s Gateway API CRDs, built into k3s | Traefik-specific Middleware CRD (`traefik.io`), newer API surface | -| **Traefik + Ingress** | Simple, widely supported | No per-route middleware (annotations are per-Ingress), hostname restrictions are less granular | -| **Nginx Ingress** | Mature, widely deployed | No native ForwardAuth per route (requires custom annotations), no Gateway API support in standard controller | -| **Istio service mesh** | Full mTLS, advanced routing | Heavy resource footprint, complex for a local-first stack, overkill for HTTP routing | -| **Envoy Gateway** | Gateway API native | Less mature, no built-in ForwardAuth equivalent, additional deployment | - -## Decision - -Use **Traefik** as the cluster ingress controller with the **Kubernetes Gateway API** (GatewayClass, Gateway, HTTPRoute) for routing, combined with Traefik-specific **Middleware** CRDs (`traefik.io`) for ForwardAuth. - -## Rationale - -1. **Built into k3s**: Traefik is the default ingress controller in k3s/k3d. No additional installation or configuration needed. -2. **Gateway API HTTPRoute**: The `HTTPRoute` CRD supports `hostnames` filtering natively. Setting `hostnames: ["obol.stack"]` on internal routes ensures they are never matched by tunnel traffic (which arrives with the tunnel hostname). -3. **ForwardAuth Middleware**: Traefik's `Middleware` CRD (`traefik.io/v1alpha1`) supports `forwardAuth` configuration. The x402-verifier is referenced as a ForwardAuth target on per-route HTTPRoutes, so only `/services/*` traffic is payment-gated. -4. **OwnerReferences**: HTTPRoutes and Middlewares created by the monetize reconciler set ownerReferences to the ServiceOffer CR. Deleting a ServiceOffer cascades deletion to all routing resources. -5. **Single Gateway**: One `Gateway` resource (`traefik-gateway` in `traefik` namespace) handles all HTTP/HTTPS traffic. Routes reference it via `parentRefs`. -6. **Security by default**: The hostname restriction pattern makes it structurally impossible to accidentally expose internal services via the tunnel. Adding a new internal service requires explicitly setting `hostnames: ["obol.stack"]`. - -## Consequences - -### Positive - -- Clean separation between local-only routes (hostname-restricted) and public routes (no hostname restriction). -- The reconciler creates standard Kubernetes resources (HTTPRoute, Middleware) that are visible via `kubectl` and benefit from RBAC. -- ForwardAuth is applied per-route, not globally. Free routes (health, discovery) bypass the verifier entirely. -- Automatic garbage collection via ownerReferences prevents orphaned routes when ServiceOffers are deleted. -- The routing architecture is auditable: `kubectl get httproutes -A` shows all routes with their hostname restrictions. - -### Negative - -- **Traefik-specific Middleware**: The `Middleware` CRD is not part of the standard Gateway API. This couples the stack to Traefik. Migrating to another Gateway API controller would require replacing ForwardAuth with a different mechanism. -- **ExternalName incompatibility**: Traefik's Gateway API implementation does not support `ExternalName` Services. All upstreams must use `ClusterIP` + `Endpoints`, which required workarounds for cross-namespace routing. -- **GatewayClass singleton**: Only one `GatewayClass` (`traefik`) exists. Multi-tenant scenarios with different ingress controllers are not supported. -- **No mTLS**: Traefik in this configuration does not provide mutual TLS between services. Inter-service communication within the cluster is unencrypted (acceptable for a local-first stack). -- **Hostname discipline required**: Developers must remember to add `hostnames: ["obol.stack"]` to every internal HTTPRoute. The SPEC and CLAUDE.md document this as a security invariant, and code review must enforce it. - -## SPEC References - -- Section 2.2 -- Routing Architecture -- Section 3.4.4 -- x402-verifier (ForwardAuth) -- Section 7.1 -- Tunnel Exposure (security model, hostname restrictions) -- Section 5.2 -- Kubernetes Resources (traefik namespace) -- Section 7.5 -- RBAC (Middleware CRD access) diff --git a/docs/specs/adr/0006-erc8004-identity.md b/docs/specs/adr/0006-erc8004-identity.md deleted file mode 100644 index cdee0afa..00000000 --- a/docs/specs/adr/0006-erc8004-identity.md +++ /dev/null @@ -1,67 +0,0 @@ -# ADR-0006: ERC-8004 NFT-Based Identity Registry - -**Status:** Accepted -**Date:** 2026-03-27 - -## Context - -AI agents deployed via Obol Stack need a decentralized identity mechanism that supports: - -- **On-chain discoverability**: Other agents and users should be able to find and verify an agent's identity using public blockchain data. -- **Metadata storage**: The identity should carry structured metadata (name, description, services, trust mechanisms) that is machine-readable. -- **Ownership and control**: The agent operator must control the identity, with the ability to update metadata and transfer ownership. -- **Integration with x402**: The identity should declare x402 payment support so buyers know the agent accepts micropayments. -- **Graceful degradation**: Registration should work even without on-chain funds, falling back to off-chain-only discovery. - -Alternatives considered: - -| Option | Pros | Cons | -|--------|------|------| -| **ERC-8004 Identity Registry** | NFT-based (ERC-721), on-chain metadata, purpose-built for agents, `.well-known` discovery | Base Sepolia deployment, NFT mint cost (gas), newer standard | -| **ENS (Ethereum Name Service)** | Established, human-readable names | Ethereum mainnet gas costs, annual renewal, no structured agent metadata | -| **DID (Decentralized Identifiers)** | W3C standard, multi-chain | No single registry, resolution complexity, no native NFT ownership | -| **Custom registry contract** | Full control over schema | Maintenance burden, no ecosystem adoption, reinvents the wheel | -| **DNS TXT records** | Simple, widely supported | Centralized, no ownership proof, no structured metadata | - -## Decision - -Use **ERC-8004** (`IdentityRegistryUpgradeable`, an ERC-721 contract) on **Base Sepolia** (testnet) and **Base Mainnet** for on-chain agent identity registration, combined with a `.well-known/agent-registration.json` endpoint for HTTP-based discovery. - -## Rationale - -1. **Purpose-built for agents**: ERC-8004 defines a standard schema for agent identity with metadata, services, trust mechanisms, and x402 support declaration. It is designed for the agent economy, not adapted from another use case. -2. **NFT ownership**: Each agent gets an ERC-721 token. The token holder controls the identity. This integrates naturally with wallet-based operations (the same wallet that receives x402 payments owns the identity). -3. **On-chain + off-chain**: The on-chain registration stores the `agentURI` pointing to `/.well-known/agent-registration.json`. The JSON document contains the full metadata. This hybrid approach keeps gas costs low while providing rich metadata. -4. **Base L2**: Deploying on Base (an Ethereum L2) keeps gas costs low compared to Ethereum mainnet. Base Sepolia is used for testnet development. -5. **Graceful degradation**: If the wallet lacks ETH for gas, the system falls back to `OffChainOnly` mode. The `.well-known` endpoint is still served and the agent is discoverable via HTTP, but no on-chain record exists. When funded, the agent can upgrade to full on-chain registration. -6. **`.well-known` convention**: The `/.well-known/agent-registration.json` endpoint follows established web conventions (RFC 8615). Any HTTP client can discover the agent's capabilities without blockchain access. - -## Consequences - -### Positive - -- Agents are discoverable both on-chain (via ERC-8004 registry queries) and off-chain (via HTTP `.well-known`). -- The identity is controlled by the operator's wallet -- no centralized authority can revoke it. -- The `AgentRegistration` JSON schema includes `x402Support: true`, enabling automated buyer discovery. -- The `services` array supports multiple endpoint types (web, A2A, MCP, OASF), making the identity extensible. -- The `supportedTrust` array declares trust mechanisms (reputation, crypto-economic, tee-attestation), enabling trust-aware agent interactions. -- OffChainOnly degradation means the monetize flow is never blocked by lack of gas funds. - -### Negative - -- **NFT mint cost**: Registering an agent requires ETH for gas on Base Sepolia/Mainnet. While cheap on L2, it is not free. -- **Base chain dependency**: The identity is tied to the Base network. Agents on other chains would need bridge or multi-chain registration (not currently supported). -- **Contract upgrade risk**: The registry uses `IdentityRegistryUpgradeable`. A malicious or buggy upgrade could affect all registered agents. -- **Newer standard**: ERC-8004 has less ecosystem adoption than ENS or DIDs. Tooling and indexer support is limited. -- **Registration latency**: Minting an NFT requires waiting for transaction confirmation (10-30 seconds on Base). The reconciler handles this asynchronously. -- **Metadata not on-chain by default**: The bulk of the metadata lives at the `.well-known` HTTP endpoint, not on-chain. If the agent goes offline, only the `agentURI` remains on-chain, and the full metadata becomes unavailable. - -## SPEC References - -- Section 3.8 -- ERC-8004 Identity -- Section 3.8.2 -- Contract (addresses) -- Section 3.8.3 -- Client Operations (Register, SetMetadata, GetMetadata) -- Section 3.8.4 -- Agent Registration Document (JSON schema) -- Section 3.8.5 -- Error States -- Section 3.4.2 -- Sell-Side Flow (Stage 5: Registered) -- Section 7.1 -- Tunnel Exposure (/.well-known public route) diff --git a/docs/specs/features/buy_payments.feature b/docs/specs/features/buy_payments.feature deleted file mode 100644 index 29b552be..00000000 --- a/docs/specs/features/buy_payments.feature +++ /dev/null @@ -1,152 +0,0 @@ -# References: -# SPEC.md Section 3.5 — Monetize Buy Side -# SPEC.md Section 4.1 — x402 Payment Protocol -# SPEC.md Section 7.2 — Payment Security - -Feature: Buy-Side Payments - As an AI agent - I want to purchase inference from remote x402-gated sellers using pre-signed vouchers - So that I can access paid models without exposing a hot wallet - - Background: - Given the cluster is running - And the x402-buyer sidecar is running in the "litellm" Deployment - And a remote seller is available at "https://seller.example.com/services/qwen" - And the seller prices inference at "0.001" USDC per request on "base-sepolia" - - # ------------------------------------------------------------------- - # Probe and discovery - # ------------------------------------------------------------------- - - Scenario: Probe discovers seller pricing via 402 response - When the agent runs "buy.py probe https://seller.example.com/services/qwen" - Then the probe sends a request to the seller endpoint - And the seller responds with HTTP 402 and PaymentRequirements - And the probe extracts: - | field | value | - | scheme | exact | - | network | eip155:84532 | - | maxAmountRequired | 1000 | - | payTo | 0xSELLER | - | asset | 0x036CbD... | - And the agent receives the pricing information - - Scenario: Probe handles non-402 seller response - Given the seller endpoint responds with HTTP 200 (no payment required) - When the agent runs "buy.py probe https://seller.example.com/services/free" - Then the probe reports the endpoint does not require payment - - # ------------------------------------------------------------------- - # Pre-signed ERC-3009 vouchers - # ------------------------------------------------------------------- - - Scenario: Pre-signed ERC-3009 vouchers stored in ConfigMap - When the agent runs "buy.py buy --count 10 --seller https://seller.example.com/services/qwen" - Then 10 ERC-3009 TransferWithAuthorization vouchers are pre-signed - And the vouchers are stored in the "x402-buyer-auths" ConfigMap in "llm" namespace - And each voucher contains: - | field | description | - | signature | EIP-712 typed signature | - | from | buyer wallet address | - | to | seller payTo address | - | value | price per request in base units | - | validAfter | 0 (immediately valid) | - | validBefore | max uint256 (no expiry) | - | nonce | unique random 32-byte nonce | - And the "x402-buyer-config" ConfigMap contains the upstream mapping for the seller - - Scenario: Buyer config maps model to upstream - Given vouchers have been pre-signed for seller "seller-qwen" - When the "x402-buyer-config" ConfigMap is inspected - Then it contains an upstream entry: - | field | value | - | url | https://seller.example.com/services/qwen | - | remoteModel | qwen3.5:9b | - | network | base-sepolia | - | payTo | 0xSELLER | - - # ------------------------------------------------------------------- - # Paid request flow - # ------------------------------------------------------------------- - - Scenario: Paid request consumes one voucher and forwards to seller - Given the buyer has 5 pre-signed vouchers for upstream "seller-qwen" - When the agent sends a chat completion request for model "paid/qwen3.5:9b" - Then LiteLLM routes the request to the x402-buyer sidecar at ":8402" - And the sidecar strips the "paid/" prefix to resolve model "qwen3.5:9b" - And the sidecar forwards the request to the seller - And the seller responds with HTTP 402 - And the sidecar pops one voucher from the pool - And the sidecar retries the request with the X-PAYMENT header - And the seller responds with HTTP 200 and the inference result - And the remaining voucher count is 4 - - Scenario: Paid request with openai prefix is resolved correctly - Given the buyer has vouchers for upstream "seller-qwen" - When a request arrives for model "paid/openai/qwen3.5:9b" - Then the sidecar strips both "paid/" and "openai/" prefixes - And resolves to model "qwen3.5:9b" - And routes to the correct upstream - - Scenario: Voucher consumption is atomic - Given the buyer has 1 pre-signed voucher for upstream "seller-qwen" - When two concurrent requests arrive for model "paid/qwen3.5:9b" - Then exactly one request consumes the voucher - And the other request receives an error indicating pool exhaustion - - # ------------------------------------------------------------------- - # Voucher pool exhaustion - # ------------------------------------------------------------------- - - Scenario: Voucher pool exhaustion returns error - Given the buyer has 0 pre-signed vouchers for upstream "seller-qwen" - When a request arrives for model "paid/qwen3.5:9b" - Then the sidecar returns an error: "pre-signed auth pool exhausted" - And no request is forwarded to the seller - - Scenario: No purchased upstream mapped returns error - Given no buyer config exists for model "paid/unknown-model" - When a request arrives for model "paid/unknown-model" - Then the sidecar returns an error: "no purchased upstream mapped" - - # ------------------------------------------------------------------- - # Sidecar status and observability - # ------------------------------------------------------------------- - - Scenario: Sidecar /status endpoint reports remaining vouchers - Given the buyer started with 10 vouchers for "seller-qwen" - And 3 vouchers have been consumed - When I send a GET request to the sidecar at "/status" - Then the response is JSON with: - | upstream | remaining | spent | - | seller-qwen | 7 | 3 | - - Scenario: Sidecar /healthz returns liveness status - When I send a GET request to the sidecar at "/healthz" - Then the response is HTTP 200 - - Scenario: Sidecar /metrics exposes Prometheus metrics - When I send a GET request to the sidecar at "/metrics" - Then the response contains Prometheus-format metrics - And a PodMonitor in the "llm" namespace scrapes the sidecar - - # ------------------------------------------------------------------- - # State persistence across restarts - # ------------------------------------------------------------------- - - Scenario: Consumed nonces survive sidecar restart - Given the buyer has consumed 3 vouchers with specific nonces - When the x402-buyer sidecar is restarted - Then the StateStore reloads consumed nonces - And the previously consumed vouchers are not reused - And the remaining voucher count reflects prior consumption - - # ------------------------------------------------------------------- - # Security properties - # ------------------------------------------------------------------- - - Scenario: Sidecar has zero signer access - Given the x402-buyer sidecar is running - Then the sidecar container has no private key mounted - And the sidecar can only use pre-signed authorizations from ConfigMaps - And maximum loss is bounded to N * price where N is the voucher count diff --git a/docs/specs/features/erc8004_identity.feature b/docs/specs/features/erc8004_identity.feature deleted file mode 100644 index 65015f78..00000000 --- a/docs/specs/features/erc8004_identity.feature +++ /dev/null @@ -1,149 +0,0 @@ -# References: -# SPEC.md Section 3.8 — ERC-8004 Identity -# SPEC.md Section 3.4.2 — Sell-Side Flow (Stage 5: Registered) -# SPEC.md Section 7.1 — Tunnel Exposure (/.well-known) -# SPEC.md Section 5.3 — ServiceOffer CRD Schema (registration spec) - -Feature: ERC-8004 Identity - As an AI agent operator - I want to register my agent on-chain using the ERC-8004 Identity Registry - So that other agents and users can discover and verify my agent's identity - - Background: - Given the cluster is running - And a wallet is available with a private key - And the Base Sepolia RPC endpoint is reachable - - # ------------------------------------------------------------------- - # Agent registration on Base Sepolia - # ------------------------------------------------------------------- - - Scenario: Agent registers on Base Sepolia Identity Registry - Given the wallet has sufficient ETH for gas on Base Sepolia - When I run "obol sell register --name my-agent --private-key-file /path/to/keyfile" - Then a Register transaction is submitted to the Identity Registry at "0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D" - And the transaction mints an ERC-721 NFT for the agent - And the returned agentId is the minted token ID - And the agent URI is set to the tunnel URL "/.well-known/agent-registration.json" - - Scenario: Registration during sell-side reconciliation (Stage 5) - Given a ServiceOffer CR "myapi" has reached stage 4 (RoutePublished) - And registration is enabled with name "My Inference Agent" - And the tunnel URL is "https://stack.example.com" - When the reconciler evaluates stage 5 - Then the agent is registered on Base Sepolia - And the ServiceOffer status is updated with: - | field | value | - | agentId | | - | registrationTxHash | | - And the condition "Registered" is set to "True" - - Scenario: Registration submits correct agent metadata - When the agent is registered with: - | field | value | - | name | My Inference Agent | - | description | Sells qwen3.5:9b inference | - | image | https://example.com/icon.png | - Then the registration transaction includes the metadata - And the agent URI points to the /.well-known endpoint - - # ------------------------------------------------------------------- - # Registration JSON at /.well-known - # ------------------------------------------------------------------- - - Scenario: Registration JSON served at /.well-known endpoint - Given an agent has been registered with agentId "42" - And the tunnel is active at "https://stack.example.com" - When a GET request is made to "https://stack.example.com/.well-known/agent-registration.json" - Then the response is HTTP 200 with Content-Type "application/json" - And the JSON body conforms to the AgentRegistration schema: - | field | value | - | type | https://eips.ethereum.org/EIPS/eip-8004#registration-v1 | - | name | My Inference Agent | - | x402Support | true | - | active | true | - And the "registrations" array contains: - | agentId | agentRegistry | - | 42 | 0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D | - And the "services" array contains at least one service endpoint - - Scenario: Registration JSON includes supported trust mechanisms - Given the ServiceOffer has supportedTrust ["reputation"] - When the registration JSON is served - Then the "supportedTrust" array contains "reputation" - - Scenario: Registration JSON httpd Deployment is minimal - Given the registration JSON has been published - When I inspect the httpd Deployment in "traefik" namespace - Then it uses a busybox image serving the ConfigMap content - And an HTTPRoute routes "/.well-known/agent-registration.json" to the httpd Service - - # ------------------------------------------------------------------- - # Metadata update - # ------------------------------------------------------------------- - - Scenario: Metadata update via SetMetadata - Given an agent is registered with agentId "42" - When SetMetadata is called with: - | key | value | - | description | Updated inference service | - | version | 2.0 | - Then a SetMetadata transaction is submitted to the registry - And the on-chain metadata is updated for agentId "42" - - Scenario: Agent URI update via SetAgentURI - Given an agent is registered with agentId "42" - And the tunnel hostname changes to "new-stack.example.com" - When SetAgentURI is called with the new URI - Then the on-chain agent URI is updated to "https://new-stack.example.com/.well-known/agent-registration.json" - - Scenario: Read metadata from registry - Given an agent is registered with agentId "42" and metadata key "description" = "Inference service" - When GetMetadata is called for agentId "42" and key "description" - Then the returned value is "Inference service" - - Scenario: Read token URI from registry - Given an agent is registered with agentId "42" - When TokenURI is called for agentId "42" - Then the returned URI is the agent's metadata endpoint - - # ------------------------------------------------------------------- - # Degraded mode without ETH - # ------------------------------------------------------------------- - - Scenario: Registration degrades to OffChainOnly without ETH - Given the wallet has zero ETH on Base Sepolia - When the reconciler evaluates registration (stage 5) - Then no on-chain Register transaction is submitted - And the /.well-known/agent-registration.json is still created and served - But the "registrations" array in the JSON is empty - And the condition "Registered" is set to "True" with reason "OffChainOnly" - - Scenario: OffChainOnly agent upgrades to on-chain after funding - Given the agent was registered in OffChainOnly mode - And the wallet has been funded with ETH - When the reconciler re-evaluates registration - Then the on-chain Register transaction is submitted - And the "registrations" array is populated with the agentId - And the condition "Registered" reason is updated to "OnChain" - - # ------------------------------------------------------------------- - # Error states - # ------------------------------------------------------------------- - - Scenario: Registration fails when RPC is unreachable - Given the Base Sepolia RPC endpoint is unreachable - When the reconciler evaluates registration - Then the error "erc8004: dial" is recorded - And the condition "Registered" is set to "False" - And the reconciler retries on the next loop - - Scenario: Registration fails when transaction is not mined - Given the RPC endpoint is reachable but the network is congested - When the Register transaction is submitted - Then the error "erc8004: wait mined" may occur - And the reconciler retries on the next loop - - Scenario: Registration uses correct contract address per chain - When registering on Base Sepolia - Then the contract address "0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D" is used diff --git a/docs/specs/features/llm_routing.feature b/docs/specs/features/llm_routing.feature deleted file mode 100644 index b71c9218..00000000 --- a/docs/specs/features/llm_routing.feature +++ /dev/null @@ -1,147 +0,0 @@ -# References: -# SPEC.md Section 3.2 — LLM Routing -# SPEC.md Section 3.6.4 — Cloud Provider Detection -# SPEC.md Section 3.5 — Monetize Buy Side (paid inference routing) - -Feature: LLM Routing - As an operator - I want the LiteLLM gateway to auto-discover and route to all available LLM providers - So that the OpenClaw agent can use local, cloud, and paid remote models through a single endpoint - - Background: - Given the cluster is running - And the LiteLLM Deployment exists in the "llm" namespace - - # ------------------------------------------------------------------- - # Auto-detection of Ollama models - # ------------------------------------------------------------------- - - Scenario: Auto-detect Ollama models during stack up - Given Ollama is running on the host with models: - | model | - | qwen3.5:9b | - | llama3.2:3b | - When "obol stack up" runs autoConfigureLLM - Then the "litellm-config" ConfigMap contains entries for "qwen3.5:9b" and "llama3.2:3b" - And each Ollama model entry has provider "ollama" and api_base pointing to the Ollama service - And the LiteLLM Deployment is restarted exactly once - - Scenario: Auto-configure skips Ollama when not running - Given Ollama is not running on the host - When "obol stack up" runs autoConfigureLLM - Then no Ollama model entries are added to "litellm-config" - And a warning is logged: Ollama not available - And the stack up continues without failure - - Scenario: Auto-configure updates models on subsequent stack up - Given the cluster was previously started with Ollama model "qwen3.5:9b" - And Ollama now has models "qwen3.5:9b" and "deepseek-r1:7b" - When "obol stack up" runs autoConfigureLLM - Then the "litellm-config" ConfigMap contains entries for both models - And the LiteLLM Deployment is restarted - - # ------------------------------------------------------------------- - # Cloud provider detection from environment variables - # ------------------------------------------------------------------- - - Scenario: Detect Anthropic provider from ANTHROPIC_API_KEY - Given the environment variable "ANTHROPIC_API_KEY" is set - When "obol stack up" runs autoConfigureLLM - Then the "litellm-config" ConfigMap contains a wildcard entry "anthropic/*" - And the "litellm-secrets" Secret contains the Anthropic API key - - Scenario: Detect Anthropic provider from CLAUDE_CODE_OAUTH_TOKEN - Given the environment variable "CLAUDE_CODE_OAUTH_TOKEN" is set - And "ANTHROPIC_API_KEY" is not set - When "obol stack up" runs autoConfigureLLM - Then the "litellm-config" ConfigMap contains a wildcard entry "anthropic/*" - And the "litellm-secrets" Secret contains the OAuth token as the Anthropic key - - Scenario: Detect OpenAI provider from OPENAI_API_KEY - Given the environment variable "OPENAI_API_KEY" is set - When "obol stack up" runs autoConfigureLLM - Then the "litellm-config" ConfigMap contains a wildcard entry "openai/*" - And the "litellm-secrets" Secret contains the OpenAI API key - - Scenario: Detect cloud provider from OpenClaw agent model preference - Given the file "~/.openclaw/openclaw.json" specifies agent model "anthropic/claude-sonnet-4-6" - And the environment variable "ANTHROPIC_API_KEY" is set - When "obol stack up" runs autoConfigureLLM - Then the Anthropic provider is auto-configured - - Scenario: No cloud provider configured when API keys are absent - Given no cloud provider API keys are set in the environment - When "obol stack up" runs autoConfigureLLM - Then no cloud provider entries are added to "litellm-config" - And a warning is logged for each missing provider - And the stack up continues without failure - - # ------------------------------------------------------------------- - # Manual model setup - # ------------------------------------------------------------------- - - Scenario: Manual provider setup via obol model setup - Given the cluster is running - When I run "obol model setup --provider anthropic" - And I provide the API key - Then the "litellm-config" ConfigMap is patched with the Anthropic wildcard entry - And the "litellm-secrets" Secret is updated - And the LiteLLM Deployment is restarted - - # ------------------------------------------------------------------- - # Custom endpoint validation - # ------------------------------------------------------------------- - - Scenario: Custom endpoint passes reachability test - Given a custom inference endpoint is running at "http://myhost:8080/v1" - When I run "obol model setup custom --name my-model --endpoint http://myhost:8080/v1 --model gpt-4" - Then the endpoint reachability test passes - And the "litellm-config" ConfigMap contains the custom model entry - And the LiteLLM Deployment is restarted - - Scenario: Custom endpoint fails reachability test - Given no service is running at "http://unreachable:8080/v1" - When I run "obol model setup custom --name my-model --endpoint http://unreachable:8080/v1 --model gpt-4" - Then the command fails with a reachability error - And the "litellm-config" ConfigMap is not modified - - # ------------------------------------------------------------------- - # Model ranking - # ------------------------------------------------------------------- - - Scenario: Cloud providers are preferred over local Ollama - Given Ollama is running with model "qwen3.5:9b" - And the environment variable "ANTHROPIC_API_KEY" is set - When "obol stack up" runs autoConfigureLLM - Then the "litellm-config" ConfigMap contains entries for both providers - And cloud provider entries appear before Ollama entries in the model list - - # ------------------------------------------------------------------- - # Paid inference routing through buyer sidecar - # ------------------------------------------------------------------- - - Scenario: Paid inference routes through buyer sidecar - Given the "litellm-config" ConfigMap contains the permanent "paid/*" entry - When a request arrives for model "paid/qwen3.5:9b" - Then LiteLLM routes the request to "http://127.0.0.1:8402/v1" - And the x402-buyer sidecar handles payment attachment - - Scenario: LiteLLM config contains permanent paid catch-all - When the "litellm-config" ConfigMap is loaded - Then it contains a model entry with name "paid/*" - And the entry has provider "openai" and api_base "http://127.0.0.1:8402/v1" - - # ------------------------------------------------------------------- - # Error states - # ------------------------------------------------------------------- - - Scenario: Model setup fails when cluster is not running - Given the cluster is not running - When I run "obol model setup --provider anthropic" - Then the command fails with "cluster not running" - - Scenario: Model setup fails with empty model list - Given the cluster is running - And Ollama has no models loaded - When I run "obol model setup --provider ollama" - Then the command fails with "no models to configure" diff --git a/docs/specs/features/network_rpc.feature b/docs/specs/features/network_rpc.feature deleted file mode 100644 index 77d9047d..00000000 --- a/docs/specs/features/network_rpc.feature +++ /dev/null @@ -1,149 +0,0 @@ -# References: -# SPEC.md Section 3.3 — Network / RPC Gateway -# SPEC.md Section 3.3.3 — Two-Stage Templating -# SPEC.md Section 3.3.4 — Write Method Blocking -# SPEC.md Section 6.1 — External Services (ChainList API) - -Feature: Network RPC Gateway - As an operator - I want to manage blockchain RPC routing through the eRPC gateway - So that my cluster can interact with multiple blockchain networks reliably - - Background: - Given the cluster is running - And the eRPC Deployment exists in the "erpc" namespace - And the "erpc-config" ConfigMap exists in the "erpc" namespace - - # ------------------------------------------------------------------- - # Add public RPCs from ChainList - # ------------------------------------------------------------------- - - Scenario: Add public RPCs from ChainList by chain ID - When I run "obol network add 1" - Then the eRPC config is patched with chain ID 1 (Ethereum Mainnet) - And public RPC endpoints from ChainList are added as upstreams - And each upstream has an ID prefixed with "chainlist-" - And a network entry with "evm.chainId: 1" is added to the project - - Scenario: Add multiple chains - When I run "obol network add 1" - And I run "obol network add 137" - Then the eRPC config contains upstreams for both chain ID 1 and chain ID 137 - And network entries exist for both chains - - Scenario: Adding the same chain twice is idempotent - Given chain ID 1 is already configured in eRPC - When I run "obol network add 1" - Then the eRPC config is not duplicated for chain ID 1 - And existing upstreams are preserved - - # ------------------------------------------------------------------- - # Add custom RPC endpoint - # ------------------------------------------------------------------- - - Scenario: Add custom RPC endpoint for a chain - When I run "obol network add 1 --endpoint https://my-node.example.com/rpc" - Then the eRPC config contains a custom upstream with the provided endpoint - And the upstream is associated with chain ID 1 - - Scenario: Custom endpoint is validated before adding - When I run "obol network add 1 --endpoint https://unreachable.example.com/rpc" - Then the endpoint reachability is checked - And if the endpoint is unreachable, a warning is displayed - - # ------------------------------------------------------------------- - # Write method blocking - # ------------------------------------------------------------------- - - Scenario: Write methods are blocked by default - Given chain ID 1 is configured in eRPC without --allow-writes - When an eth_sendRawTransaction request arrives at eRPC for chain 1 - Then eRPC blocks the request - And returns an error indicating write methods are not allowed - - Scenario: Write methods are allowed with --allow-writes flag - When I run "obol network add 1 --allow-writes" - Then the eRPC config for chain ID 1 allows eth_sendRawTransaction - And write requests are forwarded to the upstream - - Scenario: Local Ethereum nodes always have writes blocked - Given a local Ethereum node is deployed as "ethereum-fluffy-penguin" - And the node is registered as a priority upstream in eRPC - When an eth_sendRawTransaction request arrives for the local node's chain - Then the write method is blocked on the local upstream - And the write request is routed to remote upstreams instead - - # ------------------------------------------------------------------- - # Remove chain RPCs - # ------------------------------------------------------------------- - - Scenario: Remove chain RPCs from eRPC - Given chain ID 1 is configured in eRPC with multiple upstreams - When I run "obol network remove 1" - Then all upstreams for chain ID 1 are removed from the eRPC config - And the network entry for chain ID 1 is removed - - Scenario: Remove non-existent chain is a no-op - Given chain ID 999 is not configured in eRPC - When I run "obol network remove 999" - Then the command completes without error - And the eRPC config is unchanged - - # ------------------------------------------------------------------- - # eRPC status and listing - # ------------------------------------------------------------------- - - Scenario: eRPC status shows upstream counts - Given the eRPC config has: - | chain | upstream_count | - | 1 | 3 | - | 137 | 2 | - When I run "obol network list" - Then the output lists configured chains with their upstream counts: - | chain_id | name | upstreams | - | 1 | Ethereum Mainnet | 3 | - | 137 | Polygon Mainnet | 2 | - - # ------------------------------------------------------------------- - # Local Ethereum node deployment - # ------------------------------------------------------------------- - - Scenario: Install local Ethereum node with two-stage templating - When I run "obol network install ethereum --id fluffy-penguin" - Then Stage 1 renders values.yaml from values.yaml.gotmpl with CLI flags - And Stage 2 runs "helmfile sync" with the rendered values and id "fluffy-penguin" - And the node is deployed in namespace "ethereum-fluffy-penguin" - And the node is registered as a priority upstream in eRPC - - Scenario: Install local Ethereum node with auto-generated petname - When I run "obol network install ethereum" - Then a petname ID is auto-generated - And the node is deployed in namespace "ethereum-" - - Scenario: Local node registered as priority upstream in eRPC - Given a local Ethereum node "ethereum-fluffy-penguin" is deployed - Then the eRPC config contains an upstream with: - | field | value | - | id | local-ethereum-fluffy-penguin | - | endpoint | http://ethereum-execution.ethereum-fluffy-penguin.svc.cluster.local:8545 | - - # ------------------------------------------------------------------- - # Network sync and status - # ------------------------------------------------------------------- - - Scenario: Network sync re-runs helmfile for deployed network - Given a local Ethereum node is deployed with id "fluffy-penguin" - When I run "obol network sync ethereum fluffy-penguin" - Then helmfile sync is re-run for the "ethereum-fluffy-penguin" deployment - - Scenario: Network status shows deployment health - Given a local Ethereum node is deployed with id "fluffy-penguin" - When I run "obol network status ethereum fluffy-penguin" - Then the output shows the deployment status of the Ethereum node - And includes pod readiness and sync status - - Scenario: Network delete removes deployment and eRPC upstream - Given a local Ethereum node is deployed with id "fluffy-penguin" - When I run "obol network delete ethereum fluffy-penguin" - Then the namespace "ethereum-fluffy-penguin" is deleted - And the local upstream for "fluffy-penguin" is removed from the eRPC config diff --git a/docs/specs/features/sell_monetization.feature b/docs/specs/features/sell_monetization.feature deleted file mode 100644 index adb01647..00000000 --- a/docs/specs/features/sell_monetization.feature +++ /dev/null @@ -1,203 +0,0 @@ -# References: -# SPEC.md Section 3.4 — Monetize Sell Side -# SPEC.md Section 4.1 — x402 Payment Protocol -# SPEC.md Section 5.3 — ServiceOffer CRD Schema -# SPEC.md Section 7.1 — Tunnel Exposure -# SPEC.md Section 3.4.4 — x402-verifier (ForwardAuth) - -Feature: Sell-Side Monetization - As an operator - I want to sell access to cluster services via x402 micropayments - So that I can earn USDC for every inference request served - - Background: - Given the cluster is running - And a wallet is configured with address "0xSELLER" - And the chain is set to "base-sepolia" - - # ------------------------------------------------------------------- - # ServiceOffer creation - # ------------------------------------------------------------------- - - Scenario: obol sell http creates a ServiceOffer CR - When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" - Then a ServiceOffer CR named "myapi" is created in "openclaw-obol-agent" namespace - And the ServiceOffer spec contains: - | field | value | - | payment.scheme | exact | - | payment.network | base-sepolia | - | payment.payTo | 0xSELLER | - | payment.price.perRequest | 0.001 | - | upstream.service | litellm | - | upstream.port | 4000 | - | upstream.namespace | llm | - | path | /services/myapi | - - Scenario: obol sell http with per-mtok pricing - When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --per-mtok 1.00 --upstream litellm --port 4000 --namespace llm" - Then a ServiceOffer CR named "myapi" is created - And the ServiceOffer spec contains: - | field | value | - | payment.price.perMTok | 1.00 | - - Scenario: obol sell http with health path - When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm --health-path /health/readiness" - Then the ServiceOffer spec has upstream.healthPath "/health/readiness" - - Scenario: obol sell http activates tunnel on first sell - Given no tunnel is currently active - When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" - Then EnsureTunnelForSell() is called - And the quick-mode tunnel is activated - - Scenario: obol sell http rejects unsupported chain - When I run "obol sell http myapi --wallet 0xSELLER --chain ethereum-mainnet --price 0.001 --upstream litellm --port 4000 --namespace llm" - Then the command fails with "unsupported chain" - - Scenario: obol sell http rejects non-HTTPS facilitator - When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm --facilitator http://example.com" - Then the command fails with "facilitator URL must use HTTPS" - - # ------------------------------------------------------------------- - # 6-stage reconciliation - # ------------------------------------------------------------------- - - Scenario: Stage 1 — ModelReady - Given a ServiceOffer CR "myapi" exists with type "inference" and model "qwen3.5:9b" - When the reconciler evaluates stage 1 - Then the condition "ModelReady" is set to "True" if the model is available in LiteLLM - And the condition "ModelReady" is set to "False" with a message if the model is not available - - Scenario: Stage 2 — UpstreamHealthy - Given the ServiceOffer CR "myapi" has condition "ModelReady" = "True" - And the upstream service "litellm" in namespace "llm" is healthy at "/health/readiness" - When the reconciler evaluates stage 2 - Then the condition "UpstreamHealthy" is set to "True" - - Scenario: Stage 2 — UpstreamHealthy fails on unhealthy upstream - Given the ServiceOffer CR "myapi" has condition "ModelReady" = "True" - And the upstream service "litellm" in namespace "llm" returns 503 at "/health/readiness" - When the reconciler evaluates stage 2 - Then the condition "UpstreamHealthy" is set to "False" with a message indicating the health check failed - - Scenario: Stage 3 — PaymentGateReady - Given the ServiceOffer CR "myapi" has condition "UpstreamHealthy" = "True" - When the reconciler evaluates stage 3 - Then a Traefik Middleware resource of type ForwardAuth is created - And the "x402-pricing" ConfigMap is updated with a route entry for "/services/myapi/*" - And the route entry contains the price, wallet, and chain from the ServiceOffer - And the condition "PaymentGateReady" is set to "True" - - Scenario: Stage 4 — RoutePublished - Given the ServiceOffer CR "myapi" has condition "PaymentGateReady" = "True" - When the reconciler evaluates stage 4 - Then an HTTPRoute resource is created for path "/services/myapi" - And the HTTPRoute references the ForwardAuth Middleware - And traffic matching "/services/myapi/*" is routed to the upstream service - And the condition "RoutePublished" is set to "True" - - Scenario: Stage 5 — Registered (ERC-8004 on-chain) - Given the ServiceOffer CR "myapi" has condition "RoutePublished" = "True" - And registration is enabled in the ServiceOffer spec - And the tunnel URL is available - When the reconciler evaluates stage 5 - Then an ERC-8004 registration is submitted on Base Sepolia - And the status field "agentId" is set to the minted token ID - And the status field "registrationTxHash" is set - And a ConfigMap with agent-registration.json is created - And an httpd Deployment serves /.well-known/agent-registration.json - And the condition "Registered" is set to "True" - - Scenario: Stage 5 — Registration degrades without ETH - Given the ServiceOffer CR "myapi" has condition "RoutePublished" = "True" - And the wallet has zero ETH for gas - When the reconciler evaluates stage 5 - Then the registration degrades to OffChainOnly mode - And the /.well-known/agent-registration.json is still served - But no on-chain transaction is submitted - - Scenario: Stage 6 — Ready - Given all 5 prior conditions are "True" - When the reconciler evaluates stage 6 - Then the condition "Ready" is set to "True" - And the status field "endpoint" is set to the full public URL - - Scenario: Reconciled resources have ownerReferences for auto-GC - Given a ServiceOffer CR "myapi" has reached "Ready" state - When I inspect the Middleware, HTTPRoute, ConfigMap, and httpd Deployment - Then each resource has an ownerReference pointing to the ServiceOffer CR - And deleting the ServiceOffer cascades deletion to all owned resources - - # ------------------------------------------------------------------- - # x402-verifier behavior - # ------------------------------------------------------------------- - - Scenario: x402-verifier responds 402 with pricing for unauthenticated requests - Given the route "/services/myapi/*" is configured with price "0.001" USDC - When a request arrives at "/services/myapi/data" without an X-PAYMENT header - Then the x402-verifier responds with HTTP 402 - And the response body contains PaymentRequirements JSON with: - | field | value | - | x402Version | 1 | - | accepts[0].scheme | exact | - | accepts[0].network | eip155:84532 | - | accepts[0].maxAmountRequired | 1000 | - - Scenario: x402-verifier passes through requests with valid payment - Given the route "/services/myapi/*" is configured with price "0.001" USDC - When a request arrives at "/services/myapi/data" with a valid X-PAYMENT header - Then the x402-verifier delegates verification to the facilitator - And the facilitator confirms the payment is valid - And the x402-verifier responds with HTTP 200 - And the upstream receives the request with an Authorization header - - Scenario: x402-verifier passes through unmatched routes for free - Given the route "/services/myapi/*" is configured in pricing - When a request arrives at "/health" which matches no pricing route - Then the x402-verifier responds with HTTP 200 - And the request proceeds to the upstream without payment - - Scenario: x402-verifier hot-reloads pricing config - Given the x402-verifier is running with route "/services/old/*" - When the "x402-pricing" ConfigMap is updated to add route "/services/new/*" - And 5 seconds elapse for the config watcher poll - Then the verifier accepts the new route "/services/new/*" - And the old route "/services/old/*" is no longer active - - # ------------------------------------------------------------------- - # Pricing models - # ------------------------------------------------------------------- - - Scenario: perRequest pricing is used directly - Given a ServiceOffer with price.perRequest = "0.001" - When the reconciler creates the pricing route - Then the route price is "1000" (0.001 USDC in base units) - And the route priceModel is "perRequest" - - Scenario: perMTok pricing is converted at 1000 tokens per request - Given a ServiceOffer with price.perMTok = "1.00" - When the reconciler creates the pricing route - Then the effective perRequest price is perMTok / 1000 - And the route contains both perMTok and the approximated perRequest - And approxTokensPerRequest is set to 1000 - - # ------------------------------------------------------------------- - # CLI management commands - # ------------------------------------------------------------------- - - Scenario: obol sell list shows active ServiceOffers - Given ServiceOffers "myapi" and "myinference" exist - When I run "obol sell list" - Then the output lists both ServiceOffers with their status - - Scenario: obol sell status shows reconciliation progress - Given a ServiceOffer "myapi" is stuck at stage 2 (UpstreamHealthy = False) - When I run "obol sell status myapi" - Then the output shows each condition with its status - And the "UpstreamHealthy" condition shows the failure message - - Scenario: obol sell delete removes ServiceOffer and owned resources - Given a ServiceOffer "myapi" exists at "Ready" state - When I run "obol sell delete myapi" - Then the ServiceOffer CR is deleted - And all owned resources (Middleware, HTTPRoute, ConfigMaps) are garbage collected diff --git a/docs/specs/features/stack_lifecycle.feature b/docs/specs/features/stack_lifecycle.feature deleted file mode 100644 index 275ab2b7..00000000 --- a/docs/specs/features/stack_lifecycle.feature +++ /dev/null @@ -1,166 +0,0 @@ -# References: -# SPEC.md Section 3.1 — Stack Lifecycle -# SPEC.md Section 2.4 — Backend Abstraction -# SPEC.md Section 5.1 — Configuration Files -# SPEC.md Section 1.3 — System Constraints - -Feature: Stack Lifecycle - As an operator - I want to manage the full lifecycle of my local Kubernetes cluster - So that I can run decentralized AI infrastructure reproducibly - - Background: - Given Docker is running - And the obol CLI is installed - - # ------------------------------------------------------------------- - # obol stack init - # ------------------------------------------------------------------- - - Scenario: Stack init generates cluster ID and writes config - When I run "obol stack init" - Then a petname cluster ID is generated - And the file "$OBOL_CONFIG_DIR/.stack-id" contains the cluster ID - And the file "$OBOL_CONFIG_DIR/.stack-backend" contains "k3d" - And embedded infrastructure defaults are copied to "$OBOL_CONFIG_DIR/defaults/" - And template variables "OLLAMA_HOST", "OLLAMA_HOST_IP", "CLUSTER_ID" are substituted in defaults - - Scenario: Stack init resolves absolute paths for Docker volume mounts - When I run "obol stack init" - Then all paths in the generated k3d config are absolute - And no relative paths appear in volume mount declarations - - Scenario: Stack init preserves existing cluster ID on force reinit - Given I have previously run "obol stack init" - And the cluster ID is "fluffy-penguin" - When I run "obol stack init --force" - Then the cluster ID remains "fluffy-penguin" - And the backend config is regenerated - - Scenario: Stack init with k3s backend - When I run "obol stack init --backend k3s" - Then the file "$OBOL_CONFIG_DIR/.stack-backend" contains "k3s" - And the Ollama host is resolved as "127.0.0.1" - - Scenario: Stack init fails without Docker when using k3d backend - Given Docker is not running - When I run "obol stack init --backend k3d" - Then the command fails with "prerequisites check failed" - - # ------------------------------------------------------------------- - # obol stack up - # ------------------------------------------------------------------- - - Scenario: Stack up creates k3d cluster and deploys infrastructure - Given I have run "obol stack init" - When I run "obol stack up" - Then a k3d cluster is created with the persisted stack ID - And kubeconfig is written to "$OBOL_CONFIG_DIR/kubeconfig.yaml" - And helmfile sync deploys infrastructure to the cluster - And the following namespaces exist: - | namespace | - | traefik | - | llm | - | x402 | - | openclaw-obol-agent | - | erpc | - | obol-frontend | - | monitoring | - - Scenario: Stack up auto-configures LiteLLM with Ollama models - Given I have run "obol stack init" - And Ollama is running on the host with model "qwen3.5:9b" - When I run "obol stack up" - Then the "litellm-config" ConfigMap in "llm" namespace contains model "qwen3.5:9b" - And the LiteLLM Deployment is restarted once - - Scenario: Stack up deploys OpenClaw agent with skills - Given I have run "obol stack init" - When I run "obol stack up" - Then the OpenClaw Deployment exists in "openclaw-obol-agent" namespace - And skills are injected via host-path PVC - And the "openclaw-monetize" ClusterRoleBinding is patched with the openclaw ServiceAccount - - Scenario: Stack up auto-starts DNS tunnel when provisioned - Given I have run "obol stack init" - And a DNS tunnel is provisioned with hostname "stack.example.com" - When I run "obol stack up" - Then the Cloudflare tunnel is started - And the tunnel URL is propagated to "AGENT_BASE_URL" on the OpenClaw Deployment - - Scenario: Stack up keeps quick tunnel dormant until first sell - Given I have run "obol stack init" - And no DNS tunnel is provisioned - When I run "obol stack up" - Then the Cloudflare tunnel is not started - And the cloudflared Deployment has zero replicas - - Scenario: Stack up is idempotent - Given I have run "obol stack init" and "obol stack up" - And the cluster is running - When I run "obol stack up" again - Then the cluster remains in a healthy state - And no duplicate resources are created - And all existing services remain accessible - - Scenario: Stack up cleans up on helmfile sync failure - Given I have run "obol stack init" - And helmfile sync will fail due to a malformed template - When I run "obol stack up" - Then the command fails - And the cluster is automatically stopped via Down() - - Scenario: Stack up binds expected ports - Given I have run "obol stack init" - And ports 80, 8080, 443, and 8443 are available - When I run "obol stack up" - Then the k3d cluster binds host ports 80, 8080, 443, and 8443 - - Scenario: Stack up fails when ports are occupied - Given I have run "obol stack init" - And port 80 is already in use by another service - When I run "obol stack up" - Then the command fails with "port(s) already in use" - - # ------------------------------------------------------------------- - # obol stack down - # ------------------------------------------------------------------- - - Scenario: Stack down deletes cluster but preserves config - Given the cluster is running - When I run "obol stack down" - Then the k3d cluster is deleted - And the file "$OBOL_CONFIG_DIR/.stack-id" still exists - And the file "$OBOL_CONFIG_DIR/kubeconfig.yaml" still exists - And the directory "$OBOL_CONFIG_DIR/defaults/" still exists - - Scenario: Stack down stops the DNS resolver - Given the cluster is running - And the DNS resolver for "obol.stack" is active - When I run "obol stack down" - Then the DNS resolver is stopped - - # ------------------------------------------------------------------- - # obol stack purge - # ------------------------------------------------------------------- - - Scenario: Stack purge removes config directory - Given the cluster is running - When I run "obol stack purge" - Then the k3d cluster is destroyed - And the directory "$OBOL_CONFIG_DIR" is removed - But the directory "$OBOL_DATA_DIR" still exists - - Scenario: Stack purge with force removes root-owned PVCs - Given the cluster is running - And root-owned PVC data exists in "$OBOL_DATA_DIR" - When I run "obol stack purge --force" - Then the k3d cluster is destroyed - And the directory "$OBOL_CONFIG_DIR" is removed - And the directory "$OBOL_DATA_DIR" is removed via sudo - - Scenario: Stack purge prompts for wallet backup - Given the cluster is running - And a wallet exists at "$OBOL_DATA_DIR/openclaw-/keystore/" - When I run "obol stack purge" - Then the user is prompted to back up the wallet before proceeding diff --git a/docs/specs/features/tunnel_exposure.feature b/docs/specs/features/tunnel_exposure.feature deleted file mode 100644 index e1ca7356..00000000 --- a/docs/specs/features/tunnel_exposure.feature +++ /dev/null @@ -1,190 +0,0 @@ -# References: -# SPEC.md Section 3.7 — Tunnel Management -# SPEC.md Section 7.1 — Tunnel Exposure (Security Model) -# SPEC.md Section 2.2 — Routing Architecture -# SPEC.md Section 3.7.6 — Storefront Resources - -Feature: Tunnel Exposure - As an operator - I want the Cloudflare tunnel to expose only payment-gated and discovery endpoints - So that internal services remain protected while public services are accessible - - Background: - Given the cluster is running - And the Traefik Gateway is deployed in the "traefik" namespace - - # ------------------------------------------------------------------- - # Quick mode tunnel activation - # ------------------------------------------------------------------- - - Scenario: Quick mode tunnel activates on first sell command - Given no tunnel is currently active - And the tunnel mode is "quick" - When I run "obol sell http myapi --wallet 0xSELLER --chain base-sepolia --price 0.001 --upstream litellm --port 4000 --namespace llm" - Then the quick tunnel is activated - And the tunnel URL is a random "*.trycloudflare.com" hostname - And the cloudflared Deployment is scaled to 1 replica - - Scenario: Quick mode tunnel stays dormant during stack up - Given the tunnel mode is "quick" - When I run "obol stack up" - Then the cloudflared Deployment has zero replicas - And no tunnel URL is assigned - - Scenario: Quick mode tunnel URL changes on restart - Given the quick tunnel is active with URL "https://abc123.trycloudflare.com" - When I run "obol tunnel restart" - Then the tunnel URL changes to a new "*.trycloudflare.com" hostname - And the new URL is propagated to all consumers - - # ------------------------------------------------------------------- - # DNS mode tunnel - # ------------------------------------------------------------------- - - Scenario: DNS mode tunnel with stable hostname - Given I have run "obol tunnel login --hostname stack.example.com" - When I run "obol stack up" - Then the tunnel is automatically started - And the tunnel URL is "https://stack.example.com" - And the URL persists across restarts - - Scenario: DNS tunnel state is persisted - Given a DNS tunnel is provisioned with hostname "stack.example.com" - When I inspect "$OBOL_CONFIG_DIR/tunnel/cloudflared.json" - Then the state contains: - | field | value | - | mode | dns | - | hostname | stack.example.com | - - # ------------------------------------------------------------------- - # URL propagation - # ------------------------------------------------------------------- - - Scenario: Tunnel URL propagated to agent AGENT_BASE_URL - Given the tunnel is active with URL "https://stack.example.com" - When the tunnel URL is propagated - Then the OpenClaw Deployment in "openclaw-obol-agent" namespace has env var "AGENT_BASE_URL" set to "https://stack.example.com" - - Scenario: Tunnel URL propagated to frontend ConfigMap - Given the tunnel is active with URL "https://stack.example.com" - When the tunnel URL is propagated - Then the "obol-stack-config" ConfigMap in "obol-frontend" namespace contains the tunnel URL - - Scenario: Tunnel URL propagated to storefront - Given the tunnel is active with URL "https://stack.example.com" - When the tunnel URL is propagated - Then the storefront resources are created in the "traefik" namespace - - # ------------------------------------------------------------------- - # Internal services NOT accessible via tunnel - # ------------------------------------------------------------------- - - Scenario: Frontend is not accessible via tunnel hostname - Given the tunnel is active - When a request arrives via the tunnel hostname for path "/" - Then the request is routed to the storefront landing page - And the frontend application is NOT served - Because the frontend HTTPRoute has hostnames restricted to "obol.stack" - - Scenario: eRPC is not accessible via tunnel hostname - Given the tunnel is active - When a request arrives via the tunnel hostname for path "/rpc" - Then the request does NOT reach the eRPC gateway - Because the eRPC HTTPRoute has hostnames restricted to "obol.stack" - - Scenario: LiteLLM admin is not exposed via any route - Given the tunnel is active - When a request arrives via the tunnel hostname for any path - Then LiteLLM admin endpoints are never reachable - Because no HTTPRoute exists for LiteLLM without hostname restrictions - - Scenario: Prometheus monitoring is not accessible via tunnel - Given the tunnel is active - When a request arrives via the tunnel hostname for monitoring paths - Then the monitoring endpoints are NOT reachable - Because monitoring HTTPRoutes have hostnames restricted to "obol.stack" - - Scenario: Internal services remain accessible locally via obol.stack - Given the tunnel is active - When a request arrives with Host header "obol.stack" for path "/" - Then the frontend application is served - And when a request arrives with Host header "obol.stack" for path "/rpc" - Then the eRPC gateway handles the request - - # ------------------------------------------------------------------- - # /services/* accessible and x402-gated via tunnel - # ------------------------------------------------------------------- - - Scenario: Public service route is accessible via tunnel with payment - Given the tunnel is active - And a ServiceOffer "myapi" is in "Ready" state - When a request arrives via the tunnel hostname for path "/services/myapi/data" with valid payment - Then the x402-verifier validates the payment - And the request is forwarded to the upstream service - And the upstream responds successfully - - Scenario: Public service route returns 402 without payment via tunnel - Given the tunnel is active - And a ServiceOffer "myapi" is in "Ready" state - When a request arrives via the tunnel hostname for path "/services/myapi/data" without payment - Then the x402-verifier returns HTTP 402 with PaymentRequirements - - # ------------------------------------------------------------------- - # Discovery endpoints via tunnel - # ------------------------------------------------------------------- - - Scenario: Agent registration JSON accessible via tunnel - Given the tunnel is active - And an ERC-8004 registration has been published - When a request arrives via the tunnel hostname for "/.well-known/agent-registration.json" - Then the response contains the AgentRegistration JSON - And the JSON includes: - | field | type | - | type | string | - | name | string | - | x402Support | true | - | active | true | - | services | array | - | registrations| array | - - Scenario: Skill catalog accessible via tunnel - Given the tunnel is active - And a /skill.md route is published - When a request arrives via the tunnel hostname for "/skill.md" - Then the response contains the machine-readable service catalog - - # ------------------------------------------------------------------- - # Storefront landing page - # ------------------------------------------------------------------- - - Scenario: Storefront landing page served at tunnel root - Given the tunnel is active with URL "https://stack.example.com" - When a request arrives at "https://stack.example.com/" - Then the storefront static HTML page is served - And the storefront is served by the busybox httpd in the "traefik" namespace - - Scenario: Storefront resources are created correctly - Given the tunnel is active - When the storefront is deployed - Then the following resources exist in the "traefik" namespace: - | kind | name | - | ConfigMap | tunnel-storefront | - | Deployment | tunnel-storefront | - | Service | tunnel-storefront | - | HTTPRoute | tunnel-storefront | - And the Deployment uses busybox httpd with 5m CPU and 8Mi RAM requests - - # ------------------------------------------------------------------- - # Tunnel management - # ------------------------------------------------------------------- - - Scenario: obol tunnel status shows tunnel state - Given the quick tunnel is active with URL "https://abc123.trycloudflare.com" - When I run "obol tunnel status" - Then the output shows the tunnel mode as "quick" - And the output shows the current tunnel URL - - Scenario: obol tunnel logs shows cloudflared output - Given the tunnel is active - When I run "obol tunnel logs" - Then the output streams logs from the cloudflared pod