Skip to content

feat: monetization path — sell inference (M1), autoresearch (M2), indexing (M3)#288

Draft
bussyjd wants to merge 14 commits intomainfrom
feat/monetize-path
Draft

feat: monetization path — sell inference (M1), autoresearch (M2), indexing (M3)#288
bussyjd wants to merge 14 commits intomainfrom
feat/monetize-path

Conversation

@bussyjd
Copy link
Collaborator

@bussyjd bussyjd commented Mar 23, 2026

Summary

Consolidates PRs #265, #269, #279, #287 into a single integration branch with validated monetization paths. Each milestone was validated with ralph-loop autoresearch (5 iterations, real user paths, real x402 payments).

Supersedes: #265, #269, #279, #287 (all closed)

Three Monetization Paths

M1: Sell Inference    → obol sell http --upstream litellm --price 0.001
M2: Sell GPU Compute  → obol sell http --upstream worker-api --per-request 0.01 --register
M3: Sell Indexing     → obol sell http --upstream indexer --per-request 0.0001

All three use the same ServiceOffer CRD, same 6-stage reconciliation, same x402 payment rails, same ERC-8004 discovery.

What's New

Component Source PR What
monetize-guide skill #287 7-phase Claude Code composition for selling
obol sell probe #287 Verify endpoint is live + returns 402
autoresearch skill #265 Autonomous LLM optimization experiments
autoresearch-coordinator skill #265 Discover GPU workers via ERC-8004, pay via x402
autoresearch-worker skill #265 GPU training worker + REST API
reth-erc8004-indexer #269 Reth ExEx indexing ERC-8004 into SQLite
OASF taxonomy fix new Aligned with official agntcy/oasf schema
CRD skills/domains fields new --register-skills / --register-domains now works
Coordinator x402 V1 parsing new parse_402_pricing handles accepts[] array

OASF Taxonomy (validated against agntcy/oasf)

Use Case Skill Domain
Inference natural_language_processing/natural_language_generation/text_completion technology/data_science
Autoresearch devops_mlops/model_versioning research_and_development/scientific_research

Validation Method

Each milestone was validated with a ralph-loop (5 iterations, --dangerously-skip-permissions):

  • M1: flow-06/07/10/08 all pass — full sell→verify→pay→inference E2E with real Anvil fork + x402-rs facilitator
  • M2: Worker ServiceOffer + coordinator discovery + x402 pricing probe validated
  • M3: Reth indexer builds (14/14 tests), 8004scan API serves 2890 agents, coordinator fallback works

Findings: /tmp/m1-findings.md, /tmp/m2-findings.md, /tmp/m3-findings.md

Known Limitations

Review Guidance

@OisinKyne — please review:

  • cmd/obol/sell.go changes (sell probe + per-hour approximation) — overlaps with your feat: CLI agent-readiness optimizations (#268) #284
  • internal/embed/infrastructure/base/templates/serviceoffer-crd.yaml — new fields
  • OASF taxonomy alignment in all skill SKILL.md files

Test Plan

  • go build ./... && go test ./... — all pass
  • M1: sell pricing → sell http → heartbeat → 402 → paid request → settlement
  • M2: worker ServiceOffer → coordinator discovery → x402 probe
  • M3: cargo build --release + cargo test --lib (14/14)
  • Manual: obol sell probe <name> -n <ns> after ServiceOffer Ready
  • Rebase on main after feat: real user flow validation scripts + heartbeat timing fixes #282 is merged (heartbeat fix)

bussyjd and others added 14 commits March 23, 2026 11:04
Add monetize-guide skill (SKILL.md + seller-prompt reference) that teaches
Claude Code the end-to-end flow: pre-flight checks, model detection,
pricing research via ERC-8004 registry, user-confirmed pricing, sell
execution, reconciliation monitoring, and endpoint verification.

Add `obol sell probe <name> -n <ns>` command that hits the public tunnel
URL and verifies the endpoint returns 402 with valid x402 pricing headers.
Closes the feedback loop so Claude can confirm a service is live.
--per-hour was passing the raw hourly price as the per-request charge
(e.g., $0.50/hour charged $0.50 per HTTP request). Now approximates
using a 5-minute experiment budget: perRequest = perHour * 5/60.

Also rewrites worker_api.py to use stdlib http.server (no Flask dep).
…erage

Critical:
- Fix path traversal in worker_api.py: validate experiment_id with regex
- Add GGUF format guard in publish.py before ollama_create()
- Run worker container as non-root user in Dockerfile

Medium:
- Replace manual provenance struct-to-map with JSON round-trip in sell.go
- Fix weak test assertion in TestApproximateRequestPriceFromPerHour
- Guard int(amount) cast in coordinate.py with try/except
- Remove domain-specific default "val_bpb" from CRD metricName field
- Guard maxTimeoutSeconds parse in monetize.py

Low:
- Add provenance propagation test for build_registration_doc
- Fix doc type inconsistency in coordination-protocol.md
- Assert all 6 Provenance fields in store_test.go

# Conflicts:
#	cmd/obol/sell.go
#	internal/embed/infrastructure/base/templates/serviceoffer-crd.yaml
#	internal/embed/skills/autoresearch-coordinator/references/coordination-protocol.md
#	internal/embed/skills/autoresearch-coordinator/scripts/coordinate.py
#	internal/embed/skills/autoresearch/SKILL.md
#	internal/embed/skills/autoresearch/scripts/publish.py
#	internal/embed/skills/sell/scripts/monetize.py
#	internal/inference/store_test.go
#	tests/test_sell_registration_metadata.py
- Add `obol sell probe <name> -n <ns>` — sends unauthenticated request
  through Traefik to verify the endpoint returns 402 with x402 pricing.
- Create flow-06-sell-setup.sh: idempotent ServiceOffer creation
- Create flow-07-sell-verify.sh: wait for reconciliation + verify conditions
- Create flow-10-anvil-facilitator.sh: Anvil fork + x402-rs facilitator
- Create flow-08-buy.sh: EIP-712 sign + paid request through Traefik

All flows pass end-to-end: sell → verify → pay → inference (HTTP 200).
- Add skills and domains array fields to ServiceOffer CRD registration
  schema. The --register-skills CLI flag was rejected by strict CRD
  validation because the fields were missing.
- Fix coordinator.py parse_402_pricing to handle x402 V1 standard
  format (accepts[] array) in addition to flat top-level fields.

Validated: worker ServiceOffer → 402 gate → .well-known → discovery →
coordinator probe.
Paths validated against https://github.com/agntcy/oasf (cloned at R&D/oasf/).

Skills:
- natural_language_processing/text_generation/chat_completion → natural_language_processing/natural_language_generation/text_completion
- machine_learning/model_optimization → analytical_skills/model_optimization

Domains:
- technology/artificial_intelligence → technology/data_science
- technology/artificial_intelligence/research → research_and_development/scientific_research

Fixed in: monetize.py, coordinate.py, monetize-guide SKILL.md, autoresearch SKILL.md,
autoresearch-worker SKILL.md, coordinator references, seller-prompt.md
analytical_skills/model_optimization was invented (doesn't exist in OASF).
Closest real path: devops_mlops/model_versioning (validated against
agntcy/oasf schema repo at R&D/oasf/).

Also confirms prior fixes:
- natural_language_processing/natural_language_generation/text_completion
- technology/data_science
- research_and_development/scientific_research
LiteLLM does NOT auto-append /v1 for openai/ provider routes.
WarnAndStripV1Suffix was removing /v1 from api_base before storing
in the ConfigMap, causing LiteLLM to hit /chat/completions (404)
instead of /v1/chat/completions.

Verified against LiteLLM source (gpt_transformation.py:get_complete_url):
  api_base is passed as-is to the OpenAI SDK base_url, which appends
  /chat/completions directly. The /v1 must be in api_base.

Removed from all 3 call sites:
- cmd/obol/model.go (obol model setup custom)
- internal/openclaw/openclaw.go (promptForDirectProvider)
- internal/openclaw/openclaw.go (promptForCustomProvider)
Adds a new "External LAN Resources" section to the monetize-guide SKILL.md
covering the flow for selling GPU servers or inference endpoints on the
local network (e.g., DGX Spark running vLLM).

The path: obol model setup custom (bridge into LiteLLM) → obol sell http
(create ServiceOffer pointing at LiteLLM). Documents that LAN IPs are
reachable from k3d without additional config, and that --endpoint must
include /v1 since LiteLLM does not auto-append it.

Validated end-to-end with Nemotron 3 Super 120B on 2x DGX Spark:
- obol model setup custom validates and adds to LiteLLM
- obol sell http creates ServiceOffer
- Agent heartbeat reconciles in ~90s → all 6 conditions True
- 402 gating works locally and through Cloudflare tunnel
- .well-known and /skill.md discovery updated automatically
…cherry-pick

Cherry-picking PR #265 into feat/monetize-path had conflicts in sell.go
that dropped 3 blocks of code from main:

1. obol sell inference — cluster-aware routing: detects k3d cluster,
   creates K8s Service+Endpoints bridge to host gateway, creates
   ServiceOffer, auto-starts tunnel via EnsureTunnelForSell()

2. obol sell http — auto-tunnel: calls EnsureTunnelForSell() after
   creating a ServiceOffer so the endpoint is immediately public

3. obol sell delete — auto-stop tunnel: when the last ServiceOffer is
   deleted, stops the quick tunnel and removes the storefront

Also restores:
- NoPaymentGate field on Deployment and GatewayConfig structs
- createHostService(), resolveHostIP(), buildInferenceServiceOfferSpec()
- net, runtime, strconv, stack imports
LiteLLM PyPI packages 1.82.7 and 1.82.8 contain a malicious .pth file
(litellm_init.pth) that exfiltrates environment variables, SSH keys,
cloud credentials, and Kubernetes configs to an external endpoint.

See: BerriAI/litellm#24512

Our template used the floating tag `main-stable` which could pull a
compromised build. Pin to `main-v1.82.3` (confirmed safe, matches
the version currently running in our clusters).

Never use floating tags for security-sensitive dependencies.
The cherry-pick of PR #265 dropped _build_skill_md() and
_publish_skill_md() from monetize.py, along with their 3 call sites
in cmd_process. This meant /skill.md would never be created or updated
on a fresh cluster.

Restores:
- _build_skill_md(): generates service catalog markdown from Ready offers
- _publish_skill_md(): creates/updates ConfigMap + Deployment + Service +
  HTTPRoute for the /skill.md endpoint
- 3 call sites in cmd_process:
  1. Empty skill.md when no offers exist
  2. Full skill.md when all offers are Ready
  3. Regenerate after reconciliation loop
No tests existed for tunnel state persistence or auto-stop decision
logic — this is why the cherry-pick drift went undetected.

New tests in tunnel_lifecycle_test.go:
- State round-trip (save/load, quick & dns modes)
- Missing state file returns (nil, nil)
- State overwrite replaces previous
- File permissions (0600 for non-secret metadata)
- UpdatedAt timestamp refresh on save
- tunnelModeAndURL derivation
- shouldAutoStopTunnel decision logic (5 cases covering the logic
  from sell delete: stop quick tunnels when empty, never stop dns)
- Exported LoadTunnelState wrapper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant