Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .workspace
4 changes: 4 additions & 0 deletions autoresearch.checks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -euo pipefail
go build ./...
go test ./... # unit tests only (no -tags integration)
3 changes: 3 additions & 0 deletions autoresearch.config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"workingDir": "/Users/bussyjd/Development/Obol_Workbench/obol-stack/.worktrees/autoresearch"
}
11 changes: 11 additions & 0 deletions autoresearch.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{"type":"config","name":"Obol Stack Real User Flow Validation","metricName":"steps_passed","metricUnit":"","bestDirection":"higher"}
{"run":1,"commit":"f1bbe63","metric":44,"metrics":{"total_steps":57},"status":"keep","description":"Baseline: 44/57 steps passed. Failures: exec-in-container (flow-03), LiteLLM inference timeout (flow-03), ServiceOffer not reconciled (flow-06), 404 on /services (flow-07/08), x402 metrics missing (flow-07), false passes on cast balance checks (flow-10/08).","timestamp":1773861210844,"segment":0}
{"run":2,"commit":"f155993","metric":45,"metrics":{"total_steps":57},"status":"keep","description":"+1: flow-03 all fixed (python3 exec, LiteLLM auth, right model, tool-calls), flow-08 discovery fixed. Heartbeat still not firing in 8min window.","timestamp":1773863343535,"segment":0}
{"run":3,"commit":"1001739","metric":56,"metrics":{"total_steps":57},"status":"keep","description":"56/57: massive jump from 44. Only remaining failure: blockrun-llm not installed (§2.3 paid inference). All timing, flow script, cast env, and heartbeat fixes working.","timestamp":1773864045469,"segment":0}
{"run":4,"commit":"71ae55a","metric":58,"metrics":{"total_steps":58},"status":"keep","description":"58/58 all passing! Native EIP-712/ERC-3009 payment signing replaces blockrun-llm, heartbeat ConfigMap re-patched after tunnel sync. +1 step from prerequisites check.","timestamp":1773865239172,"segment":0}
{"run":5,"commit":"1720955","metric":59,"metrics":{"total_steps":60},"status":"keep","description":"59/60: flow reorder fixed verifier metrics. Still 1 remaining (metrics per-pod load balancing). Heartbeat intermittently misses 8min window. Tunnel sync idempotency fix in progress.","timestamp":1773867817767,"segment":0}
{"run":6,"commit":"047e6dc","metric":61,"metrics":{"total_steps":61},"status":"keep","description":"61/61 perfect score! All flows passing. Rollout wait before heartbeat poll eliminates timing race.","timestamp":1773868214159,"segment":0}
{"run":7,"commit":"4dd2e8e","metric":61,"metrics":{"total_steps":61},"status":"keep","description":"61/61 confirmed stable on 2nd consecutive run. +38.6% from baseline of 44.","timestamp":1773868628792,"segment":0}
{"run":8,"commit":"0bb590c","metric":62,"metrics":{"total_steps":62},"status":"keep","description":"62/62: added eRPC accessibility check covering monetize §1.6 gap. All documented user flow steps now covered.","timestamp":1773869201018,"segment":0}
{"run":9,"commit":"a846853","metric":62,"metrics":{"total_steps":62},"status":"keep","description":"62/62 stable on 3rd consecutive run. +40.9% from baseline of 44. All user flows fully validated end-to-end.","timestamp":1773869625692,"segment":0}
{"run":10,"commit":"25c988a","metric":62,"metrics":{"total_steps":62},"status":"keep","description":"62/62 with docs fixes. getting-started LiteLLM auth fixed, monetize §1.6 eRPC path corrected, /.well-known clarified.","timestamp":1773870193118,"segment":0}
104 changes: 104 additions & 0 deletions autoresearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Autoresearch: Obol Stack Real User Flow Validation

## Objective
Validate that every documented user journey in Obol Stack works exactly as a
real human would experience it. Fix CLI bugs, error messages, timing issues,
and UX problems. Improve the flow scripts themselves when they're incomplete.

## Metric
steps_passed (count, higher is better) — each flow script emits STEP/PASS/FAIL.

## Source of Truth for User Flows
- `docs/getting-started.md` — Steps 1-6 (install → inference → agent → networks)
- `docs/guides/monetize-inference.md` — Parts 1-4 (sell → buy → facilitator → lifecycle)

Every numbered section in these docs MUST have a corresponding step in a flow script.
If a doc section has no flow coverage, that is a gap — add it.

## Self-Improving Research Rules
When a flow fails, determine WHY before fixing anything:

1. **Missing prerequisite?** (e.g., model not pulled, Anvil not running, Foundry
not installed, USDC not funded) → Read the docs above, find the setup step,
ADD it to the flow script, and re-run.

2. **Wrong command/flags?** (e.g., wrong --namespace, missing --port) → Run
`obol <cmd> --help`, read the guide section, fix the flow script.

3. **CLI bug or bad error message?** (e.g., panic, misleading output, wrong exit
code) → Fix the Go source code in cmd/obol/ or internal/, rebuild, re-run.

4. **Timing/propagation issue?** (e.g., 503 because verifier not ready yet) →
Add polling with `obol sell status` or `obol kubectl wait`. If the wait is
unreasonable (>5min), fix the underlying readiness logic in Go.

5. **Doc is wrong?** (e.g., doc says --per-request but CLI wants --price) →
Fix the doc AND update the flow script. The CLI is the source of truth.

The flow scripts AND the obol-stack code are BOTH in scope for modification.

## Files in Scope
### Flow scripts (improve coverage, fix invocations)
- flows/*.sh

### CLI commands (fix bugs, improve UX)
- cmd/obol/sell.go, cmd/obol/openclaw.go, cmd/obol/main.go
- cmd/obol/network.go, cmd/obol/model.go, cmd/obol/stack.go

### Internal logic (fix timing, readiness, error handling)
- internal/stack/stack.go
- internal/openclaw/openclaw.go
- internal/agent/agent.go
- internal/x402/config.go, internal/x402/setup.go

### Documentation (fix if CLI disagrees)
- docs/getting-started.md
- docs/guides/monetize-inference.md

## Off Limits (do NOT modify)
- internal/embed/infrastructure/ (K8s templates — too risky)
- internal/x402/buyer/ (sidecar — separate domain)
- .workspace/ (runtime state)

## Constraints
0. SKIP flow-05-network.sh entirely — do NOT deploy Ethereum clients (reth/lighthouse).
They consume too much disk and network bandwidth. The user will add network coverage later.
1. STRICTLY FORBID: `go run`, direct `kubectl`, curl to pod IPs, `--force` flags
a user wouldn't know, skipping propagation waits
2. All commands must use the built obol binary (`$OBOL_BIN_DIR/obol`)
3. All cluster HTTP access through `obol.stack:8080` or tunnel URL (not localhost)
EXCEPT for documented port-forwards (LiteLLM §3c-3d, agent §5)
4. Must wait for real propagation (poll, don't sleep fixed durations)
5. `go build ./...` and `go test ./...` must pass after every change
6. NEVER run `obol stack down` or `obol stack purge`

## Branching Strategy
Each category of fix goes on its own branch off `main`. Create branches as needed:
- `fix/flow-scripts` — flow script improvements (wrong flags, missing steps, harness fixes)
- `fix/cli-ux` — CLI bugs, error messages, exit codes (Go code in `cmd/obol/`)
- `fix/timing` — readiness/polling/propagation fixes (Go code in `internal/`)
- `fix/docs` — documentation corrections (`docs/`)

Commit each fix individually with a descriptive message. Do NOT push — just commit locally.
Always create a NEW commit (never amend). The user will review branches on wakeup.

## Port-Forward vs Traefik Surfaces

| Surface | Access Method | Doc Reference |
|---------|--------------|---------------|
| LiteLLM direct | `obol kubectl port-forward -n llm svc/litellm 8001:4000` | getting-started §3c-3d |
| Agent inference | `obol kubectl port-forward -n openclaw-<id> svc/openclaw 18789:18789` | getting-started §5 |
| Frontend | `http://obol.stack:8080/` | getting-started §2 |
| eRPC | `http://obol.stack:8080/rpc` | monetize §1.6 |
| Monetized endpoints | `http://obol.stack:8080/services/<name>/*` | monetize §1.6 |
| Discovery | `<tunnel>/.well-known/*` | monetize §2.1 |

## Initial State
- Cluster was wiped clean — no k3d cluster exists
- flow-02 will handle `obol stack init` + `obol stack up` automatically
- obol binary is pre-built at `.workspace/bin/obol`
- macOS DNS: use `$CURL_OBOL` (defined in lib.sh) for `obol.stack` URLs to bypass mDNS delays
- First run will be slow (~5 min for stack up) — subsequent iterations skip init/up

## What's Been Tried
(Agent updates this section as experiments accumulate)
45 changes: 45 additions & 0 deletions autoresearch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash
set -euo pipefail

OBOL_ROOT="$(cd "$(dirname "$0")" && pwd)"
source "$OBOL_ROOT/flows/lib.sh"

# Rebuild binary (what a dev does after code changes)
go build -o "$OBOL" ./cmd/obol || { echo "METRIC steps_passed=0"; exit 1; }

TOTAL_PASSED=0
TOTAL_STEPS=0

run_flow() {
local script="$1"
echo ""
echo "=== Running: $script ==="
local output
output=$(bash "$script" 2>&1) || true
local passed; passed=$(echo "$output" | grep -c "^PASS:" || true)
local steps; steps=$(echo "$output" | grep -c "^STEP:" || true)
TOTAL_PASSED=$((TOTAL_PASSED + passed))
TOTAL_STEPS=$((TOTAL_STEPS + steps))
echo "$output" | grep -E "^(STEP|PASS|FAIL):"
}

# Dependency order:
# - flow-05 is lightweight (RPC management only, no Ethereum clients)
# - flow-10 (anvil) must run before flow-08 (buy)
# - flow-06 (sell setup) must run before flow-07 (sell verify)
for flow in \
flows/flow-01-prerequisites.sh \
flows/flow-02-stack-init-up.sh \
flows/flow-03-inference.sh \
flows/flow-04-agent.sh \
flows/flow-06-sell-setup.sh \
flows/flow-10-anvil-facilitator.sh \
flows/flow-07-sell-verify.sh \
flows/flow-08-buy.sh \
flows/flow-09-lifecycle.sh; do
[ -f "$OBOL_ROOT/$flow" ] && run_flow "$OBOL_ROOT/$flow"
done

echo ""
echo "METRIC steps_passed=$TOTAL_PASSED"
echo "METRIC total_steps=$TOTAL_STEPS"
23 changes: 23 additions & 0 deletions flows/flow-01-prerequisites.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash
# Flow 01: Prerequisites — validate environment before any cluster work.
# No cluster needed. Checks: Docker, Ollama, obol binary.
source "$(dirname "$0")/lib.sh"

# Docker must be running
run_step "Docker daemon running" docker info

# Ollama must be serving
run_step_grep "Ollama serving models" "models" curl -sf http://localhost:11434/api/tags

# obol binary must exist and be executable
step "obol binary exists"
if [ -x "$OBOL" ]; then
pass "obol binary exists at $OBOL"
else
fail "obol binary not found at $OBOL"
fi

# obol version should return something
run_step_grep "obol version" "Version" "$OBOL" version

emit_metrics
37 changes: 37 additions & 0 deletions flows/flow-02-stack-init-up.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash
# Flow 02: Stack Init + Up — getting-started.md §1-2.
# Idempotent: checks if cluster exists, skips init if so.
source "$(dirname "$0")/lib.sh"

# §1: Initialize — skip if cluster already running
step "Check if cluster exists"
if "$OBOL" kubectl cluster-info >/dev/null 2>&1; then
pass "Cluster already running — skipping init"
else
run_step "obol stack init" "$OBOL" stack init
run_step "obol stack up" "$OBOL" stack up
fi

# §2: Verify the cluster — wait for all pods to be Running/Completed
run_step_grep "Nodes ready" "Ready" "$OBOL" kubectl get nodes

# Poll for all pods healthy (fresh cluster needs ~3-4 min for images to pull)
step "All pods Running or Completed (polling, max 60x5s)"
for i in $(seq 1 60); do
pod_output=$("$OBOL" kubectl get pods -A --no-headers 2>&1)
bad_pods=$(echo "$pod_output" | grep -v -E "Running|Completed" || true)
if [ -z "$bad_pods" ]; then
pass "All pods healthy (attempt $i)"
break
fi
if [ "$i" -eq 60 ]; then
fail "Unhealthy pods after 300s: $(echo "$bad_pods" | head -3)"
fi
sleep 5
done

# Frontend via Traefik — wait up to 5 min for DNS + Traefik to be ready
poll_step "Frontend at http://obol.stack:8080/" 60 5 \
$CURL_OBOL -sf --max-time 5 http://obol.stack:8080/

emit_metrics
63 changes: 63 additions & 0 deletions flows/flow-03-inference.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/bin/bash
# Flow 03: LLM Inference — getting-started.md §3a-3d.
# Tests: host Ollama, in-cluster connectivity, LiteLLM inference, tool-calls.
source "$(dirname "$0")/lib.sh"

# §3a: Verify Ollama has models
run_step_grep "Ollama has models on host" "models" \
curl -sf http://localhost:11434/api/tags

# §3b: In-cluster Ollama connectivity — exec into litellm pod (already running)
step "In-cluster Ollama reachable from litellm pod"
out=$("$OBOL" kubectl exec -n llm deployment/litellm -c litellm -- \
wget -qO- http://ollama.llm.svc.cluster.local:11434/api/tags 2>&1) || true
if echo "$out" | grep -q "models"; then
pass "In-cluster Ollama reachable"
else
fail "In-cluster Ollama unreachable — ${out:0:200}"
fi

# §3c: Inference through LiteLLM (port-forward is the documented user path)
step "LiteLLM port-forward + inference"
"$OBOL" kubectl port-forward -n llm svc/litellm 8001:4000 &>/dev/null &
PF_PID=$!

# Poll until port 8001 is accepting connections
for i in $(seq 1 15); do
if curl -sf --max-time 2 http://localhost:8001/health >/dev/null 2>&1; then
break
fi
sleep 2
done

out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{\"model\":\"$FLOW_MODEL\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2? Answer with just the number.\"}],\"max_tokens\":50,\"stream\":false}" 2>&1) || true

if echo "$out" | grep -q "choices"; then
pass "LiteLLM inference returned choices"
else
fail "LiteLLM inference failed — ${out:0:200}"
fi

# §3d: Tool-call passthrough
step "Tool-call passthrough"
tool_out=$(curl -sf --max-time 120 -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model":"'"$FLOW_MODEL"'",
"messages":[{"role":"user","content":"What is the weather in London?"}],
"tools":[{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}],
"max_tokens":100,"stream":false
}' 2>&1) || true

if echo "$tool_out" | grep -q "tool_calls\|get_weather"; then
pass "Tool-call passthrough works"
else
# Small models may not support tool calls reliably — soft fail
fail "Tool-call not returned (model may not support it) — ${tool_out:0:200}"
fi

cleanup_pid "$PF_PID"

emit_metrics
51 changes: 51 additions & 0 deletions flows/flow-04-agent.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/bin/bash
# Flow 04: Agent Init + Inference — getting-started.md §4-5.
# Tests: agent init, openclaw list, token, agent gateway inference.
source "$(dirname "$0")/lib.sh"

# §4: Deploy AI Agent (idempotent)
run_step "obol agent init" "$OBOL" agent init

# List agent instances
run_step_grep "openclaw list shows instances" "obol-agent\|default" "$OBOL" openclaw list

# §5: Test Agent Inference
step "Get openclaw token"
TOKEN=$("$OBOL" openclaw token obol-agent 2>/dev/null || "$OBOL" openclaw token default 2>/dev/null || true)
if [ -n "$TOKEN" ]; then
pass "Got token: ${TOKEN:0:8}..."
else
fail "Failed to get openclaw token"
emit_metrics
exit 0
fi

# Determine the namespace for port-forward
NS=$("$OBOL" openclaw list 2>/dev/null | grep -oE 'openclaw-[a-z0-9-]+' | head -1 || echo "openclaw-obol-agent")

step "Agent inference via port-forward"
"$OBOL" kubectl port-forward -n "$NS" svc/openclaw 18789:18789 &>/dev/null &
PF_PID=$!

# Poll until port 18789 is accepting connections
for i in $(seq 1 15); do
if curl -sf --max-time 2 http://localhost:18789/health >/dev/null 2>&1; then
break
fi
sleep 2
done

out=$(curl -sf --max-time 120 -X POST http://localhost:18789/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "{\"model\":\"$FLOW_MODEL\",\"messages\":[{\"role\":\"user\",\"content\":\"What is 2+2?\"}],\"max_tokens\":50,\"stream\":false}" 2>&1) || true

if echo "$out" | grep -q "choices"; then
pass "Agent inference returned response"
else
fail "Agent inference failed — ${out:0:200}"
fi

cleanup_pid "$PF_PID"

emit_metrics
33 changes: 33 additions & 0 deletions flows/flow-05-network.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash
# Flow 05: Network management — getting-started.md §6.
# SKIPPED per autoresearch.md constraint 0: do NOT deploy Ethereum clients.
# Covers only: network list, network add/remove RPC, eRPC gateway health.
source "$(dirname "$0")/lib.sh"

# List available networks (local nodes + remote RPCs)
run_step_grep "network list" "ethereum\|Remote\|Local" "$OBOL" network list

# eRPC gateway health via obol network status
run_step_grep "eRPC gateway status" "eRPC\|Pod\|Upstream" "$OBOL" network status

# Add a public RPC for base-sepolia (documented user path for RPC access)
run_step "network add base-sepolia RPC" "$OBOL" network add base-sepolia --count 1

# Verify it appears in list
run_step_grep "base-sepolia in network list" "base-sepolia\|84532" "$OBOL" network list

# eRPC is accessible at /rpc/evm/<chainId> — base-sepolia is chain 84532
step "eRPC base-sepolia via Traefik (/rpc/evm/84532)"
out=$($CURL_OBOL -sf --max-time 10 "http://obol.stack:8080/rpc/evm/84532" \
-X POST -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' 2>&1) || true
if echo "$out" | grep -q '"result"'; then
pass "eRPC eth_chainId returned result"
else
fail "eRPC eth_chainId failed — ${out:0:200}"
fi

# Remove the RPC we added (cleanup)
run_step "network remove base-sepolia" "$OBOL" network remove base-sepolia

emit_metrics
Loading
Loading