One FinOps toolkit, three surfaces over the same cost data — audit what you spend, predict what a PR will cost, and ask about both in plain language.
flowchart LR
SRC["Cloud accounts · AWS · GCP · Azure<br/>+ Terraform plans"]
subgraph SYS["CloudOracle — one FinOps toolkit"]
direction TB
V1["v1 — Audit<br/>ingest live spend, run rules<br/>→ executive PDF + dashboard"]
V2["v2 — PR check<br/>price a Terraform plan pre-merge<br/>→ GitHub PR cost comment"]
V3["v3 — Insights Agent<br/>ask FinOps questions in plain language<br/>→ natural-language answers"]
end
SRC --> V1
SRC --> V2
V1 -. cost data .-> V3
A Go FinOps toolkit spanning three modes — two from the same oracle binary, plus a polyglot Python agent extension:
- v1 — Audit existing cloud spend. Ingest live EC2/RDS/EBS/Lambda inventory from AWS, GCP, or Azure into Postgres, run deterministic rules over it, and produce an executive PDF + dashboard with an LLM-narrated summary. See docs/v1-guide.md.
- v2 — Predict cost impact of a Terraform PR before merge. Read
terraform show -json plan.tfplan, look every changing resource up against the AWS Pricing API, and post (or upsert) a Markdown comment on the PR with the net monthly delta, top movers, and a 1–3 sentence LLM narrative. Ships as a GitHub Action and as theoracle pr-checksubcommand. See docs/v2-guide.md. - v3 — Insights Agent. Polyglot Go + Python extension adding agentic FinOps analysis on top of v1/v2 cost data — a hand-rolled LangGraph supervisor over specialist agents, RAG over a FinOps corpus (pgvector), production guardrails, real billing via AWS Cost Explorer, and a CLI + HTTP surface. See v3 — Insights Agent below, docs/v3-guide.md, and insights-agent/README.md.
A Python sibling of the Go server that lets you ask FinOps questions in
natural language. The agent decides which /api/v1 endpoint to call, fetches
the data over HTTP, and answers in the user's language — surfacing the
"snapshot approximation" caveat when accuracy matters.
flowchart LR
U([User]) -->|"How much did I spend on AWS?"| CLI[insights-agent CLI<br/>Python 3.12]
CLI --> G[LangGraph supervisor<br/>3 specialists + synthesize]
G -->|"bind_tools"| LLM[Gemini 2.5 Flash]
LLM -->|"HTTP tool call"| T[CloudOracle tools<br/>cost-summary / cost-by-service / recommendations / cost-trends / inventory]
T -->|"GET /api/v1/* + X-API-Key"| GO[CloudOracle Go<br/>oracle serve]
GO -->|"SQL"| DB[(PostgreSQL<br/>cost_snapshots)]
GO -->|"data_source: snapshots_approximation / billing_aws_cost_explorer / heuristic_rules"| T
LLM -->|"knowledge tool call"| R[finops_knowledge_search<br/>RAG]
R -->|"similarity search"| VDB[(pgvector<br/>finops_knowledge)]
T --> LLM
R --> LLM
LLM -->|"natural-language answer"| CLI
CLI --> U
The agent ships five HTTP tools — two cost endpoints (totals per provider,
per-service breakdown), a savings-recommendations endpoint ("where can I save
money?") from the rule-based analyzer, a cost-trends endpoint ("is my spend
growing?") with a per-day series and precomputed change summary, and a
resource-inventory endpoint ("what do I have?") — plus a sixth RAG tool,
finops_knowledge_search, that answers conceptual / policy questions from a
curated FinOps corpus embedded in pgvector. RAG is optional (enabled by
DATABASE_URL). Setup, env vars, the RAG ingestion step, and the smoke test are
documented in insights-agent/README.md.
CloudOracle parses a Terraform plan, prices every changing resource, and posts a PR comment like this:
Net monthly change: +$389.35 🔴
The Aurora cluster instance dominates this change at ~$204/month — over half the total. If this is intended for a non-production environment, an
aws_db_instancerunningdb.t3.mediumwould land around $60/mo for similar functional coverage.
Resource Action Δ Monthly Confidence aws_rds_cluster_instance.aurora🆕 create +$204.40 low aws_db_instance.db🆕 create +$71.36 low aws_instance.web🆕 create +$64.74 low
Drop this workflow into .github/workflows/cost-comment.yml:
name: Terraform Plan Cost Comment
on:
pull_request:
paths: ['**.tf']
permissions:
pull-requests: write
id-token: write
contents: read
jobs:
cost:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsCloudOracle
aws-region: us-east-2
- uses: hashicorp/setup-terraform@v3
- run: terraform init && terraform plan -out=tf.plan
- run: terraform show -json tf.plan > tf-plan.json
- uses: Cro22/CloudOracle@v2.0.0
with:
plan-file: tf-plan.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}For Action inputs, CLI flags, exit codes, LLM narrative behavior, and the list of supported resources, see docs/v2-guide.md.
docker compose up --build
docker compose exec app /app/cloudoracle seed --count 120
# → open http://localhost:8080The synthetic provider needs no credentials. To run against AWS / GCP / Azure, see docs/cloud-providers.md. For the full walkthrough (PDF reports, dashboard, LLM setup, exports, trends), see docs/v1-guide.md.
| Component | Technology |
|---|---|
| Language | Go 1.25 |
| Database | PostgreSQL 16 (Alpine) |
| DB Driver | pgx v5 (connection pool) |
| AWS SDK | aws-sdk-go-v2 (EC2, RDS, Lambda, STS) |
| GCP SDK | Google Cloud Go (Compute, SQL, Functions) |
| Azure SDK | Azure SDK for Go (Compute, SQL, App Service) |
| Concurrency | golang.org/x/sync/errgroup |
| Logging | log/slog (structured, text/JSON) |
| go-pdf/fpdf | |
| LLM | Gemini / Claude / OpenAI |
| Testing | testing + httptest |
| Containers | Docker Compose + multi-stage Dockerfile |
- docs/v3-guide.md — Insights Agent: architecture (supervisor, RAG, guardrails), the
/api/v1contract +data_sourcesemantics, real billing, and how to run the CLI/HTTP surface - docs/v2-guide.md — Terraform PR cost analysis (Action inputs, CLI flags, exit codes, supported resources)
- docs/v1-guide.md — Cloud cost audit walkthrough (seed, analyze, PDF, dashboard, LLM setup, sample output)
- docs/architecture.md — v1/v2 internal layout, analyzer + LLM provider design, architecture decisions, lessons learned
- docs/cloud-providers.md — AWS, GCP, Azure setup (credentials, IAM scopes, region config)
- docs/configuration.md — environment variables reference
- docs/testing.md — unit and integration test strategy and coverage
- Milestone 8.0 — Authenticated
/api/v1/cost-summaryand/api/v1/cost-by-serviceGo endpoints (X-API-Key, snapshot-derived totals with explicitdata_sourcedisclaimer, machine-readable error codes) - Milestone 8.1 — Python
insights-agentsibling: LangGraphcreate_react_agentgraph with two CloudOracle tools, Gemini provider, pydantic-settings config, structlog matching the Go slog format, CLI with--verbose/--jsonflags, 92% test coverage with mocked LLM + mocked HTTP. See insights-agent/ - Milestone 8.2 — Additional agent tools, each a new authenticated v1 endpoint:
GET /api/v1/recommendations(rule-based savings,data_source: heuristic_rules),GET /api/v1/cost-trends(per-day series with precomputed change/direction), andGET /api/v1/inventory(resource counts + cost by provider/service,data_source: live_inventory) — wired ascloudoracle_recommendations/cloudoracle_cost_trends/cloudoracle_inventorytools. Agent now ships 5 tools - Milestone 8.3 — pgvector + RAG over a curated FinOps corpus: packaged markdown knowledge base, Gemini embeddings (mirroring the LLM-provider ABC),
langchain-postgresPGVector store (compose image →pgvector/pgvector:pg16),insights-agent-ingestCLI, and afinops_knowledge_searchtool the agent uses for conceptual/policy questions with source citations. Optional viaDATABASE_URL; retrieval path unit-tested offline with an in-memory store - Milestone 8.4 — Hand-rolled supervisor multi-agent graph replacing
create_react_agent: aStateGraphwhere a tool-call-routing supervisor delegates to three specialist workers (cost analyst, savings advisor, concept expert — each its own hand-rolled ReAct loop) and a synthesizer composes the answer, with a hop cap. Driveable end-to-end by the scripted fake model;create_react_agentkept as the simple graph - Milestone 8.5 — Production guardrails: per-run cost/usage caps (
RunLimits); layered semantic answer validation (deterministic figure-grounding against tool observations, then an optional LLM judge); deterministic no-LLM fallback on run failure or failed validation; and a FastAPI HTTP surface (POST /ask,GET /health, optionalX-API-Key) sharing oneGeminiAgentRunnerwith the CLI - Milestone 8.7 — Real billing integration behind a
billing.Sourceabstraction: the v1 cost endpoints now consume normalized cost records, with the snapshot approximation as the default source and an AWS Cost Explorer source (real unblended cost,data_source: billing_aws_cost_explorer) selectable viaCLOUDORACLE_BILLING_PROVIDER=aws_cost_explorer. GCP (BigQuery export) and Azure (Cost Management) sources can plug into the same interface next
- Terraform plan parser —
internal/iacreadsterraform show -jsoninto a typedPlanmodel with action classification (create / update / replace / delete / no-op) andafter_unknownhandling - AWS Pricing API client + cache —
internal/pricing.Clientwraps AWS SDK v2pricing:GetProducts;internal/pricing.Cacheadds a 7-day disk cache keyed by service+filters - Per-resource estimators — EC2, EBS, RDS, Aurora cluster instance, Lambda, NAT gateway with breakdown line items and assumption notes
- CostDiff aggregator —
internal/diff.Analyzecollapses per-resource estimates into a plan-wide picture with Created / Deleted / Updated / Replaced / Skipped slices, top movers, and aggregate confidence - Markdown renderer —
internal/diff.RenderMarkdownproduces the canonical PR comment (header / top movers table / full breakdown / caveats / marker footer), templated and golden-tested - LLM-narrated PR comment —
RenderMarkdownWithLLMswaps the templated narrative for a 1–3 sentence LLM output with caveat grouping, sanity checks (length cap, preamble strip, paragraph-break warn), and silent fallback to the templated text on any failure - GitHub REST client —
internal/github.PostOrUpdateCommentlists, finds-by-marker, and PATCHes / POSTs; paginated with cap, body truncation guard at 60KB, multi-match resolution to most-recently-updated -
oracle pr-checksubcommand — orchestrates the whole pipeline, with differentiated exit codes (1 input / 2 pricing / 3 output / 4 github) and--no-llm/--postswitches - GitHub Action packaging —
Dockerfile.action,action.yml, POSIXentrypoint.shthat auto-extracts the PR number fromGITHUB_REFonpull_request[_target]events; reference workflows under.github/examples/
- LLM-powered analysis: executive summaries generated by Gemini / Claude / OpenAI
- PDF report generation with executive summary and severity-coded tables
- Real AWS integration via SDK (EC2, RDS, EBS, Lambda with STS validation and graceful degradation)
- Multi-cloud support (GCP, Azure) with Compute, SQL, Disks, and Functions for each provider
- Cost trend tracking over time (automatic snapshots on seed +
trendcommand) - Parallel fetch with
errgroupand per-servicecontext.WithTimeout - Structured logging with
log/slog(text or JSON output, level-configurable) - Centralized configuration loaded once and injected as typed structs
- Export findings to JSON/CSV (stdout or file, RFC 4180 escaping, pipeline-friendly)
- Web dashboard with cost visualizations (React + Recharts + Tailwind v4, embedded in the Go binary via
go:embed, served byoracle serve) - SDK-client interfaces for real-provider unit tests — every provider fetcher (AWS / GCP / Azure) is exercised against fake SDK clients, covering pagination, per-service errors, and graceful degradation
- Fail-fast configuration validation —
config.Load() (Config, error)accumulates every invalid env var into a single readable error, with cross-field rules (provider=gcp withoutGOOGLE_CLOUD_PROJECT,LLM_PROVIDER=claudewithoutANTHROPIC_API_KEY, etc.) - Resilient LLM HTTP layer — shared
RoundTripperretries 429/5xx/network errors with exponential-backoff-with-full-jitter, honorsRetry-After, replays request bodies, cancellable via context - testcontainers-based integration tests — real Postgres 16 in Docker via
testcontainers-go, gated by//go:build integration, with a full seed → analyze E2E test and a GitHub Actions workflow that runs both unit and integration tiers
Apache 2.0