Skip to content

Cro22/CloudOracle

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Repository files navigation

CloudOracle

TestsGo Version License

One FinOps toolkit, three surfaces over the same cost data — audit what you spend, predict what a PR will cost, and ask about both in plain language.

flowchart LR
    SRC["Cloud accounts · AWS · GCP · Azure<br/>+ Terraform plans"]

    subgraph SYS["CloudOracle — one FinOps toolkit"]
      direction TB
      V1["v1 — Audit<br/>ingest live spend, run rules<br/>→ executive PDF + dashboard"]
      V2["v2 — PR check<br/>price a Terraform plan pre-merge<br/>→ GitHub PR cost comment"]
      V3["v3 — Insights Agent<br/>ask FinOps questions in plain language<br/>→ natural-language answers"]
    end

    SRC --> V1
    SRC --> V2
    V1 -. cost data .-> V3
Loading

A Go FinOps toolkit spanning three modes — two from the same oracle binary, plus a polyglot Python agent extension:

  • v1 — Audit existing cloud spend. Ingest live EC2/RDS/EBS/Lambda inventory from AWS, GCP, or Azure into Postgres, run deterministic rules over it, and produce an executive PDF + dashboard with an LLM-narrated summary. See docs/v1-guide.md.
  • v2 — Predict cost impact of a Terraform PR before merge. Read terraform show -json plan.tfplan, look every changing resource up against the AWS Pricing API, and post (or upsert) a Markdown comment on the PR with the net monthly delta, top movers, and a 1–3 sentence LLM narrative. Ships as a GitHub Action and as the oracle pr-check subcommand. See docs/v2-guide.md.
  • v3 — Insights Agent. Polyglot Go + Python extension adding agentic FinOps analysis on top of v1/v2 cost data — a hand-rolled LangGraph supervisor over specialist agents, RAG over a FinOps corpus (pgvector), production guardrails, real billing via AWS Cost Explorer, and a CLI + HTTP surface. See v3 — Insights Agent below, docs/v3-guide.md, and insights-agent/README.md.

v3 — Insights Agent

A Python sibling of the Go server that lets you ask FinOps questions in natural language. The agent decides which /api/v1 endpoint to call, fetches the data over HTTP, and answers in the user's language — surfacing the "snapshot approximation" caveat when accuracy matters.

flowchart LR
    U([User]) -->|"How much did I spend on AWS?"| CLI[insights-agent CLI<br/>Python 3.12]
    CLI --> G[LangGraph supervisor<br/>3 specialists + synthesize]
    G -->|"bind_tools"| LLM[Gemini 2.5 Flash]
    LLM -->|"HTTP tool call"| T[CloudOracle tools<br/>cost-summary / cost-by-service / recommendations / cost-trends / inventory]
    T -->|"GET /api/v1/* + X-API-Key"| GO[CloudOracle Go<br/>oracle serve]
    GO -->|"SQL"| DB[(PostgreSQL<br/>cost_snapshots)]
    GO -->|"data_source: snapshots_approximation / billing_aws_cost_explorer / heuristic_rules"| T
    LLM -->|"knowledge tool call"| R[finops_knowledge_search<br/>RAG]
    R -->|"similarity search"| VDB[(pgvector<br/>finops_knowledge)]
    T --> LLM
    R --> LLM
    LLM -->|"natural-language answer"| CLI
    CLI --> U
Loading

The agent ships five HTTP tools — two cost endpoints (totals per provider, per-service breakdown), a savings-recommendations endpoint ("where can I save money?") from the rule-based analyzer, a cost-trends endpoint ("is my spend growing?") with a per-day series and precomputed change summary, and a resource-inventory endpoint ("what do I have?") — plus a sixth RAG tool, finops_knowledge_search, that answers conceptual / policy questions from a curated FinOps corpus embedded in pgvector. RAG is optional (enabled by DATABASE_URL). Setup, env vars, the RAG ingestion step, and the smoke test are documented in insights-agent/README.md.

v2 — Quick start

CloudOracle parses a Terraform plan, prices every changing resource, and posts a PR comment like this:

💰 Cloud Cost Impact

Net monthly change: +$389.35 🔴

The Aurora cluster instance dominates this change at ~$204/month — over half the total. If this is intended for a non-production environment, an aws_db_instance running db.t3.medium would land around $60/mo for similar functional coverage.

Top movers by cost impact

Resource Action Δ Monthly Confidence
aws_rds_cluster_instance.aurora 🆕 create +$204.40 low
aws_db_instance.db 🆕 create +$71.36 low
aws_instance.web 🆕 create +$64.74 low

Drop this workflow into .github/workflows/cost-comment.yml:

name: Terraform Plan Cost Comment
on:
  pull_request:
    paths: ['**.tf']

permissions:
  pull-requests: write
  id-token: write
  contents: read

jobs:
  cost:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsCloudOracle
          aws-region: us-east-2
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init && terraform plan -out=tf.plan
      - run: terraform show -json tf.plan > tf-plan.json
      - uses: Cro22/CloudOracle@v2.0.0
        with:
          plan-file: tf-plan.json
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

For Action inputs, CLI flags, exit codes, LLM narrative behavior, and the list of supported resources, see docs/v2-guide.md.

v1 — Quick start

docker compose up --build
docker compose exec app /app/cloudoracle seed --count 120
# → open http://localhost:8080

The synthetic provider needs no credentials. To run against AWS / GCP / Azure, see docs/cloud-providers.md. For the full walkthrough (PDF reports, dashboard, LLM setup, exports, trends), see docs/v1-guide.md.

Tech Stack

Component Technology
Language Go 1.25
Database PostgreSQL 16 (Alpine)
DB Driver pgx v5 (connection pool)
AWS SDK aws-sdk-go-v2 (EC2, RDS, Lambda, STS)
GCP SDK Google Cloud Go (Compute, SQL, Functions)
Azure SDK Azure SDK for Go (Compute, SQL, App Service)
Concurrency golang.org/x/sync/errgroup
Logging log/slog (structured, text/JSON)
PDF go-pdf/fpdf
LLM Gemini / Claude / OpenAI
Testing testing + httptest
Containers Docker Compose + multi-stage Dockerfile

Documentation

  • docs/v3-guide.md — Insights Agent: architecture (supervisor, RAG, guardrails), the /api/v1 contract + data_source semantics, real billing, and how to run the CLI/HTTP surface
  • docs/v2-guide.md — Terraform PR cost analysis (Action inputs, CLI flags, exit codes, supported resources)
  • docs/v1-guide.md — Cloud cost audit walkthrough (seed, analyze, PDF, dashboard, LLM setup, sample output)
  • docs/architecture.md — v1/v2 internal layout, analyzer + LLM provider design, architecture decisions, lessons learned
  • docs/cloud-providers.md — AWS, GCP, Azure setup (credentials, IAM scopes, region config)
  • docs/configuration.md — environment variables reference
  • docs/testing.md — unit and integration test strategy and coverage

Roadmap

v3 — Insights Agent

  • Milestone 8.0 — Authenticated /api/v1/cost-summary and /api/v1/cost-by-service Go endpoints (X-API-Key, snapshot-derived totals with explicit data_source disclaimer, machine-readable error codes)
  • Milestone 8.1 — Python insights-agent sibling: LangGraph create_react_agent graph with two CloudOracle tools, Gemini provider, pydantic-settings config, structlog matching the Go slog format, CLI with --verbose / --json flags, 92% test coverage with mocked LLM + mocked HTTP. See insights-agent/
  • Milestone 8.2 — Additional agent tools, each a new authenticated v1 endpoint: GET /api/v1/recommendations (rule-based savings, data_source: heuristic_rules), GET /api/v1/cost-trends (per-day series with precomputed change/direction), and GET /api/v1/inventory (resource counts + cost by provider/service, data_source: live_inventory) — wired as cloudoracle_recommendations / cloudoracle_cost_trends / cloudoracle_inventory tools. Agent now ships 5 tools
  • Milestone 8.3 — pgvector + RAG over a curated FinOps corpus: packaged markdown knowledge base, Gemini embeddings (mirroring the LLM-provider ABC), langchain-postgres PGVector store (compose image → pgvector/pgvector:pg16), insights-agent-ingest CLI, and a finops_knowledge_search tool the agent uses for conceptual/policy questions with source citations. Optional via DATABASE_URL; retrieval path unit-tested offline with an in-memory store
  • Milestone 8.4 — Hand-rolled supervisor multi-agent graph replacing create_react_agent: a StateGraph where a tool-call-routing supervisor delegates to three specialist workers (cost analyst, savings advisor, concept expert — each its own hand-rolled ReAct loop) and a synthesizer composes the answer, with a hop cap. Driveable end-to-end by the scripted fake model; create_react_agent kept as the simple graph
  • Milestone 8.5 — Production guardrails: per-run cost/usage caps (RunLimits); layered semantic answer validation (deterministic figure-grounding against tool observations, then an optional LLM judge); deterministic no-LLM fallback on run failure or failed validation; and a FastAPI HTTP surface (POST /ask, GET /health, optional X-API-Key) sharing one GeminiAgentRunner with the CLI
  • Milestone 8.7 — Real billing integration behind a billing.Source abstraction: the v1 cost endpoints now consume normalized cost records, with the snapshot approximation as the default source and an AWS Cost Explorer source (real unblended cost, data_source: billing_aws_cost_explorer) selectable via CLOUDORACLE_BILLING_PROVIDER=aws_cost_explorer. GCP (BigQuery export) and Azure (Cost Management) sources can plug into the same interface next

v2 — Terraform PR cost analysis

  • Terraform plan parser — internal/iac reads terraform show -json into a typed Plan model with action classification (create / update / replace / delete / no-op) and after_unknown handling
  • AWS Pricing API client + cache — internal/pricing.Client wraps AWS SDK v2 pricing:GetProducts; internal/pricing.Cache adds a 7-day disk cache keyed by service+filters
  • Per-resource estimators — EC2, EBS, RDS, Aurora cluster instance, Lambda, NAT gateway with breakdown line items and assumption notes
  • CostDiff aggregator — internal/diff.Analyze collapses per-resource estimates into a plan-wide picture with Created / Deleted / Updated / Replaced / Skipped slices, top movers, and aggregate confidence
  • Markdown renderer — internal/diff.RenderMarkdown produces the canonical PR comment (header / top movers table / full breakdown / caveats / marker footer), templated and golden-tested
  • LLM-narrated PR comment — RenderMarkdownWithLLM swaps the templated narrative for a 1–3 sentence LLM output with caveat grouping, sanity checks (length cap, preamble strip, paragraph-break warn), and silent fallback to the templated text on any failure
  • GitHub REST client — internal/github.PostOrUpdateComment lists, finds-by-marker, and PATCHes / POSTs; paginated with cap, body truncation guard at 60KB, multi-match resolution to most-recently-updated
  • oracle pr-check subcommand — orchestrates the whole pipeline, with differentiated exit codes (1 input / 2 pricing / 3 output / 4 github) and --no-llm / --post switches
  • GitHub Action packaging — Dockerfile.action, action.yml, POSIX entrypoint.sh that auto-extracts the PR number from GITHUB_REF on pull_request[_target] events; reference workflows under .github/examples/

v1 — Cloud cost audit

  • LLM-powered analysis: executive summaries generated by Gemini / Claude / OpenAI
  • PDF report generation with executive summary and severity-coded tables
  • Real AWS integration via SDK (EC2, RDS, EBS, Lambda with STS validation and graceful degradation)
  • Multi-cloud support (GCP, Azure) with Compute, SQL, Disks, and Functions for each provider
  • Cost trend tracking over time (automatic snapshots on seed + trend command)
  • Parallel fetch with errgroup and per-service context.WithTimeout
  • Structured logging with log/slog (text or JSON output, level-configurable)
  • Centralized configuration loaded once and injected as typed structs
  • Export findings to JSON/CSV (stdout or file, RFC 4180 escaping, pipeline-friendly)
  • Web dashboard with cost visualizations (React + Recharts + Tailwind v4, embedded in the Go binary via go:embed, served by oracle serve)
  • SDK-client interfaces for real-provider unit tests — every provider fetcher (AWS / GCP / Azure) is exercised against fake SDK clients, covering pagination, per-service errors, and graceful degradation
  • Fail-fast configuration validation — config.Load() (Config, error) accumulates every invalid env var into a single readable error, with cross-field rules (provider=gcp without GOOGLE_CLOUD_PROJECT, LLM_PROVIDER=claude without ANTHROPIC_API_KEY, etc.)
  • Resilient LLM HTTP layer — shared RoundTripper retries 429/5xx/network errors with exponential-backoff-with-full-jitter, honors Retry-After, replays request bodies, cancellable via context
  • testcontainers-based integration tests — real Postgres 16 in Docker via testcontainers-go, gated by //go:build integration, with a full seed → analyze E2E test and a GitHub Actions workflow that runs both unit and integration tiers

License

Apache 2.0

About

Multi-cloud FinOps tool that analyzes AWS, GCP & Azure resources to find waste and optimize costs. Built in Go with LLM-powered insights and React dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors