Skip to content

liu66-qing/CodeMap

Repository files navigation

CodeMap

CodeGraph - Navigate any codebase with AI agents

A multi-agent system that turns complex repositories into guided learning paths

English · 中文 · Live Demo

Multi-Agent 4 Stage Agents Graph-Aware RAG React FastAPI Apache 2.0


The Problem

You want to learn from react, vscode, or langchain — understand how they're built, what makes their design good. You open the repository. 2,000+ files. The README tells you how to install the project, not how to understand it.

You're stuck with these questions:

  • Where's the entry point?
  • How does the main execution flow work?
  • Which modules actually matter?
  • What design patterns are worth learning?
  • How do I go from "confused" to "I actually get it"?

Traditional code search gives you fragments. ChatGPT gives you plausible-sounding answers with no structure. You need a map.


The Solution

CodeGraph is a multi-agent orchestration system that analyzes repositories through four specialized agents, each with distinct responsibilities, tools, and structured outputs.

Instead of a single chatbot that tries to answer everything, CodeGraph coordinates four agents that work sequentially:

  1. OverviewAgent — Understands the repo's positioning, tech stack, and architecture
  2. MainFlowAgent — Traces the core execution path with call graph analysis
  3. ShowcaseAgent — Identifies design patterns and implementation highlights
  4. TakeawayAgent — Extracts reusable patterns you can apply to your own projects

Each agent uses deterministic tools (AST parsing, call tracing, dependency analysis) combined with LLM reasoning. Every tool call and LLM interaction is traced for full observability.

🎯 Try the Live Demo — The hosted demo showcases the frontend learning map interface. Full agent workflows with graph-aware retrieval run through the backend (requires local setup).


How It Works

Agent Responsibility Tools Used Output
OverviewAgent Build mental model of the repo github_fetcher, code_parser, readme_summarizer Positioning, tech stack, architecture summary, reading order
MainFlowAgent Trace main execution flow call_graph_tracer, code_parser, github_fetcher Execution flow diagram with clickable nodes and code evidence
ShowcaseAgent Find design highlights pattern_matcher, architecture_detector, code_parser 3 design tricks with problem/solution/tradeoff/code links
TakeawayAgent Extract reusable patterns All previous outputs + pattern_matcher 3 reusable patterns with implementation snippets and applicability guidance

Each agent:

  • Receives context from the previous agent's output
  • Calls tools through a unified call_tool() interface (auto-traced)
  • Returns structured JSON validated against a schema
  • Records all tool calls, LLM requests, token costs, and latency

Architecture

flowchart TB
    subgraph "Input"
        A[GitHub Repository URL]
    end
    
    subgraph "Ingestion Layer"
        B[Code Parser<br/>AST + Metadata]
        C[Graph Builder<br/>Call Relations + Dependencies]
    end
    
    subgraph "Retrieval Layer"
        D[Hybrid Search<br/>Vector + Keyword]
        E[Graph-Aware Retrieval<br/>Structural Relations]
    end
    
    subgraph "Agent Orchestration"
        F[OverviewAgent]
        G[MainFlowAgent]
        H[ShowcaseAgent]
        I[TakeawayAgent]
    end
    
    subgraph "Tools"
        J[github_fetcher]
        K[code_parser]
        L[call_graph_tracer]
        M[pattern_matcher]
        N[architecture_detector]
    end
    
    subgraph "Observability"
        O[AgentTrace<br/>Tool calls + Token costs]
        P[Frontend Visualization<br/>Timeline + Metrics]
    end
    
    A --> B
    A --> C
    B --> D
    C --> E
    D --> F
    E --> F
    F --> G
    G --> H
    H --> I
    
    F -.calls.-> J
    F -.calls.-> K
    G -.calls.-> L
    G -.calls.-> K
    H -.calls.-> M
    H -.calls.-> N
    I -.calls.-> M
    
    F --> O
    G --> O
    H --> O
    I --> O
    O --> P
    
    I --> Q[Structured Learning Map]
Loading

Key architectural decisions:

  • Agent separation of concerns: Each agent owns one stage of understanding (overview → flow → highlights → takeaways), not a general-purpose "answer any question" interface.
  • Tool-based execution: Agents don't hardcode analysis logic. They compose tools (call_graph_tracer, pattern_matcher) registered at initialization.
  • Graph-aware retrieval: Combines semantic search with code structure (imports, calls, inheritance) — not just vector similarity.
  • Observable by default: Every call_tool() and call_llm() automatically records trace data (args, results, latency, tokens). The frontend visualizes agent execution timelines.

Why Multi-Agent Orchestration?

Most code understanding tools take one of two approaches:

Approach 1: Traditional RAG

Chunk code → Embed → Retrieve similar → Generate answer

Problem: Misses code structure. No understanding of call chains, module boundaries, or architectural patterns.

Approach 2: General chatbot

Paste repo context → Ask questions → Get answers

Problem: No systematic analysis. Answers are reactive, not structured. No staged progression from "what is this" to "how to use this."

CodeMap's approach: Multi-Agent Orchestration

Overview → MainFlow → Showcase → Takeaway
(Each agent uses tools + prior context)
Capability Traditional RAG Chatbot CodeGraph
Systematic repo analysis ✅ 4-stage workflow
Call graph tracing ⚠️ Depends on prompt ✅ Dedicated tool + agent
Structured outputs ⚠️ Schema possible ❌ Free text ✅ JSON schema enforced
Agent specialization ❌ Single model ❌ Single model ✅ 4 specialized agents
Full observability ✅ Tool trace + token metrics
Graph-aware retrieval ❌ Vector only ❌ Context only ✅ Vector + code relations

The key insight: Understanding a codebase is not a single-turn Q&A task. It's a multi-stage workflow where each stage builds on the previous one. Each agent receives context from prior agents and contributes structured knowledge to the next.


Tech Stack

Layer Technology
Frontend React, TypeScript, Vite, Mantine UI, Pixel-style design system
Backend FastAPI, Python 3.11, async/await throughout
Agent System Custom orchestrator, BaseAgent abstraction, tool registration
Retrieval Hybrid search (vector + keyword), graph-aware ranking
Graph Code relationship modeling (calls, imports, dependencies)
Parsing Tree-sitter for multi-language AST parsing
LLM OpenAI-compatible API (configurable: GPT-4, DeepSeek, etc.)
Observability AgentTrace, ToolCall logging, frontend visualization
Deployment Docker Compose (local infra), Vercel (frontend)

Screenshots

Home

CodeGraph home page

Learning Map

4-stage learning map with pixel-game design

Stage Pages

Overview stage

① Overview — Positioning, architecture, mental model

Main Flow stage

② Main Flow — Execution trace with call graph

Showcase stage

③ Showcase — Design highlights and patterns

Takeaway stage

④ Takeaway — Reusable patterns and code templates


Quick Start

Requirements

  • Python 3.11+
  • Node.js 18+
  • Docker and Docker Compose
  • OpenAI-compatible API key

1. Clone the repository

git clone https://github.com/liu66-qing/CodeGraph.git
cd CodeGraph

2. Configure environment

cp .env.example .env

Edit .env with your API keys and service configuration:

# LLM Configuration
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # Or DeepSeek, etc.

# Database & Cache
NEO4J_URI=bolt://localhost:7687
REDIS_URL=redis://localhost:6379

3. Start infrastructure services

docker-compose up -d

This starts Neo4j (graph database) and Redis (cache).

4. Start backend

pip install -e ".[dev]"
uvicorn codegraph.main:app --reload --host 0.0.0.0 --port 8000

Backend runs at http://localhost:8000. API docs at http://localhost:8000/docs.

5. Start frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:5173.


Project Structure

.
├── frontend/                  # React + Vite frontend
│   ├── src/
│   │   ├── components/        # Reusable UI components
│   │   ├── pages/             # 4 stage pages + home + learning map
│   │   ├── i18n/              # EN/ZH language dictionaries
│   │   └── assets/pixel/      # Pixel-art design assets
├── src/codegraph/             # Python backend
│   ├── agent/
│   │   ├── base.py            # BaseAgent, AgentTrace, ToolCall
│   │   ├── stages/            # OverviewAgent, MainFlowAgent, etc.
│   │   ├── tools/             # github_fetcher, code_parser, etc.
│   │   └── orchestrator.py    # Agent coordination logic
│   ├── retrieval/             # Hybrid + graph-aware retrieval
│   ├── graph/                 # Code relationship modeling
│   ├── parsers/               # Tree-sitter AST parsing
│   └── main.py                # FastAPI application
├── tests/                     # Unit and integration tests
├── docs/                      # Design docs, PRD, screenshots
└── docker-compose.yml         # Local infrastructure

Roadmap

  • Enhanced call graph accuracy for large TypeScript/Python repos
  • Multi-file pattern detection (e.g., middleware chains across files)
  • GitHub issue/PR context integration
  • Export learning maps as Markdown or PDF
  • Public backend deployment for full end-to-end hosted demo
  • Example analyses for popular repos (React, FastAPI, LangChain)

Contributing

CodeGraph is early-stage and welcomes contributions.

How to contribute:

  • Star the repo if the multi-agent approach resonates with you
  • 🐛 Open issues for bugs or repos that don't analyze well
  • 💡 Suggest improvements to agent prompts, tools, or architectures
  • 🔧 Submit PRs for new language parsers, analysis tools, or UI improvements

Good first issues:

  • Add support for Rust/Go/Java AST parsing
  • Improve call flow extraction for async/await heavy codebases
  • Add a sample analysis for a popular repo (Next.js, Vue, etc.)
  • Export agent analysis results as structured Markdown

License

Apache-2.0. See LICENSE.


If CodeGraph helps you understand one complex repo faster, please leave a star.
Stars tell me this approach is worth building further.

About

Learn from any GitHub repo systematically with AI agents guiding you through architecture, flow, patterns & takeaways

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors