|
1 | | -# DumpCode |
| 1 | +# DumpCode: The Semantic Context Engine for LLM-Native Development |
2 | 2 |
|
3 | | -**DumpCode** is a semantic codebase dumper designed to prepare code for Large Language Models (LLMs). Unlike simple concatenation tools, DumpCode treats your codebase as a structured dataset, wrapping it in XML and sandwiching it between context-aware prompt templates. |
| 3 | +DumpCode is a professional-grade codebase dumper that transforms your project into a structured, LLM-ready dataset. Unlike simple concatenation scripts, DumpCode treats your code as a semantic hierarchy, wrapping it in XML and grounding it via a Sandwich Architecture to maximize the reasoning capabilities of Large Language Models. |
4 | 4 |
|
5 | | -## Key Features |
| 5 | +## 🧠 The Philosophy: The Prompt Sandwich |
6 | 6 |
|
7 | | -* **The "Prompt Sandwich" Architecture**: Automatically structures output as: `[Role Instructions]` + `[Code Context]` + `[Specific Task]`. |
8 | | -* **Semantic XML Output**: Wraps code in `<tree>`, `<file>`, and `<dump>` tags to help LLMs distinguish between file paths, directory structures, and file contents. |
9 | | -* **Native Git Integration**: |
10 | | - * **Exclusion**: Uses `pathspec` to parse `.gitignore` files exactly as Git does (handling negations and nested rules). |
11 | | - * **Detection**: Can limit dumps to only files that are modified or untracked (`--changed`). |
12 | | -* **Smart Content Processing**: |
13 | | - * **Binary Detection**: Scans file headers (shebangs/null bytes) to skip binaries like images or compiled code. |
14 | | - * **Data Truncation**: Automatically detects large data files (`.csv`, `.jsonl`, `.log`) and truncates them to the first 5 lines to preserve context window. |
15 | | - * **Encoding Heuristics**: Robustly handles UTF-8, UTF-8-SIG (BOM), Latin-1, and CP1252. |
16 | | -* **Dynamic Profiles**: Switch between different LLM personas (Architect, Technical Writer, QA) via CLI flags. |
17 | | -* **Meta-Configuration**: A self-modifying mode that generates prompts to help you update the DumpCode configuration itself. |
18 | | -* **OSC 52 Clipboard**: Pushes the generated dump directly to your local system clipboard, even over SSH. |
| 7 | +Large Language Models (LLMs) perform best when instructions are clearly separated from data. DumpCode enforces a "Sandwich Architecture" that structures every output into three logical layers to prevent context drifting and hallucinations: |
19 | 8 |
|
20 | | ---- |
| 9 | +**The Instructions (`<instructions>`):** Sets the persona (e.g., Senior Architect) and the architectural rules before the model sees a single line of code. |
21 | 10 |
|
22 | | -## The "Sandwich" Architecture & Templating |
23 | | - |
24 | | -DumpCode does not use a traditional templating engine. Instead, the `DumpEngine` constructs a specific, logical flow designed to prime an LLM effectively. |
25 | | - |
26 | | -When you run `dumpcode --profile-name`, the engine assembles the output in three distinct layers: |
27 | | - |
28 | | -### 1. The Top Bun (Instructions) |
29 | | -**Source:** `profile["pre"]` in `.dump_config.json` |
30 | | -**Tag:** `<instructions>` |
31 | | -This section sets the persona and rules for the LLM *before* it sees any code. This prevents the model from hallucinating or answering before reading the context. |
32 | | - |
33 | | -### 2. The Meat (The Dump) |
34 | | -**Source:** The actual file system scan. |
35 | | -**Tag:** `<dump>` |
36 | | -This contains the directory tree and the file contents. |
37 | | - |
38 | | -### 3. The Bottom Bun (The Task) |
39 | | -**Source:** `profile["post"]` OR CLI argument `-q "Question"` |
40 | | -**Tag:** `<task>` |
41 | | -This is the trigger. After processing the context, what should the LLM *do*? |
42 | | -*Note: If you provide a question via `-q`, it overrides the profile's default post-prompt.* |
43 | | - |
44 | | -### Example Output |
45 | | -```xml |
46 | | -<instructions> |
47 | | -Act as a Senior Technical Writer... |
48 | | -</instructions> |
49 | | - |
50 | | -<dump version="4"> |
51 | | - <tree> |
52 | | - Project Root: /home/dev/project |
53 | | - project/ |
54 | | - ├── src/ |
55 | | - │ └── main.py |
56 | | - └── pyproject.toml |
57 | | - </tree> |
58 | | - <files> |
59 | | - <file path="src/main.py"> |
60 | | -def main(): print("Hello") |
61 | | - </file> |
62 | | - </files> |
63 | | -</dump> |
64 | | - |
65 | | -<task> |
66 | | -Output the result in raw Markdown... |
67 | | -</task> |
68 | | -``` |
| 11 | +**The Context (`<dump>`):** A semantic XML representation of your project, including a visual directory tree, file contents, and execution diagnostics (linter/test outputs). |
69 | 12 |
|
70 | | ---- |
| 13 | +**The Task (`<task>`):** The specific trigger or question. By placing the "Ask" at the very end, we ensure the LLM has parsed the entire context before attempting a response. |
71 | 14 |
|
72 | | -## Configuration & Profiles |
| 15 | +--- |
73 | 16 |
|
74 | | -DumpCode relies on a `.dump_config.json` file in your project root. Run `dumpcode --init` to create it interactively. |
| 17 | +## 🤖 Included Profiles: Your Virtual Engineering Team |
75 | 18 |
|
76 | | -### Exclusion Logic (How it works) |
77 | | -DumpCode employs a **Union Strategy** for exclusions. A file is skipped if it matches ANY of the following: |
| 19 | +DumpCode comes with a suite of pre-configured "AI Agents" defined in `.dump_config.json`. Each profile changes the "Buns" of the sandwich to change the LLM's persona and goals. |
78 | 20 |
|
79 | | -1. **Hardcoded Safety**: The system prevents scanning the config file itself or the output file. |
80 | | -2. **Config Patterns**: Matches found in the `ignore_patterns` JSON list. |
81 | | -3. **Gitignore**: The engine looks for a `.gitignore` file in the root and parses it using `pathspec` (native Git logic). |
| 21 | +| Profile Flag | Role | Primary Function | |
| 22 | +| :--- | :--- | :--- | |
| 23 | +| `--readme` | Technical Writer | Generates professional, architect-level documentation. | |
| 24 | +| `--architect` | System Designer | Analyzes code to generate a master `PLAN.md` specification. | |
| 25 | +| `--plan-next` | Project Manager | Syncs current code state with `PLAN.md` and defines the next task. | |
| 26 | +| `--cleanup` | Code Reviewer | Runs `ruff` and `mypy`, then asks the LLM to fix the reported errors. | |
| 27 | +| `--test-fixer` | QA Engineer | Runs `pytest`, ingests failures, and plans specific code repairs. | |
| 28 | +| `--refactor` | Senior Dev | Identifies SOLID violations and structural "code smells." | |
| 29 | +| `--optimize` | Perf Engineer | Locates algorithmic inefficiencies and I/O bottlenecks. | |
| 30 | +| `--coverage` | SDET | Runs coverage reports and identifies untested critical logic paths. | |
82 | 31 |
|
83 | 32 | --- |
84 | 33 |
|
85 | | -## Meta-Configuration (Profile Creator) |
86 | | - |
87 | | -Creating, changing, or adding rules to complex JSON profiles manually is tedious. DumpCode includes a **Meta Mode** (`--change-profile`) to automate this. |
88 | 34 |
|
89 | | -**Scenario:** You want to add a profile for security auditing. |
| 35 | +## 🔄 The Workflow: Spec-Driven Iteration |
90 | 36 |
|
91 | | -**Command:** |
92 | | -```bash |
93 | | -dumpcode --change-profile "Add a 'security' profile that looks for vulnerabilities and hardcoded secrets." |
94 | | -``` |
95 | | - |
96 | | -**Result:** |
97 | | -DumpCode generates a prompt containing your current config and your instruction, then copies it to the clipboard. You paste this into an LLM, and it returns the valid JSON update for your config. |
98 | | - |
99 | | ---- |
| 37 | +DumpCode is designed to facilitate a "Dump → Discuss → Plan → Implement" loop, keeping your project's `PLAN.md` as the single source of truth. |
100 | 38 |
|
101 | | -## Project Management Workflow (`PLAN.md`) |
| 39 | +### 1. The Blueprinting Phase |
| 40 | +Generate a comprehensive project roadmap by dumping your current state with the architect persona: |
102 | 41 |
|
103 | | -DumpCode includes a specific workflow for maintaining a `PLAN.md` file using the `--new-plan` argument. This allows you to pipe LLM output directly back into your project roadmap. |
104 | | - |
105 | | -**1. Generate the Plan:** |
106 | 42 | ```bash |
107 | | -# Dump code with the architect profile to your clipboard |
108 | | -dumpcode --architect -q |
| 43 | +dumpcode --architect -q "Create a master specification for a new plugin system." |
109 | 44 | ``` |
110 | 45 |
|
111 | | -**2. Update the Plan:** |
112 | | -Paste the dump into your LLM. Discuss with the LLM and produce a new Markdown plan. Copy that response. |
| 46 | +### 2. The Plan Sync (`--new-plan`) |
| 47 | +Once the LLM provides a roadmap, pipe it directly back into your repository. Use the `-` argument for a safe, interactive "Paste Mode": |
113 | 48 |
|
114 | | -**3. Save the Plan:** |
115 | | -Paste the content directly into `PLAN.md` using the paste mode: |
116 | 49 | ```bash |
117 | | -# Opens stdin, paste your content, then hit Ctrl+D |
| 50 | +# This opens a buffer; paste the LLM's Markdown and hit Ctrl+D |
118 | 51 | dumpcode --new-plan - |
119 | 52 | ``` |
120 | 53 |
|
121 | | -Or update from a file: |
| 54 | +### 3. Focused Implementation (`--changed`) |
| 55 | +Don't waste tokens. Once you start coding, dump only the files you've modified in Git to provide the LLM with the specific "delta" context it needs: |
| 56 | + |
122 | 57 | ```bash |
123 | | -dumpcode --new-plan /path/to/new_plan.md |
| 58 | +dumpcode --changed --plan-next |
124 | 59 | ``` |
125 | 60 |
|
126 | 61 | --- |
127 | 62 |
|
128 | | -## Installation |
| 63 | +## 🛠 Technical Feature Highlights |
| 64 | + |
| 65 | +**Smart Content Handling:** |
| 66 | +- **Truncation:** High-volume files (`.csv`, `.jsonl`, `.log`) are automatically truncated (e.g., first 5-10 lines) to prevent context window saturation. |
| 67 | +- **Binary Detection:** Heuristic scanning (null-byte detection and extension checking) skips compiled objects, images, and non-text assets. |
| 68 | +- **Encoding Resilience:** Heuristic detection of UTF-8, UTF-16, and Latin-1. |
| 69 | + |
| 70 | +**Environment Awareness:** |
| 71 | +- **OSC52 Clipboard:** Pushes the dump directly to your local clipboard via terminal escape sequences. This works flawlessly over SSH, inside Docker, or in remote dev containers. |
| 72 | +- **Git-Native Logic:** Leverages `pathspec` to respect `.gitignore` rules exactly as Git does, including complex negations and nested patterns. |
| 73 | +- **Diagnostic Integration:** The `cleanup` and `test-fixer` profiles execute shell commands (linters/test suites) and wrap the results in `<execution>` tags so the LLM can "see" the errors. |
| 74 | +- **Meta-Configuration:** Use `--change-profile` to generate a prompt that helps you rewrite the tool's own `.dump_config.json` file. |
| 75 | +- **Comprehensive Testing:** Maintains 95%+ test coverage with robust CI/CD pipeline. |
| 76 | + |
| 77 | +## ⚙️ Configuration & Installation |
129 | 78 |
|
130 | | -### From Source |
| 79 | +### Requirements |
| 80 | +- **Python 3.9+** |
| 81 | +- `pathspec` (Included) |
| 82 | +- `tiktoken` (Optional, for precise OpenAI token counting) |
| 83 | + |
| 84 | +### Installation |
131 | 85 | ```bash |
132 | | -git clone https://github.com/FloLey/dumpcode.git |
133 | | -cd dumpcode |
134 | 86 | pip install . |
| 87 | +# Or with dev/token tools: |
| 88 | +pip install ".[token-counting,dev]" |
135 | 89 | ``` |
136 | 90 |
|
137 | | -### Requirements |
138 | | -* **Python 3.9+** |
139 | | -* `pathspec`: For gitignore parsing. |
140 | | -* `tiktoken` (Optional): For precise OpenAI token counting. (Install via `pip install .[token-counting]`) |
141 | | - |
142 | | -### Development |
143 | | -For development and testing, install with dev dependencies: |
| 91 | +### Setup |
| 92 | +Initialize your project-specific configuration: |
144 | 93 | ```bash |
145 | | -pip install -e ".[token-counting,dev]" |
| 94 | +dumpcode --init |
146 | 95 | ``` |
147 | 96 |
|
148 | 97 | --- |
@@ -181,27 +130,20 @@ dumpcode --structure-only |
181 | 130 |
|
182 | 131 | --- |
183 | 132 |
|
184 | | -## CLI Options |
185 | | - |
186 | | -| Flag | Function | |
187 | | -| :--- | :--- | |
188 | | -| **Scanning** | | |
189 | | -| `startpath` | The root directory to scan (default: current dir). | |
190 | | -| `-L`, `--level` | Max recursion depth for the directory tree. | |
191 | | -| `--changed` | Only include files modified/untracked in Git. | |
192 | | -| `-d`, `--dir-only` | Scan directories only (no files). | |
193 | | -| `--structure-only` | Show the tree, but omit file contents. | |
194 | | -| `--ignore-errors` | specific encoding errors are skipped (files logged as skipped). | |
195 | | -| **Output** | | |
196 | | -| `-o`, `--output-file` | Target file (default: `codebase_dump.txt`). | |
197 | | -| `--no-copy` | Disable OSC 52 clipboard copying. | |
198 | | -| `--no-xml` | Use plain text delimiters instead of XML tags. | |
199 | | -| `--reset-version` | Reset the config version counter to 1. | |
200 | | -| **Meta / Profiles** | | |
201 | | -| `--init` | Interactive wizard to generate `.dump_config.json`. | |
202 | | -| `--new-plan [file\|-]`| Update `PLAN.md` from a file or stdin (`-`). | |
203 | | -| `--change-profile` | Generate a prompt to modify the config file. | |
204 | | -| `-q`, `--question` | Override the profile's `post` instruction. | |
| 133 | +## ⌨️ CLI Reference |
| 134 | + |
| 135 | +| Flag | Category | Description | |
| 136 | +| :--- | :--- | :--- | |
| 137 | +| `startpath` | Scanning | Root directory to scan (default: `.`). | |
| 138 | +| `-L`, `--level` | Scanning | Max recursion depth for the directory tree. | |
| 139 | +| `--changed` | Scanning | Only include files modified/untracked in Git. | |
| 140 | +| `--structure-only` | Scanning | Output the visual tree, but omit file contents. | |
| 141 | +| `-o [file]` | Output | Target output file (default: `codebase_dump.txt`). | |
| 142 | +| `--no-copy` | Output | Disable the automatic OSC52 clipboard copy. | |
| 143 | +| `--no-xml` | Output | Use plain text delimiters instead of semantic XML. | |
| 144 | +| `-q [query]` | Meta | Override a profile's task with a specific question. | |
| 145 | +| `--new-plan [file\|-]` | Meta | Update `PLAN.md` from a file or stdin. | |
| 146 | +| `--change-profile` | Meta | Generate a prompt to modify your `.dump_config.json`. | |
205 | 147 |
|
206 | 148 | ## How DumpCode Was Built (Using DumpCode) |
207 | 149 |
|
|
0 commit comments