Skip to content

Commit 97d051d

Browse files
committed
feat: add optional llm narrative summaries for repo onboarding
1 parent a007bc9 commit 97d051d

File tree

10 files changed

+467
-9
lines changed

10 files changed

+467
-9
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ python scripts/analyze.py analyze \
145145
--mode standard \
146146
--audience nontech \
147147
--overview-length medium \
148+
--enable-llm-descriptions true \
148149
--enable-web-enrichment true
149150
```
150151

@@ -153,6 +154,11 @@ Useful optional controls:
153154
- `--include-glob "<pattern>"` (repeatable) to scope analysis to specific paths
154155
- `--exclude-glob "<pattern>"` (repeatable) to remove generated/irrelevant files
155156

157+
For LLM-based narrative summaries:
158+
159+
- Set `CODE_EXPLAINER_LLM_API_KEY` (or `OPENAI_API_KEY`)
160+
- Optional: `CODE_EXPLAINER_LLM_BASE_URL`, `CODE_EXPLAINER_LLM_MODEL`
161+
156162
## Install From GitHub (For Other Developers)
157163

158164
Using Skills CLI:

code-explainer/SKILL.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ python scripts/analyze.py analyze \
3737
--overview-length <short|medium|long> \
3838
--include-glob <pattern> \
3939
--exclude-glob <pattern> \
40+
--enable-llm-descriptions <true|false> \
4041
--enable-web-enrichment <true|false>
4142
```
4243

@@ -45,6 +46,7 @@ Defaults:
4546
- `mode=standard`
4647
- `audience=nontech`
4748
- `overview-length=medium`
49+
- `enable-llm-descriptions=true`
4850
- `enable-web-enrichment=true`
4951

5052
## Dependencies
@@ -77,18 +79,21 @@ bash ./scripts/install_runtime.sh
7779
2. Local index build (files/modules/symbol candidates).
7880
3. Stack/entrypoint/dependency/flow extraction.
7981
4. Documentation ingestion (`coverage_report.json`).
80-
5. Optional DeepWiki + web enrichment with attribution.
81-
6. Mermaid generation (Context + Container + flow set).
82-
7. Mermaid validation.
83-
8. SVG then PNG rendering.
84-
9. Overview + deep markdown generation.
85-
10. Quality gates and confidence report generation.
82+
5. Optional LLM narrative generation (`llm_summary.json`).
83+
6. Optional DeepWiki + web enrichment with attribution.
84+
7. Mermaid generation (Context + Container + flow set).
85+
8. Mermaid validation.
86+
9. SVG then PNG rendering.
87+
10. Overview + deep markdown generation.
88+
11. Quality gates and confidence report generation.
8689

8790
## Notes
8891

8992
- For GitHub URLs, `git` must be available on PATH.
9093
- For high-fidelity diagram rendering, `mmdc` should be installed.
9194
- Without `mmdc`, fallback rendering is used and flagged in reports.
95+
- For LLM narrative summaries, set `CODE_EXPLAINER_LLM_API_KEY` (or `OPENAI_API_KEY`).
96+
- Optional: set `CODE_EXPLAINER_LLM_BASE_URL` and `CODE_EXPLAINER_LLM_MODEL`.
9297
- This skill does not mutate the analyzed target repository.
9398

9499
## Dependency Troubleshooting

code-explainer/assets/templates/deep_architecture.md.j2

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@ Architecture Pattern: **{{architecture_pattern}}**
2828

2929
{{docs_coverage}}
3030

31+
## Suggested Deep-Dive Starters
32+
33+
{{llm_deep_dive_starters}}
34+
3135
## Where To Modify for Common Changes
3236

3337
{{where_to_modify}}

code-explainer/assets/templates/overview.md.j2

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ It appears to follow a **{{architecture_pattern}}** architecture with a stack ce
2525

2626
{{building_blocks}}
2727

28+
## Directory Map (Plain Language)
29+
30+
{{directory_plain_summaries}}
31+
2832
## Documentation Coverage
2933

3034
{{docs_coverage}}
@@ -48,6 +52,10 @@ Start with these docs:
4852

4953
{{external_context_summary}}
5054

55+
## Narrative Confidence Notes
56+
57+
{{llm_confidence_notes}}
58+
5159
## Deep Dive Links
5260

5361
- [Architecture Deep Explainer](../deep/ARCHITECTURE_DEEP.md)

code-explainer/references/mode-behavior.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,18 @@ Goal: Maximum fidelity and audit-ready onboarding.
3232
- `nontech`: plain-language phrasing first, minimal jargon.
3333
- `mixed`: business-and-technical balance.
3434
- `engineering`: technical detail and traceability emphasis.
35+
- If LLM narrative is enabled and available, wording is further adapted per audience.
3536

3637
## Overview Length
3738

3839
- `short`: executive skim.
3940
- `medium`: balanced default.
4041
- `long`: expanded onboarding context and references.
42+
43+
## LLM Narrative
44+
45+
- Controlled with `--enable-llm-descriptions <true|false>`.
46+
- Reads API config from env vars:
47+
- `CODE_EXPLAINER_LLM_API_KEY` (or `OPENAI_API_KEY`)
48+
- `CODE_EXPLAINER_LLM_BASE_URL` (optional)
49+
- `CODE_EXPLAINER_LLM_MODEL` (optional)

code-explainer/references/output-contract.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,9 @@
2424
20. `meta/render_report.json`
2525
21. `meta/enrichment.json`
2626
22. `meta/coverage_report.json`
27-
23. `meta/docs_generation.json`
28-
24. `meta/quality_report.json`
27+
23. `meta/llm_summary.json`
28+
24. `meta/docs_generation.json`
29+
25. `meta/quality_report.json`
2930

3031
## Manifest Schema
3132

@@ -47,6 +48,9 @@
4748
- `exclude_globs[]`
4849
- `docs_discovered`
4950
- `docs_parsed`
51+
- `llm_descriptions_enabled`
52+
- `llm_descriptions_used`
53+
- `llm_model`
5054

5155
## Coverage Schema
5256

@@ -61,6 +65,21 @@
6165
- `parsed_docs[]` with `path`, `title`, `summary`, `headings[]`, `line_count`, `size_bytes`, `keywords[]`
6266
- `skipped_docs[]` with `path`, `reason`
6367

68+
## LLM Narrative Schema
69+
70+
`llm_summary.json` contains:
71+
72+
- `generated_at`
73+
- `enabled`
74+
- `used`
75+
- `provider`
76+
- `model`
77+
- `repo_summary_paragraph`
78+
- `directory_summaries[]` with `name`, `summary`
79+
- `deep_dive_starters[]`
80+
- `confidence_notes[]`
81+
- `error`
82+
6483
## Confidence Schema
6584

6685
`confidence_report.json` contains:

code-explainer/scripts/analyze.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
import map_dependencies
2020
import map_flows
2121
import ingest_docs
22+
import llm_describe
2223
import build_diagrams
2324
import validate_mermaid
2425
import render_diagrams
@@ -85,6 +86,7 @@ def _write_manifest(
8586
stack_payload: Dict[str, Any],
8687
entry_payload: Dict[str, Any],
8788
docs_payload: Dict[str, Any],
89+
llm_payload: Dict[str, Any],
8890
module_count: int,
8991
diagram_count: int,
9092
include_globs: List[str],
@@ -103,6 +105,9 @@ def _write_manifest(
103105
"entrypoints": entry_payload.get("entrypoints", []),
104106
"docs_discovered": docs_payload.get("discovered_count", 0),
105107
"docs_parsed": docs_payload.get("parsed_count", 0),
108+
"llm_descriptions_enabled": llm_payload.get("enabled", False),
109+
"llm_descriptions_used": llm_payload.get("used", False),
110+
"llm_model": llm_payload.get("model", ""),
106111
"module_count": module_count,
107112
"diagram_count": diagram_count,
108113
"include_globs": include_globs,
@@ -118,6 +123,7 @@ def run_pipeline(
118123
audience: str,
119124
overview_length: str,
120125
enable_web_enrichment: bool,
126+
enable_llm_descriptions: bool,
121127
include_globs: List[str] | None = None,
122128
exclude_globs: List[str] | None = None,
123129
) -> Dict[str, Any]:
@@ -145,6 +151,20 @@ def run_pipeline(
145151
flow_payload = map_flows.map_flows(stack_payload, entry_payload, dep_payload, meta_dir, mode)
146152
coverage_payload = ingest_docs.ingest_docs(repo_root, index_payload, meta_dir, mode)
147153
enrichment_payload = enrich_external.enrich_external(source, meta_dir, enable_web_enrichment)
154+
llm_payload = llm_describe.generate_llm_descriptions(
155+
repo_root=repo_root,
156+
source=source,
157+
mode=mode,
158+
audience=audience,
159+
index_payload=index_payload,
160+
stack_payload=stack_payload,
161+
entry_payload=entry_payload,
162+
dep_payload=dep_payload,
163+
flow_payload=flow_payload,
164+
docs_payload=coverage_payload,
165+
out_dir=meta_dir,
166+
enabled=enable_llm_descriptions,
167+
)
148168

149169
diagram_manifest = build_diagrams.build_diagrams(
150170
stack=stack_payload,
@@ -172,6 +192,7 @@ def run_pipeline(
172192
flow_payload=flow_payload,
173193
diagram_manifest=diagram_manifest,
174194
docs_payload=coverage_payload,
195+
llm_payload=llm_payload,
175196
enrichment_payload=enrichment_payload,
176197
)
177198
_write_confidence_and_attribution(output_root, docs_gen_payload, enrichment_payload)
@@ -185,6 +206,7 @@ def run_pipeline(
185206
stack_payload=stack_payload,
186207
entry_payload=entry_payload,
187208
docs_payload=coverage_payload,
209+
llm_payload=llm_payload,
188210
module_count=len(index_payload.get("modules", [])),
189211
diagram_count=diagram_manifest.get("count", 0),
190212
include_globs=include_globs,
@@ -202,6 +224,7 @@ def run_pipeline(
202224
"file_count": index_payload.get("file_count", 0),
203225
"docs_discovered": coverage_payload.get("discovered_count", 0),
204226
"docs_parsed": coverage_payload.get("parsed_count", 0),
227+
"llm_descriptions_used": llm_payload.get("used", False),
205228
"diagram_count": diagram_manifest.get("count", 0),
206229
"validation_ok": validation_payload.get("overall_ok", False),
207230
"renderer": render_payload.get("renderer", ""),
@@ -235,6 +258,7 @@ def _parse_args() -> argparse.Namespace:
235258
help="Glob(s) to exclude from indexing.",
236259
)
237260
parser.add_argument("--enable-web-enrichment", default="true")
261+
parser.add_argument("--enable-llm-descriptions", default="true")
238262
return parser.parse_args()
239263

240264

@@ -246,13 +270,15 @@ def main() -> int:
246270

247271
mode = common.normalize_mode(args.mode)
248272
web_enabled = common.bool_from_string(args.enable_web_enrichment)
273+
llm_enabled = common.bool_from_string(args.enable_llm_descriptions)
249274
summary = run_pipeline(
250275
source=args.source,
251276
output_root=Path(args.output).resolve(),
252277
mode=mode,
253278
audience=args.audience,
254279
overview_length=args.overview_length,
255280
enable_web_enrichment=web_enabled,
281+
enable_llm_descriptions=llm_enabled,
256282
include_globs=args.include_glob,
257283
exclude_globs=args.exclude_glob,
258284
)
@@ -266,6 +292,7 @@ def main() -> int:
266292
"file_count",
267293
"docs_discovered",
268294
"docs_parsed",
295+
"llm_descriptions_used",
269296
"diagram_count",
270297
"validation_ok",
271298
"renderer",

code-explainer/scripts/generate_docs.py

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,46 @@ def _where_to_modify(modules: List[Dict[str, Any]], limit: int) -> str:
7777
return "\n".join(suggestions)
7878

7979

80+
def _llm_directory_summaries(llm_payload: Dict[str, Any], fallback_modules: List[Dict[str, Any]], limit: int = 8) -> str:
81+
items = llm_payload.get("directory_summaries", [])
82+
lines: List[str] = []
83+
if isinstance(items, list):
84+
for item in items[:limit]:
85+
if not isinstance(item, dict):
86+
continue
87+
name = str(item.get("name", "")).strip()
88+
summary = str(item.get("summary", "")).strip()
89+
if not name or not summary:
90+
continue
91+
lines.append(f"- **{name}**: {summary}")
92+
if lines:
93+
return "\n".join(lines)
94+
95+
for module in fallback_modules[:limit]:
96+
name = module.get("name", "")
97+
if not name:
98+
continue
99+
lines.append(f"- **{name}**: Module with {module.get('file_count', 0)} files.")
100+
return "\n".join(lines) if lines else "- No directory-level summary available."
101+
102+
103+
def _llm_deep_dive_starters(llm_payload: Dict[str, Any]) -> str:
104+
starters = llm_payload.get("deep_dive_starters", [])
105+
if not isinstance(starters, list) or not starters:
106+
return "- Start from entrypoints, then trace one request through dependencies."
107+
return "\n".join([f"- {str(item)}" for item in starters[:6]])
108+
109+
110+
def _llm_confidence_notes(llm_payload: Dict[str, Any]) -> str:
111+
notes = llm_payload.get("confidence_notes", [])
112+
if not isinstance(notes, list) or not notes:
113+
if llm_payload.get("enabled", False) and not llm_payload.get("used", False):
114+
error = llm_payload.get("error", "LLM summary unavailable for this run.")
115+
return f"- {error}"
116+
return "- LLM summary disabled; deterministic analysis remains primary."
117+
return "\n".join([f"- {str(item)}" for item in notes[:6]])
118+
119+
80120
def _glossary_terms(
81121
stack_payload: Dict[str, Any],
82122
dep_payload: Dict[str, Any],
@@ -216,8 +256,13 @@ def _plain_system_summary(
216256
repo_name: str,
217257
stack_payload: Dict[str, Any],
218258
doc_payload: Dict[str, Any],
259+
llm_payload: Dict[str, Any],
219260
audience: str,
220261
) -> str:
262+
llm_summary = str(llm_payload.get("repo_summary_paragraph", "")).strip()
263+
if llm_summary:
264+
return llm_summary
265+
221266
parsed_docs = doc_payload.get("parsed_docs", [])
222267
summary_doc = _pick_summary_doc(parsed_docs)
223268
if summary_doc:
@@ -297,6 +342,7 @@ def generate_docs(
297342
flow_payload: Dict[str, Any],
298343
diagram_manifest: Dict[str, Any],
299344
docs_payload: Dict[str, Any],
345+
llm_payload: Dict[str, Any],
300346
enrichment_payload: Dict[str, Any],
301347
) -> Dict[str, Any]:
302348
overview_dir = common.ensure_dir(output_root / "overview")
@@ -335,11 +381,16 @@ def generate_docs(
335381
"audience_note": _audience_note(audience),
336382
"mode_note": _mode_note(mode),
337383
"overview_length_note": _overview_length_note(overview_length),
338-
"plain_summary": _plain_system_summary(repo_name, stack_payload, docs_payload, audience),
384+
"plain_summary": _plain_system_summary(repo_name, stack_payload, docs_payload, llm_payload, audience),
339385
"docs_coverage": _docs_coverage_line(docs_payload),
340386
"docs_quick_links": _docs_summary(docs_payload, profile["doc_link_limit"]),
341387
"primary_user_flow_summary": _primary_flow_summary(flow_payload),
342388
"external_context_summary": _external_context_summary(enrichment_payload),
389+
"directory_plain_summaries": _llm_directory_summaries(llm_payload, modules, limit=profile["module_limit"]),
390+
"llm_deep_dive_starters": _llm_deep_dive_starters(llm_payload),
391+
"llm_confidence_notes": _llm_confidence_notes(llm_payload),
392+
"llm_enabled": "true" if llm_payload.get("enabled", False) else "false",
393+
"llm_used": "true" if llm_payload.get("used", False) else "false",
343394
}
344395

345396
overview_template = common.load_template(templates_root / "overview.md.j2")
@@ -422,6 +473,17 @@ def generate_docs(
422473
),
423474
]
424475

476+
if llm_payload.get("used", False):
477+
claims.append(
478+
common.collect_claim(
479+
"claim_llm_narrative",
480+
"An LLM-generated narrative summary was incorporated for repository and directory explainers.",
481+
["meta/llm_summary.json"],
482+
0.7,
483+
"Generated from deterministic context payload + model inference.",
484+
)
485+
)
486+
425487
if enrichment_payload.get("records"):
426488
claims.append(
427489
common.collect_claim(
@@ -467,6 +529,7 @@ def main() -> int:
467529
parser.add_argument("--flows", required=True)
468530
parser.add_argument("--diagram-manifest", required=True)
469531
parser.add_argument("--coverage", required=True)
532+
parser.add_argument("--llm-summary", required=True)
470533
parser.add_argument("--enrichment", required=True)
471534
args = parser.parse_args()
472535

@@ -484,6 +547,7 @@ def main() -> int:
484547
flow_payload=common.read_json(Path(args.flows), default={}),
485548
diagram_manifest=common.read_json(Path(args.diagram_manifest), default={}),
486549
docs_payload=common.read_json(Path(args.coverage), default={}),
550+
llm_payload=common.read_json(Path(args.llm_summary), default={}),
487551
enrichment_payload=common.read_json(Path(args.enrichment), default={}),
488552
)
489553
print(json.dumps({"overview": payload["overview_file"], "deep_count": len(payload["deep_files"])}, indent=2))

0 commit comments

Comments
 (0)