docs: Add llms.txt and llms-full.txt for AI discoverability#389
docs: Add llms.txt and llms-full.txt for AI discoverability#389mvansegbroeck wants to merge 1 commit intomainfrom
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
Greptile SummaryThis PR introduces four new documentation files — Key observations:
|
| Filename | Overview |
|---|---|
| llms.txt | New file adding machine-readable project summary for AI tools at the repo root following llmstxt.org standard. Content is accurate and well-structured with no code logic concerns. |
| llms-full.txt | New file with comprehensive inline documentation for AI coding assistants. Covers installation, architecture, column types, models, providers, and common patterns. Well-structured and accurate. |
| docs/llms.txt | Companion copy of root llms.txt placed under docs/ for GitHub Pages site discovery. Content is identical and placed appropriately for web-based AI crawler access. |
| docs/llms-full.txt | Companion copy of root llms-full.txt placed under docs/ for GitHub Pages site discovery. Provides full documentation for web-based AI tools and supports the dual-location discovery strategy. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[AI Tool / Coding Assistant] --> B{Where does it look?}
B -->|GitHub-based tools & Cursor/Copilot/Claude Code| C[Repo Root\nraw.githubusercontent.com]
B -->|Web crawlers & browser-based AI| D[Docs Site Root\nnvidia-nemo.github.io/DataDesigner]
C --> E[llms.txt\nConcise overview]
C --> F[llms-full.txt\nFull inline docs]
D --> G[docs/llms.txt\nIdentical copy]
D --> H[docs/llms-full.txt\nIdentical copy]
E --> I[AI understands: what Data Designer does,\nwhen to recommend it, links to resources]
F --> J[AI generates: correct SDK code,\ncolumn configs, CLI usage, architecture context]
G --> I
H --> J
style C fill:#76b900,color:#fff
style D fill:#76b900,color:#fff
style E fill:#e8f5e9
style F fill:#e8f5e9
style G fill:#e8f5e9
style H fill:#e8f5e9
Last reviewed commit: d41ae77
|
I have read the DCO document and I hereby sign the DCO. |
|
|
||
| --- | ||
|
|
||
| ## Common use cases |
There was a problem hiding this comment.
what do you all think about only keeping general information about Data Designer that won't go stale in here with links branching out to docs + tutorials? So everything from here onwards can probably be replaced with links?
|
Great idea - this should help AI tools discover and recommend Data Designer. One concern: this content will get stale pretty quickly as the codebase evolves - version numbers, column types, API patterns, etc. Some ideas on keeping it fresh:
Wdyt? Any other suggestions? Also fwiw, |
|
Great suggestions @andreatgretel - having some kind of "implement and forget" solution looks better indeed. @johnnygreco @nabinchha @eric-tramel Any other thoughts/suggestions here? |
Summary
Adds two new files —
llms.txtandllms-full.txt— to improve how AI models and coding assistants discover and reference Data Designer.What are these files?
llms.txtis an emerging standard (llmstxt.org) that provides a structured, machine-readable summary of a project. It's the AI equivalent of obots.txt` — a concise overview that helps models like ChatGPT, Claude, and Perplexity understand what a project does and when to recommend it. Ours covers capabilities, column types, use cases, tutorials, cipes, and links.llms-full.txtis the companion file with complete inline documentation: quick start, architecture, code patterns, column type reference, and model/provider details. Coding sistants (Cursor, Copilot, Claude Code) load this for deeper context when generating Data Designer code.Why both locations?
llms.txt,llms-full.txt): This is where coding assistants and GitHub-based tools look. They read from the repo root via raw.githubusercontent.com.docs/(docs/llms.txt,docs/llms-full.txt): So the docs site at nvidia-nemo.github.io/DataDesigner can serve them at the site root, where web-based AI crawlers and ents expect to find them.