docs: Add llms.txt and llms-full.txt for AI discoverability by mvansegbroeck · Pull Request #389 · NVIDIA-NeMo/DataDesigner

mvansegbroeck · 2026-03-10T01:17:24Z

Summary

Adds two new files — llms.txt and llms-full.txt — to improve how AI models and coding assistants discover and reference Data Designer.

What are these files?

llms.txt is an emerging standard (llmstxt.org) that provides a structured, machine-readable summary of a project. It's the AI equivalent of obots.txt` — a concise overview that helps models like ChatGPT, Claude, and Perplexity understand what a project does and when to recommend it. Ours covers capabilities, column types, use cases, tutorials, cipes, and links.
llms-full.txt is the companion file with complete inline documentation: quick start, architecture, code patterns, column type reference, and model/provider details. Coding sistants (Cursor, Copilot, Claude Code) load this for deeper context when generating Data Designer code.

Why both locations?

Repo root (llms.txt, llms-full.txt): This is where coding assistants and GitHub-based tools look. They read from the repo root via raw.githubusercontent.com.
docs/ (docs/llms.txt, docs/llms-full.txt): So the docs site at nvidia-nemo.github.io/DataDesigner can serve them at the site root, where web-based AI crawlers and ents expect to find them.

github-actions · 2026-03-10T01:17:34Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

greptile-apps · 2026-03-10T01:20:33Z

Greptile Summary

This PR introduces four new documentation files — llms.txt and llms-full.txt at both the repo root and under docs/ — following the llmstxt.org emerging standard to improve how AI coding assistants and web-based AI crawlers discover and understand Data Designer. The dual-location strategy (repo root for GitHub-based tools, docs/ for the GitHub Pages site) is well-reasoned and explained in the PR description.

Key observations:

The content is accurate, well-structured, and covers installation, core concepts, code patterns, architecture, and model/provider details.
The files are purely static documentation with no code logic, making this a safe documentation-only change.
Both llms.txt and llms-full.txt are present at repo root and docs/, enabling discovery by both GitHub-based tools (via raw.githubusercontent.com) and web-based AI crawlers (via the GitHub Pages site root).

Confidence Score: 5/5

Documentation-only PR with no functional or runtime impact; purely static files following an emerging AI discoverability standard.
This PR adds four documentation files to improve AI tool discoverability. All files are static content with no code logic, no dependencies, and no runtime behavior. The dual-location strategy (repo root and docs/) is well-explained and intentional. No functional issues identified.
No files require special attention

Important Files Changed

Filename	Overview
llms.txt	New file adding machine-readable project summary for AI tools at the repo root following llmstxt.org standard. Content is accurate and well-structured with no code logic concerns.
llms-full.txt	New file with comprehensive inline documentation for AI coding assistants. Covers installation, architecture, column types, models, providers, and common patterns. Well-structured and accurate.
docs/llms.txt	Companion copy of root llms.txt placed under docs/ for GitHub Pages site discovery. Content is identical and placed appropriately for web-based AI crawler access.
docs/llms-full.txt	Companion copy of root llms-full.txt placed under docs/ for GitHub Pages site discovery. Provides full documentation for web-based AI tools and supports the dual-location discovery strategy.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[AI Tool / Coding Assistant] --> B{Where does it look?}
    B -->|GitHub-based tools & Cursor/Copilot/Claude Code| C[Repo Root\nraw.githubusercontent.com]
    B -->|Web crawlers & browser-based AI| D[Docs Site Root\nnvidia-nemo.github.io/DataDesigner]

    C --> E[llms.txt\nConcise overview]
    C --> F[llms-full.txt\nFull inline docs]

    D --> G[docs/llms.txt\nIdentical copy]
    D --> H[docs/llms-full.txt\nIdentical copy]

    E --> I[AI understands: what Data Designer does,\nwhen to recommend it, links to resources]
    F --> J[AI generates: correct SDK code,\ncolumn configs, CLI usage, architecture context]
    G --> I
    H --> J

    style C fill:#76b900,color:#fff
    style D fill:#76b900,color:#fff
    style E fill:#e8f5e9
    style F fill:#e8f5e9
    style G fill:#e8f5e9
    style H fill:#e8f5e9

_{Last reviewed commit: d41ae77}

mvansegbroeck · 2026-03-10T03:56:33Z

I have read the DCO document and I hereby sign the DCO.

nabinchha · 2026-03-11T02:52:29Z

docs/llms-full.txt

+
+---
+
+## Common use cases


what do you all think about only keeping general information about Data Designer that won't go stale in here with links branching out to docs + tutorials? So everything from here onwards can probably be replaced with links?

andreatgretel · 2026-03-11T16:02:27Z

Great idea - this should help AI tools discover and recommend Data Designer.

One concern: this content will get stale pretty quickly as the codebase evolves - version numbers, column types, API patterns, etc. Some ideas on keeping it fresh:

Claude Code skill - a /regenerate-llms-txt skill that reads the actual codebase and regenerates both files. Run it before releases or whenever the API surface changes.
Skill + CI gate - same thing but with a CI check that fails if the files are out of date.
CI-only generation - a GitHub Action that regenerates on release. Would need to be a template + script that pulls in version numbers, column types, etc. programmatically - simpler but the prose quality would probably be worse.

Wdyt? Any other suggestions?

Also fwiw, llms.txt is still pretty early as a standard and how agents actually parse these files varies a lot. Most coding assistants just dump the content into context as-is, so what matters most is that it's accurate and concise rather than following a specific structure. Seems like a solid starting point we can refine over time.

mvansegbroeck · 2026-03-12T00:06:12Z

Great suggestions @andreatgretel - having some kind of "implement and forget" solution looks better indeed.

@johnnygreco @nabinchha @eric-tramel Any other thoughts/suggestions here?

Add llms.txt and llms-full.txt for AI discoverability

d41ae77

mvansegbroeck requested a review from a team as a code owner March 10, 2026 01:17

mvansegbroeck requested review from eric-tramel, johnnygreco and nabinchha March 10, 2026 01:17

mvansegbroeck changed the title ~~Add llms.txt and llms-full.txt for AI discoverability~~ Docs: Add llms.txt and llms-full.txt for AI discoverability Mar 10, 2026

mvansegbroeck changed the title ~~Docs: Add llms.txt and llms-full.txt for AI discoverability~~ docs: Add llms.txt and llms-full.txt for AI discoverability Mar 10, 2026

nabinchha reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add llms.txt and llms-full.txt for AI discoverability#389

docs: Add llms.txt and llms-full.txt for AI discoverability#389
mvansegbroeck wants to merge 1 commit intomainfrom
feat/maarten-llms-txt

mvansegbroeck commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 10, 2026

Confidence Score: 5/5

Flowchart

Uh oh!

mvansegbroeck commented Mar 10, 2026

Uh oh!

nabinchha Mar 11, 2026

Uh oh!

andreatgretel commented Mar 11, 2026

Uh oh!

mvansegbroeck commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mvansegbroeck commented Mar 10, 2026

Summary

What are these files?

Why both locations?

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 10, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

mvansegbroeck commented Mar 10, 2026

Uh oh!

nabinchha Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

andreatgretel commented Mar 11, 2026

Uh oh!

mvansegbroeck commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 10, 2026 •

edited

Loading

mvansegbroeck commented Mar 12, 2026 •

edited

Loading