Skip to content

Add llms.txt generation for AI-friendly article extraction#486

Merged
MaxGhenis merged 4 commits intomainfrom
feature/llms-txt
Feb 12, 2026
Merged

Add llms.txt generation for AI-friendly article extraction#486
MaxGhenis merged 4 commits intomainfrom
feature/llms-txt

Conversation

@MaxGhenis
Copy link
Contributor

Summary

Implements the llms.txt standard to make PolicyEngine research articles more token-efficient for AI consumption. This follows the pattern used by Bun, Svelte, and other projects.

Changes

  • Add scripts/generate-llms-txt.ts that generates:

    • /llms.txt - Index with links to sections (~3KB)
    • /llms-full.txt - All articles combined (~2.5MB, down from ~10MB+ with raw Plotly JSON)
    • /llms-research-us.txt - US articles only (~1.3MB)
    • /llms-research-uk.txt - UK articles only (~1.1MB)
  • Replace verbose Plotly JSON charts with text summaries using figure captions

  • Transform iframes to [Interactive: description] placeholders

  • Support optional ai_summary field in posts.json for custom summaries

  • Integrate into build process (runs before vite build)

  • Generated files are gitignored (built on deploy)

Token savings

Before After Reduction
~40-100 lines per Plotly chart 1 line summary ~95%
Full iframe HTML [Interactive: ...] ~90%

Workflow for PRs adding new articles

  1. No extra work required - the build script auto-generates llms.txt files
  2. Optional: Add ai_summary field to your post in posts.json for a custom AI-friendly summary
  3. For charts: Use descriptive captions (e.g., **Figure 1: Winners by income decile**) - the caption becomes the chart summary

Test plan

  • Run npm run generate-llms-txt - generates all 4 files
  • Verify chart JSON replaced with [Chart: caption]
  • Verify iframes replaced with [Interactive: ...]
  • Run npm run build - llms.txt generated before vite build
  • CI passes

Closes #485

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Nov 29, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-app-v2 Ready Ready Preview, Comment Feb 12, 2026 2:54pm
policyengine-calculator Ready Ready Preview, Comment Feb 12, 2026 2:54pm

Request Review

MaxGhenis and others added 3 commits February 12, 2026 09:32
Implements the llms.txt standard (https://llmstxt.org/) to make
PolicyEngine research articles more token-efficient for AI consumption.

- Add scripts/generate-llms-txt.ts that generates:
  - /llms.txt - Index with links to sections
  - /llms-full.txt - All articles combined
  - /llms-research-us.txt - US articles only
  - /llms-research-uk.txt - UK articles only

- Replace verbose Plotly JSON charts with text summaries using figure captions
- Transform iframes to [Interactive: description] placeholders
- Support optional ai_summary field in posts.json for custom summaries
- Integrate into build process (runs before vite build)
- Add tsx as dev dependency for running TypeScript scripts

Closes #485

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Build scripts appropriately use console.log for progress output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
… test

- Explicitly sort posts by date descending before slicing for Recent Research
- Add llms-recent.txt (last 50 articles) as a lighter alternative to the 2.6MB full archive
- Remove generated llms*.txt files from git tracking (already in .gitignore)
- Add smoke test verifying all 5 output files, expected sections, and sort order

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract slugFromFilename helper to deduplicate regex
- Replace nested ternary with if/else for header selection
- Simplify single-expression arrow callback
- Consolidate stats printing into data-driven loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit 9f2bda0 into main Feb 12, 2026
8 checks passed
@MaxGhenis MaxGhenis deleted the feature/llms-txt branch February 12, 2026 14:59
@github-actions
Copy link

🎉 This PR is included in version 0.3.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@policyengine
Copy link

policyengine bot commented Feb 17, 2026

Sorry @github-actions[bot], only members of the PolicyEngine/core-developers team can invoke Claude Code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add llms.txt for AI-friendly article extraction

1 participant

Comments