Skip to content

feat: replace Algolia search with self-contained llms.txt search#28

Open
mattpodwysocki wants to merge 8 commits intomainfrom
feat/llms-txt-search
Open

feat: replace Algolia search with self-contained llms.txt search#28
mattpodwysocki wants to merge 8 commits intomainfrom
feat/llms-txt-search

Conversation

@mattpodwysocki
Copy link
Copy Markdown
Contributor

@mattpodwysocki mattpodwysocki commented Apr 14, 2026

Summary

search_mapbox_docs_tool no longer depends on the Algolia third-party service. The hosted server at mcp-docs.mapbox.com shares a single Algolia free-tier quota across every user, making it prone to throttling as adoption grows. This replaces it with a self-contained keyword search over the llms.txt index files that now exist at every product level on docs.mapbox.com (following the docs restructure in subdomain-docs-root#576).

How it works

On first search, 12 product llms.txt files are fetched in parallel (~220KB total). Each file contains a structured list of documentation pages with titles, URLs, and one-line descriptions. These are cached via the existing docCache for the 1-hour TTL. All subsequent searches are pure in-memory keyword matching — zero network calls after warm-up.

Products indexed (~220KB total, all cached):

Product File Size
API Reference api/llms.txt 5.7KB
Mapbox GL JS mapbox-gl-js/llms.txt 34KB
Help Center help/llms.txt 39KB
Style Specification style-spec/llms.txt 4.2KB
Studio Manual studio-manual/llms.txt 2.5KB
Mapbox Search JS mapbox-search-js/llms.txt 9.3KB
Maps SDK for iOS ios/maps/llms.txt 21KB
Maps SDK for Android android/maps/llms.txt 31KB
Navigation SDK for iOS ios/navigation/llms.txt 16KB
Navigation SDK for Android android/navigation/llms.txt 42KB
Mapbox Tiling Service mapbox-tiling-service/llms.txt 6.8KB
Tilesets data/tilesets/llms.txt 7.3KB

Scoring: Title matches rank 3×, description and URL path matches 1× each. Results are deduplicated by URL across sources (same page can appear in multiple product indexes) and capped at limit.

Reliability: Failed sources are silently skipped — if any single llms.txt is unreachable, the remaining 11 still return results. No single point of failure.

Trade-offs vs Algolia

Algolia llms.txt search
Relevance ranking Full-text, weighted Keyword frequency
Coverage All indexed pages 12 curated products
Third-party dependency Yes (quota risk) None
Cold start ~100ms ~500ms (12 parallel fetches)
Warm queries ~100ms <1ms (in-memory)
Rate limiting Yes None

For typical developer queries ("directions API parameters", "add marker GL JS", "style spec fill color") keyword matching against structured titles and descriptions is effective. The output format is the same shape — numbered list with title, URL, and description — so get_document_tool chaining works identically.

Also included

fetchCachedText(url, httpRequest) — new helper in docFetcher.ts for fetching and caching plain-text URLs. Shared between docsSearchIndex.ts and (in PR #27) the resource layer. Fixes a subtle bug where empty-string cache entries weren't recognized as hits due to a falsy check on '' — now uses !== null.

Test plan

  • npm test — 68 tests pass (including 8 new SearchDocsTool tests, 7 new docsSearchIndex tests)
  • CI green
  • Manual: search "directions" → returns Directions API page
  • Manual: search "add marker" → returns GL JS markers guide
  • Manual: search "style fill color" → returns Style Spec fill-color reference

"Search Mapbox docs for isochrone"

Screenshot 2026-04-14 at 14 29 48 Screenshot 2026-04-14 at 14 29 55

🤖 Generated with Claude Code

mattpodwysocki and others added 4 commits April 1, 2026 14:16
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs.mapbox.com restructured so that llms.txt now exists at every product
level (e.g. /api/llms.txt, /help/llms.txt, /mapbox-gl-js/llms.txt) and
llms-full.txt contains full page content. The root llms.txt is now a pure
link index, so resources that filtered it by category keyword were returning
empty content.

- resource://mapbox-api-reference → docs.mapbox.com/api/llms.txt
  (structured index of all REST APIs by service category)
- resource://mapbox-guides → docs.mapbox.com/help/llms.txt
  (39KB Help Center index with actual guide content)
- resource://mapbox-sdk-docs → docs.mapbox.com/mapbox-gl-js/llms.txt
  (34KB GL JS documentation index with guides, API ref, examples)
- resource://mapbox-reference → root llms.txt without filtering
  (complete product catalog for discovering available docs)
- resource://mapbox-examples → continues filtering root for
  playground/demo sections

Add fetchCachedText() shared helper to docFetcher to consolidate the
repeated fetch+cache pattern across all resource implementations.

Fix toMarkdownUrl() to return null for URLs already ending in .txt/.md/.json
so get_document_tool fetches llms.txt files directly without a wasted
.md rewrite attempt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
search_mapbox_docs_tool no longer depends on the Algolia third-party
service. The hosted server shares a single Algolia free-tier quota across
all users, making it prone to throttling as usage grows.

The new implementation fetches 12 product llms.txt files in parallel
on first search (~220KB total), caches them via docCache (1-hour TTL),
and does in-memory keyword matching on subsequent calls — no network
calls after warm-up, no quota, no third-party dependency.

Products indexed: API Reference, GL JS, Help Center, Style Spec, Studio
Manual, Search JS, Maps SDK iOS/Android, Navigation SDK iOS/Android,
Mapbox Tiling Service, Tilesets.

Scoring: title matches (3x) > description matches (1x) = URL path (1x).
Results are deduplicated by URL across sources and capped at limit.
Failed sources are skipped gracefully.

Also adds fetchCachedText() to docFetcher.ts as a shared helper for
fetching and caching plain-text URLs (used by both the search index
and the resource layer). Fixes a subtle bug where empty-string cache
entries were not recognized as hits due to falsy check on ''.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mattpodwysocki mattpodwysocki requested a review from a team as a code owner April 14, 2026 17:46
mattpodwysocki and others added 4 commits April 14, 2026 13:52
cspell flagged capitalized 'Isochrone' as an unknown word in two
resource description strings. Lowercased to fix CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cspell doesn't know 'isochrone' by default. Adding both cases to the
project word list so the CI spellcheck passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Brings in sublevel llms.txt resources, docs-index-tool, and spellcheck fixes
- Resolves conflict in fetchCachedText: keep !== null cache check and
  response.text() from HEAD; remove Accept header that caused CDN 403

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
.claude/worktrees/exciting-sutherland was accidentally staged during the
merge commit. Removed and added .claude/ to .gitignore to prevent recurrence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant