feat: replace Algolia search with self-contained llms.txt search#28
Open
mattpodwysocki wants to merge 8 commits intomainfrom
Open
feat: replace Algolia search with self-contained llms.txt search#28mattpodwysocki wants to merge 8 commits intomainfrom
mattpodwysocki wants to merge 8 commits intomainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs.mapbox.com restructured so that llms.txt now exists at every product level (e.g. /api/llms.txt, /help/llms.txt, /mapbox-gl-js/llms.txt) and llms-full.txt contains full page content. The root llms.txt is now a pure link index, so resources that filtered it by category keyword were returning empty content. - resource://mapbox-api-reference → docs.mapbox.com/api/llms.txt (structured index of all REST APIs by service category) - resource://mapbox-guides → docs.mapbox.com/help/llms.txt (39KB Help Center index with actual guide content) - resource://mapbox-sdk-docs → docs.mapbox.com/mapbox-gl-js/llms.txt (34KB GL JS documentation index with guides, API ref, examples) - resource://mapbox-reference → root llms.txt without filtering (complete product catalog for discovering available docs) - resource://mapbox-examples → continues filtering root for playground/demo sections Add fetchCachedText() shared helper to docFetcher to consolidate the repeated fetch+cache pattern across all resource implementations. Fix toMarkdownUrl() to return null for URLs already ending in .txt/.md/.json so get_document_tool fetches llms.txt files directly without a wasted .md rewrite attempt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
search_mapbox_docs_tool no longer depends on the Algolia third-party service. The hosted server shares a single Algolia free-tier quota across all users, making it prone to throttling as usage grows. The new implementation fetches 12 product llms.txt files in parallel on first search (~220KB total), caches them via docCache (1-hour TTL), and does in-memory keyword matching on subsequent calls — no network calls after warm-up, no quota, no third-party dependency. Products indexed: API Reference, GL JS, Help Center, Style Spec, Studio Manual, Search JS, Maps SDK iOS/Android, Navigation SDK iOS/Android, Mapbox Tiling Service, Tilesets. Scoring: title matches (3x) > description matches (1x) = URL path (1x). Results are deduplicated by URL across sources and capped at limit. Failed sources are skipped gracefully. Also adds fetchCachedText() to docFetcher.ts as a shared helper for fetching and caching plain-text URLs (used by both the search index and the resource layer). Fixes a subtle bug where empty-string cache entries were not recognized as hits due to falsy check on ''. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cspell flagged capitalized 'Isochrone' as an unknown word in two resource description strings. Lowercased to fix CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cspell doesn't know 'isochrone' by default. Adding both cases to the project word list so the CI spellcheck passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Brings in sublevel llms.txt resources, docs-index-tool, and spellcheck fixes - Resolves conflict in fetchCachedText: keep !== null cache check and response.text() from HEAD; remove Accept header that caused CDN 403 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
.claude/worktrees/exciting-sutherland was accidentally staged during the merge commit. Removed and added .claude/ to .gitignore to prevent recurrence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
search_mapbox_docs_toolno longer depends on the Algolia third-party service. The hosted server atmcp-docs.mapbox.comshares a single Algolia free-tier quota across every user, making it prone to throttling as adoption grows. This replaces it with a self-contained keyword search over thellms.txtindex files that now exist at every product level on docs.mapbox.com (following the docs restructure in subdomain-docs-root#576).How it works
On first search, 12 product
llms.txtfiles are fetched in parallel (~220KB total). Each file contains a structured list of documentation pages with titles, URLs, and one-line descriptions. These are cached via the existingdocCachefor the 1-hour TTL. All subsequent searches are pure in-memory keyword matching — zero network calls after warm-up.Products indexed (~220KB total, all cached):
api/llms.txtmapbox-gl-js/llms.txthelp/llms.txtstyle-spec/llms.txtstudio-manual/llms.txtmapbox-search-js/llms.txtios/maps/llms.txtandroid/maps/llms.txtios/navigation/llms.txtandroid/navigation/llms.txtmapbox-tiling-service/llms.txtdata/tilesets/llms.txtScoring: Title matches rank 3×, description and URL path matches 1× each. Results are deduplicated by URL across sources (same page can appear in multiple product indexes) and capped at
limit.Reliability: Failed sources are silently skipped — if any single
llms.txtis unreachable, the remaining 11 still return results. No single point of failure.Trade-offs vs Algolia
For typical developer queries ("directions API parameters", "add marker GL JS", "style spec fill color") keyword matching against structured titles and descriptions is effective. The output format is the same shape — numbered list with title, URL, and description — so
get_document_toolchaining works identically.Also included
fetchCachedText(url, httpRequest)— new helper indocFetcher.tsfor fetching and caching plain-text URLs. Shared betweendocsSearchIndex.tsand (in PR #27) the resource layer. Fixes a subtle bug where empty-string cache entries weren't recognized as hits due to a falsy check on''— now uses!== null.Test plan
npm test— 68 tests pass (including 8 new SearchDocsTool tests, 7 new docsSearchIndex tests)"Search Mapbox docs for isochrone"
🤖 Generated with Claude Code