feat: routing overhaul #70
Open
vishalveerareddy123 wants to merge 5 commits into
Open
Conversation
Implements docs/routing-improvement-plan.md across one branch.
Phase 1 — Plug the open loops (default-on):
1.1 src/routing/tokenizer.js — js-tiktoken w/ chars/4 fallback;
replaces estimator in complexity-analyzer.js and api/router.js
1.2 cost-optimizer wired into routing/index.js; picks cheaper qualifying
model when ≥25% cheaper and risk!=high (LYNKR_COST_OPTIMIZE=false to
disable)
1.3 src/routing/context-validator.js — escalates to context-capable model
when estimated tokens exceed 85% of selected model's window
1.4 scripts/calibrate-thresholds.js — nightly job; ranges read from
data/calibrated-thresholds.json by model-tiers.js
1.5 latency-tracker keyed by provider:model with backward-compat wildcard;
databricks.js call sites pass model
Phase 2 — Pre-router primitives:
2.1 cache/semantic.js bumped to 10K entries; short-TTL keyword override
for time-sensitive queries
2.2 scripts/refresh-pricing.js — cron-friendly refresh + diff with >5%
threshold alerting
2.3 scripts/learn-output-ratios.js + routing/output-ratios.js — per-task
ratio table; cost-optimizer.estimateCost reads via ratioFor(taskType)
Phase 3 — Learned scoring:
3.1 routing/knn-router.js + embedding-cache.js (hnswlib-node backed);
scripts/build-knn-index.js with optional RouterBench bootstrap;
empty/sparse → null; caller falls back to heuristic
3.3 routing/cascade.js + confidence-scorer.js — small-first cascade;
off by default for streaming/tools, LYNKR_CASCADE_ENABLED=true
3.4 routing/risk-classifier.js + scripts/train-risk-classifier.js — LR
over TF features; never downgrades regex-flagged high risk
Phase 4 — Online adaptation:
4.1 routing/bandit.js (LinUCB) + reward-pipeline.js; state in
data/bandit-state.json
4.2 routing/regret-estimator.js + scripts/sample-regret.js; opt-in via
LYNKR_REGRET_ESTIMATOR=true (costs $ for Opus re-runs)
4.3 routing/drift-monitor.js — PSI over input/output distributions;
alerts to data/drift-alerts.json
4.4 routing/shadow-mode.js + scripts/compare-policies.js — A/B without
serving shadow decisions; LYNKR_SHADOW_POLICY=<name> to activate
Cross-cutting (Phase 6):
6.1 routing/tenant-policy.js + api/middleware/tenant.js; per-tenant
configs in data/tenants/<id>.json; LYNKR-Tenant-Id header
6.2 budget/hierarchical-budget.js + api/middleware/budget-enforcer.js;
virtual_key/team/customer/org levels via in-process Map (Redis stub)
6.3 routing/deadline.js — P95-aware filtering keyed off LYNKR-Deadline-Ms
Stubs (deferred per plan):
6.4 scripts/run-routerarena.js — entrypoint; CI integration not wired
Deps added: js-tiktoken ^1.0.20, hnswlib-node ^3.0.0
Tests: 8 new test files covering tokenizer, bandit, drift, budget, cascade,
tenant policy, deadline routing, and output ratios. Full unit suite passes
(756/756).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- routing/index.js: replace risk-analyzer with risk-classifier; add kNN query override (confidence > 0.7), LinUCB bandit intra-tier selection, deadline-aware chooseFastest filter, per-tenant applyTenantOverrides, and shadow-mode fire-and-forget compareAndLog - databricks.js invokeModel: add small-first cascade (LYNKR_CASCADE_ENABLED) with _cascadeInner guard to prevent recursion - orchestrator/index.js runAgentLoop: thread _deadlineMs from lynkr-deadline-ms header and _tenantPolicy from options onto cleanPayload - databricks.js invokeModel: pass tenantPolicy from body._tenantPolicy into determineProviderSmart options - router.js: pass res.locals.tenantPolicy to processMessage options (both streaming and buffered paths) - server.js: mount tenantMiddleware and budgetEnforcer on /v1/messages Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two NadirClaw-inspired safety features: 1. kNN ambiguous escalation (confidence 0.4-0.7): When kNN neighbors are split and no model has clear majority, bump tier one step up (SIMPLE→MEDIUM→COMPLEX→REASONING) to err on the side of quality over cost. REASONING tier is never escalated further. 2. Vision routing guard (Phase 1.4): When payload contains image content blocks and selected model lacks vision support, automatically upgrade to cheapest vision-capable model at or above current tier. Prevents silent upstream failures. Changes: - src/routing/index.js: add _payloadHasImages() helper, kNN ambiguous escalation block, and Phase 1.4 vision guard (slots after context validation, before kNN routing) - src/routing/model-tiers.js: add findVisionCapable() method (walks tier order from preferred upward, checks registry.getCost(model).vision) - test/knn-ambiguous-escalate.test.js: 8 tests covering boundary conditions, REASONING ceiling, missing config fallback - test/vision-routing-guard.test.js: 8 tests covering image/image_url detection, same-tier upgrade, cross-tier escalation, no-model warning All tests pass (16/16 new, 740/758 total). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added detailed documentation for MCP Code Mode across three key files:
1. documentation/token-optimization.md:
- New Phase 0: MCP Code Mode (96% reduction for MCP tools)
- Full workflow example (discover → inspect → execute)
- Token savings calculation (17,500 → 700 tokens)
- Trade-offs and configuration
- Updated phase numbering (6 → 7 optimization phases)
- Headroom becomes Phase 8
2. documentation/tools.md:
- New section: "MCP Code Mode (Token Optimization)"
- When to use vs skip Code Mode
- Integration with Smart Tool Selection
- Integration with Headroom compression pipeline
- Full workflow examples with JSON
3. README.md:
- Updated "Token Optimization (8 Phases)" section
- Enhanced "MCP Integration + Code Mode" section with:
* Token reduction details (17,500 → 700 tokens)
* Lazy tool discovery workflow
* Use cases and trade-offs
* Links to detailed documentation
All docs now explain:
- 96% token reduction for MCP-heavy setups
- 4 meta-tools (mcp_list_tools, mcp_tool_info, mcp_tool_docs, mcp_execute)
- Pipeline position: Code Mode → Smart Tool Selection → Headroom
- Trade-off: 3 sequential calls vs 1 direct (adds ~2-3s latency)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ures Added "Routing Safety Features" section to routing.md documenting: 1. Vision Capability Guard: - Automatic upgrade when images detected + model lacks vision - Tier escalation if no vision model at current tier - Example: ollama:llama3.2 → anthropic:claude-sonnet-4-6 - Method tag: +vision_guard 2. kNN Ambiguous Confidence Escalation: - When kNN confidence 0.4-0.7 (split neighbors) → escalate tier - Confidence >0.7 → use kNN model directly - Confidence ≤0.4 → ignore kNN - Example: MEDIUM → COMPLEX when neighbors split - Method tag: +knn_ambiguous_escalate Updated routing decision flow (12 → 19 steps) to include: - Step 13: Vision capability guard - Step 14: kNN routing with ambiguous escalation - Risk analysis, context escalation, LinUCB, deadline, tenant policy No external references, pure technical documentation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete routing overhaul spanning 6 phases plus two NadirClaw-inspired safety features:
Phase 1-2: Complexity analysis, tier mapping, cost optimization, context validation
Phase 3-6: kNN router, LinUCB bandit, cascade mode, deadline routing, tenant policy, budget enforcement, shadow A/B testing
NadirClaw safety:
Key Commits
3ba5c2fFull routing overhaul (phases 1-4 + cross-cutting)d5149adWire phases 3-6 into live request path64da51bkNN ambiguous escalation + vision guardTechnical Details
kNN Ambiguous Escalation
When kNN neighbors are split (confidence 0.4-0.7), bump tier one step up:
+knn_ambiguous_escalateVision Routing Guard (Phase 1.4)
Slots after context validation, before kNN routing:
_payloadHasImages()checks fortype: 'image'or'image_url'blocksvision: truein registry, callsselector.findVisionCapable(tier)+vision_guardIntegration Points
determineProviderSmartfindVisionCapable(preferredTier)walks tier order upwardTesting
Deployment Notes
LYNKR_KNN_ENABLED,LYNKR_CASCADE_ENABLED,LYNKR_SHADOW_POLICYTIER_SIMPLE,TIER_MEDIUM,TIER_COMPLEX,TIER_REASONINGvision: boolfield (already populated)🤖 Generated with Claude Code