Skip to content

feat: routing overhaul #70

Open
vishalveerareddy123 wants to merge 5 commits into
mainfrom
feat/routing-overhaul-v1
Open

feat: routing overhaul #70
vishalveerareddy123 wants to merge 5 commits into
mainfrom
feat/routing-overhaul-v1

Conversation

@vishalveerareddy123
Copy link
Copy Markdown
Collaborator

Summary

Complete routing overhaul spanning 6 phases plus two NadirClaw-inspired safety features:

Phase 1-2: Complexity analysis, tier mapping, cost optimization, context validation
Phase 3-6: kNN router, LinUCB bandit, cascade mode, deadline routing, tenant policy, budget enforcement, shadow A/B testing
NadirClaw safety:

  • kNN ambiguous escalation: confidence 0.4-0.7 → tier bump (quality over cost)
  • Vision routing guard: auto-upgrade to vision models when payload has images

Key Commits

  • 3ba5c2f Full routing overhaul (phases 1-4 + cross-cutting)
  • d5149ad Wire phases 3-6 into live request path
  • 64da51b kNN ambiguous escalation + vision guard

Technical Details

kNN Ambiguous Escalation

When kNN neighbors are split (confidence 0.4-0.7), bump tier one step up:

  • SIMPLE → MEDIUM → COMPLEX → REASONING
  • REASONING never escalates (ceiling)
  • Method tag: +knn_ambiguous_escalate

Vision Routing Guard (Phase 1.4)

Slots after context validation, before kNN routing:

  • _payloadHasImages() checks for type: 'image' or 'image_url' blocks
  • If selected model lacks vision: true in registry, calls selector.findVisionCapable(tier)
  • Upgrades to cheapest vision model at or above current tier
  • Method tag: +vision_guard

Integration Points

  • routing/index.js: both features in determineProviderSmart
  • model-tiers.js: findVisionCapable(preferredTier) walks tier order upward
  • Test coverage: 16 new tests (8 kNN, 8 vision), all passing

Testing

node --test test/knn-ambiguous-escalate.test.js   # 8/8 pass
node --test test/vision-routing-guard.test.js      # 8/8 pass
node --test test/*.test.js                         # 740/758 pass (18 pre-existing failures)

Deployment Notes

  • Feature flags: LYNKR_KNN_ENABLED, LYNKR_CASCADE_ENABLED, LYNKR_SHADOW_POLICY
  • Tier config: TIER_SIMPLE, TIER_MEDIUM, TIER_COMPLEX, TIER_REASONING
  • Requires model registry with vision: bool field (already populated)

🤖 Generated with Claude Code

vishal veerareddy and others added 3 commits May 20, 2026 17:25
Implements docs/routing-improvement-plan.md across one branch.

Phase 1 — Plug the open loops (default-on):
  1.1 src/routing/tokenizer.js — js-tiktoken w/ chars/4 fallback;
      replaces estimator in complexity-analyzer.js and api/router.js
  1.2 cost-optimizer wired into routing/index.js; picks cheaper qualifying
      model when ≥25% cheaper and risk!=high (LYNKR_COST_OPTIMIZE=false to
      disable)
  1.3 src/routing/context-validator.js — escalates to context-capable model
      when estimated tokens exceed 85% of selected model's window
  1.4 scripts/calibrate-thresholds.js — nightly job; ranges read from
      data/calibrated-thresholds.json by model-tiers.js
  1.5 latency-tracker keyed by provider:model with backward-compat wildcard;
      databricks.js call sites pass model

Phase 2 — Pre-router primitives:
  2.1 cache/semantic.js bumped to 10K entries; short-TTL keyword override
      for time-sensitive queries
  2.2 scripts/refresh-pricing.js — cron-friendly refresh + diff with >5%
      threshold alerting
  2.3 scripts/learn-output-ratios.js + routing/output-ratios.js — per-task
      ratio table; cost-optimizer.estimateCost reads via ratioFor(taskType)

Phase 3 — Learned scoring:
  3.1 routing/knn-router.js + embedding-cache.js (hnswlib-node backed);
      scripts/build-knn-index.js with optional RouterBench bootstrap;
      empty/sparse → null; caller falls back to heuristic
  3.3 routing/cascade.js + confidence-scorer.js — small-first cascade;
      off by default for streaming/tools, LYNKR_CASCADE_ENABLED=true
  3.4 routing/risk-classifier.js + scripts/train-risk-classifier.js — LR
      over TF features; never downgrades regex-flagged high risk

Phase 4 — Online adaptation:
  4.1 routing/bandit.js (LinUCB) + reward-pipeline.js; state in
      data/bandit-state.json
  4.2 routing/regret-estimator.js + scripts/sample-regret.js; opt-in via
      LYNKR_REGRET_ESTIMATOR=true (costs $ for Opus re-runs)
  4.3 routing/drift-monitor.js — PSI over input/output distributions;
      alerts to data/drift-alerts.json
  4.4 routing/shadow-mode.js + scripts/compare-policies.js — A/B without
      serving shadow decisions; LYNKR_SHADOW_POLICY=<name> to activate

Cross-cutting (Phase 6):
  6.1 routing/tenant-policy.js + api/middleware/tenant.js; per-tenant
      configs in data/tenants/<id>.json; LYNKR-Tenant-Id header
  6.2 budget/hierarchical-budget.js + api/middleware/budget-enforcer.js;
      virtual_key/team/customer/org levels via in-process Map (Redis stub)
  6.3 routing/deadline.js — P95-aware filtering keyed off LYNKR-Deadline-Ms

Stubs (deferred per plan):
  6.4 scripts/run-routerarena.js — entrypoint; CI integration not wired

Deps added: js-tiktoken ^1.0.20, hnswlib-node ^3.0.0

Tests: 8 new test files covering tokenizer, bandit, drift, budget, cascade,
tenant policy, deadline routing, and output ratios. Full unit suite passes
(756/756).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- routing/index.js: replace risk-analyzer with risk-classifier; add kNN
  query override (confidence > 0.7), LinUCB bandit intra-tier selection,
  deadline-aware chooseFastest filter, per-tenant applyTenantOverrides,
  and shadow-mode fire-and-forget compareAndLog
- databricks.js invokeModel: add small-first cascade (LYNKR_CASCADE_ENABLED)
  with _cascadeInner guard to prevent recursion
- orchestrator/index.js runAgentLoop: thread _deadlineMs from
  lynkr-deadline-ms header and _tenantPolicy from options onto cleanPayload
- databricks.js invokeModel: pass tenantPolicy from body._tenantPolicy into
  determineProviderSmart options
- router.js: pass res.locals.tenantPolicy to processMessage options (both
  streaming and buffered paths)
- server.js: mount tenantMiddleware and budgetEnforcer on /v1/messages

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two NadirClaw-inspired safety features:

1. kNN ambiguous escalation (confidence 0.4-0.7):
   When kNN neighbors are split and no model has clear majority, bump
   tier one step up (SIMPLE→MEDIUM→COMPLEX→REASONING) to err on the
   side of quality over cost. REASONING tier is never escalated further.

2. Vision routing guard (Phase 1.4):
   When payload contains image content blocks and selected model lacks
   vision support, automatically upgrade to cheapest vision-capable
   model at or above current tier. Prevents silent upstream failures.

Changes:
- src/routing/index.js: add _payloadHasImages() helper, kNN ambiguous
  escalation block, and Phase 1.4 vision guard (slots after context
  validation, before kNN routing)
- src/routing/model-tiers.js: add findVisionCapable() method (walks
  tier order from preferred upward, checks registry.getCost(model).vision)
- test/knn-ambiguous-escalate.test.js: 8 tests covering boundary
  conditions, REASONING ceiling, missing config fallback
- test/vision-routing-guard.test.js: 8 tests covering image/image_url
  detection, same-tier upgrade, cross-tier escalation, no-model warning

All tests pass (16/16 new, 740/758 total).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@veerareddyvishal144 veerareddyvishal144 changed the title feat: routing overhaul (phases 1-6 + NadirClaw-inspired safety) feat: routing overhaul May 28, 2026
vishal veerareddy and others added 2 commits May 27, 2026 17:41
Added detailed documentation for MCP Code Mode across three key files:

1. documentation/token-optimization.md:
   - New Phase 0: MCP Code Mode (96% reduction for MCP tools)
   - Full workflow example (discover → inspect → execute)
   - Token savings calculation (17,500 → 700 tokens)
   - Trade-offs and configuration
   - Updated phase numbering (6 → 7 optimization phases)
   - Headroom becomes Phase 8

2. documentation/tools.md:
   - New section: "MCP Code Mode (Token Optimization)"
   - When to use vs skip Code Mode
   - Integration with Smart Tool Selection
   - Integration with Headroom compression pipeline
   - Full workflow examples with JSON

3. README.md:
   - Updated "Token Optimization (8 Phases)" section
   - Enhanced "MCP Integration + Code Mode" section with:
     * Token reduction details (17,500 → 700 tokens)
     * Lazy tool discovery workflow
     * Use cases and trade-offs
     * Links to detailed documentation

All docs now explain:
- 96% token reduction for MCP-heavy setups
- 4 meta-tools (mcp_list_tools, mcp_tool_info, mcp_tool_docs, mcp_execute)
- Pipeline position: Code Mode → Smart Tool Selection → Headroom
- Trade-off: 3 sequential calls vs 1 direct (adds ~2-3s latency)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ures

Added "Routing Safety Features" section to routing.md documenting:

1. Vision Capability Guard:
   - Automatic upgrade when images detected + model lacks vision
   - Tier escalation if no vision model at current tier
   - Example: ollama:llama3.2 → anthropic:claude-sonnet-4-6
   - Method tag: +vision_guard

2. kNN Ambiguous Confidence Escalation:
   - When kNN confidence 0.4-0.7 (split neighbors) → escalate tier
   - Confidence >0.7 → use kNN model directly
   - Confidence ≤0.4 → ignore kNN
   - Example: MEDIUM → COMPLEX when neighbors split
   - Method tag: +knn_ambiguous_escalate

Updated routing decision flow (12 → 19 steps) to include:
- Step 13: Vision capability guard
- Step 14: kNN routing with ambiguous escalation
- Risk analysis, context escalation, LinUCB, deadline, tenant policy

No external references, pure technical documentation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant