Add $select filtering and ref-aware schemas for tool chaining#3056
Add $select filtering and ref-aware schemas for tool chaining#3056tim-inkeep merged 8 commits intomainfrom
Conversation
🦋 Changeset detectedLatest commit: a31de87 The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
TL;DR — Adds a Key changes
Summary | 18 files | 2 commits | base:
|
|
TL;DR — Adds Key changes
Summary | 22 files | 8 commits | base: JMESPath
|
There was a problem hiding this comment.
Solid feature — the $select JMESPath filtering, ref-aware schemas, and security validation are well-designed. Two issues warrant attention before merging: the dead template files that should be deleted, and the duplicated sanitizeJMESPathSelector + JMESPath usage in ArtifactService that the new select-filter.ts was explicitly extracted to replace. One minor runtime safety issue in the error path.
Claude Opus | 𝕏
| import dataComponentTemplate from '../../../../../../templates/v1/shared/data-component.xml?raw'; | ||
| import dataComponentsTemplate from '../../../../../../templates/v1/shared/data-components.xml?raw'; | ||
|
|
||
| const systemPromptTemplate = |
There was a problem hiding this comment.
The six XML template files under agents-api/templates/v1/ are now dead code — they're no longer imported from anywhere. The inlined constants here contain updated content (tool chaining guidance, $select references) that diverges from the on-disk files. Delete the templates/v1/ directory so there's no stale source of confusion.
| * Sanitize JMESPath selector to fix common LLM syntax issues. | ||
| * Extracted from ArtifactService for reuse in $select filtering. | ||
| */ | ||
| export function sanitizeJMESPathSelector(selector: string): string { |
There was a problem hiding this comment.
ArtifactService still has its own private sanitizeJMESPathSelector (line 854) with identical logic and a separate static selectorCache. The comment on line 31 here says "Extracted from ArtifactService for reuse" but the original was never removed. This means two independent caches, two copies of the same four regexes, and — critically — ArtifactService's direct jmespath.search() calls (lines 225, 1048) bypass the security validation (DANGEROUS_PATTERNS, MAX_EXPRESSION_LENGTH) that applySelector enforces.
Consolidate: have ArtifactService import sanitizeJMESPathSelector (and ideally applySelector) from this module, then remove its private method and static cache.
| if (!validation.success) { | ||
| const mismatchDetails = Object.entries(resolvedArgs as Record<string, unknown>) |
There was a problem hiding this comment.
Object.entries(resolvedArgs as Record<string, unknown>) is unsafe if resolvedArgs is a primitive. When the entire args object is a single $tool + $select ref (e.g., { "$tool": "call_x", "$select": "items[0].name" }), resolveArgs returns the selected value directly — which could be a string, number, or array. Calling Object.entries on a string iterates its character indices, producing misleading error output.
Guard with a type check:
const mismatchDetails = resolvedArgs && typeof resolvedArgs === 'object' && !Array.isArray(resolvedArgs)
? Object.entries(resolvedArgs as Record<string, unknown>)
.map(([key, val]) => { ... })
.join(', ')
: `resolved to ${Array.isArray(resolvedArgs) ? 'array' : typeof resolvedArgs}`;| baseInputSchema: ReturnType<typeof z.fromJSONSchema> | undefined; | ||
| } { | ||
| try { | ||
| const rawJson = z.toJSONSchema(inputSchema as z.ZodType) as Record<string, unknown>; |
There was a problem hiding this comment.
z.toJSONSchema(inputSchema as z.ZodType) assumes inputSchema is a Zod schema, but MCP tool schemas from the protocol are plain JSON Schema objects — they're only converted to Zod via z.fromJSONSchema at registration time. If inputSchema is already a Zod type this works, but if it's a raw JSON Schema object, z.toJSONSchema will fail or produce incorrect output.
The function-tools path (function-tools.ts:89-106) correctly uses functionData.inputSchema (raw JSON Schema) directly with makeRefAwareJsonSchema. Verify this path handles both Zod and raw JSON Schema inputs, or add a type guard.
| } | ||
| } | ||
| if (changed) { | ||
| (span as any).attributes['ai.toolCall.result'] = JSON.stringify(parsed); |
There was a problem hiding this comment.
Mutating span.attributes via (span as any).attributes[...] after onEnd relies on the span object being mutable at this point. ReadableSpan is — by design — read-only. This works today because the JS object is still writable, but it's fragile: future OTEL SDK versions could freeze the attributes or use a different internal representation. Consider whether this sanitization should happen earlier (e.g., in a custom SpanExporter wrapper or before the span is finalized) for long-term stability.
| description: TOOL_CHAINING_SCHEMA_DESCRIPTIONS.SELECT, | ||
| }, | ||
| }, | ||
| additionalProperties: true, |
There was a problem hiding this comment.
additionalProperties: true on both ref schema variants means the LLM can include arbitrary extra keys alongside $tool/$artifact/$select without validation failure. This is intentional (the LLM might include other tool params in the same object), but it also means typos in sentinel keys (e.g., $selec) will silently pass validation. Just confirming this is the desired tradeoff — if so, no action needed.
Ito Test Report ✅8 test cases ran. 1 additional finding, 7 passed. Overall, the unified QA run covered 8 verifiable test cases with 7 passes and 1 failure (plus one additional code-review-only check with no customer-facing pass/fail assertions). The strongest findings were that tool-chaining selector behavior is largely correct and robust (primitive extraction via $tool+$select, stripping internal underscore fields, expected ToolChainResolutionError handling for no-match/malformed/type-mismatch selectors, stable rapid re-submits with distinct chat IDs despite provider credential errors, and rejection of dangerous proto selectors), but a high-severity defect remains where POST /run/api/chat can throw a 500 “Context validation failed” when agent context is unresolved because validation dereferences a missing agent before selector sanitization. ✅ Passed (7)ℹ️ Additional Findings (1)
|
There was a problem hiding this comment.
PR Review Summary
(6) Total Issues | Risk: Medium
🟠⚠️ Major (4) 🟠⚠️
Inline Comments:
- 🟠 Major:
select-filter.ts:9-18JSDoc comment misplaced — describes stripInternalFields but precedes clearSelectorCache - 🟠 Major:
select-filter.ts:24-31DANGEROUS_PATTERNS duplicated from agents-core — should import from@inkeep/agents-core/utils/jmespath-utils - 🟠 Major:
function-tools.ts:89-94Silent schema building failure — no logging when baseInputSchema creation fails - 🟠 Major:
mcp-tools.ts:28-33Silent schema transformation failure — catch block swallows all errors with no logging
🟡 Minor (2) 🟡
Inline Comments:
- 🟡 Minor:
instrumentation.ts:100Empty catch block silently swallows JSON parsing errors - 🟡 Minor:
instrumentation.ts:97Mutating ReadableSpan attributes violates OpenTelemetry contract
💭 Consider (3) 💭
💭 1) select-filter.ts Missing unit tests for stripInternalFields
Issue: The stripInternalFields function (lines 13-18) has no unit tests despite being used in critical paths (tool-wrapper.ts, ArtifactService.ts).
Why: Regressions could cause internal fields like _structureHints to leak into stored tool results or traces.
Fix: Add tests to select-filter.test.ts:
describe('stripInternalFields', () => {
it('should remove underscore-prefixed keys', () => {
expect(stripInternalFields({ _internal: 1, visible: 2 })).toEqual({ visible: 2 });
});
it('should return non-objects unchanged', () => {
expect(stripInternalFields('string')).toBe('string');
expect(stripInternalFields([1, 2])).toEqual([1, 2]);
});
});💭 2) resolveArgs-select.test.ts Test mocks getToolResultRaw but implementation calls getToolResultFull for $select
Issue: Tests at line 46 mock getToolResultRaw for $select scenarios, but the implementation (ArtifactParser.ts:259-262) calls getToolResultFull when $select is present. The test passes only because the helper falls back.
Why: This masks potential bugs if the two methods diverge.
Fix: Explicitly mock getToolResultFull when testing $select scenarios.
💭 3) tool-wrapper.ts:207-228 Missing integration test for post-resolution validation error path
Issue: The enhanced error message for type mismatches after $select resolution has no test coverage.
Why: This is a key user-facing error path for LLM self-correction.
🧹 While You're Here (1) 🧹
🧹 1) agents-api/templates/v1/ Dead template files should be deleted
Issue: Six XML template files still exist in agents-api/templates/v1/ but are no longer imported anywhere — the templates were inlined as string constants in PromptConfig.ts.
Why: These are now dead code that could cause confusion. Grep confirms no TypeScript files import from templates/v1.
Fix: Delete the agents-api/templates/v1/ directory.
Refs: Pullfrog review comment
💡 APPROVE WITH SUGGESTIONS
Summary: This is a well-designed feature that adds $select JMESPath filtering and ref-aware schemas for tool chaining. The core implementation is solid with good security validation and test coverage. The main actionable items are: (1) add logging to the silent catch blocks in schema building to aid debugging, (2) import DANGEROUS_PATTERNS from agents-core instead of duplicating, and (3) clean up the dead template files. None of these are blocking, but addressing the logging gaps would significantly improve debuggability.
Discarded (5)
| Location | Issue | Reason Discarded |
|---|---|---|
mcp-tools.ts:24 |
z.toJSONSchema cast may fail for non-Zod schemas |
Handled correctly by try-catch fallback; the catch block returns original schema |
instrumentation.ts:81 |
onStart param should be Span not ReadableSpan | No runtime impact — method body is empty |
agent-types.ts:260 |
baseInputSchema type duplicates parameters type shape | Minor DRY violation but localized and simple |
select-filter.ts:133 |
Error message may leak data structure info | Intentional for LLM feedback; acceptable for use case |
ref-aware-schema.ts:56 |
additionalProperties: true allows typos in sentinel keys | Intentional tradeoff confirmed — allows other tool params in same object |
Reviewers (5)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
3 | 0 | 0 | 0 | 2 | 0 | 1 |
pr-review-appsec |
4 | 0 | 0 | 0 | 2 | 0 | 2 |
pr-review-tests |
6 | 0 | 3 | 0 | 0 | 0 | 0 |
pr-review-errors |
5 | 0 | 0 | 0 | 2 | 0 | 2 |
pr-review-types |
5 | 0 | 0 | 0 | 0 | 0 | 5 |
| Total | 23 | 0 | 3 | 1 | 6 | 0 | 10 |
Note: Many type-related findings were discarded as they are either handled by existing fallbacks or are minor expressiveness issues with no runtime impact.
| export function clearSelectorCache(): void { | ||
| selectorCache.clear(); | ||
| } | ||
|
|
||
| export function stripInternalFields<T>(data: T): T { | ||
| if (data && typeof data === 'object' && !Array.isArray(data)) { | ||
| return Object.fromEntries(Object.entries(data).filter(([key]) => !key.startsWith('_'))) as T; | ||
| } | ||
| return data; | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: JSDoc comment misplaced — describes wrong function
Issue: The JSDoc block at lines 4-8 describes "Strip internal fields (prefixed with _) from a tool result object" but immediately precedes clearSelectorCache(), not stripInternalFields().
Why: This creates confusing documentation — developers reading this code will think clearSelectorCache strips internal fields when it actually clears the selector cache. The actual stripInternalFields function at line 13 has no documentation.
Fix:
| export function clearSelectorCache(): void { | |
| selectorCache.clear(); | |
| } | |
| export function stripInternalFields<T>(data: T): T { | |
| if (data && typeof data === 'object' && !Array.isArray(data)) { | |
| return Object.fromEntries(Object.entries(data).filter(([key]) => !key.startsWith('_'))) as T; | |
| } | |
| return data; | |
| } | |
| export function clearSelectorCache(): void { | |
| selectorCache.clear(); | |
| } | |
| /** | |
| * Strip internal fields (prefixed with _) from a tool result object. | |
| * Removes _structureHints, _toolCallId, etc. that are added for LLM context | |
| * but should not appear in traces, stored results, or downstream data. | |
| */ | |
| export function stripInternalFields<T>(data: T): T { |
Refs:
| const DANGEROUS_PATTERNS: RegExp[] = [ | ||
| /\$\{.*\}/, | ||
| /eval\s*\(/, | ||
| /function\s*\(/, | ||
| /constructor/, | ||
| /prototype/, | ||
| /__proto__/, | ||
| ]; |
There was a problem hiding this comment.
🟠 MAJOR: DANGEROUS_PATTERNS duplicated from agents-core
Issue: This array duplicates the exact same patterns defined in packages/agents-core/src/utils/jmespath-utils.ts (lines 191-198), which is already exported.
Why: Maintaining two copies of security-critical patterns creates drift risk. If one is updated and the other isn't, security gaps could emerge. The core package also exports validateJMESPathSecure() which could be used directly.
Fix: Import from agents-core instead of redefining:
import { DANGEROUS_PATTERNS } from '@inkeep/agents-core/utils/jmespath-utils';Or consider using validateJMESPathSecure() from the same module which combines length + pattern + compile validation.
Refs:
| if (functionData.inputSchema) { | ||
| try { | ||
| baseInputSchema = makeBaseInputSchema(functionData.inputSchema); | ||
| } catch { | ||
| // baseInputSchema stays undefined — post-resolution validation will be skipped | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Silent schema building failure — no logging when baseInputSchema creation fails
Issue: When makeBaseInputSchema() throws, the error is silently swallowed. Post-resolution validation is silently disabled for this tool with no indication to operators.
Why: If schema parsing fails for a tool, that tool will accept any arguments after reference resolution, potentially causing downstream failures that are harder to debug. Operators have no visibility into which tools have disabled validation.
Fix: Add a warning log so operators know which tools have disabled post-resolution validation:
| if (functionData.inputSchema) { | |
| try { | |
| baseInputSchema = makeBaseInputSchema(functionData.inputSchema); | |
| } catch { | |
| // baseInputSchema stays undefined — post-resolution validation will be skipped | |
| } | |
| try { | |
| baseInputSchema = makeBaseInputSchema(functionData.inputSchema); | |
| } catch (schemaError) { | |
| logger.warn( | |
| { | |
| functionToolName: functionToolDef.name, | |
| schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError), | |
| }, | |
| 'Failed to build base input schema; post-resolution validation will be skipped' | |
| ); | |
| } |
Refs:
| } catch { | ||
| return { | ||
| refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>, | ||
| baseInputSchema: undefined, | ||
| }; | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Silent schema transformation failure — catch block swallows all errors
Issue: When z.toJSONSchema(), makeBaseInputSchema(), or makeRefAwareJsonSchema() throws, the error is silently swallowed with no logging. The tool loses ref-aware schema capabilities and post-resolution validation with no indication.
Why: This could hide type mismatches, invalid schemas, or other issues. MCP tools failing schema transformation will behave differently than expected with no trace or log to diagnose.
Fix: Add logging to indicate when schema transformation fails:
| } catch { | |
| return { | |
| refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>, | |
| baseInputSchema: undefined, | |
| }; | |
| } | |
| } catch (schemaError) { | |
| logger.warn( | |
| { | |
| schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError), | |
| }, | |
| 'Failed to build ref-aware schema for MCP tool; falling back to original schema' | |
| ); | |
| return { | |
| refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>, | |
| baseInputSchema: undefined, | |
| }; | |
| } |
Refs:
| (span as any).attributes['ai.toolCall.result'] = JSON.stringify(parsed); | ||
| } | ||
| } | ||
| } catch {} |
There was a problem hiding this comment.
🟡 Minor: Empty catch block silently swallows JSON parsing errors
Issue: When span.attributes['ai.toolCall.result'] contains malformed JSON, the catch block silently swallows the error with no logging.
Why: Developers debugging trace issues have no visibility into parsing failures. The span processor appears to work correctly when it may be silently failing on certain spans.
Fix: Add a comment acknowledging this is expected behavior, or add debug-level logging:
| } catch {} | |
| } catch { | |
| // Expected for non-JSON tool results - leave unchanged | |
| } |
Refs:
| } | ||
| } | ||
| if (changed) { | ||
| (span as any).attributes['ai.toolCall.result'] = JSON.stringify(parsed); |
There was a problem hiding this comment.
🟡 Minor: Mutating ReadableSpan attributes violates OpenTelemetry contract
Issue: ReadableSpan is designed to be read-only by the OpenTelemetry interface, but this code mutates its attributes via an as any cast.
Why: While this works in practice with the current SDK, it relies on implementation details rather than the public API. This could break if OpenTelemetry changes how spans are frozen. The type annotation on onStart also uses ReadableSpan when it should be Span.
Fix: Consider adding a comment acknowledging the contract violation and why it's necessary:
// Note: Mutating ReadableSpan attributes is technically a contract violation,
// but necessary for sanitization before export. This is a common pattern.
// If OTEL SDK changes break this, consider using a custom SpanExporter instead.Alternatively, move sanitization to onStart where the span is still mutable, though this would require detecting which spans are tool calls before execution completes.
Refs:
Preview URLsUse these stable preview aliases for testing this PR:
These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find. Raw Vercel deployment URLs
|
Ito Test Report ✅14 test cases ran. 14 passed. The unified run passed all 14 of 14 test cases with zero failures, confirming stable non-production behavior after local environment/auth remediation and deterministic project-agent seeding. Most importantly, core run-domain chat streaming (including OpenAI-style follow-ups) and durable execution reconnect worked end-to-end, selector-chain functionality and hardening checks behaved as designed (nested and optional $select, reference/schema handling, controlled errors, and proto blocking), and resilience flows across rapid desktop actions, mobile viewport use, and refresh/back-forward interruptions remained recoverable and usable. ✅ Passed (14)Commit: Tell us how we did: Give Ito Feedback |
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
This is a delta review scoped to changes since the prior automated review (commit c929c408a5b1).
Prior Review Issues — All Addressed ✅
The PR author addressed all actionable items from the prior review in commits 3d2371718 and 4e20019ad:
| Prior Issue | Status | Resolution |
|---|---|---|
🟠 Major: JSDoc comment misplaced in select-filter.ts |
✅ Fixed | JSDoc now correctly precedes stripInternalFields |
🟠 Major: DANGEROUS_PATTERNS duplicated from agents-core |
✅ Fixed | Now imports from @inkeep/agents-core/utils/signature-validation |
🟠 Major: Silent schema building failure in function-tools.ts |
✅ Fixed | Added logger.warn with tool name and error details |
🟠 Major: Silent schema transformation failure in mcp-tools.ts |
✅ Fixed | Added logger.warn with error details |
🟡 Minor: Empty catch block in instrumentation.ts |
✅ Fixed | Added explanatory comment: "Expected for non-JSON tool results" |
| 🟡 Minor: Mutating ReadableSpan attributes | ⚪ Not addressed | Acceptable — this is a known OTEL pattern |
| 🧹 Dead template files should be deleted | ✅ Resolved | Template files are now USED — PromptConfig.ts imports them instead of inlining |
Merged Main Changes
The delta also includes several commits merged from main that are outside this PR's scope:
feat(artifacts): binary attachment artifacts and blob-backed artifact hydration(#2745)fix: key durable approval decisions by toolCallId instead of toolName(#3062)fix: handle SSE failure when streaming approval request(#3064)refactor(test): create reusable logger mock factory(#3071)refactor(agents-core): remove logger DI from agentFull and projectFull(#3072)
These are independent changes already reviewed in their respective PRs.
💭 Consider (Carried Forward)
The following items from the prior review remain as optional improvements:
💭 1) select-filter.ts Missing unit tests for stripInternalFields
The stripInternalFields function has no unit tests. This is a nice-to-have for completeness but not blocking.
💭 2) resolveArgs-select.test.ts Test mocks alignment
Tests mock getToolResultRaw for $select scenarios, but implementation calls getToolResultFull when $select is present. Works due to fallback but could be cleaner.
✅ APPROVE
Summary: Excellent work addressing the prior review feedback! All 4 Major issues and 1 Minor issue have been resolved. The template files are now properly imported rather than inlined, which also resolves the "dead files" concern. The $select JMESPath filtering feature is well-implemented with proper security validation, error handling, and comprehensive test coverage. Ready to merge. 🚀
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator |
7 | 0 | 2 | 0 | 0 | 0 | 0 |
| Total | 7 | 0 | 2 | 0 | 0 | 0 | 0 |
Note: Delta review — sub-reviewers not dispatched since prior issues were addressed.
Ito Test Report ✅10 test cases ran. 1 additional finding, 9 passed. Overall, the unified QA run completed 10 test cases with 9 passing and 1 failing, with the only confirmed defect being a medium-severity conversation continuity bug in the Manage UI playground where refresh/back-forward/multi-tab recovery during streaming rotates the conversation ID and loses in-progress thread context. All other critical paths were validated as expected—including /run/v1/chat/completions tool-chaining, correct $select behavior (full-payload passthrough when absent, result.-prefix normalization, and actionable type-mismatch guidance), and robust selector hardening against dangerous, overlong, and malformed inputs—while rapid-submit recovery and mobile chained-chat usability remained stable. ✅ Passed (9)ℹ️ Additional Findings (1)
🟠 Unusual journey stress: refresh/back-forward and multi-tab retry during stream
Relevant code:
useEffect(() => {
// when the playground is closed the chat widget is unmounted so we need to reset the conversation id
return () => resetPlaygroundConversationId();
}, []);
resetPlaygroundConversationId() {
set({ playgroundConversationId: generateId() });
},
isChatHistoryButtonVisible: false,
isViewOnly: hasHeadersError,
conversationIdOverride: conversationId,
baseUrl: PUBLIC_INKEEP_AGENTS_API_URL,Commit: Tell us how we did: Give Ito Feedback |
Documents the new $select parameter for tool chaining references ($tool and $artifact) introduced in PR #3056. Covers JMESPath syntax, common patterns, and automatic result. prefix stripping.






























Summary
$selectJMESPath filtering: Tool chaining references ($tool,$artifact) now support a$selectfield that filters data at resolution time using JMESPath expressions, enabling extraction of specific fields before they reach thedownstream tool
anyOf: [originalType, refObject]so the LLM structurally sees that{ "$tool": "..." }objects are valid inputs — not just prompt-guided. Includes abaseInputSchemafor post-resolution validation against the original types_structureHintsand_toolCallIdstripped from OTEL trace spans viaToolResultSanitizerSpanProcessor;stripInternalFieldsutility shared across ArtifactService and trace output$selectresolves to the wrong type (e.g., object where string expected), the error now tells the LLM exactly what happened and suggests drilling deeper with specific field paths