spec: $select filtering for tool refs + bash tool for data exploration by tim-inkeep · Pull Request #3036 · inkeep/agents

tim-inkeep · 2026-04-06T20:27:50Z

Summary

Spec for two-phase approach to data filtering in the tool chaining system
Phase 1 (In Scope): Add $select sentinel key to $tool/$artifact refs — inline JMESPath filtering at resolveArgs() resolution time, zero new dependencies, zero token cost
Phase 2 (Explored, Future Work): Built-in bash tool via just-bash for interactive data exploration (jq, grep, awk), executed via existing SandboxExecutorFactory

Problem

The tool chaining system ($tool/$artifact refs) lets agents pipe data between tools without it entering context. The prompt guidance teaches a pipeline pattern (tool_a → extract → tool_b), but the extract step doesn't exist as a real capability. When tool results are large, they either:

Enter context and trigger compression (extra LLM call + detail loss)
Get truncated at 100K chars (silent data loss)
Get blocked entirely as oversized artifacts (>30% of context window)

The system already provides _structureHints with ready-to-use JMESPath selectors showing the LLM what to extract — but no mechanism to do the extraction without pulling data into context.

Phase 1: `$select` in `$tool` refs

Add a $select key to the sentinel ref system. During resolveArgs() (which already runs before every tool execution), if $select is present alongside $tool, the resolved data is filtered through JMESPath before being passed to the tool.

Zero new dependencies — uses the existing jmespath library and sanitizeJMESPathSelector() already in the codebase. Consolidates with existing jmespath-utils.ts public utilities.

Zero token cost — no new tool schema in system prompts. The filtering happens transparently during argument resolution.

Example flow:

search_tool({ query: "auth" })           → 50K result (call_id: "call_search")

render_card({
  data: {
    "$tool": "call_search",
    "$select": "items[?score > `0.8`].{title: title, url: url}"
  }
})                                        → render_card receives 3K filtered subset

The 50K result never enters conversation context. The LLM uses _structureHints.exampleSelectors (which are already JMESPath) to write the $select expression.

Oversized artifacts (retrievalBlocked: true) can also be filtered via $select, bypassing the retrieval block since the filter produces a manageable subset.

Key design choices:

$select key is language-agnostic — can upgrade from JMESPath to jq later without changing the API contract
Reuses sanitizeJMESPathSelector() to auto-fix common LLM JMESPath errors
Non-JSON source data handled correctly (strings not double-quoted, MCP content arrays unwrapped)

Phase 2: Built-in `bash` tool (future — fully designed)

A sandboxed bash tool powered by just-bash for interactive data exploration — when the agent needs to see filtered results before deciding what to do next (vs $select which pipes directly to another tool).

Full design in spec includes: tool interface + schema, architecture (main thread → SandboxExecutorFactory → sandbox process), stdin data flow, resource limits, just-bash configuration, pipeline examples, error handling, prompt guidance (when to use bash vs $select), OpenTelemetry observability, integration points, and acceptance criteria.

Execution via SandboxExecutorFactory — reuses the existing dual-path sandbox infrastructure (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). No new process management infrastructure needed.

Before advancing to Phase 2

We will likely revisit and change conversation and generation compression triggers before implementing the bash tool. The current compression thresholds and mid-generation compression behavior should be re-evaluated in light of $select reducing context pressure — the existing triggers may fire too aggressively or need different strategies once agents can proactively filter data.

Phase 2 prerequisites:

Compression trigger re-evaluation (thresholds, mid-generation behavior)
just-bash dependency spike (install, build, WASM cold start timing)
SandboxExecutorFactory compatibility verification (both Native and Vercel paths)
Token cost measurement (~300-350 tokens per system prompt — always-on vs conditional?)
Phase 1 usage data (does $select cover enough cases?)

Spec artifacts

File	Purpose
`specs/2026-04-06-bash-tool-for-tool-chaining/SPEC.md`	Full spec (Phase 1 + Phase 2)
`specs/.../evidence/tool-chaining-current-state.md`	Current $tool/$artifact ref system analysis
`specs/.../evidence/compression-cost-analysis.md`	Compression triggers, thresholds, costs
`specs/.../evidence/just-bash-assessment.md`	Library evaluation (API, deps, security)
`specs/.../evidence/sandbox-executor-pattern.md`	NativeSandboxExecutor pattern analysis
`specs/.../meta/audit-findings.md`	Cold-reader audit (3 HIGH, 6 MEDIUM — all resolved)
`specs/.../meta/design-challenge.md`	Design challenges (7 challenges — led to phased approach)

Audit highlights

Spec was audited by two independent cold-reader agents. Key findings that shaped the design:

H1: Child process pools incompatible with Vercel serverless → switched to SandboxExecutorFactory
H2: wrapToolWithStreaming does NOT auto-cache results → bash tool must call recordToolResult() explicitly
Challenge 6: $select in resolveArgs proposed as lighter alternative → adopted as Phase 1
Challenge 4/5: Token cost + always-on premature → deferred to Phase 2 measurement

🤖 Generated with Claude Code

vercel · 2026-04-06T20:27:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Apr 7, 2026 8:51pm
agents-docs	Ready	Preview, Comment	Apr 7, 2026 8:51pm
agents-manage-ui	Ready	Preview, Comment	Apr 7, 2026 8:51pm

changeset-bot · 2026-04-06T20:28:00Z

🦋 Changeset detected

Latest commit: e223b3a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
@inkeep/agents-api	Patch
@inkeep/agents-manage-ui	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-core	Patch
@inkeep/agents-email	Patch
@inkeep/agents-mcp	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pullfrog · 2026-04-06T20:29:54Z

TL;DR — Implements Phase 1 of the $select filtering spec: tool arguments containing { "$tool": "call_id", "$select": "jmespath" } now filter data at resolveArgs() resolution time, so large tool results never enter context. Also rewrites tool schemas to be ref-aware (every parameter advertises tool-chaining syntax to the LLM), overhauls prompt guidance to be shorter and $select-centric, and strips internal fields (_structureHints, _toolCallId) from traces and stored results. The spec for Phase 2 (bash tool via just-bash) remains as future work.

Key changes

Add $select JMESPath filtering in ArtifactParser.resolveArgs() — when a sentinel ref includes $select, the resolved data is filtered through JMESPath before reaching the downstream tool; works for both $tool and $artifact + $tool refs
New select-filter.ts utility — applySelector() with result. prefix auto-stripping, sanitization via sanitizeJMESPathSelector(), length/injection validation, and descriptive ToolChainResolutionError messages including data shape summaries
New ref-aware-schema.ts for tool schema transformation — makeRefAwareJsonSchema() wraps every property in a tool's JSON schema with an anyOf branch accepting $tool/$artifact/$select ref objects, so the LLM sees tool-chaining syntax directly in parameter descriptions
Apply ref-aware schemas to function tools and MCP tools — both function-tools.ts and mcp-tools.ts now build ref-aware input schemas with a baseInputSchema fallback for post-resolution validation
Rewrite PromptConfig.ts tool chaining guidance — replaces 70+ lines of verbose extract-chain instructions with a compact $select-centric reference (syntax, rules, examples, JMESPath patterns); inlines XML templates as string constants
Add toolChainingGuidance to _structureHints — every tool result now includes ready-to-use $tool/$select reference syntax, sequencing rules, and forbidden patterns directly in the hints
Strip internal fields from traces and tool results — new stripInternalFields() removes _-prefixed keys from stored results (ArtifactService.getToolResultFull()), streamed tool events, and a ToolResultSanitizer span processor cleans ai.toolCall.result trace attributes
Add $select to SENTINEL_KEY constants — extends artifact-syntax.ts with SELECT: '$select'
Improved post-resolution validation errors — tool-wrapper.ts now reports type mismatches with field-by-field details and suggests drilling deeper in $select paths
Update get_reference_artifact description — discourages using it for inter-tool data flow, pointing to tool-chaining instead
Comprehensive test coverage — new test suites for applySelector, sanitizeJMESPathSelector, resolveArgs with $select, makeRefAwareJsonSchema, and makeBaseInputSchema
Updated spec with implementation details — SPEC.md revised to reflect the shipped Phase 1 design and simplified Phase 2 architecture

_{Summary ｜ 26 files ｜ 7 commits ｜ base: main ← feature/just-bash-spec-v0}

`$select` filtering in `resolveArgs()`

Before: Tool chaining could only pass full tool results — no way to filter data between tools without pulling it into context, triggering compression or truncation.
After: { "$tool": "call_id", "$select": "items[?score > 0.8]" }filters data at resolution time via JMESPath. Works for both ephemeral$toolrefs and persisted$artifact` refs.

The implementation adds a $select branch in ArtifactParser.resolveArgs() for both the $artifact + $tool path and the $tool-only path. When $select is present, the $tool-only path uses getToolResultFull() (pre-MCP-unwrap) instead of getToolResultRaw() to ensure filtering operates on the complete structured data. The new applySelector() function handles result. prefix stripping (matching _structureHints.exampleSelectors format), sanitizes common LLM JMESPath errors, validates against injection patterns and length limits, and throws ToolChainResolutionError with a data shape summary on mismatch.

How does post-resolution type validation work?

Tools now carry a baseInputSchema (the original JSON schema without ref-aware wrappers) alongside the ref-aware parameters schema. After resolveArgs() resolves refs and applies $select, tool-wrapper.ts validates the resolved args against baseInputSchema. If the resolved type doesn't match (e.g., $select returns an object where a string is expected), the error message includes per-field type info and suggests drilling deeper in the $select path.

ArtifactParser.ts · select-filter.ts · tool-wrapper.ts · artifact-syntax.ts

Ref-aware tool schemas

Before: Tool parameters accepted only literal values in their JSON schema; the LLM had to learn tool-chaining syntax from prompt guidance alone.
After: Every tool parameter's schema includes an anyOf branch with $tool/$artifact/$select ref objects, with rich descriptions explaining when and how to use each.

makeRefAwareJsonSchema() recursively walks a tool's JSON schema and wraps each value-position node with withToolRef(), which adds an anyOf containing PARAM_REF_SCHEMA — two object variants (artifact-ref and tool-ref) with $select as an optional string property. The root schema gets a TOOL_CHAINING_SCHEMA_DESCRIPTIONS.ROOT description. Descriptions are defined as shared constants in TOOL_CHAINING_SCHEMA_DESCRIPTIONS for consistency across all tools.

ref-aware-schema.ts · function-tools.ts · mcp-tools.ts

Prompt guidance overhaul and internal field stripping

Before: Tool chaining prompt guidance was 70+ lines with verbose extract-chain examples; _structureHints and _toolCallId leaked into traces and stored results.
After: Compact $select-centric guidance (~30 lines) with syntax reference, JMESPath patterns, and concrete examples. Internal _-prefixed fields stripped from all output surfaces.

The PromptConfig.ts changes consolidate guidance around three primitives: full passthrough ($tool), filtered passthrough ($tool + $select), and artifact passthrough ($artifact + $tool). The XML templates are inlined as string constants (removing the ?raw template imports). _structureHints now includes a toolChainingGuidance block with ready-to-use reference syntax per tool result. A new ToolResultSanitizer span processor strips _-prefixed fields from ai.toolCall.result trace attributes, and stripInternalFields() cleans stored results via ArtifactService.getToolResultFull() and streamed tool events.

PromptConfig.ts · tool-result.ts · instrumentation.ts · ArtifactService.ts

Phase 2 spec: in-process bash tool (future)

Before: Original spec proposed bash commands through SandboxExecutorFactory with child process pools.
After: Spec revised to use just-bash as a pure TypeScript AST interpreter running in-process — no sandbox, no child processes, no WASM for core commands. Prerequisites reduced from 5 to 4.

The spec documents the bash tool interface, just-bash architecture (in-memory FS, disabled network, TypeScript-reimplemented commands), stdin data flow model, resource limits (AbortSignal.timeout(30_000), maxCallDepth: 50, 1MB stdout truncation), and explicit recordToolResult() caching. This remains future work pending Phase 1 usage data and compression trigger re-evaluation.

SPEC.md

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ Using Claude Opus ｜ 𝕏}

pullfrog

Spec review — solid design, several stale jq references to clean up.

The Phase 1 $select design is well-reasoned: zero new dependencies, zero token cost, leverages existing jmespath + sanitizeJMESPathSelector + _structureHints.exampleSelectors. All codebase claims verified accurate (line numbers, file paths, dependency versions, method signatures). The phased approach with clear Phase 2 prerequisites is good discipline.

The main issue is incomplete find-and-replace from the jq → JMESPath pivot. Several sections still reference "jq" where they should say "JMESPath", and one file naming inconsistency (jq-filter.ts vs select-filter.ts). These are all in the spec text — not blocking, but should be cleaned up before this serves as the implementation guide.

One design note outside the diff: sanitizeJMESPathSelector() is currently a private method on ArtifactService (line 848). The spec correctly plans a new select-filter.ts utility to wrap it, but should explicitly call out that the private method needs to be extracted or made accessible — otherwise the implementer may miss this.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-04-06T20:31:17Z

+
+The `$select` filter operates on the resolved data. Handle different data types:
+
+| Source type | Serialization for jq | Notes |


Table header says "Serialization for jq" but Phase 1 uses JMESPath, not jq. Should be "Serialization for JMESPath" (or just "Serialization").

pullfrog · 2026-04-06T20:31:17Z

+
+| File | Change |
+|------|--------|
+| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` |


The modified files table says Add JQ: '$select' to SENTINEL_KEY but the code example in section 5.1 (line 111) correctly uses SELECT: '$select'. This should be Add SELECT: '$select' to SENTINEL_KEY.

pullfrog · 2026-04-06T20:31:17Z

+- `agents-api/src/domains/run/artifacts/ArtifactService.ts`
+- `agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts`
+- `agents-api/src/domains/run/agents/generation/tool-result.ts`
+- `agents-api/src/domains/run/utils/` (new jq-filter.ts)


Agent Constraints lists the new file as jq-filter.ts but section 6 (New Files table, line 230) calls it select-filter.ts. These should match — select-filter.ts is the correct name given the $select key naming.

pullfrog · 2026-04-06T20:31:17Z

+- Child process infrastructure — Phase 2
+
+**STOP_IF:**
+- jq library spike shows no viable option for full jq syntax in JS


Both STOP_IF conditions reference jq:

"jq library spike shows no viable option" — Phase 1 uses the existing jmespath library, no spike needed

"jq adds >50ms" — should say JMESPath

These read like Phase 2 stop conditions that got left in the Phase 1 section.

pullfrog · 2026-04-06T20:31:17Z

+    }
+    return result;
+  } catch (error) {
+    throw new ToolChainResolutionError(


applySelector(data, selector) references toolCallId in the throw statement (line 172) but toolCallId is not a parameter of this function. Either the function signature should include it, or the error should be thrown by the caller (resolveArgs) which has access to toolCallId.

pullfrog · 2026-04-06T20:31:17Z

+This became the Phase 1 design. Smallest change, highest value, zero token cost, no new dependency beyond a jq library.
+
+## Challenge 7: JMESPath extract tool first — DISMISSED
+JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1.


Challenge 7 says "JMESPath is demonstrably error-prone for LLMs" and "Using jq even in Phase 1" — this directly contradicts the final SPEC.md which chose JMESPath for Phase 1 precisely because it's already in the codebase. This finding was written before the JMESPath pivot but was never updated. Should note that this challenge was later superseded by the zero-dependency argument.

pullfrog · 2026-04-06T20:31:17Z

+- **Challenge 6 ($jq in resolveArgs)**: Adopted as Phase 1. Smallest change, zero token cost, covers pipeline case.
+- **Challenge 4/5 (token cost, always-on)**: Phase 1 has zero cost. Phase 2 injection TBD pending measurement.
+- **Challenge 1 (dependency weight)**: Phase 1 needs only a jq library, not full just-bash.
+- **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better.


Changelog entry says "Challenge 7 (JMESPath): Dismissed — LLMs unreliable with JMESPath, jq better" but the final spec chose JMESPath for Phase 1. This entry reflects an intermediate decision that was later reversed. Should be updated to reflect the final outcome (JMESPath chosen for zero-dep advantage, with sanitizeJMESPathSelector mitigating LLM reliability issues).

claude

PR Review Summary

(7) Total Issues | Risk: Medium

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

🟠 Major: SPEC.md:221 SENTINEL_KEY uses JQ but should be SELECT per section 5.1
🟠 Major: SPEC.md:143 Table header says "jq" but Phase 1 uses JMESPath
🟠 Major: SPEC.md:161-176 Undefined variable toolCallId in code example

🟡 Minor (4) 🟡

🟡 1) SPEC.md Missing test case for acceptance criterion #8

Issue: Acceptance criterion #8 states "_structureHints reference $select as an option for filtering" but the verification plan (Section 14) has no explicit test case for this.

Why: Implementers may miss updating _structureHints to mention $select if it's not in the test plan.

Fix: Add under "Integration tests" in Section 14:

- [ ] `_structureHints` includes guidance about using `$select` for inline filtering

Refs:

Inline Comments:

🟡 Minor: SPEC.md:328 Filename jq-filter.ts should be select-filter.ts
🟡 Minor: design-challenge.md:21-22 Challenge 7 contradicts final JMESPath decision
🟡 Minor: _changelog.md:24 Stale "JMESPath dismissed" statement contradicts line 34

💭 Consider (3) 💭

💭 1) SPEC.md:136 JMESPath sanitization architecture
Issue: Spec proposes reusing sanitizeJMESPathSelector() from ArtifactService.ts, but this is a private method. Meanwhile, agents-core/src/utils/jmespath-utils.ts exports public utilities (validateJMESPathSecure(), searchJMESPath()) with security checks.
Why: Risk of divergent sanitization logic if a third implementation is created.
Fix: Clarify which JMESPath utilities to use. Consider consolidating to jmespath-utils.ts.

💭 2) SPEC.md:3-5 Metadata field naming
Issue: Uses Author: and Date: but some existing specs use Owner(s): and Last updated:.
Why: Minor convention inconsistency across specs directory.
Fix: No action required — both patterns exist in the codebase.

💭 3) SPEC.md:247 Decision log columns
Issue: Introduces Confidence column not present in other specs.
Why: The column adds value (documents certainty level) but is a new pattern.
Fix: Keep as-is — this is a reasonable evolution of the spec format.

💡 APPROVE WITH SUGGESTIONS

Summary: This is a well-structured spec that thoughtfully phases the implementation and reuses existing infrastructure. The main issues are terminology inconsistencies from the jq→JMESPath pivot (3 Major naming issues that would confuse implementers) plus a code example bug (undefined variable). All are quick fixes — the spec design itself is sound. Recommend addressing the Major items before merging.

Discarded (2)

Location	Issue	Reason Discarded
`meta/audit-findings.md`	New meta file type with no precedent	Intentional spec evolution — audit trail is valuable
`meta/design-challenge.md`	New meta file type with no precedent	Intentional spec evolution — design rationale is valuable

Reviewers (4)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Discarded
`pr-review-architecture`	5	0	1	4	0
`pr-review-product`	5	0	0	4	1
`pr-review-standards`	5	1	0	3	1
`pr-review-consistency`	8	0	2	4	2
Total	23	1	3	6	4

Note: High overlap across reviewers on the jq→JMESPath terminology issues (all 4 flagged SENTINEL_KEY, table header, and filename).

claude · 2026-04-06T20:33:03Z

+
+| File | Change |
+|------|--------|
+| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` |


🟠 MAJOR: Inconsistent SENTINEL_KEY naming

Issue: This line says JQ: '$select' but section 5.1 (line 111) correctly defines it as SELECT: '$select'. The existing pattern uses descriptive names matching the value (ARTIFACT: '$artifact', TOOL: '$tool').

Why: Implementers following this table would add the wrong constant name, causing a mismatch with the code example in section 5.1.

Fix: (1-click apply)

Suggested change

| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` |

| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `SELECT: '$select'` to `SENTINEL_KEY` |

Refs:

SPEC.md:111 — correct definition

claude · 2026-04-06T20:33:07Z

+
+The `$select` filter operates on the resolved data. Handle different data types:
+
+| Source type | Serialization for jq | Notes |


🟠 MAJOR: Table header references jq but Phase 1 uses JMESPath

Issue: The column header says "Serialization for jq" but Phase 1 uses JMESPath, not jq. This is leftover terminology from an earlier draft.

Why: Creates confusion about what query language Phase 1 actually uses. Additionally, JMESPath operates directly on JavaScript objects, not serialized JSON strings.

Fix: (1-click apply)

Suggested change

| Source type | Serialization for jq | Notes |

| Source type | Handling for JMESPath | Notes |

Refs:

Changelog line 34 — confirms JMESPath decision

claude · 2026-04-06T20:33:10Z

+- `agents-api/src/domains/run/artifacts/ArtifactService.ts`
+- `agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts`
+- `agents-api/src/domains/run/agents/generation/tool-result.ts`
+- `agents-api/src/domains/run/utils/` (new jq-filter.ts)


🟡 Minor: Filename inconsistency — jq-filter.ts should be select-filter.ts

Issue: This line says jq-filter.ts but section 6 (line 230) correctly names it select-filter.ts.

Why: Implementers would create a file with the wrong name, inconsistent with the rest of the spec.

Fix: (1-click apply)

Suggested change

- `agents-api/src/domains/run/utils/` (new jq-filter.ts)

- `agents-api/src/domains/run/utils/` (new select-filter.ts)

Refs:

SPEC.md:230 — correct filename

claude · 2026-04-06T20:33:16Z

+function applySelector(data: any, selector: string): any {
+  try {
+    const sanitized = sanitizeJMESPathSelector(selector);
+    const result = jmespath.search(data, sanitized);
+    if (result === null || result === undefined) {
+      // Selector matched nothing — return empty rather than failing
+      return null;
+    }
+    return result;
+  } catch (error) {
+    throw new ToolChainResolutionError(
+      toolCallId,
+      `$select filter failed: ${error.message}. Expression: ${selector}`
+    );
+  }
+}


🟠 MAJOR: Undefined variable toolCallId in code example

Issue: The applySelector function signature is applySelector(data: any, selector: string) but the error handling references toolCallId which is not defined in scope.

Why: This code example would fail TypeScript compilation. Implementers would need to either add toolCallId as a parameter or restructure the error handling.

Fix: Add toolCallId as a third parameter:

function applySelector(data: any, selector: string, toolCallId: string): any { try { const sanitized = sanitizeJMESPathSelector(selector); const result = jmespath.search(data, sanitized); if (result === null || result === undefined) { return null; } return result; } catch (error) { throw new ToolChainResolutionError( toolCallId, `$select filter failed: ${error.message}. Expression: ${selector}` ); } }

Refs:

ToolChainResolutionError usage

claude · 2026-04-06T20:33:23Z

+## Challenge 7: JMESPath extract tool first — DISMISSED
+JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1.


🟡 Minor: Challenge 7 contradicts final spec decision

Issue: This says "JMESPath is demonstrably error-prone for LLMs...Using jq even in Phase 1" but the final spec uses JMESPath (confirmed by changelog line 34 and SPEC.md D8).

Why: This meta file was not updated after the final Phase 1 refinement. Future readers would be confused about the actual rationale for choosing JMESPath.

Fix: Update to reflect the actual outcome:

Suggested change

## Challenge 7: JMESPath extract tool first — DISMISSED

JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1.

## Challenge 7: JMESPath extract tool first — ACCEPTED AS PHASE 1

The existing `jmespath` library and `sanitizeJMESPathSelector()` provide sufficient capability with zero new dependencies. `_structureHints` already generates JMESPath selectors for LLMs to use, mitigating reliability concerns.

Refs:

Changelog line 34 — final JMESPath decision

claude · 2026-04-06T20:33:30Z

+- **Challenge 6 ($jq in resolveArgs)**: Adopted as Phase 1. Smallest change, zero token cost, covers pipeline case.
+- **Challenge 4/5 (token cost, always-on)**: Phase 1 has zero cost. Phase 2 injection TBD pending measurement.
+- **Challenge 1 (dependency weight)**: Phase 1 needs only a jq library, not full just-bash.
+- **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better.


🟡 Minor: Stale statement contradicts final decision

Issue: This says "Challenge 7 (JMESPath): Dismissed" but line 34 confirms JMESPath was ultimately chosen for Phase 1.

Why: The changelog shows intermediate decisions but this creates a contradictory record since the final section (lines 32-39) supersedes it.

Fix: Either remove this line or update it to reflect the final outcome:

Suggested change

- **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better.

- **Challenge 7 (JMESPath)**: Initially dismissed, later ACCEPTED for Phase 1 — see "Phase 1 refinement" section below

Refs:

Line 34 — final decision

tim-inkeep · 2026-04-06T20:42:15Z

@pullfrog, please review the spec again

claude

PR Review Summary

(0) Total Issues | Risk: Low

✅ Prior Issues Addressed

All 7 issues from the previous review have been fixed in this revision:

#	Prior Issue	Status
🟠 1	`SPEC.md:221` SENTINEL_KEY uses `JQ` instead of `SELECT`	✅ Fixed (now line 439)
🟠 2	`SPEC.md:143` Table header says "jq" but Phase 1 uses JMESPath	✅ Fixed (now line 361 says "Handling for JMESPath")
🟠 3	`SPEC.md:161-176` Undefined variable `toolCallId` in `applySelector`	✅ Fixed (line 379 now includes `toolCallId: string` as parameter)
🟡 4	`SPEC.md:328` Filename `jq-filter.ts` should be `select-filter.ts`	✅ Fixed (line 548)
🟡 5	`design-challenge.md:21-22` Challenge 7 contradicts final JMESPath decision	✅ Fixed (now says "ACCEPTED AS PHASE 1")
🟡 6	`_changelog.md:24` Stale "JMESPath dismissed" statement	✅ Fixed (explains the pivot)
🟡 7	Missing test case for acceptance criterion #8	✅ Fixed (line 598 adds `_structureHints` test case)

Delta Analysis

The revision also added significant new content:

Full Phase 2 technical design (lines 80-312) — tool interface, architecture diagram, stdin data flow, SandboxExecutorFactory integration, resource limits, error handling, observability requirements, and acceptance criteria
Compression prerequisite — added as a Phase 2 blocker (line 88)
JMESPath consolidation note (line 355) — calls out that sanitizeJMESPathSelector() is private and should be consolidated with jmespath-utils.ts
Changelog entries (lines 41-48) — documents the Phase 2 restoration

The new content is well-structured and internally consistent. No new issues identified.

💭 Consider (1) 💭

💭 1) audit-findings.md:23 Historical reference to $jq
Issue: Line 23 says "Phase 1 ($jq)" but Phase 1 now uses $select.
Why: This is in a historical audit document, so it accurately reflects the state at audit time.
Fix: No action needed — historical accuracy is acceptable here. The main SPEC.md is correct.

✅ APPROVE

Summary: All 7 issues from the prior review have been addressed. The spec is now internally consistent — no stale jq references remain in the implementation guidance, terminology is aligned throughout, and the code example bug is fixed. The expanded Phase 2 documentation is comprehensive and well-structured. Ready to merge. 🎉

Discarded (0)

No discarded findings.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator-delta-review`	1	0	1	0	0	0	0
Total	1	0	1	0	0	0	0

Note: This was a delta review scoped to changes since the prior automated review at commit 705b252. All 7 prior issues were verified as fixed.

pullfrog

Re-review: all previous issues fixed. Remaining items are terminology cleanup in Phase 1 sections and one design accuracy issue.

All 7 issues from the first review (stale jq→JMESPath terminology, undefined toolCallId, filename mismatch, missing test case) have been addressed. All 14 codebase claims verified accurate — file paths, line numbers, method signatures, dependency versions, and behavioral descriptions all match the actual code.

Two new issues found: (1) the Phase 1 serialization table and surrounding sections still reference "jq" in 6 places where they should say "JMESPath" — this is specifically in Phase 1 sections where the query language is JMESPath, not jq; (2) the serialization table itself is misleading because JMESPath operates on native JS objects, not serialized JSON strings.

The meta files (_changelog.md, design-challenge.md, audit-findings.md) also have stale $jq / "jq library" references from the pre-pivot era, but these are historical records so the inline comments focus on the normative SPEC.md sections that implementers will follow.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-04-06T20:48:24Z

+The `$select` filter operates on the resolved data. Handle different data types:
+
+| Source type | Handling for JMESPath | Notes |
+|-------------|---------------------|-------|
+| Object/Array | `JSON.stringify(data)` | Normal case — jq operates on JSON |


Phase 1 serialization table still references jq in 4 places. This table is in Section 5.3 (Phase 1 design), where the query language is JMESPath — not jq. Additionally, the table is somewhat misleading: JMESPath operates directly on JavaScript objects via jmespath.search(data, expression), not on serialized JSON strings. The JSON.stringify column makes sense for Phase 2 bash/stdin, but for Phase 1 $select, the resolved data is passed directly to jmespath.search() as a JS value.

Stale references:

Line 363: Normal case — jq operates on JSON → JMESPath

Line 366: Skip jq, return error | jq only works on text/JSON → JMESPath

Consider renaming the column header to just "Handling" (it's already correct as Handling for JMESPath) and updating the notes to reflect that JMESPath receives the native JS object, not a serialized string.

Suggested change

The `$select` filter operates on the resolved data. Handle different data types:

| Source type | Handling for JMESPath | Notes |

|-------------|---------------------|-------|

| Object/Array | `JSON.stringify(data)` | Normal case — jq operates on JSON |

| Source type | Handling for JMESPath | Notes |

|-------------|---------------------|-------|

| Object/Array | Pass directly to `jmespath.search()` | Normal case — JMESPath operates on JS objects |

| String | Pass as-is (not re-stringified) | Avoid double-quoting: `"hello"` not `"\"hello\""` |

| MCP content array | Unwrap text parts, concatenate | `{ content: [{type: 'text', text: '...'}] }` → extract text |

| Buffer/binary | Skip filter, return error | JMESPath only works on structured/text data |

pullfrog · 2026-04-06T20:48:24Z

+When `$tool`+`$artifact` ref resolves to an oversized artifact (`retrievalBlocked: true`):
+- `$select` filtering bypasses the retrieval block
+- Artifact data fetched via `ArtifactService.getArtifactFull()` with new `{ allowOversized: true }` option
+- jq filter applied, producing a subset that fits in context


Stale jq reference in Phase 1 section. This line says "jq filter applied" but this is Section 5.4 (Phase 1 Oversized Artifact Processing), where the filter is JMESPath via $select.

Suggested change

- jq filter applied, producing a subset that fits in context

- JMESPath filter applied via `$select`, producing a subset that fits in context

pullfrog · 2026-04-06T20:48:24Z

+}
+```
+
+On jq failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).


Stale jq reference in Phase 1 error handling section.

Suggested change

On jq failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).

On JMESPath failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).

pullfrog · 2026-04-06T20:48:24Z

+| Component | How it participates |
+|-----------|---------------------|
+| `resolveArgs()` | Extended (not replaced) — existing `$tool`/`$artifact` logic unchanged |
+| `ToolChainResolutionError` | Reused for jq filter errors |


pullfrog · 2026-04-06T20:48:24Z

+|---|----------|--------|------|------------|-----------|
+| D1 | Phase 1: `$select` (JMESPath) in resolveArgs; Phase 2: bash tool | LOCKED | Cross-cutting | HIGH | `$select` is smallest viable change — no new tool, no token cost, zero new deps (uses existing `jmespath`). Bash tool for exploratory cases pending validation. |
+| D2 | Queryable sources = session cache + stored artifacts | DIRECTED | Cross-cutting | HIGH | Session cache for current-turn, artifacts for cross-turn |
+| D3 | Allow processing oversized artifacts | LOCKED | Cross-cutting | HIGH | Key capability — jq filter operates out-of-context |


pullfrog · 2026-04-06T20:48:24Z

+        → stdin receives 50K JSON, jq filters to 3K (call_id: "call_bash_1")
+        → executed in sandbox, event loop free
+        → 3K result cached in ToolSessionManager, enters context
+


exampleSelectors use a result. prefix that gets stripped. Worth noting: _structureHints.exampleSelectors generate JMESPath expressions prefixed with result. (e.g., result.items[?score > 0.8]). ArtifactService.createArtifact strips this prefix before applying (line 215-217). When the LLM uses these selectors in a $select expression, the implementer needs to decide whether $select should also handle the result. prefix, or whether _structureHints guidance should clarify that $select expressions don't need the prefix. This is worth calling out explicitly to avoid confusion during implementation.

github-actions · 2026-04-06T20:50:14Z

Preview URLs

Use these stable preview aliases for testing this PR:

UI: https://pr-3036-ui.preview.inkeep.com
API: https://pr-3036-api.preview.inkeep.com
API health: https://pr-3036-api.preview.inkeep.com/health

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

UI deployment: https://agents-manage-56do4y1jk.preview.inkeep.com
API deployment: https://agents-nsdwbhsss.preview.inkeep.com

tim-inkeep · 2026-04-06T20:54:08Z

+});
+```
+
+**Available commands (most relevant):**


Do we want to limit the amount of tools initially available?

- Serialization table: jq→JMESPath, JSON.stringify→pass directly to jmespath.search() - Oversized artifact processing: "jq filter" → "JMESPath filter via $select" - Error handling: "On jq failure" → "On JMESPath failure" - Integration points: "jq filter errors" → "$select filter errors" - Decision D3: "jq filter" → "$select filter" - Added result. prefix handling note for _structureHints.exampleSelectors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude

PR Review Summary

(0) Total Issues | Risk: Low

✅ Prior Issues Addressed

All 6 issues from the pullfrog review (20:48) have been fixed in commit 0eae157:

#	Prior Issue	Status
1	`SPEC.md:363` "jq operates on JSON" → JMESPath	✅ Fixed (line 365: "JMESPath operates on JS objects")
2	`SPEC.md:366` "jq only works on text/JSON" → JMESPath	✅ Fixed (line 368: "JMESPath only works on structured/text data")
3	`SPEC.md:375` "jq filter applied" → JMESPath	✅ Fixed (line 375: "JMESPath filter applied via `$select`")
4	`SPEC.md:399` "On jq failure" → JMESPath	✅ Fixed (line 399: "On JMESPath failure")
5	`SPEC.md:459` "jq filter errors" → $select	✅ Fixed (line 459: "`$select` filter errors")
6	`SPEC.md:471` D3 "jq filter" → $select	✅ Fixed (line 471: "`$select` filter operates out-of-context")

Delta Analysis

The commit 0eae157 ("Fix remaining stale jq references in Phase 1 sections") correctly addresses all terminology inconsistencies. Additionally:

Added result. prefix handling (line 357) — documents that _structureHints.exampleSelectors generate JMESPath expressions with a result. prefix that gets stripped by ArtifactService.createArtifact(), and recommends the $select implementation auto-strip this prefix for consistency. This addresses pullfrog's design accuracy note.
Serialization table corrected — now accurately describes that JMESPath operates on native JS objects (not serialized JSON strings), fixing the conceptual inaccuracy in the prior version.

🕐 Pending Recommendations (1)

💬 SPEC.md:198 tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)

✅ APPROVE

Summary: All terminology issues have been resolved. The spec is now internally consistent — no stale jq references remain in Phase 1 sections. The result. prefix handling documentation is a valuable addition that will help implementers. One human question remains open about limiting Phase 2 bash tool commands, but this doesn't block the spec for Phase 1 implementation. 🎉

Discarded (0)

No discarded findings.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator-delta-review`	6	0	0	0	0	1	0
Total	6	0	0	0	0	1	0

Note: This was a delta review scoped to changes since the pullfrog review at commit b48ff20. All 6 prior stale jq→JMESPath issues were verified as fixed.

claude

PR Review Summary

(1) Total Issues | Risk: Low

🟠⚠️ Major (1) 🟠⚠️

Inline Comments:

🟠 Major: SPEC.md:29 Stale SandboxExecutorFactory reference contradicts updated Phase 2 execution model

💭 Consider (2) 💭

💭 1) SPEC.md:89,130 WASM bundling vs invocation distinction
Issue: The spec states "we disable the WASM-dependent optional runtimes" and "no WASM for core commands" but the evidence file (just-bash-assessment.md:47-51) notes that sql.js and quickjs-emscripten WASM modules are bundled with just-bash, even if not invoked.
Why: The distinction is between "not invoked" (what the spec means) vs "not bundled" (what readers might assume). The evidence file already acknowledges this ("We don't use python/JS/sqlite features but they're bundled").
Fix: Consider clarifying in the dependency spike prerequisite that verification should confirm WASM modules are lazy-loaded (not impacting cold start) if not invoked, or update the spec language from "no WASM" to "WASM runtimes not invoked."

💭 2) SPEC.md:172-179 Concurrency limits removed without explicit rationale
Issue: The delta removed acceptance criterion #6 "Concurrent bash calls bounded by semaphore" without documenting why concurrency limits are no longer needed for in-process execution.
Why: The prior SandboxExecutorFactory approach used semaphore-based concurrency limiting (NativeSandboxExecutor.ts:70-129). In-process execution doesn't need this pattern, but the rationale should be documented.
Fix: Add a brief note in the Resource Limits section explaining that concurrency is implicitly bounded by Node.js single-threaded execution model (one request at a time in serverless, event loop for others).

🕐 Pending Recommendations (1)

💬 SPEC.md:191 tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)

💡 APPROVE WITH SUGGESTIONS

Summary: The Phase 2 architecture simplification (SandboxExecutorFactory → in-process just-bash) is a well-reasoned change that removes unnecessary complexity. However, one reference at line 29 (in the Problem Statement section) wasn't updated and now contradicts the new execution model described in the rest of the spec. Quick fix — update line 29 to match the Architecture section. The design considerations (WASM bundling, concurrency model) are valid nuances to address before Phase 2 implementation but don't block this spec revision.

Discarded (2)

Location	Issue	Reason Discarded
`SPEC.md:130`	Security model assumes LLM-generated commands are safe	Valid design consideration but already gated behind Phase 2 prerequisites — the dependency spike + usage data will inform this decision before implementation
`evidence/just-bash-assessment.md`	WASM bundling risk	Already acknowledged in evidence file ("they're bundled") — not a new finding

Reviewers (2)

Reviewer	Returned	Consider	Inline Comments	Discarded
`pr-review-architecture`	3	2	0	1
`pr-review-standards`	1	0	1	0
Total	4	2	1	1

Note: Delta review scoped to changes since commit 0eae157. All prior issues (jq→JMESPath terminology, etc.) were verified as fixed in that commit.

claude · 2026-04-07T15:53:18Z

+
+**Phase 1 — `$select` selector in `$tool` refs:** Enhance `resolveArgs()` to support inline JMESPath filtering at resolution time. E.g., `{ "$tool": "call_id", "$select": "items[?score > `0.8`].{title: title, url: url}" }`. This filters data before it reaches the downstream tool — no extra tool call, no token cost in system prompts, **zero new dependencies** (uses the existing `jmespath` library already in the codebase). `_structureHints` already generates example JMESPath selectors that the LLM can use directly. Covers the pipeline case (filtering data between tools).
+
+**Phase 2 — Built-in `bash` tool:** A sandboxed shell (via `just-bash`) for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes via the existing `SandboxExecutorFactory` (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). Covers the interactive case. Requires dependency validation spike first.


🟠 MAJOR: Stale SandboxExecutorFactory reference contradicts updated Phase 2 execution model

Issue: This line still describes Phase 2 execution via SandboxExecutorFactory (NativeSandboxExecutor/VercelSandboxExecutor), but the rest of the Phase 2 section (lines 116-168) was updated in this commit to describe direct in-process execution via just-bash with no SandboxExecutorFactory.

Why: Implementers following the spec will see contradictory execution models — the overview says "SandboxExecutorFactory" while the Architecture section says "no child processes, no IPC, no sandbox wrappers needed." Decision D6 (line 465) and acceptance criterion #5 (line 298) both now specify in-process execution.

Fix: Update to match the new execution model:

Suggested change

**Phase 2 — Built-in `bash` tool:** A sandboxed shell (via `just-bash`) for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes via the existing `SandboxExecutorFactory` (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). Covers the interactive case. Requires dependency validation spike first.

**Phase 2 — Built-in `bash` tool:** A sandboxed bash tool powered by `just-bash` for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes directly in-process (no external sandbox needed — just-bash is a virtual bash environment with in-memory FS and no network). Covers the interactive case. Requires dependency validation spike first.

Refs:

Line 116-168 — updated Architecture section

Line 465 — Decision D6

claude

PR Review Summary

(6) Total Issues | Risk: Medium

This is a delta review scoped to changes since commit 51e7697bf201 — the implementation of $select filtering for tool chaining (18 files, +1040/-140).

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

🟠 Major: select-filter.ts:27 Duplicate DANGEROUS_PATTERNS creates security drift risk
🟠 Major: function-tools.ts:94 Silent schema conversion failure hides validation degradation
🟠 Major: mcp-tools.ts:33 Silent schema conversion failure in MCP tools

🟡 Minor (3) 🟡

🟡 1) instrumentation.ts:97 SpanProcessor mutates ReadableSpan attributes

Issue: ToolResultSanitizer.onEnd() mutates span.attributes directly via cast (span as any).attributes. ReadableSpan attributes are intended to be immutable after the span ends.

Why: This mutation may not persist depending on processor ordering relative to export. The code works because of the current processor order (BaggageSpanProcessor → ToolResultSanitizer → BatchSpanProcessor), but this is fragile.

Fix: Consider documenting this is a known workaround that depends on processor ordering, or strip internal fields earlier in the pipeline before recording the result.

Refs:

instrumentation.ts:83-101

Inline Comments:

🟡 Minor: select-filter.test.ts:29 Missing tests for stripInternalFields utility
🟡 Minor: resolveArgs-select.test.ts:101 Missing test for $select on retrievalBlocked artifacts

💭 Consider (4) 💭

💭 1) select-filter.ts:16-17 Cache eviction strategy
Issue: The selectorCache Map has a 1000 entry limit but no eviction policy — once full, new selectors get no caching benefit.
Why: Cache effectiveness degrades over time as early selectors are cached forever while later ones are not.
Fix: Consider LRU cache if this becomes a hot path. Current limit prevents unbounded growth, so acceptable for now.

💭 2) select-filter.ts:126-130 Data shape exposure in error messages
Issue: summarizeShape output in error messages could expose internal field names to LLM output.
Why: The function does filter _-prefixed keys, mitigating the primary concern. Consider whether exposing any field names is acceptable for your threat model.

💭 3) agent-types.ts:244-252 Duplicated validation result type
Issue: The safeParse return type is inlined twice (for parameters and baseInputSchema).
Fix: Extract to shared type alias for maintainability.

💭 4) ref-aware-schema.ts:36-75 PARAM_REF_SCHEMA typed as Record<string, unknown>
Issue: Loses structure validation at compile time.
Fix: Consider defining a JsonSchema interface, though this is acceptable since JSON Schema manipulation is inherently weakly typed.

🕐 Pending Recommendations (1)

💬 SPEC.md:191 tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)

💡 APPROVE WITH SUGGESTIONS

Summary: The $select implementation is well-designed — it reuses existing JMESPath infrastructure, has good security checks (dangerous patterns, length limits), comprehensive test coverage for the core filtering logic, and clear error messages. The main concerns are: (1) duplicate DANGEROUS_PATTERNS creating security drift risk with agents-core, and (2) silent schema conversion failures that hide validation degradation. Both are addressable with straightforward changes. The test coverage gaps (stripInternalFields, retrievalBlocked artifacts) are worth adding but not blocking.

Discarded (3)

Location	Issue	Reason Discarded
`instrumentation.ts:81`	SpanProcessor.onStart signature uses ReadableSpan	Technically correct concern but onStart is empty and parameter is unused — no practical impact
`ref-aware-schema.ts:1`	Minimal zod usage	Valid but extremely minor — import is used, just not extensively
`PromptConfig.ts:11-18`	Templates inlined vs file imports	This is the existing pattern in the file, not a regression from this PR

Reviewers (6)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Discarded
`pr-review-standards`	3	0	1	0	2
`pr-review-appsec`	3	1	2	0	0
`pr-review-errors`	3	2	0	0	1
`pr-review-tests`	7	0	0	2	5
`pr-review-consistency`	5	1	0	0	4
`pr-review-types`	2	0	2	0	0
Total	23	4	5	2	12

Note: High overlap across reviewers on the duplicate DANGEROUS_PATTERNS issue and silent schema conversion failures.

claude · 2026-04-07T20:52:32Z

+  /constructor/,
+  /prototype/,
+  /__proto__/,
+];


🟠 MAJOR: Duplicate DANGEROUS_PATTERNS creates security drift risk

Issue: This DANGEROUS_PATTERNS array duplicates the identical list in packages/agents-core/src/utils/jmespath-utils.ts (lines 191-198). The spec explicitly notes at line 346 to consolidate JMESPath utilities to avoid divergent implementations.

Why: If new dangerous patterns are identified, both locations must be updated. Security control drift is a real risk when validation logic is duplicated.

Fix: Import from the existing shared location:

import { DANGEROUS_PATTERNS, MAX_EXPRESSION_LENGTH } from '@inkeep/agents-core/utils/jmespath-utils';

Refs:

agents-core/jmespath-utils.ts:191-198

claude · 2026-04-07T20:52:39Z

+          baseInputSchema = makeBaseInputSchema(functionData.inputSchema);
+        } catch {
+          // baseInputSchema stays undefined — post-resolution validation will be skipped
+        }


🟠 MAJOR: Silent schema conversion failure hides validation degradation

Issue: When makeBaseInputSchema() fails, baseInputSchema stays undefined and post-resolution validation is silently skipped. If an LLM's $select resolves to the wrong type (object instead of string), the tool executes with wrong-typed data causing confusing errors deep in tool execution rather than a clear validation message.

Why: Operators have zero visibility into which tools are running with degraded type-safety. The comment "post-resolution validation will be skipped" documents the behavior but doesn't surface it operationally.

Fix: Log a warning when schema conversion fails:

try { baseInputSchema = makeBaseInputSchema(functionData.inputSchema); } catch (schemaError) { logger.warn( { functionToolName: functionToolDef.name, schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError), }, 'Failed to build base input schema; post-resolution validation will be skipped' ); }

Refs:

tool-wrapper.ts:175-193 — validation code that gets skipped

claude · 2026-04-07T20:52:45Z

+      refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>,
+      baseInputSchema: undefined,
+    };
+  }


🟠 MAJOR: Silent schema conversion failure in MCP tools

Issue: Same issue as function-tools.ts — when buildRefAwareInputSchema fails, baseInputSchema is undefined and validation is silently disabled. MCP tools often have complex schemas, and failures here mean agents lose type-safety for tool chaining without any operational visibility.

Why: A user who writes $select: 'items[0]' expecting a string but getting an object will see cryptic MCP tool errors rather than a clear "resolved to wrong type" message.

Fix: Add warning logging in the catch block:

} catch (error) { logger.warn( { schemaError: error instanceof Error ? error.message : String(error) }, 'Failed to build ref-aware schema for MCP tool; post-resolution validation disabled' ); return { refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>, baseInputSchema: undefined, }; }

claude · 2026-04-07T20:52:52Z

+  it('should return valid expressions unchanged', () => {
+    expect(sanitizeJMESPathSelector('items[?score > `0.8`]')).toBe('items[?score > `0.8`]');
+  });
+});


🟡 Minor: Missing tests for stripInternalFields utility

Issue: The stripInternalFields function exported from select-filter.ts (lines 9-14) is not tested. This function filters out underscore-prefixed keys before exposing results to downstream tools and traces.

Why: If this function breaks, tool results would incorrectly include internal metadata (_structureHints, _toolCallId), potentially causing schema validation failures or exposing internal implementation details.

Fix: Add a test section:

describe('stripInternalFields', () => { it('should remove underscore-prefixed keys from object', () => { const data = { name: 'test', _structureHints: {}, _toolCallId: 'call_1' }; expect(stripInternalFields(data)).toEqual({ name: 'test' }); }); it('should return arrays unchanged', () => { expect(stripInternalFields([1, 2, 3])).toEqual([1, 2, 3]); }); it('should return primitives unchanged', () => { expect(stripInternalFields('hello')).toBe('hello'); expect(stripInternalFields(null)).toBe(null); }); });

claude · 2026-04-07T20:52:59Z

+
+      expect(result).toBe(3);
+    });
+  });


🟡 Minor: Missing test for $select on retrievalBlocked artifacts

Issue: The spec states that oversized artifacts with retrievalBlocked: true can be filtered via $select to bypass the retrieval block (SPEC.md line 357-360). No test verifies this critical behavior.

Why: If the system fails to allow $select filtering on blocked artifacts, large tool results would be completely inaccessible even when the agent only needs a small subset — defeating a primary use case.

Fix: Add a test case:

it('should allow $select to extract from retrievalBlocked artifact', async () => { const parser = createParser({ getArtifactFull: vi.fn().mockResolvedValue({ data: { items: [{ id: 1 }, { id: 2 }] }, metadata: { retrievalBlocked: true, isOversized: true } }), }); const result = await parser.resolveArgs({ $artifact: 'art_oversized', $tool: 'call_search', $select: 'items[0].id', }); expect(result).toBe(1); });

itoqa · 2026-04-07T22:06:04Z

Ito Test Report ✅

11 test cases ran. 1 additional finding, 10 passed.

Across 11 test cases, 10 passed and 1 failed, showing that most run-domain and Playground behaviors are functioning as expected, including route parity between /run/api/chat and /run/v1/chat/completions, correct $tool/$artifact/$select resolution (with nested and no-$select compatibility), type-mismatch rejection with actionable guidance, internal field sanitization, mobile viewport usability, and selector-injection guard enforcement. The key issue is one high-severity, reproducible concurrency defect where rapid repeated submits can generate colliding millisecond-based request IDs (chatcmpl-${Date.now()}), causing request-scoped stream/session state overwrites and resulting in cross-run corruption, dropped or misaligned streams, and assistant error states under burst traffic.

✅ Passed (10)

Category	Summary	Screenshot
Adversarial	Selector injection payloads are blocked by runtime guards; local security tests passed for `${evil}` and `eval()` patterns.
Edge	Recursive nested refs resolve independently as expected; prior runtime failure was not reproducible as a production-code defect.
Logic	Verified `$tool + $select` applies filtering before downstream tool execution.
Logic	Verified `$artifact + $tool + $select` resolves and filters saved artifact data correctly.
Logic	Verified backward-compatible passthrough when `$select` is omitted.
Logic	Type-mismatch handling for post-resolution `$select` inputs is implemented and validated, including actionable drill-deeper guidance.	N/A
Logic	Internal underscore-prefixed fields are stripped from surfaced tool_result payloads; prior blockage was environmental rather than a product sanitization defect.
Mobile	On 390x844 viewport, Playground controls stayed visible/usable and error messaging remained readable in-view during prompt submissions.
Happy-path	Verified `/run/api/chat` chaining behavior; prior blockage was fixture setup, not app logic.
Happy-path	Verified `/run/v1/chat/completions` chaining parity; no route-level defect identified.

ℹ️ Additional Findings (1)

These findings are unrelated to the current changes but were observed during testing.

Category	Summary	Screenshot
Edge	⚠️ Rapid submits can collide request IDs and corrupt run state

⚠️ Rapid repeated submits do not corrupt chaining resolution

What failed: Distinct rapid submits can share the same request ID, so stream/session registration can overwrite prior in-flight state instead of staying isolated per request.
Impact: Concurrent users can see cross-run corruption, dropped/misaligned stream state, or assistant error states during burst submissions. This undermines chaining reliability in high-interaction scenarios.
Steps to reproduce:
1. Open the agent playground chat for a configured tool-chaining agent.
2. Submit the same chaining prompt three times in rapid succession (double submit plus Enter spam).
3. Observe that request-scoped stream/session behavior can become corrupted instead of staying isolated per run.
Stub / mock context: No stubs, mocks, or bypasses were applied for this test in the recorded run.
Code analysis: I inspected run request ID creation and downstream request-scoped registries in the run domain. IDs are generated with millisecond time only, then reused as the key for stream helper and session registration, which creates a collision path when multiple submits land in the same millisecond.
Why this is likely a bug: Request IDs are not guaranteed unique under rapid concurrency, but they are treated as unique keys for mutable run state, so collisions can deterministically overwrite active request associations.

Relevant code:

agents-api/src/domains/run/routes/chat.ts (lines 332-337)

const requestId = `chatcmpl-${Date.now()}`;
const timestamp = Math.floor(Date.now() / 1000);

const lastUserMessage = body.messages
  .filter((msg: Message) => msg.role === 'user')
  .slice(-1)[0];

agents-api/src/domains/run/stream/stream-registry.ts (lines 19-23)

/**
 * Register a StreamHelper for a specific request ID
 */
export function registerStreamHelper(requestId: string, streamHelper: StreamHelper): void {
  getRegistry().set(requestId, streamHelper);
}

agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts (lines 474-478)

registerStreamHelper(requestId, sseHelper);
agentSessionManager.createSession(requestId, executionContext, conversationId);
if (emitOperations) {
  agentSessionManager.enableEmitOperations(requestId);
}

Commit: e223b3a

View Full Run

Tell us how we did: Give Ito Feedback

speced out bash tool for tool chaining and interaction

705b252

pullfrog bot reviewed Apr 6, 2026

View reviewed changes

claude bot reviewed Apr 6, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Apr 6, 2026

tim-inkeep and others added 2 commits April 6, 2026 16:41

updated tool chaining specs

38b5101

Merge branch 'main' into feature/just-bash-spec-v0

b48ff20

vercel bot deployed to Preview – agents-manage-ui April 6, 2026 20:44 View deployment

vercel bot deployed to Preview – agents-docs April 6, 2026 20:44 View deployment

claude bot reviewed Apr 6, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Apr 6, 2026

vercel bot deployed to Preview – agents-api April 6, 2026 20:45 View deployment

tim-inkeep changed the title ~~feat: add $select filtering to $tool/$artifact refs for tool chaining~~ spec: data query & transform for tool chaining ($select + bash tool) Apr 6, 2026

pullfrog bot reviewed Apr 6, 2026

View reviewed changes

tim-inkeep changed the title ~~spec: data query & transform for tool chaining ($select + bash tool)~~ spec: $select filtering for tool refs + bash tool for data exploration Apr 6, 2026

tim-inkeep commented Apr 6, 2026

View reviewed changes

vercel bot deployed to Preview – agents-manage-ui April 6, 2026 20:56 View deployment

vercel bot deployed to Preview – agents-docs April 6, 2026 20:57 View deployment

claude bot reviewed Apr 6, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Apr 6, 2026

vercel bot deployed to Preview – agents-api April 6, 2026 20:59 View deployment

updated phase 2 spec

51e7697

vercel bot deployed to Preview – agents-manage-ui April 7, 2026 15:50 View deployment

vercel bot deployed to Preview – agents-docs April 7, 2026 15:51 View deployment

vercel bot deployed to Preview – agents-api April 7, 2026 15:51 View deployment

claude bot reviewed Apr 7, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Apr 7, 2026

updated tool wrapping to allow a selector for tool chaining

e223b3a

vercel bot deployed to Preview – agents-manage-ui April 7, 2026 20:49 View deployment

vercel bot deployed to Preview – agents-api April 7, 2026 20:51 View deployment

vercel bot deployed to Preview – agents-docs April 7, 2026 20:51 View deployment

claude bot reviewed Apr 7, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Apr 7, 2026

tim-inkeep closed this Apr 14, 2026


		The `$select` filter operates on the resolved data. Handle different data types:

		\| Source type \| Serialization for jq \| Notes \|

	\| `agents-api/src/domains/run/constants/artifact-syntax.ts` \| Add `JQ: '$select'` to `SENTINEL_KEY` \|
	\| `agents-api/src/domains/run/constants/artifact-syntax.ts` \| Add `SELECT: '$select'` to `SENTINEL_KEY` \|

	\| Source type \| Serialization for jq \| Notes \|
	\| Source type \| Handling for JMESPath \| Notes \|

	- `agents-api/src/domains/run/utils/` (new jq-filter.ts)
	- `agents-api/src/domains/run/utils/` (new select-filter.ts)

		## Challenge 7: JMESPath extract tool first — DISMISSED
		JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1.

	- Challenge 7 (JMESPath): Dismissed — LLMs unreliable with JMESPath, jq better.
	- Challenge 7 (JMESPath): Initially dismissed, later ACCEPTED for Phase 1 — see "Phase 1 refinement" section below

	- jq filter applied, producing a subset that fits in context
	- JMESPath filter applied via `$select`, producing a subset that fits in context

	On jq failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).
	On JMESPath failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).

	\| `ToolChainResolutionError` \| Reused for jq filter errors \|
	\| `ToolChainResolutionError` \| Reused for `$select` filter errors \|

	\| D3 \| Allow processing oversized artifacts \| LOCKED \| Cross-cutting \| HIGH \| Key capability — jq filter operates out-of-context \|
	\| D3 \| Allow processing oversized artifacts \| LOCKED \| Cross-cutting \| HIGH \| Key capability — `$select` filter operates out-of-context \|


		Phase 1 — `$select` selector in `$tool` refs: Enhance `resolveArgs()` to support inline JMESPath filtering at resolution time. E.g., `{ "$tool": "call_id", "$select": "items[?score > `0.8`].{title: title, url: url}" }`. This filters data before it reaches the downstream tool — no extra tool call, no token cost in system prompts, zero new dependencies (uses the existing `jmespath` library already in the codebase). `_structureHints` already generates example JMESPath selectors that the LLM can use directly. Covers the pipeline case (filtering data between tools).

		Phase 2 — Built-in `bash` tool: A sandboxed shell (via `just-bash`) for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes via the existing `SandboxExecutorFactory` (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). Covers the interactive case. Requires dependency validation spike first.

Conversation

tim-inkeep commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Phase 1: $select in $tool refs

Phase 2: Built-in bash tool (future — fully designed)

Before advancing to Phase 2

Spec artifacts

Audit highlights

Uh oh!

vercel bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pullfrog bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key changes

$select filtering in resolveArgs()

Ref-aware tool schemas

Prompt guidance overhaul and internal field stripping

Phase 2 spec: in-process bash tool (future)

Uh oh!

pullfrog bot left a comment

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (3) 🟠⚠️

🟡 Minor (4) 🟡

💭 Consider (3) 💭

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

tim-inkeep commented Apr 6, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

tim-inkeep commented Apr 6, 2026 •

edited

Loading

Phase 1: `$select` in `$tool` refs

Phase 2: Built-in `bash` tool (future — fully designed)

vercel bot commented Apr 6, 2026 •

edited

Loading

changeset-bot bot commented Apr 6, 2026 •

edited

Loading

pullfrog bot commented Apr 6, 2026 •

edited

Loading

`$select` filtering in `resolveArgs()`

github-actions bot commented Apr 6, 2026 •

edited

Loading