Skip to content

spec: $select filtering for tool refs + bash tool for data exploration#3036

Closed
tim-inkeep wants to merge 6 commits intomainfrom
feature/just-bash-spec-v0
Closed

spec: $select filtering for tool refs + bash tool for data exploration#3036
tim-inkeep wants to merge 6 commits intomainfrom
feature/just-bash-spec-v0

Conversation

@tim-inkeep
Copy link
Copy Markdown
Contributor

@tim-inkeep tim-inkeep commented Apr 6, 2026

Summary

  • Spec for two-phase approach to data filtering in the tool chaining system
  • Phase 1 (In Scope): Add $select sentinel key to $tool/$artifact refs — inline JMESPath filtering at resolveArgs() resolution time, zero new dependencies, zero token cost
  • Phase 2 (Explored, Future Work): Built-in bash tool via just-bash for interactive data exploration (jq, grep, awk), executed via existing SandboxExecutorFactory

Problem

The tool chaining system ($tool/$artifact refs) lets agents pipe data between tools without it entering context. The prompt guidance teaches a pipeline pattern (tool_a → extract → tool_b), but the extract step doesn't exist as a real capability. When tool results are large, they either:

  1. Enter context and trigger compression (extra LLM call + detail loss)
  2. Get truncated at 100K chars (silent data loss)
  3. Get blocked entirely as oversized artifacts (>30% of context window)

The system already provides _structureHints with ready-to-use JMESPath selectors showing the LLM what to extract — but no mechanism to do the extraction without pulling data into context.

Phase 1: $select in $tool refs

Add a $select key to the sentinel ref system. During resolveArgs() (which already runs before every tool execution), if $select is present alongside $tool, the resolved data is filtered through JMESPath before being passed to the tool.

Zero new dependencies — uses the existing jmespath library and sanitizeJMESPathSelector() already in the codebase. Consolidates with existing jmespath-utils.ts public utilities.

Zero token cost — no new tool schema in system prompts. The filtering happens transparently during argument resolution.

Example flow:

search_tool({ query: "auth" })           → 50K result (call_id: "call_search")

render_card({
  data: {
    "$tool": "call_search",
    "$select": "items[?score > `0.8`].{title: title, url: url}"
  }
})                                        → render_card receives 3K filtered subset

The 50K result never enters conversation context. The LLM uses _structureHints.exampleSelectors (which are already JMESPath) to write the $select expression.

Oversized artifacts (retrievalBlocked: true) can also be filtered via $select, bypassing the retrieval block since the filter produces a manageable subset.

Key design choices:

  • $select key is language-agnostic — can upgrade from JMESPath to jq later without changing the API contract
  • Reuses sanitizeJMESPathSelector() to auto-fix common LLM JMESPath errors
  • Non-JSON source data handled correctly (strings not double-quoted, MCP content arrays unwrapped)

Phase 2: Built-in bash tool (future — fully designed)

A sandboxed bash tool powered by just-bash for interactive data exploration — when the agent needs to see filtered results before deciding what to do next (vs $select which pipes directly to another tool).

Full design in spec includes: tool interface + schema, architecture (main thread → SandboxExecutorFactory → sandbox process), stdin data flow, resource limits, just-bash configuration, pipeline examples, error handling, prompt guidance (when to use bash vs $select), OpenTelemetry observability, integration points, and acceptance criteria.

Execution via SandboxExecutorFactory — reuses the existing dual-path sandbox infrastructure (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). No new process management infrastructure needed.

Before advancing to Phase 2

We will likely revisit and change conversation and generation compression triggers before implementing the bash tool. The current compression thresholds and mid-generation compression behavior should be re-evaluated in light of $select reducing context pressure — the existing triggers may fire too aggressively or need different strategies once agents can proactively filter data.

Phase 2 prerequisites:

  1. Compression trigger re-evaluation (thresholds, mid-generation behavior)
  2. just-bash dependency spike (install, build, WASM cold start timing)
  3. SandboxExecutorFactory compatibility verification (both Native and Vercel paths)
  4. Token cost measurement (~300-350 tokens per system prompt — always-on vs conditional?)
  5. Phase 1 usage data (does $select cover enough cases?)

Spec artifacts

File Purpose
specs/2026-04-06-bash-tool-for-tool-chaining/SPEC.md Full spec (Phase 1 + Phase 2)
specs/.../evidence/tool-chaining-current-state.md Current $tool/$artifact ref system analysis
specs/.../evidence/compression-cost-analysis.md Compression triggers, thresholds, costs
specs/.../evidence/just-bash-assessment.md Library evaluation (API, deps, security)
specs/.../evidence/sandbox-executor-pattern.md NativeSandboxExecutor pattern analysis
specs/.../meta/audit-findings.md Cold-reader audit (3 HIGH, 6 MEDIUM — all resolved)
specs/.../meta/design-challenge.md Design challenges (7 challenges — led to phased approach)

Audit highlights

Spec was audited by two independent cold-reader agents. Key findings that shaped the design:

  • H1: Child process pools incompatible with Vercel serverless → switched to SandboxExecutorFactory
  • H2: wrapToolWithStreaming does NOT auto-cache results → bash tool must call recordToolResult() explicitly
  • Challenge 6: $select in resolveArgs proposed as lighter alternative → adopted as Phase 1
  • Challenge 4/5: Token cost + always-on premature → deferred to Phase 2 measurement

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Apr 7, 2026 8:51pm
agents-docs Ready Ready Preview, Comment Apr 7, 2026 8:51pm
agents-manage-ui Ready Ready Preview, Comment Apr 7, 2026 8:51pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 6, 2026

🦋 Changeset detected

Latest commit: e223b3a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-core Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog bot commented Apr 6, 2026

TL;DR — Implements Phase 1 of the $select filtering spec: tool arguments containing { "$tool": "call_id", "$select": "jmespath" } now filter data at resolveArgs() resolution time, so large tool results never enter context. Also rewrites tool schemas to be ref-aware (every parameter advertises tool-chaining syntax to the LLM), overhauls prompt guidance to be shorter and $select-centric, and strips internal fields (_structureHints, _toolCallId) from traces and stored results. The spec for Phase 2 (bash tool via just-bash) remains as future work.

Key changes

  • Add $select JMESPath filtering in ArtifactParser.resolveArgs() — when a sentinel ref includes $select, the resolved data is filtered through JMESPath before reaching the downstream tool; works for both $tool and $artifact + $tool refs
  • New select-filter.ts utilityapplySelector() with result. prefix auto-stripping, sanitization via sanitizeJMESPathSelector(), length/injection validation, and descriptive ToolChainResolutionError messages including data shape summaries
  • New ref-aware-schema.ts for tool schema transformationmakeRefAwareJsonSchema() wraps every property in a tool's JSON schema with an anyOf branch accepting $tool/$artifact/$select ref objects, so the LLM sees tool-chaining syntax directly in parameter descriptions
  • Apply ref-aware schemas to function tools and MCP tools — both function-tools.ts and mcp-tools.ts now build ref-aware input schemas with a baseInputSchema fallback for post-resolution validation
  • Rewrite PromptConfig.ts tool chaining guidance — replaces 70+ lines of verbose extract-chain instructions with a compact $select-centric reference (syntax, rules, examples, JMESPath patterns); inlines XML templates as string constants
  • Add toolChainingGuidance to _structureHints — every tool result now includes ready-to-use $tool/$select reference syntax, sequencing rules, and forbidden patterns directly in the hints
  • Strip internal fields from traces and tool results — new stripInternalFields() removes _-prefixed keys from stored results (ArtifactService.getToolResultFull()), streamed tool events, and a ToolResultSanitizer span processor cleans ai.toolCall.result trace attributes
  • Add $select to SENTINEL_KEY constants — extends artifact-syntax.ts with SELECT: '$select'
  • Improved post-resolution validation errorstool-wrapper.ts now reports type mismatches with field-by-field details and suggests drilling deeper in $select paths
  • Update get_reference_artifact description — discourages using it for inter-tool data flow, pointing to tool-chaining instead
  • Comprehensive test coverage — new test suites for applySelector, sanitizeJMESPathSelector, resolveArgs with $select, makeRefAwareJsonSchema, and makeBaseInputSchema
  • Updated spec with implementation details — SPEC.md revised to reflect the shipped Phase 1 design and simplified Phase 2 architecture

Summary | 26 files | 7 commits | base: mainfeature/just-bash-spec-v0


$select filtering in resolveArgs()

Before: Tool chaining could only pass full tool results — no way to filter data between tools without pulling it into context, triggering compression or truncation.
After: { "$tool": "call_id", "$select": "items[?score > 0.8]" }filters data at resolution time via JMESPath. Works for both ephemeral$toolrefs and persisted$artifact` refs.

The implementation adds a $select branch in ArtifactParser.resolveArgs() for both the $artifact + $tool path and the $tool-only path. When $select is present, the $tool-only path uses getToolResultFull() (pre-MCP-unwrap) instead of getToolResultRaw() to ensure filtering operates on the complete structured data. The new applySelector() function handles result. prefix stripping (matching _structureHints.exampleSelectors format), sanitizes common LLM JMESPath errors, validates against injection patterns and length limits, and throws ToolChainResolutionError with a data shape summary on mismatch.

How does post-resolution type validation work?

Tools now carry a baseInputSchema (the original JSON schema without ref-aware wrappers) alongside the ref-aware parameters schema. After resolveArgs() resolves refs and applies $select, tool-wrapper.ts validates the resolved args against baseInputSchema. If the resolved type doesn't match (e.g., $select returns an object where a string is expected), the error message includes per-field type info and suggests drilling deeper in the $select path.

ArtifactParser.ts · select-filter.ts · tool-wrapper.ts · artifact-syntax.ts


Ref-aware tool schemas

Before: Tool parameters accepted only literal values in their JSON schema; the LLM had to learn tool-chaining syntax from prompt guidance alone.
After: Every tool parameter's schema includes an anyOf branch with $tool/$artifact/$select ref objects, with rich descriptions explaining when and how to use each.

makeRefAwareJsonSchema() recursively walks a tool's JSON schema and wraps each value-position node with withToolRef(), which adds an anyOf containing PARAM_REF_SCHEMA — two object variants (artifact-ref and tool-ref) with $select as an optional string property. The root schema gets a TOOL_CHAINING_SCHEMA_DESCRIPTIONS.ROOT description. Descriptions are defined as shared constants in TOOL_CHAINING_SCHEMA_DESCRIPTIONS for consistency across all tools.

ref-aware-schema.ts · function-tools.ts · mcp-tools.ts


Prompt guidance overhaul and internal field stripping

Before: Tool chaining prompt guidance was 70+ lines with verbose extract-chain examples; _structureHints and _toolCallId leaked into traces and stored results.
After: Compact $select-centric guidance (~30 lines) with syntax reference, JMESPath patterns, and concrete examples. Internal _-prefixed fields stripped from all output surfaces.

The PromptConfig.ts changes consolidate guidance around three primitives: full passthrough ($tool), filtered passthrough ($tool + $select), and artifact passthrough ($artifact + $tool). The XML templates are inlined as string constants (removing the ?raw template imports). _structureHints now includes a toolChainingGuidance block with ready-to-use reference syntax per tool result. A new ToolResultSanitizer span processor strips _-prefixed fields from ai.toolCall.result trace attributes, and stripInternalFields() cleans stored results via ArtifactService.getToolResultFull() and streamed tool events.

PromptConfig.ts · tool-result.ts · instrumentation.ts · ArtifactService.ts


Phase 2 spec: in-process bash tool (future)

Before: Original spec proposed bash commands through SandboxExecutorFactory with child process pools.
After: Spec revised to use just-bash as a pure TypeScript AST interpreter running in-process — no sandbox, no child processes, no WASM for core commands. Prerequisites reduced from 5 to 4.

The spec documents the bash tool interface, just-bash architecture (in-memory FS, disabled network, TypeScript-reimplemented commands), stdin data flow model, resource limits (AbortSignal.timeout(30_000), maxCallDepth: 50, 1MB stdout truncation), and explicit recordToolResult() caching. This remains future work pending Phase 1 usage data and compression trigger re-evaluation.

SPEC.md

Pullfrog  | View workflow run | Triggered by Pullfrog | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec review — solid design, several stale jq references to clean up.

The Phase 1 $select design is well-reasoned: zero new dependencies, zero token cost, leverages existing jmespath + sanitizeJMESPathSelector + _structureHints.exampleSelectors. All codebase claims verified accurate (line numbers, file paths, dependency versions, method signatures). The phased approach with clear Phase 2 prerequisites is good discipline.

The main issue is incomplete find-and-replace from the jq → JMESPath pivot. Several sections still reference "jq" where they should say "JMESPath", and one file naming inconsistency (jq-filter.ts vs select-filter.ts). These are all in the spec text — not blocking, but should be cleaned up before this serves as the implementation guide.

One design note outside the diff: sanitizeJMESPathSelector() is currently a private method on ArtifactService (line 848). The spec correctly plans a new select-filter.ts utility to wrap it, but should explicitly call out that the private method needs to be extracted or made accessible — otherwise the implementer may miss this.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Claude Opus𝕏


The `$select` filter operates on the resolved data. Handle different data types:

| Source type | Serialization for jq | Notes |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table header says "Serialization for jq" but Phase 1 uses JMESPath, not jq. Should be "Serialization for JMESPath" (or just "Serialization").


| File | Change |
|------|--------|
| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modified files table says Add JQ: '$select' to SENTINEL_KEY but the code example in section 5.1 (line 111) correctly uses SELECT: '$select'. This should be Add SELECT: '$select' to SENTINEL_KEY.

- `agents-api/src/domains/run/artifacts/ArtifactService.ts`
- `agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts`
- `agents-api/src/domains/run/agents/generation/tool-result.ts`
- `agents-api/src/domains/run/utils/` (new jq-filter.ts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent Constraints lists the new file as jq-filter.ts but section 6 (New Files table, line 230) calls it select-filter.ts. These should match — select-filter.ts is the correct name given the $select key naming.

- Child process infrastructure — Phase 2

**STOP_IF:**
- jq library spike shows no viable option for full jq syntax in JS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both STOP_IF conditions reference jq:

  • "jq library spike shows no viable option" — Phase 1 uses the existing jmespath library, no spike needed
  • "jq adds >50ms" — should say JMESPath

These read like Phase 2 stop conditions that got left in the Phase 1 section.

}
return result;
} catch (error) {
throw new ToolChainResolutionError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applySelector(data, selector) references toolCallId in the throw statement (line 172) but toolCallId is not a parameter of this function. Either the function signature should include it, or the error should be thrown by the caller (resolveArgs) which has access to toolCallId.

This became the Phase 1 design. Smallest change, highest value, zero token cost, no new dependency beyond a jq library.

## Challenge 7: JMESPath extract tool first — DISMISSED
JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Challenge 7 says "JMESPath is demonstrably error-prone for LLMs" and "Using jq even in Phase 1" — this directly contradicts the final SPEC.md which chose JMESPath for Phase 1 precisely because it's already in the codebase. This finding was written before the JMESPath pivot but was never updated. Should note that this challenge was later superseded by the zero-dependency argument.

- **Challenge 6 ($jq in resolveArgs)**: Adopted as Phase 1. Smallest change, zero token cost, covers pipeline case.
- **Challenge 4/5 (token cost, always-on)**: Phase 1 has zero cost. Phase 2 injection TBD pending measurement.
- **Challenge 1 (dependency weight)**: Phase 1 needs only a jq library, not full just-bash.
- **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog entry says "Challenge 7 (JMESPath): Dismissed — LLMs unreliable with JMESPath, jq better" but the final spec chose JMESPath for Phase 1. This entry reflects an intermediate decision that was later reversed. Should be updated to reflect the final outcome (JMESPath chosen for zero-dep advantage, with sanitizeJMESPathSelector mitigating LLM reliability issues).

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(7) Total Issues | Risk: Medium

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

  • 🟠 Major: SPEC.md:221 SENTINEL_KEY uses JQ but should be SELECT per section 5.1
  • 🟠 Major: SPEC.md:143 Table header says "jq" but Phase 1 uses JMESPath
  • 🟠 Major: SPEC.md:161-176 Undefined variable toolCallId in code example

🟡 Minor (4) 🟡

🟡 1) SPEC.md Missing test case for acceptance criterion #8

Issue: Acceptance criterion #8 states "_structureHints reference $select as an option for filtering" but the verification plan (Section 14) has no explicit test case for this.

Why: Implementers may miss updating _structureHints to mention $select if it's not in the test plan.

Fix: Add under "Integration tests" in Section 14:

- [ ] `_structureHints` includes guidance about using `$select` for inline filtering

Refs:

Inline Comments:

  • 🟡 Minor: SPEC.md:328 Filename jq-filter.ts should be select-filter.ts
  • 🟡 Minor: design-challenge.md:21-22 Challenge 7 contradicts final JMESPath decision
  • 🟡 Minor: _changelog.md:24 Stale "JMESPath dismissed" statement contradicts line 34

💭 Consider (3) 💭

💭 1) SPEC.md:136 JMESPath sanitization architecture
Issue: Spec proposes reusing sanitizeJMESPathSelector() from ArtifactService.ts, but this is a private method. Meanwhile, agents-core/src/utils/jmespath-utils.ts exports public utilities (validateJMESPathSecure(), searchJMESPath()) with security checks.
Why: Risk of divergent sanitization logic if a third implementation is created.
Fix: Clarify which JMESPath utilities to use. Consider consolidating to jmespath-utils.ts.

💭 2) SPEC.md:3-5 Metadata field naming
Issue: Uses Author: and Date: but some existing specs use Owner(s): and Last updated:.
Why: Minor convention inconsistency across specs directory.
Fix: No action required — both patterns exist in the codebase.

💭 3) SPEC.md:247 Decision log columns
Issue: Introduces Confidence column not present in other specs.
Why: The column adds value (documents certainty level) but is a new pattern.
Fix: Keep as-is — this is a reasonable evolution of the spec format.


💡 APPROVE WITH SUGGESTIONS

Summary: This is a well-structured spec that thoughtfully phases the implementation and reuses existing infrastructure. The main issues are terminology inconsistencies from the jq→JMESPath pivot (3 Major naming issues that would confuse implementers) plus a code example bug (undefined variable). All are quick fixes — the spec design itself is sound. Recommend addressing the Major items before merging.

Discarded (2)
Location Issue Reason Discarded
meta/audit-findings.md New meta file type with no precedent Intentional spec evolution — audit trail is valuable
meta/design-challenge.md New meta file type with no precedent Intentional spec evolution — design rationale is valuable
Reviewers (4)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-architecture 5 0 1 0 4 0 0
pr-review-product 5 0 0 0 4 0 1
pr-review-standards 5 1 0 0 3 0 1
pr-review-consistency 8 0 2 0 4 0 2
Total 23 1 3 0 6 0 4

Note: High overlap across reviewers on the jq→JMESPath terminology issues (all 4 flagged SENTINEL_KEY, table header, and filename).


| File | Change |
|------|--------|
| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Inconsistent SENTINEL_KEY naming

Issue: This line says JQ: '$select' but section 5.1 (line 111) correctly defines it as SELECT: '$select'. The existing pattern uses descriptive names matching the value (ARTIFACT: '$artifact', TOOL: '$tool').

Why: Implementers following this table would add the wrong constant name, causing a mismatch with the code example in section 5.1.

Fix: (1-click apply)

Suggested change
| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` |
| `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `SELECT: '$select'` to `SENTINEL_KEY` |

Refs:


The `$select` filter operates on the resolved data. Handle different data types:

| Source type | Serialization for jq | Notes |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Table header references jq but Phase 1 uses JMESPath

Issue: The column header says "Serialization for jq" but Phase 1 uses JMESPath, not jq. This is leftover terminology from an earlier draft.

Why: Creates confusion about what query language Phase 1 actually uses. Additionally, JMESPath operates directly on JavaScript objects, not serialized JSON strings.

Fix: (1-click apply)

Suggested change
| Source type | Serialization for jq | Notes |
| Source type | Handling for JMESPath | Notes |

Refs:

- `agents-api/src/domains/run/artifacts/ArtifactService.ts`
- `agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts`
- `agents-api/src/domains/run/agents/generation/tool-result.ts`
- `agents-api/src/domains/run/utils/` (new jq-filter.ts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Filename inconsistency — jq-filter.ts should be select-filter.ts

Issue: This line says jq-filter.ts but section 6 (line 230) correctly names it select-filter.ts.

Why: Implementers would create a file with the wrong name, inconsistent with the rest of the spec.

Fix: (1-click apply)

Suggested change
- `agents-api/src/domains/run/utils/` (new jq-filter.ts)
- `agents-api/src/domains/run/utils/` (new select-filter.ts)

Refs:

Comment on lines +161 to +176
function applySelector(data: any, selector: string): any {
try {
const sanitized = sanitizeJMESPathSelector(selector);
const result = jmespath.search(data, sanitized);
if (result === null || result === undefined) {
// Selector matched nothing — return empty rather than failing
return null;
}
return result;
} catch (error) {
throw new ToolChainResolutionError(
toolCallId,
`$select filter failed: ${error.message}. Expression: ${selector}`
);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Undefined variable toolCallId in code example

Issue: The applySelector function signature is applySelector(data: any, selector: string) but the error handling references toolCallId which is not defined in scope.

Why: This code example would fail TypeScript compilation. Implementers would need to either add toolCallId as a parameter or restructure the error handling.

Fix: Add toolCallId as a third parameter:

function applySelector(data: any, selector: string, toolCallId: string): any {
  try {
    const sanitized = sanitizeJMESPathSelector(selector);
    const result = jmespath.search(data, sanitized);
    if (result === null || result === undefined) {
      return null;
    }
    return result;
  } catch (error) {
    throw new ToolChainResolutionError(
      toolCallId,
      `$select filter failed: ${error.message}. Expression: ${selector}`
    );
  }
}

Refs:

Comment on lines +21 to +22
## Challenge 7: JMESPath extract tool first — DISMISSED
JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Challenge 7 contradicts final spec decision

Issue: This says "JMESPath is demonstrably error-prone for LLMs...Using jq even in Phase 1" but the final spec uses JMESPath (confirmed by changelog line 34 and SPEC.md D8).

Why: This meta file was not updated after the final Phase 1 refinement. Future readers would be confused about the actual rationale for choosing JMESPath.

Fix: Update to reflect the actual outcome:

Suggested change
## Challenge 7: JMESPath extract tool first — DISMISSED
JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1.
## Challenge 7: JMESPath extract tool first — ACCEPTED AS PHASE 1
The existing `jmespath` library and `sanitizeJMESPathSelector()` provide sufficient capability with zero new dependencies. `_structureHints` already generates JMESPath selectors for LLMs to use, mitigating reliability concerns.

Refs:

- **Challenge 6 ($jq in resolveArgs)**: Adopted as Phase 1. Smallest change, zero token cost, covers pipeline case.
- **Challenge 4/5 (token cost, always-on)**: Phase 1 has zero cost. Phase 2 injection TBD pending measurement.
- **Challenge 1 (dependency weight)**: Phase 1 needs only a jq library, not full just-bash.
- **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Stale statement contradicts final decision

Issue: This says "Challenge 7 (JMESPath): Dismissed" but line 34 confirms JMESPath was ultimately chosen for Phase 1.

Why: The changelog shows intermediate decisions but this creates a contradictory record since the final section (lines 32-39) supersedes it.

Fix: Either remove this line or update it to reflect the final outcome:

Suggested change
- **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better.
- **Challenge 7 (JMESPath)**: Initially dismissed, later ACCEPTED for Phase 1 — see "Phase 1 refinement" section below

Refs:

@github-actions github-actions bot deleted a comment from claude bot Apr 6, 2026
@tim-inkeep
Copy link
Copy Markdown
Contributor Author

@pullfrog, please review the spec again

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

✅ Prior Issues Addressed

All 7 issues from the previous review have been fixed in this revision:

# Prior Issue Status
🟠 1 SPEC.md:221 SENTINEL_KEY uses JQ instead of SELECT ✅ Fixed (now line 439)
🟠 2 SPEC.md:143 Table header says "jq" but Phase 1 uses JMESPath ✅ Fixed (now line 361 says "Handling for JMESPath")
🟠 3 SPEC.md:161-176 Undefined variable toolCallId in applySelector ✅ Fixed (line 379 now includes toolCallId: string as parameter)
🟡 4 SPEC.md:328 Filename jq-filter.ts should be select-filter.ts ✅ Fixed (line 548)
🟡 5 design-challenge.md:21-22 Challenge 7 contradicts final JMESPath decision ✅ Fixed (now says "ACCEPTED AS PHASE 1")
🟡 6 _changelog.md:24 Stale "JMESPath dismissed" statement ✅ Fixed (explains the pivot)
🟡 7 Missing test case for acceptance criterion #8 ✅ Fixed (line 598 adds _structureHints test case)

Delta Analysis

The revision also added significant new content:

  1. Full Phase 2 technical design (lines 80-312) — tool interface, architecture diagram, stdin data flow, SandboxExecutorFactory integration, resource limits, error handling, observability requirements, and acceptance criteria
  2. Compression prerequisite — added as a Phase 2 blocker (line 88)
  3. JMESPath consolidation note (line 355) — calls out that sanitizeJMESPathSelector() is private and should be consolidated with jmespath-utils.ts
  4. Changelog entries (lines 41-48) — documents the Phase 2 restoration

The new content is well-structured and internally consistent. No new issues identified.

💭 Consider (1) 💭

💭 1) audit-findings.md:23 Historical reference to $jq
Issue: Line 23 says "Phase 1 ($jq)" but Phase 1 now uses $select.
Why: This is in a historical audit document, so it accurately reflects the state at audit time.
Fix: No action needed — historical accuracy is acceptable here. The main SPEC.md is correct.


✅ APPROVE

Summary: All 7 issues from the prior review have been addressed. The spec is now internally consistent — no stale jq references remain in the implementation guidance, terminology is aligned throughout, and the code example bug is fixed. The expanded Phase 2 documentation is comprehensive and well-structured. Ready to merge. 🎉

Discarded (0)

No discarded findings.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator-delta-review 1 0 1 0 0 0 0
Total 1 0 1 0 0 0 0

Note: This was a delta review scoped to changes since the prior automated review at commit 705b252. All 7 prior issues were verified as fixed.

@github-actions github-actions bot deleted a comment from claude bot Apr 6, 2026
@tim-inkeep tim-inkeep changed the title feat: add $select filtering to $tool/$artifact refs for tool chaining spec: data query & transform for tool chaining ($select + bash tool) Apr 6, 2026
Copy link
Copy Markdown
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: all previous issues fixed. Remaining items are terminology cleanup in Phase 1 sections and one design accuracy issue.

All 7 issues from the first review (stale jq→JMESPath terminology, undefined toolCallId, filename mismatch, missing test case) have been addressed. All 14 codebase claims verified accurate — file paths, line numbers, method signatures, dependency versions, and behavioral descriptions all match the actual code.

Two new issues found: (1) the Phase 1 serialization table and surrounding sections still reference "jq" in 6 places where they should say "JMESPath" — this is specifically in Phase 1 sections where the query language is JMESPath, not jq; (2) the serialization table itself is misleading because JMESPath operates on native JS objects, not serialized JSON strings.

The meta files (_changelog.md, design-challenge.md, audit-findings.md) also have stale $jq / "jq library" references from the pre-pivot era, but these are historical records so the inline comments focus on the normative SPEC.md sections that implementers will follow.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Claude Opus𝕏

Comment on lines +359 to +363
The `$select` filter operates on the resolved data. Handle different data types:

| Source type | Handling for JMESPath | Notes |
|-------------|---------------------|-------|
| Object/Array | `JSON.stringify(data)` | Normal case — jq operates on JSON |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 1 serialization table still references jq in 4 places. This table is in Section 5.3 (Phase 1 design), where the query language is JMESPath — not jq. Additionally, the table is somewhat misleading: JMESPath operates directly on JavaScript objects via jmespath.search(data, expression), not on serialized JSON strings. The JSON.stringify column makes sense for Phase 2 bash/stdin, but for Phase 1 $select, the resolved data is passed directly to jmespath.search() as a JS value.

Stale references:

  • Line 363: Normal case — jq operates on JSON → JMESPath
  • Line 366: Skip jq, return error | jq only works on text/JSON → JMESPath

Consider renaming the column header to just "Handling" (it's already correct as Handling for JMESPath) and updating the notes to reflect that JMESPath receives the native JS object, not a serialized string.

Suggested change
The `$select` filter operates on the resolved data. Handle different data types:
| Source type | Handling for JMESPath | Notes |
|-------------|---------------------|-------|
| Object/Array | `JSON.stringify(data)` | Normal case — jq operates on JSON |
| Source type | Handling for JMESPath | Notes |
|-------------|---------------------|-------|
| Object/Array | Pass directly to `jmespath.search()` | Normal case — JMESPath operates on JS objects |
| String | Pass as-is (not re-stringified) | Avoid double-quoting: `"hello"` not `"\"hello\""` |
| MCP content array | Unwrap text parts, concatenate | `{ content: [{type: 'text', text: '...'}] }` → extract text |
| Buffer/binary | Skip filter, return error | JMESPath only works on structured/text data |

When `$tool`+`$artifact` ref resolves to an oversized artifact (`retrievalBlocked: true`):
- `$select` filtering bypasses the retrieval block
- Artifact data fetched via `ArtifactService.getArtifactFull()` with new `{ allowOversized: true }` option
- jq filter applied, producing a subset that fits in context
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale jq reference in Phase 1 section. This line says "jq filter applied" but this is Section 5.4 (Phase 1 Oversized Artifact Processing), where the filter is JMESPath via $select.

Suggested change
- jq filter applied, producing a subset that fits in context
- JMESPath filter applied via `$select`, producing a subset that fits in context

}
```

On jq failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale jq reference in Phase 1 error handling section.

Suggested change
On jq failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).
On JMESPath failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors).

| Component | How it participates |
|-----------|---------------------|
| `resolveArgs()` | Extended (not replaced) — existing `$tool`/`$artifact` logic unchanged |
| `ToolChainResolutionError` | Reused for jq filter errors |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale jq reference in Phase 1 integration points.

Suggested change
| `ToolChainResolutionError` | Reused for jq filter errors |
| `ToolChainResolutionError` | Reused for `$select` filter errors |

|---|----------|--------|------|------------|-----------|
| D1 | Phase 1: `$select` (JMESPath) in resolveArgs; Phase 2: bash tool | LOCKED | Cross-cutting | HIGH | `$select` is smallest viable change — no new tool, no token cost, zero new deps (uses existing `jmespath`). Bash tool for exploratory cases pending validation. |
| D2 | Queryable sources = session cache + stored artifacts | DIRECTED | Cross-cutting | HIGH | Session cache for current-turn, artifacts for cross-turn |
| D3 | Allow processing oversized artifacts | LOCKED | Cross-cutting | HIGH | Key capability — jq filter operates out-of-context |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale jq reference in decision log. D3's rationale says "jq filter" but the Phase 1 mechanism is JMESPath filtering via $select. jq is only relevant in Phase 2 (bash tool).

Suggested change
| D3 | Allow processing oversized artifacts | LOCKED | Cross-cutting | HIGH | Key capability — jq filter operates out-of-context |
| D3 | Allow processing oversized artifacts | LOCKED | Cross-cutting | HIGH | Key capability — `$select` filter operates out-of-context |

→ stdin receives 50K JSON, jq filters to 3K (call_id: "call_bash_1")
→ executed in sandbox, event loop free
→ 3K result cached in ToolSessionManager, enters context

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exampleSelectors use a result. prefix that gets stripped. Worth noting: _structureHints.exampleSelectors generate JMESPath expressions prefixed with result. (e.g., result.items[?score > 0.8]). ArtifactService.createArtifact strips this prefix before applying (line 215-217). When the LLM uses these selectors in a $select expression, the implementer needs to decide whether $select should also handle the result. prefix, or whether _structureHints guidance should clarify that $select expressions don't need the prefix. This is worth calling out explicitly to avoid confusion during implementation.

@tim-inkeep tim-inkeep changed the title spec: data query & transform for tool chaining ($select + bash tool) spec: $select filtering for tool refs + bash tool for data exploration Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Preview URLs

Use these stable preview aliases for testing this PR:

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

});
```

**Available commands (most relevant):**
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to limit the amount of tools initially available?

- Serialization table: jq→JMESPath, JSON.stringify→pass directly to jmespath.search()
- Oversized artifact processing: "jq filter" → "JMESPath filter via $select"
- Error handling: "On jq failure" → "On JMESPath failure"
- Integration points: "jq filter errors" → "$select filter errors"
- Decision D3: "jq filter" → "$select filter"
- Added result. prefix handling note for _structureHints.exampleSelectors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

✅ Prior Issues Addressed

All 6 issues from the pullfrog review (20:48) have been fixed in commit 0eae157:

# Prior Issue Status
1 SPEC.md:363 "jq operates on JSON" → JMESPath ✅ Fixed (line 365: "JMESPath operates on JS objects")
2 SPEC.md:366 "jq only works on text/JSON" → JMESPath ✅ Fixed (line 368: "JMESPath only works on structured/text data")
3 SPEC.md:375 "jq filter applied" → JMESPath ✅ Fixed (line 375: "JMESPath filter applied via $select")
4 SPEC.md:399 "On jq failure" → JMESPath ✅ Fixed (line 399: "On JMESPath failure")
5 SPEC.md:459 "jq filter errors" → $select ✅ Fixed (line 459: "$select filter errors")
6 SPEC.md:471 D3 "jq filter" → $select ✅ Fixed (line 471: "$select filter operates out-of-context")

Delta Analysis

The commit 0eae157 ("Fix remaining stale jq references in Phase 1 sections") correctly addresses all terminology inconsistencies. Additionally:

  • Added result. prefix handling (line 357) — documents that _structureHints.exampleSelectors generate JMESPath expressions with a result. prefix that gets stripped by ArtifactService.createArtifact(), and recommends the $select implementation auto-strip this prefix for consistency. This addresses pullfrog's design accuracy note.

  • Serialization table corrected — now accurately describes that JMESPath operates on native JS objects (not serialized JSON strings), fixing the conceptual inaccuracy in the prior version.

🕐 Pending Recommendations (1)

  • 💬 SPEC.md:198 tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)

✅ APPROVE

Summary: All terminology issues have been resolved. The spec is now internally consistent — no stale jq references remain in Phase 1 sections. The result. prefix handling documentation is a valuable addition that will help implementers. One human question remains open about limiting Phase 2 bash tool commands, but this doesn't block the spec for Phase 1 implementation. 🎉

Discarded (0)

No discarded findings.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator-delta-review 6 0 0 0 0 1 0
Total 6 0 0 0 0 1 0

Note: This was a delta review scoped to changes since the pullfrog review at commit b48ff20. All 6 prior stale jq→JMESPath issues were verified as fixed.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(1) Total Issues | Risk: Low

🟠⚠️ Major (1) 🟠⚠️

Inline Comments:

  • 🟠 Major: SPEC.md:29 Stale SandboxExecutorFactory reference contradicts updated Phase 2 execution model

💭 Consider (2) 💭

💭 1) SPEC.md:89,130 WASM bundling vs invocation distinction
Issue: The spec states "we disable the WASM-dependent optional runtimes" and "no WASM for core commands" but the evidence file (just-bash-assessment.md:47-51) notes that sql.js and quickjs-emscripten WASM modules are bundled with just-bash, even if not invoked.
Why: The distinction is between "not invoked" (what the spec means) vs "not bundled" (what readers might assume). The evidence file already acknowledges this ("We don't use python/JS/sqlite features but they're bundled").
Fix: Consider clarifying in the dependency spike prerequisite that verification should confirm WASM modules are lazy-loaded (not impacting cold start) if not invoked, or update the spec language from "no WASM" to "WASM runtimes not invoked."

💭 2) SPEC.md:172-179 Concurrency limits removed without explicit rationale
Issue: The delta removed acceptance criterion #6 "Concurrent bash calls bounded by semaphore" without documenting why concurrency limits are no longer needed for in-process execution.
Why: The prior SandboxExecutorFactory approach used semaphore-based concurrency limiting (NativeSandboxExecutor.ts:70-129). In-process execution doesn't need this pattern, but the rationale should be documented.
Fix: Add a brief note in the Resource Limits section explaining that concurrency is implicitly bounded by Node.js single-threaded execution model (one request at a time in serverless, event loop for others).

🕐 Pending Recommendations (1)

  • 💬 SPEC.md:191 tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)

💡 APPROVE WITH SUGGESTIONS

Summary: The Phase 2 architecture simplification (SandboxExecutorFactory → in-process just-bash) is a well-reasoned change that removes unnecessary complexity. However, one reference at line 29 (in the Problem Statement section) wasn't updated and now contradicts the new execution model described in the rest of the spec. Quick fix — update line 29 to match the Architecture section. The design considerations (WASM bundling, concurrency model) are valid nuances to address before Phase 2 implementation but don't block this spec revision.

Discarded (2)
Location Issue Reason Discarded
SPEC.md:130 Security model assumes LLM-generated commands are safe Valid design consideration but already gated behind Phase 2 prerequisites — the dependency spike + usage data will inform this decision before implementation
evidence/just-bash-assessment.md WASM bundling risk Already acknowledged in evidence file ("they're bundled") — not a new finding
Reviewers (2)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-architecture 3 0 2 0 0 0 1
pr-review-standards 1 0 0 0 1 0 0
Total 4 0 2 0 1 0 1

Note: Delta review scoped to changes since commit 0eae157. All prior issues (jq→JMESPath terminology, etc.) were verified as fixed in that commit.


**Phase 1 — `$select` selector in `$tool` refs:** Enhance `resolveArgs()` to support inline JMESPath filtering at resolution time. E.g., `{ "$tool": "call_id", "$select": "items[?score > `0.8`].{title: title, url: url}" }`. This filters data before it reaches the downstream tool — no extra tool call, no token cost in system prompts, **zero new dependencies** (uses the existing `jmespath` library already in the codebase). `_structureHints` already generates example JMESPath selectors that the LLM can use directly. Covers the pipeline case (filtering data between tools).

**Phase 2 — Built-in `bash` tool:** A sandboxed shell (via `just-bash`) for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes via the existing `SandboxExecutorFactory` (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). Covers the interactive case. Requires dependency validation spike first.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Stale SandboxExecutorFactory reference contradicts updated Phase 2 execution model

Issue: This line still describes Phase 2 execution via SandboxExecutorFactory (NativeSandboxExecutor/VercelSandboxExecutor), but the rest of the Phase 2 section (lines 116-168) was updated in this commit to describe direct in-process execution via just-bash with no SandboxExecutorFactory.

Why: Implementers following the spec will see contradictory execution models — the overview says "SandboxExecutorFactory" while the Architecture section says "no child processes, no IPC, no sandbox wrappers needed." Decision D6 (line 465) and acceptance criterion #5 (line 298) both now specify in-process execution.

Fix: Update to match the new execution model:

Suggested change
**Phase 2 — Built-in `bash` tool:** A sandboxed shell (via `just-bash`) for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes via the existing `SandboxExecutorFactory` (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). Covers the interactive case. Requires dependency validation spike first.
**Phase 2 — Built-in `bash` tool:** A sandboxed bash tool powered by `just-bash` for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes directly in-process (no external sandbox needed — just-bash is a virtual bash environment with in-memory FS and no network). Covers the interactive case. Requires dependency validation spike first.

Refs:

@github-actions github-actions bot deleted a comment from claude bot Apr 7, 2026
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(6) Total Issues | Risk: Medium

This is a delta review scoped to changes since commit 51e7697bf201 — the implementation of $select filtering for tool chaining (18 files, +1040/-140).

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

  • 🟠 Major: select-filter.ts:27 Duplicate DANGEROUS_PATTERNS creates security drift risk
  • 🟠 Major: function-tools.ts:94 Silent schema conversion failure hides validation degradation
  • 🟠 Major: mcp-tools.ts:33 Silent schema conversion failure in MCP tools

🟡 Minor (3) 🟡

🟡 1) instrumentation.ts:97 SpanProcessor mutates ReadableSpan attributes

Issue: ToolResultSanitizer.onEnd() mutates span.attributes directly via cast (span as any).attributes. ReadableSpan attributes are intended to be immutable after the span ends.

Why: This mutation may not persist depending on processor ordering relative to export. The code works because of the current processor order (BaggageSpanProcessor → ToolResultSanitizer → BatchSpanProcessor), but this is fragile.

Fix: Consider documenting this is a known workaround that depends on processor ordering, or strip internal fields earlier in the pipeline before recording the result.

Refs:

Inline Comments:

  • 🟡 Minor: select-filter.test.ts:29 Missing tests for stripInternalFields utility
  • 🟡 Minor: resolveArgs-select.test.ts:101 Missing test for $select on retrievalBlocked artifacts

💭 Consider (4) 💭

💭 1) select-filter.ts:16-17 Cache eviction strategy
Issue: The selectorCache Map has a 1000 entry limit but no eviction policy — once full, new selectors get no caching benefit.
Why: Cache effectiveness degrades over time as early selectors are cached forever while later ones are not.
Fix: Consider LRU cache if this becomes a hot path. Current limit prevents unbounded growth, so acceptable for now.

💭 2) select-filter.ts:126-130 Data shape exposure in error messages
Issue: summarizeShape output in error messages could expose internal field names to LLM output.
Why: The function does filter _-prefixed keys, mitigating the primary concern. Consider whether exposing any field names is acceptable for your threat model.

💭 3) agent-types.ts:244-252 Duplicated validation result type
Issue: The safeParse return type is inlined twice (for parameters and baseInputSchema).
Fix: Extract to shared type alias for maintainability.

💭 4) ref-aware-schema.ts:36-75 PARAM_REF_SCHEMA typed as Record<string, unknown>
Issue: Loses structure validation at compile time.
Fix: Consider defining a JsonSchema interface, though this is acceptable since JSON Schema manipulation is inherently weakly typed.

🕐 Pending Recommendations (1)

  • 💬 SPEC.md:191 tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)

💡 APPROVE WITH SUGGESTIONS

Summary: The $select implementation is well-designed — it reuses existing JMESPath infrastructure, has good security checks (dangerous patterns, length limits), comprehensive test coverage for the core filtering logic, and clear error messages. The main concerns are: (1) duplicate DANGEROUS_PATTERNS creating security drift risk with agents-core, and (2) silent schema conversion failures that hide validation degradation. Both are addressable with straightforward changes. The test coverage gaps (stripInternalFields, retrievalBlocked artifacts) are worth adding but not blocking.

Discarded (3)
Location Issue Reason Discarded
instrumentation.ts:81 SpanProcessor.onStart signature uses ReadableSpan Technically correct concern but onStart is empty and parameter is unused — no practical impact
ref-aware-schema.ts:1 Minimal zod usage Valid but extremely minor — import is used, just not extensively
PromptConfig.ts:11-18 Templates inlined vs file imports This is the existing pattern in the file, not a regression from this PR
Reviewers (6)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 3 0 1 0 0 0 2
pr-review-appsec 3 1 2 0 0 0 0
pr-review-errors 3 2 0 0 0 0 1
pr-review-tests 7 0 0 0 2 0 5
pr-review-consistency 5 1 0 0 0 0 4
pr-review-types 2 0 2 0 0 0 0
Total 23 4 5 0 2 0 12

Note: High overlap across reviewers on the duplicate DANGEROUS_PATTERNS issue and silent schema conversion failures.

/constructor/,
/prototype/,
/__proto__/,
];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Duplicate DANGEROUS_PATTERNS creates security drift risk

Issue: This DANGEROUS_PATTERNS array duplicates the identical list in packages/agents-core/src/utils/jmespath-utils.ts (lines 191-198). The spec explicitly notes at line 346 to consolidate JMESPath utilities to avoid divergent implementations.

Why: If new dangerous patterns are identified, both locations must be updated. Security control drift is a real risk when validation logic is duplicated.

Fix: Import from the existing shared location:

import { DANGEROUS_PATTERNS, MAX_EXPRESSION_LENGTH } from '@inkeep/agents-core/utils/jmespath-utils';

Refs:

baseInputSchema = makeBaseInputSchema(functionData.inputSchema);
} catch {
// baseInputSchema stays undefined — post-resolution validation will be skipped
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Silent schema conversion failure hides validation degradation

Issue: When makeBaseInputSchema() fails, baseInputSchema stays undefined and post-resolution validation is silently skipped. If an LLM's $select resolves to the wrong type (object instead of string), the tool executes with wrong-typed data causing confusing errors deep in tool execution rather than a clear validation message.

Why: Operators have zero visibility into which tools are running with degraded type-safety. The comment "post-resolution validation will be skipped" documents the behavior but doesn't surface it operationally.

Fix: Log a warning when schema conversion fails:

try {
  baseInputSchema = makeBaseInputSchema(functionData.inputSchema);
} catch (schemaError) {
  logger.warn(
    {
      functionToolName: functionToolDef.name,
      schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError),
    },
    'Failed to build base input schema; post-resolution validation will be skipped'
  );
}

Refs:

refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>,
baseInputSchema: undefined,
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Silent schema conversion failure in MCP tools

Issue: Same issue as function-tools.ts — when buildRefAwareInputSchema fails, baseInputSchema is undefined and validation is silently disabled. MCP tools often have complex schemas, and failures here mean agents lose type-safety for tool chaining without any operational visibility.

Why: A user who writes $select: 'items[0]' expecting a string but getting an object will see cryptic MCP tool errors rather than a clear "resolved to wrong type" message.

Fix: Add warning logging in the catch block:

} catch (error) {
  logger.warn(
    { schemaError: error instanceof Error ? error.message : String(error) },
    'Failed to build ref-aware schema for MCP tool; post-resolution validation disabled'
  );
  return {
    refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>,
    baseInputSchema: undefined,
  };
}

it('should return valid expressions unchanged', () => {
expect(sanitizeJMESPathSelector('items[?score > `0.8`]')).toBe('items[?score > `0.8`]');
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Missing tests for stripInternalFields utility

Issue: The stripInternalFields function exported from select-filter.ts (lines 9-14) is not tested. This function filters out underscore-prefixed keys before exposing results to downstream tools and traces.

Why: If this function breaks, tool results would incorrectly include internal metadata (_structureHints, _toolCallId), potentially causing schema validation failures or exposing internal implementation details.

Fix: Add a test section:

describe('stripInternalFields', () => {
  it('should remove underscore-prefixed keys from object', () => {
    const data = { name: 'test', _structureHints: {}, _toolCallId: 'call_1' };
    expect(stripInternalFields(data)).toEqual({ name: 'test' });
  });

  it('should return arrays unchanged', () => {
    expect(stripInternalFields([1, 2, 3])).toEqual([1, 2, 3]);
  });

  it('should return primitives unchanged', () => {
    expect(stripInternalFields('hello')).toBe('hello');
    expect(stripInternalFields(null)).toBe(null);
  });
});


expect(result).toBe(3);
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Missing test for $select on retrievalBlocked artifacts

Issue: The spec states that oversized artifacts with retrievalBlocked: true can be filtered via $select to bypass the retrieval block (SPEC.md line 357-360). No test verifies this critical behavior.

Why: If the system fails to allow $select filtering on blocked artifacts, large tool results would be completely inaccessible even when the agent only needs a small subset — defeating a primary use case.

Fix: Add a test case:

it('should allow $select to extract from retrievalBlocked artifact', async () => {
  const parser = createParser({
    getArtifactFull: vi.fn().mockResolvedValue({
      data: { items: [{ id: 1 }, { id: 2 }] },
      metadata: { retrievalBlocked: true, isOversized: true }
    }),
  });

  const result = await parser.resolveArgs({
    $artifact: 'art_oversized',
    $tool: 'call_search',
    $select: 'items[0].id',
  });

  expect(result).toBe(1);
});

@github-actions github-actions bot deleted a comment from claude bot Apr 7, 2026
@itoqa
Copy link
Copy Markdown

itoqa bot commented Apr 7, 2026

Ito Test Report ✅

11 test cases ran. 1 additional finding, 10 passed.

Across 11 test cases, 10 passed and 1 failed, showing that most run-domain and Playground behaviors are functioning as expected, including route parity between /run/api/chat and /run/v1/chat/completions, correct $tool/$artifact/$select resolution (with nested and no-$select compatibility), type-mismatch rejection with actionable guidance, internal field sanitization, mobile viewport usability, and selector-injection guard enforcement. The key issue is one high-severity, reproducible concurrency defect where rapid repeated submits can generate colliding millisecond-based request IDs (chatcmpl-${Date.now()}), causing request-scoped stream/session state overwrites and resulting in cross-run corruption, dropped or misaligned streams, and assistant error states under burst traffic.

✅ Passed (10)
Category Summary Screenshot
Adversarial Selector injection payloads are blocked by runtime guards; local security tests passed for ${evil} and eval() patterns. ADV-1
Edge Recursive nested refs resolve independently as expected; prior runtime failure was not reproducible as a production-code defect. EDGE-3
Logic Verified $tool + $select applies filtering before downstream tool execution. LOGIC-1
Logic Verified $artifact + $tool + $select resolves and filters saved artifact data correctly. LOGIC-2
Logic Verified backward-compatible passthrough when $select is omitted. LOGIC-3
Logic Type-mismatch handling for post-resolution $select inputs is implemented and validated, including actionable drill-deeper guidance. N/A
Logic Internal underscore-prefixed fields are stripped from surfaced tool_result payloads; prior blockage was environmental rather than a product sanitization defect. LOGIC-5
Mobile On 390x844 viewport, Playground controls stayed visible/usable and error messaging remained readable in-view during prompt submissions. MOBILE-1
Happy-path Verified /run/api/chat chaining behavior; prior blockage was fixture setup, not app logic. ROUTE-1
Happy-path Verified /run/v1/chat/completions chaining parity; no route-level defect identified. ROUTE-2
ℹ️ Additional Findings (1)

These findings are unrelated to the current changes but were observed during testing.

Category Summary Screenshot
Edge ⚠️ Rapid submits can collide request IDs and corrupt run state EDGE-6
⚠️ Rapid repeated submits do not corrupt chaining resolution
  • What failed: Distinct rapid submits can share the same request ID, so stream/session registration can overwrite prior in-flight state instead of staying isolated per request.
  • Impact: Concurrent users can see cross-run corruption, dropped/misaligned stream state, or assistant error states during burst submissions. This undermines chaining reliability in high-interaction scenarios.
  • Steps to reproduce:
    1. Open the agent playground chat for a configured tool-chaining agent.
    2. Submit the same chaining prompt three times in rapid succession (double submit plus Enter spam).
    3. Observe that request-scoped stream/session behavior can become corrupted instead of staying isolated per run.
  • Stub / mock context: No stubs, mocks, or bypasses were applied for this test in the recorded run.
  • Code analysis: I inspected run request ID creation and downstream request-scoped registries in the run domain. IDs are generated with millisecond time only, then reused as the key for stream helper and session registration, which creates a collision path when multiple submits land in the same millisecond.
  • Why this is likely a bug: Request IDs are not guaranteed unique under rapid concurrency, but they are treated as unique keys for mutable run state, so collisions can deterministically overwrite active request associations.

Relevant code:

agents-api/src/domains/run/routes/chat.ts (lines 332-337)

const requestId = `chatcmpl-${Date.now()}`;
const timestamp = Math.floor(Date.now() / 1000);

const lastUserMessage = body.messages
  .filter((msg: Message) => msg.role === 'user')
  .slice(-1)[0];

agents-api/src/domains/run/stream/stream-registry.ts (lines 19-23)

/**
 * Register a StreamHelper for a specific request ID
 */
export function registerStreamHelper(requestId: string, streamHelper: StreamHelper): void {
  getRegistry().set(requestId, streamHelper);
}

agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts (lines 474-478)

registerStreamHelper(requestId, sseHelper);
agentSessionManager.createSession(requestId, executionContext, conversationId);
if (emitOperations) {
  agentSessionManager.enableEmitOperations(requestId);
}

Commit: e223b3a

View Full Run


Tell us how we did: Give Ito Feedback

@tim-inkeep tim-inkeep closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant