spec: $select filtering for tool refs + bash tool for data exploration#3036
spec: $select filtering for tool refs + bash tool for data exploration#3036tim-inkeep wants to merge 6 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🦋 Changeset detectedLatest commit: e223b3a The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
TL;DR — Implements Phase 1 of the Key changes
Summary | 26 files | 7 commits | base:
|
There was a problem hiding this comment.
Spec review — solid design, several stale jq references to clean up.
The Phase 1 $select design is well-reasoned: zero new dependencies, zero token cost, leverages existing jmespath + sanitizeJMESPathSelector + _structureHints.exampleSelectors. All codebase claims verified accurate (line numbers, file paths, dependency versions, method signatures). The phased approach with clear Phase 2 prerequisites is good discipline.
The main issue is incomplete find-and-replace from the jq → JMESPath pivot. Several sections still reference "jq" where they should say "JMESPath", and one file naming inconsistency (jq-filter.ts vs select-filter.ts). These are all in the spec text — not blocking, but should be cleaned up before this serves as the implementation guide.
One design note outside the diff: sanitizeJMESPathSelector() is currently a private method on ArtifactService (line 848). The spec correctly plans a new select-filter.ts utility to wrap it, but should explicitly call out that the private method needs to be extracted or made accessible — otherwise the implementer may miss this.
Claude Opus | 𝕏
|
|
||
| The `$select` filter operates on the resolved data. Handle different data types: | ||
|
|
||
| | Source type | Serialization for jq | Notes | |
There was a problem hiding this comment.
Table header says "Serialization for jq" but Phase 1 uses JMESPath, not jq. Should be "Serialization for JMESPath" (or just "Serialization").
|
|
||
| | File | Change | | ||
| |------|--------| | ||
| | `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` | |
There was a problem hiding this comment.
The modified files table says Add JQ: '$select' to SENTINEL_KEY but the code example in section 5.1 (line 111) correctly uses SELECT: '$select'. This should be Add SELECT: '$select' to SENTINEL_KEY.
| - `agents-api/src/domains/run/artifacts/ArtifactService.ts` | ||
| - `agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts` | ||
| - `agents-api/src/domains/run/agents/generation/tool-result.ts` | ||
| - `agents-api/src/domains/run/utils/` (new jq-filter.ts) |
There was a problem hiding this comment.
Agent Constraints lists the new file as jq-filter.ts but section 6 (New Files table, line 230) calls it select-filter.ts. These should match — select-filter.ts is the correct name given the $select key naming.
| - Child process infrastructure — Phase 2 | ||
|
|
||
| **STOP_IF:** | ||
| - jq library spike shows no viable option for full jq syntax in JS |
There was a problem hiding this comment.
Both STOP_IF conditions reference jq:
- "jq library spike shows no viable option" — Phase 1 uses the existing
jmespathlibrary, no spike needed - "jq adds >50ms" — should say JMESPath
These read like Phase 2 stop conditions that got left in the Phase 1 section.
| } | ||
| return result; | ||
| } catch (error) { | ||
| throw new ToolChainResolutionError( |
There was a problem hiding this comment.
applySelector(data, selector) references toolCallId in the throw statement (line 172) but toolCallId is not a parameter of this function. Either the function signature should include it, or the error should be thrown by the caller (resolveArgs) which has access to toolCallId.
| This became the Phase 1 design. Smallest change, highest value, zero token cost, no new dependency beyond a jq library. | ||
|
|
||
| ## Challenge 7: JMESPath extract tool first — DISMISSED | ||
| JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1. |
There was a problem hiding this comment.
Challenge 7 says "JMESPath is demonstrably error-prone for LLMs" and "Using jq even in Phase 1" — this directly contradicts the final SPEC.md which chose JMESPath for Phase 1 precisely because it's already in the codebase. This finding was written before the JMESPath pivot but was never updated. Should note that this challenge was later superseded by the zero-dependency argument.
| - **Challenge 6 ($jq in resolveArgs)**: Adopted as Phase 1. Smallest change, zero token cost, covers pipeline case. | ||
| - **Challenge 4/5 (token cost, always-on)**: Phase 1 has zero cost. Phase 2 injection TBD pending measurement. | ||
| - **Challenge 1 (dependency weight)**: Phase 1 needs only a jq library, not full just-bash. | ||
| - **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better. |
There was a problem hiding this comment.
Changelog entry says "Challenge 7 (JMESPath): Dismissed — LLMs unreliable with JMESPath, jq better" but the final spec chose JMESPath for Phase 1. This entry reflects an intermediate decision that was later reversed. Should be updated to reflect the final outcome (JMESPath chosen for zero-dep advantage, with sanitizeJMESPathSelector mitigating LLM reliability issues).
There was a problem hiding this comment.
PR Review Summary
(7) Total Issues | Risk: Medium
🟠⚠️ Major (3) 🟠⚠️
Inline Comments:
- 🟠 Major:
SPEC.md:221SENTINEL_KEY usesJQbut should beSELECTper section 5.1 - 🟠 Major:
SPEC.md:143Table header says "jq" but Phase 1 uses JMESPath - 🟠 Major:
SPEC.md:161-176Undefined variabletoolCallIdin code example
🟡 Minor (4) 🟡
🟡 1) SPEC.md Missing test case for acceptance criterion #8
Issue: Acceptance criterion #8 states "_structureHints reference $select as an option for filtering" but the verification plan (Section 14) has no explicit test case for this.
Why: Implementers may miss updating _structureHints to mention $select if it's not in the test plan.
Fix: Add under "Integration tests" in Section 14:
- [ ] `_structureHints` includes guidance about using `$select` for inline filtering
Refs:
Inline Comments:
- 🟡 Minor:
SPEC.md:328Filenamejq-filter.tsshould beselect-filter.ts - 🟡 Minor:
design-challenge.md:21-22Challenge 7 contradicts final JMESPath decision - 🟡 Minor:
_changelog.md:24Stale "JMESPath dismissed" statement contradicts line 34
💭 Consider (3) 💭
💭 1) SPEC.md:136 JMESPath sanitization architecture
Issue: Spec proposes reusing sanitizeJMESPathSelector() from ArtifactService.ts, but this is a private method. Meanwhile, agents-core/src/utils/jmespath-utils.ts exports public utilities (validateJMESPathSecure(), searchJMESPath()) with security checks.
Why: Risk of divergent sanitization logic if a third implementation is created.
Fix: Clarify which JMESPath utilities to use. Consider consolidating to jmespath-utils.ts.
💭 2) SPEC.md:3-5 Metadata field naming
Issue: Uses Author: and Date: but some existing specs use Owner(s): and Last updated:.
Why: Minor convention inconsistency across specs directory.
Fix: No action required — both patterns exist in the codebase.
💭 3) SPEC.md:247 Decision log columns
Issue: Introduces Confidence column not present in other specs.
Why: The column adds value (documents certainty level) but is a new pattern.
Fix: Keep as-is — this is a reasonable evolution of the spec format.
💡 APPROVE WITH SUGGESTIONS
Summary: This is a well-structured spec that thoughtfully phases the implementation and reuses existing infrastructure. The main issues are terminology inconsistencies from the jq→JMESPath pivot (3 Major naming issues that would confuse implementers) plus a code example bug (undefined variable). All are quick fixes — the spec design itself is sound. Recommend addressing the Major items before merging.
Discarded (2)
| Location | Issue | Reason Discarded |
|---|---|---|
meta/audit-findings.md |
New meta file type with no precedent | Intentional spec evolution — audit trail is valuable |
meta/design-challenge.md |
New meta file type with no precedent | Intentional spec evolution — design rationale is valuable |
Reviewers (4)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-architecture |
5 | 0 | 1 | 0 | 4 | 0 | 0 |
pr-review-product |
5 | 0 | 0 | 0 | 4 | 0 | 1 |
pr-review-standards |
5 | 1 | 0 | 0 | 3 | 0 | 1 |
pr-review-consistency |
8 | 0 | 2 | 0 | 4 | 0 | 2 |
| Total | 23 | 1 | 3 | 0 | 6 | 0 | 4 |
Note: High overlap across reviewers on the jq→JMESPath terminology issues (all 4 flagged SENTINEL_KEY, table header, and filename).
|
|
||
| | File | Change | | ||
| |------|--------| | ||
| | `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` | |
There was a problem hiding this comment.
🟠 MAJOR: Inconsistent SENTINEL_KEY naming
Issue: This line says JQ: '$select' but section 5.1 (line 111) correctly defines it as SELECT: '$select'. The existing pattern uses descriptive names matching the value (ARTIFACT: '$artifact', TOOL: '$tool').
Why: Implementers following this table would add the wrong constant name, causing a mismatch with the code example in section 5.1.
Fix: (1-click apply)
| | `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `JQ: '$select'` to `SENTINEL_KEY` | | |
| | `agents-api/src/domains/run/constants/artifact-syntax.ts` | Add `SELECT: '$select'` to `SENTINEL_KEY` | |
Refs:
|
|
||
| The `$select` filter operates on the resolved data. Handle different data types: | ||
|
|
||
| | Source type | Serialization for jq | Notes | |
There was a problem hiding this comment.
🟠 MAJOR: Table header references jq but Phase 1 uses JMESPath
Issue: The column header says "Serialization for jq" but Phase 1 uses JMESPath, not jq. This is leftover terminology from an earlier draft.
Why: Creates confusion about what query language Phase 1 actually uses. Additionally, JMESPath operates directly on JavaScript objects, not serialized JSON strings.
Fix: (1-click apply)
| | Source type | Serialization for jq | Notes | | |
| | Source type | Handling for JMESPath | Notes | |
Refs:
| - `agents-api/src/domains/run/artifacts/ArtifactService.ts` | ||
| - `agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts` | ||
| - `agents-api/src/domains/run/agents/generation/tool-result.ts` | ||
| - `agents-api/src/domains/run/utils/` (new jq-filter.ts) |
There was a problem hiding this comment.
🟡 Minor: Filename inconsistency — jq-filter.ts should be select-filter.ts
Issue: This line says jq-filter.ts but section 6 (line 230) correctly names it select-filter.ts.
Why: Implementers would create a file with the wrong name, inconsistent with the rest of the spec.
Fix: (1-click apply)
| - `agents-api/src/domains/run/utils/` (new jq-filter.ts) | |
| - `agents-api/src/domains/run/utils/` (new select-filter.ts) |
Refs:
| function applySelector(data: any, selector: string): any { | ||
| try { | ||
| const sanitized = sanitizeJMESPathSelector(selector); | ||
| const result = jmespath.search(data, sanitized); | ||
| if (result === null || result === undefined) { | ||
| // Selector matched nothing — return empty rather than failing | ||
| return null; | ||
| } | ||
| return result; | ||
| } catch (error) { | ||
| throw new ToolChainResolutionError( | ||
| toolCallId, | ||
| `$select filter failed: ${error.message}. Expression: ${selector}` | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Undefined variable toolCallId in code example
Issue: The applySelector function signature is applySelector(data: any, selector: string) but the error handling references toolCallId which is not defined in scope.
Why: This code example would fail TypeScript compilation. Implementers would need to either add toolCallId as a parameter or restructure the error handling.
Fix: Add toolCallId as a third parameter:
function applySelector(data: any, selector: string, toolCallId: string): any {
try {
const sanitized = sanitizeJMESPathSelector(selector);
const result = jmespath.search(data, sanitized);
if (result === null || result === undefined) {
return null;
}
return result;
} catch (error) {
throw new ToolChainResolutionError(
toolCallId,
`$select filter failed: ${error.message}. Expression: ${selector}`
);
}
}Refs:
| ## Challenge 7: JMESPath extract tool first — DISMISSED | ||
| JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1. |
There was a problem hiding this comment.
🟡 Minor: Challenge 7 contradicts final spec decision
Issue: This says "JMESPath is demonstrably error-prone for LLMs...Using jq even in Phase 1" but the final spec uses JMESPath (confirmed by changelog line 34 and SPEC.md D8).
Why: This meta file was not updated after the final Phase 1 refinement. Future readers would be confused about the actual rationale for choosing JMESPath.
Fix: Update to reflect the actual outcome:
| ## Challenge 7: JMESPath extract tool first — DISMISSED | |
| JMESPath is demonstrably error-prone for LLMs (sanitizeJMESPathSelector, forbidden-pattern guidance). jq has better LLM training data. Using jq even in Phase 1. | |
| ## Challenge 7: JMESPath extract tool first — ACCEPTED AS PHASE 1 | |
| The existing `jmespath` library and `sanitizeJMESPathSelector()` provide sufficient capability with zero new dependencies. `_structureHints` already generates JMESPath selectors for LLMs to use, mitigating reliability concerns. |
Refs:
| - **Challenge 6 ($jq in resolveArgs)**: Adopted as Phase 1. Smallest change, zero token cost, covers pipeline case. | ||
| - **Challenge 4/5 (token cost, always-on)**: Phase 1 has zero cost. Phase 2 injection TBD pending measurement. | ||
| - **Challenge 1 (dependency weight)**: Phase 1 needs only a jq library, not full just-bash. | ||
| - **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better. |
There was a problem hiding this comment.
🟡 Minor: Stale statement contradicts final decision
Issue: This says "Challenge 7 (JMESPath): Dismissed" but line 34 confirms JMESPath was ultimately chosen for Phase 1.
Why: The changelog shows intermediate decisions but this creates a contradictory record since the final section (lines 32-39) supersedes it.
Fix: Either remove this line or update it to reflect the final outcome:
| - **Challenge 7 (JMESPath)**: Dismissed — LLMs unreliable with JMESPath, jq better. | |
| - **Challenge 7 (JMESPath)**: Initially dismissed, later ACCEPTED for Phase 1 — see "Phase 1 refinement" section below |
Refs:
|
@pullfrog, please review the spec again |
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
✅ Prior Issues Addressed
All 7 issues from the previous review have been fixed in this revision:
| # | Prior Issue | Status |
|---|---|---|
| 🟠 1 | SPEC.md:221 SENTINEL_KEY uses JQ instead of SELECT |
✅ Fixed (now line 439) |
| 🟠 2 | SPEC.md:143 Table header says "jq" but Phase 1 uses JMESPath |
✅ Fixed (now line 361 says "Handling for JMESPath") |
| 🟠 3 | SPEC.md:161-176 Undefined variable toolCallId in applySelector |
✅ Fixed (line 379 now includes toolCallId: string as parameter) |
| 🟡 4 | SPEC.md:328 Filename jq-filter.ts should be select-filter.ts |
✅ Fixed (line 548) |
| 🟡 5 | design-challenge.md:21-22 Challenge 7 contradicts final JMESPath decision |
✅ Fixed (now says "ACCEPTED AS PHASE 1") |
| 🟡 6 | _changelog.md:24 Stale "JMESPath dismissed" statement |
✅ Fixed (explains the pivot) |
| 🟡 7 | Missing test case for acceptance criterion #8 | ✅ Fixed (line 598 adds _structureHints test case) |
Delta Analysis
The revision also added significant new content:
- Full Phase 2 technical design (lines 80-312) — tool interface, architecture diagram, stdin data flow, SandboxExecutorFactory integration, resource limits, error handling, observability requirements, and acceptance criteria
- Compression prerequisite — added as a Phase 2 blocker (line 88)
- JMESPath consolidation note (line 355) — calls out that
sanitizeJMESPathSelector()is private and should be consolidated withjmespath-utils.ts - Changelog entries (lines 41-48) — documents the Phase 2 restoration
The new content is well-structured and internally consistent. No new issues identified.
💭 Consider (1) 💭
💭 1) audit-findings.md:23 Historical reference to $jq
Issue: Line 23 says "Phase 1 ($jq)" but Phase 1 now uses $select.
Why: This is in a historical audit document, so it accurately reflects the state at audit time.
Fix: No action needed — historical accuracy is acceptable here. The main SPEC.md is correct.
✅ APPROVE
Summary: All 7 issues from the prior review have been addressed. The spec is now internally consistent — no stale jq references remain in the implementation guidance, terminology is aligned throughout, and the code example bug is fixed. The expanded Phase 2 documentation is comprehensive and well-structured. Ready to merge. 🎉
Discarded (0)
No discarded findings.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator-delta-review |
1 | 0 | 1 | 0 | 0 | 0 | 0 |
| Total | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Note: This was a delta review scoped to changes since the prior automated review at commit 705b252. All 7 prior issues were verified as fixed.
There was a problem hiding this comment.
Re-review: all previous issues fixed. Remaining items are terminology cleanup in Phase 1 sections and one design accuracy issue.
All 7 issues from the first review (stale jq→JMESPath terminology, undefined toolCallId, filename mismatch, missing test case) have been addressed. All 14 codebase claims verified accurate — file paths, line numbers, method signatures, dependency versions, and behavioral descriptions all match the actual code.
Two new issues found: (1) the Phase 1 serialization table and surrounding sections still reference "jq" in 6 places where they should say "JMESPath" — this is specifically in Phase 1 sections where the query language is JMESPath, not jq; (2) the serialization table itself is misleading because JMESPath operates on native JS objects, not serialized JSON strings.
The meta files (_changelog.md, design-challenge.md, audit-findings.md) also have stale $jq / "jq library" references from the pre-pivot era, but these are historical records so the inline comments focus on the normative SPEC.md sections that implementers will follow.
Claude Opus | 𝕏
| The `$select` filter operates on the resolved data. Handle different data types: | ||
|
|
||
| | Source type | Handling for JMESPath | Notes | | ||
| |-------------|---------------------|-------| | ||
| | Object/Array | `JSON.stringify(data)` | Normal case — jq operates on JSON | |
There was a problem hiding this comment.
Phase 1 serialization table still references jq in 4 places. This table is in Section 5.3 (Phase 1 design), where the query language is JMESPath — not jq. Additionally, the table is somewhat misleading: JMESPath operates directly on JavaScript objects via jmespath.search(data, expression), not on serialized JSON strings. The JSON.stringify column makes sense for Phase 2 bash/stdin, but for Phase 1 $select, the resolved data is passed directly to jmespath.search() as a JS value.
Stale references:
- Line 363:
Normal case — jq operates on JSON→ JMESPath - Line 366:
Skip jq, return error | jq only works on text/JSON→ JMESPath
Consider renaming the column header to just "Handling" (it's already correct as Handling for JMESPath) and updating the notes to reflect that JMESPath receives the native JS object, not a serialized string.
| The `$select` filter operates on the resolved data. Handle different data types: | |
| | Source type | Handling for JMESPath | Notes | | |
| |-------------|---------------------|-------| | |
| | Object/Array | `JSON.stringify(data)` | Normal case — jq operates on JSON | | |
| | Source type | Handling for JMESPath | Notes | | |
| |-------------|---------------------|-------| | |
| | Object/Array | Pass directly to `jmespath.search()` | Normal case — JMESPath operates on JS objects | | |
| | String | Pass as-is (not re-stringified) | Avoid double-quoting: `"hello"` not `"\"hello\""` | | |
| | MCP content array | Unwrap text parts, concatenate | `{ content: [{type: 'text', text: '...'}] }` → extract text | | |
| | Buffer/binary | Skip filter, return error | JMESPath only works on structured/text data | |
| When `$tool`+`$artifact` ref resolves to an oversized artifact (`retrievalBlocked: true`): | ||
| - `$select` filtering bypasses the retrieval block | ||
| - Artifact data fetched via `ArtifactService.getArtifactFull()` with new `{ allowOversized: true }` option | ||
| - jq filter applied, producing a subset that fits in context |
There was a problem hiding this comment.
Stale jq reference in Phase 1 section. This line says "jq filter applied" but this is Section 5.4 (Phase 1 Oversized Artifact Processing), where the filter is JMESPath via $select.
| - jq filter applied, producing a subset that fits in context | |
| - JMESPath filter applied via `$select`, producing a subset that fits in context |
| } | ||
| ``` | ||
|
|
||
| On jq failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors). |
There was a problem hiding this comment.
Stale jq reference in Phase 1 error handling section.
| On jq failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors). | |
| On JMESPath failure, the `ToolChainResolutionError` propagates to the LLM as a tool call error. The LLM can retry with corrected syntax (same pattern as existing artifact resolution errors). |
| | Component | How it participates | | ||
| |-----------|---------------------| | ||
| | `resolveArgs()` | Extended (not replaced) — existing `$tool`/`$artifact` logic unchanged | | ||
| | `ToolChainResolutionError` | Reused for jq filter errors | |
There was a problem hiding this comment.
Stale jq reference in Phase 1 integration points.
| | `ToolChainResolutionError` | Reused for jq filter errors | | |
| | `ToolChainResolutionError` | Reused for `$select` filter errors | |
| |---|----------|--------|------|------------|-----------| | ||
| | D1 | Phase 1: `$select` (JMESPath) in resolveArgs; Phase 2: bash tool | LOCKED | Cross-cutting | HIGH | `$select` is smallest viable change — no new tool, no token cost, zero new deps (uses existing `jmespath`). Bash tool for exploratory cases pending validation. | | ||
| | D2 | Queryable sources = session cache + stored artifacts | DIRECTED | Cross-cutting | HIGH | Session cache for current-turn, artifacts for cross-turn | | ||
| | D3 | Allow processing oversized artifacts | LOCKED | Cross-cutting | HIGH | Key capability — jq filter operates out-of-context | |
There was a problem hiding this comment.
Stale jq reference in decision log. D3's rationale says "jq filter" but the Phase 1 mechanism is JMESPath filtering via $select. jq is only relevant in Phase 2 (bash tool).
| | D3 | Allow processing oversized artifacts | LOCKED | Cross-cutting | HIGH | Key capability — jq filter operates out-of-context | | |
| | D3 | Allow processing oversized artifacts | LOCKED | Cross-cutting | HIGH | Key capability — `$select` filter operates out-of-context | |
| → stdin receives 50K JSON, jq filters to 3K (call_id: "call_bash_1") | ||
| → executed in sandbox, event loop free | ||
| → 3K result cached in ToolSessionManager, enters context | ||
|
|
There was a problem hiding this comment.
exampleSelectors use a result. prefix that gets stripped. Worth noting: _structureHints.exampleSelectors generate JMESPath expressions prefixed with result. (e.g., result.items[?score > 0.8]). ArtifactService.createArtifact strips this prefix before applying (line 215-217). When the LLM uses these selectors in a $select expression, the implementer needs to decide whether $select should also handle the result. prefix, or whether _structureHints guidance should clarify that $select expressions don't need the prefix. This is worth calling out explicitly to avoid confusion during implementation.
Preview URLsUse these stable preview aliases for testing this PR:
These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find. Raw Vercel deployment URLs
|
| }); | ||
| ``` | ||
|
|
||
| **Available commands (most relevant):** |
There was a problem hiding this comment.
Do we want to limit the amount of tools initially available?
- Serialization table: jq→JMESPath, JSON.stringify→pass directly to jmespath.search() - Oversized artifact processing: "jq filter" → "JMESPath filter via $select" - Error handling: "On jq failure" → "On JMESPath failure" - Integration points: "jq filter errors" → "$select filter errors" - Decision D3: "jq filter" → "$select filter" - Added result. prefix handling note for _structureHints.exampleSelectors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
✅ Prior Issues Addressed
All 6 issues from the pullfrog review (20:48) have been fixed in commit 0eae157:
| # | Prior Issue | Status |
|---|---|---|
| 1 | SPEC.md:363 "jq operates on JSON" → JMESPath |
✅ Fixed (line 365: "JMESPath operates on JS objects") |
| 2 | SPEC.md:366 "jq only works on text/JSON" → JMESPath |
✅ Fixed (line 368: "JMESPath only works on structured/text data") |
| 3 | SPEC.md:375 "jq filter applied" → JMESPath |
✅ Fixed (line 375: "JMESPath filter applied via $select") |
| 4 | SPEC.md:399 "On jq failure" → JMESPath |
✅ Fixed (line 399: "On JMESPath failure") |
| 5 | SPEC.md:459 "jq filter errors" → $select |
✅ Fixed (line 459: "$select filter errors") |
| 6 | SPEC.md:471 D3 "jq filter" → $select |
✅ Fixed (line 471: "$select filter operates out-of-context") |
Delta Analysis
The commit 0eae157 ("Fix remaining stale jq references in Phase 1 sections") correctly addresses all terminology inconsistencies. Additionally:
-
Added
result.prefix handling (line 357) — documents that_structureHints.exampleSelectorsgenerate JMESPath expressions with aresult.prefix that gets stripped byArtifactService.createArtifact(), and recommends the$selectimplementation auto-strip this prefix for consistency. This addresses pullfrog's design accuracy note. -
Serialization table corrected — now accurately describes that JMESPath operates on native JS objects (not serialized JSON strings), fixing the conceptual inaccuracy in the prior version.
🕐 Pending Recommendations (1)
- 💬
SPEC.md:198tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)
✅ APPROVE
Summary: All terminology issues have been resolved. The spec is now internally consistent — no stale jq references remain in Phase 1 sections. The result. prefix handling documentation is a valuable addition that will help implementers. One human question remains open about limiting Phase 2 bash tool commands, but this doesn't block the spec for Phase 1 implementation. 🎉
Discarded (0)
No discarded findings.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator-delta-review |
6 | 0 | 0 | 0 | 0 | 1 | 0 |
| Total | 6 | 0 | 0 | 0 | 0 | 1 | 0 |
Note: This was a delta review scoped to changes since the pullfrog review at commit b48ff20. All 6 prior stale jq→JMESPath issues were verified as fixed.
There was a problem hiding this comment.
PR Review Summary
(1) Total Issues | Risk: Low
🟠⚠️ Major (1) 🟠⚠️
Inline Comments:
- 🟠 Major:
SPEC.md:29Stale SandboxExecutorFactory reference contradicts updated Phase 2 execution model
💭 Consider (2) 💭
💭 1) SPEC.md:89,130 WASM bundling vs invocation distinction
Issue: The spec states "we disable the WASM-dependent optional runtimes" and "no WASM for core commands" but the evidence file (just-bash-assessment.md:47-51) notes that sql.js and quickjs-emscripten WASM modules are bundled with just-bash, even if not invoked.
Why: The distinction is between "not invoked" (what the spec means) vs "not bundled" (what readers might assume). The evidence file already acknowledges this ("We don't use python/JS/sqlite features but they're bundled").
Fix: Consider clarifying in the dependency spike prerequisite that verification should confirm WASM modules are lazy-loaded (not impacting cold start) if not invoked, or update the spec language from "no WASM" to "WASM runtimes not invoked."
💭 2) SPEC.md:172-179 Concurrency limits removed without explicit rationale
Issue: The delta removed acceptance criterion #6 "Concurrent bash calls bounded by semaphore" without documenting why concurrency limits are no longer needed for in-process execution.
Why: The prior SandboxExecutorFactory approach used semaphore-based concurrency limiting (NativeSandboxExecutor.ts:70-129). In-process execution doesn't need this pattern, but the rationale should be documented.
Fix: Add a brief note in the Resource Limits section explaining that concurrency is implicitly bounded by Node.js single-threaded execution model (one request at a time in serverless, event loop for others).
🕐 Pending Recommendations (1)
- 💬
SPEC.md:191tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)
💡 APPROVE WITH SUGGESTIONS
Summary: The Phase 2 architecture simplification (SandboxExecutorFactory → in-process just-bash) is a well-reasoned change that removes unnecessary complexity. However, one reference at line 29 (in the Problem Statement section) wasn't updated and now contradicts the new execution model described in the rest of the spec. Quick fix — update line 29 to match the Architecture section. The design considerations (WASM bundling, concurrency model) are valid nuances to address before Phase 2 implementation but don't block this spec revision.
Discarded (2)
| Location | Issue | Reason Discarded |
|---|---|---|
SPEC.md:130 |
Security model assumes LLM-generated commands are safe | Valid design consideration but already gated behind Phase 2 prerequisites — the dependency spike + usage data will inform this decision before implementation |
evidence/just-bash-assessment.md |
WASM bundling risk | Already acknowledged in evidence file ("they're bundled") — not a new finding |
Reviewers (2)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-architecture |
3 | 0 | 2 | 0 | 0 | 0 | 1 |
pr-review-standards |
1 | 0 | 0 | 0 | 1 | 0 | 0 |
| Total | 4 | 0 | 2 | 0 | 1 | 0 | 1 |
Note: Delta review scoped to changes since commit 0eae157. All prior issues (jq→JMESPath terminology, etc.) were verified as fixed in that commit.
|
|
||
| **Phase 1 — `$select` selector in `$tool` refs:** Enhance `resolveArgs()` to support inline JMESPath filtering at resolution time. E.g., `{ "$tool": "call_id", "$select": "items[?score > `0.8`].{title: title, url: url}" }`. This filters data before it reaches the downstream tool — no extra tool call, no token cost in system prompts, **zero new dependencies** (uses the existing `jmespath` library already in the codebase). `_structureHints` already generates example JMESPath selectors that the LLM can use directly. Covers the pipeline case (filtering data between tools). | ||
|
|
||
| **Phase 2 — Built-in `bash` tool:** A sandboxed shell (via `just-bash`) for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes via the existing `SandboxExecutorFactory` (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). Covers the interactive case. Requires dependency validation spike first. |
There was a problem hiding this comment.
🟠 MAJOR: Stale SandboxExecutorFactory reference contradicts updated Phase 2 execution model
Issue: This line still describes Phase 2 execution via SandboxExecutorFactory (NativeSandboxExecutor/VercelSandboxExecutor), but the rest of the Phase 2 section (lines 116-168) was updated in this commit to describe direct in-process execution via just-bash with no SandboxExecutorFactory.
Why: Implementers following the spec will see contradictory execution models — the overview says "SandboxExecutorFactory" while the Architecture section says "no child processes, no IPC, no sandbox wrappers needed." Decision D6 (line 465) and acceptance criterion #5 (line 298) both now specify in-process execution.
Fix: Update to match the new execution model:
| **Phase 2 — Built-in `bash` tool:** A sandboxed shell (via `just-bash`) for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes via the existing `SandboxExecutorFactory` (NativeSandboxExecutor for local dev, VercelSandboxExecutor for production). Covers the interactive case. Requires dependency validation spike first. | |
| **Phase 2 — Built-in `bash` tool:** A sandboxed bash tool powered by `just-bash` for exploratory querying — when the agent needs to see the filtered result before deciding what to do. Executes directly in-process (no external sandbox needed — just-bash is a virtual bash environment with in-memory FS and no network). Covers the interactive case. Requires dependency validation spike first. |
Refs:
There was a problem hiding this comment.
PR Review Summary
(6) Total Issues | Risk: Medium
This is a delta review scoped to changes since commit 51e7697bf201 — the implementation of $select filtering for tool chaining (18 files, +1040/-140).
🟠⚠️ Major (3) 🟠⚠️
Inline Comments:
- 🟠 Major:
select-filter.ts:27Duplicate DANGEROUS_PATTERNS creates security drift risk - 🟠 Major:
function-tools.ts:94Silent schema conversion failure hides validation degradation - 🟠 Major:
mcp-tools.ts:33Silent schema conversion failure in MCP tools
🟡 Minor (3) 🟡
🟡 1) instrumentation.ts:97 SpanProcessor mutates ReadableSpan attributes
Issue: ToolResultSanitizer.onEnd() mutates span.attributes directly via cast (span as any).attributes. ReadableSpan attributes are intended to be immutable after the span ends.
Why: This mutation may not persist depending on processor ordering relative to export. The code works because of the current processor order (BaggageSpanProcessor → ToolResultSanitizer → BatchSpanProcessor), but this is fragile.
Fix: Consider documenting this is a known workaround that depends on processor ordering, or strip internal fields earlier in the pipeline before recording the result.
Refs:
Inline Comments:
- 🟡 Minor:
select-filter.test.ts:29Missing tests forstripInternalFieldsutility - 🟡 Minor:
resolveArgs-select.test.ts:101Missing test for$selectonretrievalBlockedartifacts
💭 Consider (4) 💭
💭 1) select-filter.ts:16-17 Cache eviction strategy
Issue: The selectorCache Map has a 1000 entry limit but no eviction policy — once full, new selectors get no caching benefit.
Why: Cache effectiveness degrades over time as early selectors are cached forever while later ones are not.
Fix: Consider LRU cache if this becomes a hot path. Current limit prevents unbounded growth, so acceptable for now.
💭 2) select-filter.ts:126-130 Data shape exposure in error messages
Issue: summarizeShape output in error messages could expose internal field names to LLM output.
Why: The function does filter _-prefixed keys, mitigating the primary concern. Consider whether exposing any field names is acceptable for your threat model.
💭 3) agent-types.ts:244-252 Duplicated validation result type
Issue: The safeParse return type is inlined twice (for parameters and baseInputSchema).
Fix: Extract to shared type alias for maintainability.
💭 4) ref-aware-schema.ts:36-75 PARAM_REF_SCHEMA typed as Record<string, unknown>
Issue: Loses structure validation at compile time.
Fix: Consider defining a JsonSchema interface, though this is acceptable since JSON Schema manipulation is inherently weakly typed.
🕐 Pending Recommendations (1)
- 💬
SPEC.md:191tim-inkeep asks: "Do we want to limit the amount of tools initially available?" (Phase 2 bash tool commands)
💡 APPROVE WITH SUGGESTIONS
Summary: The $select implementation is well-designed — it reuses existing JMESPath infrastructure, has good security checks (dangerous patterns, length limits), comprehensive test coverage for the core filtering logic, and clear error messages. The main concerns are: (1) duplicate DANGEROUS_PATTERNS creating security drift risk with agents-core, and (2) silent schema conversion failures that hide validation degradation. Both are addressable with straightforward changes. The test coverage gaps (stripInternalFields, retrievalBlocked artifacts) are worth adding but not blocking.
Discarded (3)
| Location | Issue | Reason Discarded |
|---|---|---|
instrumentation.ts:81 |
SpanProcessor.onStart signature uses ReadableSpan | Technically correct concern but onStart is empty and parameter is unused — no practical impact |
ref-aware-schema.ts:1 |
Minimal zod usage | Valid but extremely minor — import is used, just not extensively |
PromptConfig.ts:11-18 |
Templates inlined vs file imports | This is the existing pattern in the file, not a regression from this PR |
Reviewers (6)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
3 | 0 | 1 | 0 | 0 | 0 | 2 |
pr-review-appsec |
3 | 1 | 2 | 0 | 0 | 0 | 0 |
pr-review-errors |
3 | 2 | 0 | 0 | 0 | 0 | 1 |
pr-review-tests |
7 | 0 | 0 | 0 | 2 | 0 | 5 |
pr-review-consistency |
5 | 1 | 0 | 0 | 0 | 0 | 4 |
pr-review-types |
2 | 0 | 2 | 0 | 0 | 0 | 0 |
| Total | 23 | 4 | 5 | 0 | 2 | 0 | 12 |
Note: High overlap across reviewers on the duplicate DANGEROUS_PATTERNS issue and silent schema conversion failures.
| /constructor/, | ||
| /prototype/, | ||
| /__proto__/, | ||
| ]; |
There was a problem hiding this comment.
🟠 MAJOR: Duplicate DANGEROUS_PATTERNS creates security drift risk
Issue: This DANGEROUS_PATTERNS array duplicates the identical list in packages/agents-core/src/utils/jmespath-utils.ts (lines 191-198). The spec explicitly notes at line 346 to consolidate JMESPath utilities to avoid divergent implementations.
Why: If new dangerous patterns are identified, both locations must be updated. Security control drift is a real risk when validation logic is duplicated.
Fix: Import from the existing shared location:
import { DANGEROUS_PATTERNS, MAX_EXPRESSION_LENGTH } from '@inkeep/agents-core/utils/jmespath-utils';Refs:
| baseInputSchema = makeBaseInputSchema(functionData.inputSchema); | ||
| } catch { | ||
| // baseInputSchema stays undefined — post-resolution validation will be skipped | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Silent schema conversion failure hides validation degradation
Issue: When makeBaseInputSchema() fails, baseInputSchema stays undefined and post-resolution validation is silently skipped. If an LLM's $select resolves to the wrong type (object instead of string), the tool executes with wrong-typed data causing confusing errors deep in tool execution rather than a clear validation message.
Why: Operators have zero visibility into which tools are running with degraded type-safety. The comment "post-resolution validation will be skipped" documents the behavior but doesn't surface it operationally.
Fix: Log a warning when schema conversion fails:
try {
baseInputSchema = makeBaseInputSchema(functionData.inputSchema);
} catch (schemaError) {
logger.warn(
{
functionToolName: functionToolDef.name,
schemaError: schemaError instanceof Error ? schemaError.message : String(schemaError),
},
'Failed to build base input schema; post-resolution validation will be skipped'
);
}Refs:
| refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>, | ||
| baseInputSchema: undefined, | ||
| }; | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Silent schema conversion failure in MCP tools
Issue: Same issue as function-tools.ts — when buildRefAwareInputSchema fails, baseInputSchema is undefined and validation is silently disabled. MCP tools often have complex schemas, and failures here mean agents lose type-safety for tool chaining without any operational visibility.
Why: A user who writes $select: 'items[0]' expecting a string but getting an object will see cryptic MCP tool errors rather than a clear "resolved to wrong type" message.
Fix: Add warning logging in the catch block:
} catch (error) {
logger.warn(
{ schemaError: error instanceof Error ? error.message : String(error) },
'Failed to build ref-aware schema for MCP tool; post-resolution validation disabled'
);
return {
refAwareInputSchema: inputSchema as ReturnType<typeof z.fromJSONSchema>,
baseInputSchema: undefined,
};
}| it('should return valid expressions unchanged', () => { | ||
| expect(sanitizeJMESPathSelector('items[?score > `0.8`]')).toBe('items[?score > `0.8`]'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
🟡 Minor: Missing tests for stripInternalFields utility
Issue: The stripInternalFields function exported from select-filter.ts (lines 9-14) is not tested. This function filters out underscore-prefixed keys before exposing results to downstream tools and traces.
Why: If this function breaks, tool results would incorrectly include internal metadata (_structureHints, _toolCallId), potentially causing schema validation failures or exposing internal implementation details.
Fix: Add a test section:
describe('stripInternalFields', () => {
it('should remove underscore-prefixed keys from object', () => {
const data = { name: 'test', _structureHints: {}, _toolCallId: 'call_1' };
expect(stripInternalFields(data)).toEqual({ name: 'test' });
});
it('should return arrays unchanged', () => {
expect(stripInternalFields([1, 2, 3])).toEqual([1, 2, 3]);
});
it('should return primitives unchanged', () => {
expect(stripInternalFields('hello')).toBe('hello');
expect(stripInternalFields(null)).toBe(null);
});
});|
|
||
| expect(result).toBe(3); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
🟡 Minor: Missing test for $select on retrievalBlocked artifacts
Issue: The spec states that oversized artifacts with retrievalBlocked: true can be filtered via $select to bypass the retrieval block (SPEC.md line 357-360). No test verifies this critical behavior.
Why: If the system fails to allow $select filtering on blocked artifacts, large tool results would be completely inaccessible even when the agent only needs a small subset — defeating a primary use case.
Fix: Add a test case:
it('should allow $select to extract from retrievalBlocked artifact', async () => {
const parser = createParser({
getArtifactFull: vi.fn().mockResolvedValue({
data: { items: [{ id: 1 }, { id: 2 }] },
metadata: { retrievalBlocked: true, isOversized: true }
}),
});
const result = await parser.resolveArgs({
$artifact: 'art_oversized',
$tool: 'call_search',
$select: 'items[0].id',
});
expect(result).toBe(1);
});
Ito Test Report ✅11 test cases ran. 1 additional finding, 10 passed. Across 11 test cases, 10 passed and 1 failed, showing that most run-domain and Playground behaviors are functioning as expected, including route parity between /run/api/chat and /run/v1/chat/completions, correct $tool/$artifact/$select resolution (with nested and no-$select compatibility), type-mismatch rejection with actionable guidance, internal field sanitization, mobile viewport usability, and selector-injection guard enforcement. The key issue is one high-severity, reproducible concurrency defect where rapid repeated submits can generate colliding millisecond-based request IDs (chatcmpl-${Date.now()}), causing request-scoped stream/session state overwrites and resulting in cross-run corruption, dropped or misaligned streams, and assistant error states under burst traffic. ✅ Passed (10)ℹ️ Additional Findings (1)
|











Summary
$selectsentinel key to$tool/$artifactrefs — inline JMESPath filtering atresolveArgs()resolution time, zero new dependencies, zero token costbashtool viajust-bashfor interactive data exploration (jq, grep, awk), executed via existingSandboxExecutorFactoryProblem
The tool chaining system (
$tool/$artifactrefs) lets agents pipe data between tools without it entering context. The prompt guidance teaches a pipeline pattern (tool_a → extract → tool_b), but theextractstep doesn't exist as a real capability. When tool results are large, they either:The system already provides
_structureHintswith ready-to-use JMESPath selectors showing the LLM what to extract — but no mechanism to do the extraction without pulling data into context.Phase 1:
$selectin$toolrefsAdd a
$selectkey to the sentinel ref system. DuringresolveArgs()(which already runs before every tool execution), if$selectis present alongside$tool, the resolved data is filtered through JMESPath before being passed to the tool.Zero new dependencies — uses the existing
jmespathlibrary andsanitizeJMESPathSelector()already in the codebase. Consolidates with existingjmespath-utils.tspublic utilities.Zero token cost — no new tool schema in system prompts. The filtering happens transparently during argument resolution.
Example flow:
The 50K result never enters conversation context. The LLM uses
_structureHints.exampleSelectors(which are already JMESPath) to write the$selectexpression.Oversized artifacts (
retrievalBlocked: true) can also be filtered via$select, bypassing the retrieval block since the filter produces a manageable subset.Key design choices:
$selectkey is language-agnostic — can upgrade from JMESPath to jq later without changing the API contractsanitizeJMESPathSelector()to auto-fix common LLM JMESPath errorsPhase 2: Built-in
bashtool (future — fully designed)A sandboxed bash tool powered by
just-bashfor interactive data exploration — when the agent needs to see filtered results before deciding what to do next (vs$selectwhich pipes directly to another tool).Full design in spec includes: tool interface + schema, architecture (main thread → SandboxExecutorFactory → sandbox process), stdin data flow, resource limits, just-bash configuration, pipeline examples, error handling, prompt guidance (when to use bash vs
$select), OpenTelemetry observability, integration points, and acceptance criteria.Execution via
SandboxExecutorFactory— reuses the existing dual-path sandbox infrastructure (NativeSandboxExecutorfor local dev,VercelSandboxExecutorfor production). No new process management infrastructure needed.Before advancing to Phase 2
We will likely revisit and change conversation and generation compression triggers before implementing the bash tool. The current compression thresholds and mid-generation compression behavior should be re-evaluated in light of
$selectreducing context pressure — the existing triggers may fire too aggressively or need different strategies once agents can proactively filter data.Phase 2 prerequisites:
just-bashdependency spike (install, build, WASM cold start timing)$selectcover enough cases?)Spec artifacts
specs/2026-04-06-bash-tool-for-tool-chaining/SPEC.mdspecs/.../evidence/tool-chaining-current-state.mdspecs/.../evidence/compression-cost-analysis.mdspecs/.../evidence/just-bash-assessment.mdspecs/.../evidence/sandbox-executor-pattern.mdspecs/.../meta/audit-findings.mdspecs/.../meta/design-challenge.mdAudit highlights
Spec was audited by two independent cold-reader agents. Key findings that shaped the design:
wrapToolWithStreamingdoes NOT auto-cache results → bash tool must callrecordToolResult()explicitly$selectin resolveArgs proposed as lighter alternative → adopted as Phase 1🤖 Generated with Claude Code