refactor(tools): use docker/aijson for LLM-tolerant JSON unmarshal#2816
refactor(tools): use docker/aijson for LLM-tolerant JSON unmarshal#2816trungutt wants to merge 1 commit into
Conversation
Replaces the in-tree input-shape repair layer (PR docker#2635) with a call to the new github.com/docker/aijson library. aijson exposes a drop-in encoding/json.Unmarshal shape with optional OnRepair telemetry hook, so NewHandler[T] collapses to a single Unmarshal call. The four repairs and their invariants (validate-then-repair, ordering, strict-parse-first) are unchanged — they now live in github.com/docker/aijson and are unit-tested there. The integration tests in repair_test.go pin the docker-agent-specific plumbing (slog event, NewHandler error surface). Co-authored-by: rumpl <djordje.lukic@docker.com>
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
This PR correctly removes the local repair layer and delegates to github.com/docker/aijson. The compile-time surface is clean — no missing symbols, no orphaned callers. One medium observability concern and one stale comment were flagged.
| // original error so the model sees the schema's complaint, not | ||
| // the repair-layer's complaint about a synthesised payload. | ||
| var params T | ||
| err := aijson.Unmarshal([]byte(args), ¶ms, aijson.OnRepair(func(kinds []aijson.Kind) { |
There was a problem hiding this comment.
[MEDIUM] tool_input_repaired slog field type changed — telemetry representation may silently differ
The repairs slog field now logs []aijson.Kind (an external type from github.com/docker/aijson) instead of the previous []repairKind (local typed strings). Two concerns:
-
String representation:
aijson.Kindvalues may render differently in structured logs / alerting pipelines that pattern-match on the old repair-kind strings (e.g."json_stringified_array"). If any dashboards or alerts key on those strings, they will silently stop matching. -
Firing semantics: The callback is now driven entirely by
aijson.Unmarshal's internals. If the library's notion of "a repair was applied" differs from the old validate-then-repair loop (e.g., it fires for normalizations the old code did not count, or skips one of the four repair kinds in some edge case), the event rate will diverge without any compiler or test signal.
The PR description asserts "Behavior is unchanged" but this is not testable from the diff alone — the four repair kinds and their ordering invariant now live in an external v0.1.0 package. Consider adding a test that asserts the exact aijson.Kind string values emitted for each of the four repair cases, or explicitly documenting the expected Kind strings in the slog event.
| assert.JSONEq(t, `{"paths":["only.txt"],"tags":["go","ai"]}`, string(out)) | ||
| } | ||
| // These are integration tests for NewHandler's repair plumbing. Unit tests | ||
| // for the four repair kinds live in github.com/docker/json-tool-repair. |
There was a problem hiding this comment.
[LOW] Stale comment references wrong package name (json-tool-repair vs aijson)
The comment says unit tests for the four repair kinds live in github.com/docker/json-tool-repair, but the actual library added to go.mod is github.com/docker/aijson. A future contributor following this comment will not find the referenced repository.
Suggested fix:
// These are integration tests for NewHandler's repair plumbing. Unit tests
// for the four repair kinds live in github.com/docker/aijson.
The validate-then-repair layer added in #2635 has zero coupling to docker-agent. The same four shape repairs apply to every place we parse JSON produced by an LLM — tool-call arguments today, structured outputs and JSON-mode responses next. Keeping it in
pkg/tools/ties a reusable utility to a single import site and makes it harder to apply at other entry points (MCP/A2A bridges, future consumers).This PR moves the implementation to
github.com/docker/aijsonand replaces the local copy with a one-lineaijson.Unmarshalcall. The library's public API is a drop-in forencoding/json.Unmarshal:NewHandler[T]collapses to a single call that keeps thetool_input_repairedslog event for telemetry. The four repair kinds, their ordering invariant, and the validate-then-repair design all move with the implementation and are unit-tested in the new repo.Behavior is unchanged; valid input still pays nothing.