fix: reduce retained tool output memory by dgageot · Pull Request #2854 · docker/docker-agent

dgageot · 2026-05-21T12:13:13Z

wait for lifecycle supervisor watcher goroutines during shutdown
bound retained tool output in session history with a regression test
spool large MCP media payloads to disk instead of retaining inline base64
slim retained TUI tool results and avoid duplicating file contents in metadata

aheritier

The PR attacks four distinct memory sources and the implementation is mostly clean. The WithoutPayload slimming, the watchDone shutdown guarantee, and the ReadFileMeta.Content removal are all sound. One blocking issue must be resolved before merge.

Blocking

Disk leak: spooled MCP media files are never cleaned up.
writeMediaFile creates temp directories under os.TempDir() with prefix docker-agent-mcp-media-* but nothing removes them when the session ends or the ToolCallResult is GC'd. The PR itself calls defer os.RemoveAll(filepath.Dir(img.FilePath)) in the test, which highlights the production gap. Over a long session with many large media responses this trades the memory leak for a disk leak.

Fix options:

Register a cleanup func on the GatewayToolset lifecycle (cf. cleanUp pattern already used there), called on Stop.
Use a single session-scoped temp directory created once at Toolset construction and removed atomically on teardown — simpler than one dir per payload and easier to reason about.

Non-blocking

Double WithoutPayload() in reasoningblock.go — messages.go already calls msg.Result.WithoutPayload() before passing the result to UpdateToolResult; reasoningblock.go then calls result.WithoutPayload() again internally. Harmless (idempotent on a stripped result) but confusing. Choose one call site and remove the other.
Missing test: disk-write fallback in encodeMedia — the slog.Warn + inline fallback branch is untested. Worth covering with an injected error (e.g. a package-level writeMediaFile var that tests can replace).
Missing test: threshold boundary — TestProcessMCPContentSpoolsLargeMedia only checks maxInlineMediaBytes+1. Add a case for exactly maxInlineMediaBytes bytes → inline, and maxInlineMediaBytes+1 → spooled.
TestSupervisor_StopWaitsForWatcher is weak coverage for the concurrency fix — the test exercises the sequential path (Start → Stop), not the concurrent-Stop path that the s.stopping guard is specifically designed to handle. A test calling Stop twice concurrently from two goroutines while the watcher is live would be a stronger regression guard. The < time.Second timing assertion is also brittle under heavy CI load; goleak would be more idiomatic.
readImageFile in filesystem.go still inlines large local images — up to chat.MaxImageBytes (~4.5 MB) as base64. This is inconsistent with the new MCP spooling strategy. Out of scope for this PR, but worth a follow-up issue.

aheritier · 2026-05-23T10:47:40Z

+
+func writeMediaFile(data []byte, mimeType string) (string, error) {
+	dir, err := os.MkdirTemp("", "docker-agent-mcp-media-*")
+	if err != nil {


[BLOCKING] Disk leak — these temp directories are never removed.

os.MkdirTemp creates a directory in os.TempDir() on every call to writeMediaFile, but nothing cleans it up when the ToolCallResult is discarded or the session ends. Over a long session with many large media tool responses this will exhaust disk space.

Suggested fix: introduce a session-scoped temp directory (created once, removed on Stop) instead of a per-payload directory. Alternatively, register a cleanup closure on the Toolset lifecycle the same way GatewayToolset does today.

aheritier · 2026-05-23T10:47:40Z

+	assert.Equal(t, "image/png", img.MimeType)
+	require.NotEmpty(t, img.FilePath)
+	defer os.RemoveAll(filepath.Dir(img.FilePath))
+


The test correctly cleans up with defer os.RemoveAll(...), but this highlights that the production code does not. This defer will be unnecessary once a lifecycle cleanup hook is added on the production side.

aheritier · 2026-05-23T10:47:40Z

 		entry.msg.ToolStatus = status
-		entry.msg.ToolResult = result
+		entry.msg.ToolResult = result.WithoutPayload()



[Non-blocking] Double WithoutPayload() call.

The caller in messages.go already passes msg.Result.WithoutPayload() here, so result is already a stripped copy. Calling result.WithoutPayload() a second time is idempotent but redundant and may confuse future readers. Either remove WithoutPayload() from this line (since it's already stripped at the call site), or remove it from the call site in messages.go and let UpdateToolResult own the stripping — but not both.

aheritier · 2026-05-23T10:47:40Z

+	start := time.Now()
+	assert.NilError(t, s.Stop(t.Context()))
+	assert.Check(t, time.Since(start) < time.Second)
+	assert.Check(t, is.Equal(s.State().State, lifecycle.StateStopped))


[Non-blocking] Brittle timing assertion and weak concurrency coverage.

This < time.Second bound will be flaky under heavy CI load. More importantly, the test only covers the sequential Start→Stop path, not the concurrent-Stop path that the s.stopping guard was added to fix. A test with two goroutines calling Stop concurrently while the watcher is alive (e.g. blocked in sess.Wait()) would be a stronger regression guard. Consider using goleak to assert no goroutines are left behind.

docker-agent

Assessment: 🟢 APPROVE

Reviewed fix: reduce retained tool output memory (PR #2854).

Scope: 10 files changed — watcher shutdown sequencing (supervisor.go), metadata deduplication (filesystem.go), MCP media spooling to disk (mcp.go), TUI payload slimming (messages.go), and supporting tests.

Verification summary: The drafter raised 5 hypotheses (3 medium, 2 low). After code-level verification, all 5 were dismissed:

#	File	Hypothesis	Verdict	Reason
1	`supervisor.go`	`RestartAndWait` blocks indefinitely on ctx cancel during backoff	✅ DISMISSED	`RestartAndWait` is not changed by this PR; the new `watchDone` mechanism applies to `Stop()` only. No regression introduced.
2	`filesystem.go`	TOCTOU between `resolveAndCheckPath` and `readFile` in WalkDir	✅ DISMISSED	The diff only removes a duplicate metadata field and stale assignment. No I/O path changed; TOCTOU was already addressed in prior commit `eb7bb600`.
3	`messages.go`	Data race on `renderedItems` LRU cache between `View()`/`Update()`	✅ DISMISSED	bubbletea runs `View()` and `Update()` on the same single-threaded loop goroutine — no concurrency. The diff only swaps `msg.Result` → `msg.Result.WithoutPayload()` with no new cache access.
4	`supervisor.go`	Panic in `OnFailed`/`OnRestart` callbacks uncaught in watcher	✅ DISMISSED	Callback invocations are unchanged; lack of panic recovery is pre-existing, not introduced by this PR.
5	`filesystem.go`	Repair loop capped at 3 passes may silently fail	✅ DISMISSED	No such loop exists anywhere in the changed code.

Positive observations:

The watchDone channel approach in supervisor.go is clean and correct: created before the watcher goroutine spawns, closed in the goroutine's defer, and waited on by both the first and concurrent Stop() callers. The stopping guard path now correctly blocks instead of silently returning, which is the right fix.
Spooling MCP media to disk via ensureMediaDir / cleanupMediaDir is well-structured, with proper fallback to inline base64 on spool failure and cleanup tied to Stop().
WithoutPayload() in messages.go correctly avoids retaining large payloads in the TUI state.
The new tests (TestSupervisor_StopWaitsForWatcher, TestSupervisor_StopConcurrent) are solid — waitParked eliminates the racy connect-then-stop path and the concurrent variant exercises the stopping guard properly.

dgageot requested a review from a team as a code owner May 21, 2026 12:13

rumpl reviewed May 21, 2026

View reviewed changes

Comment thread pkg/runtime/toolexec/dispatcher.go

rumpl reviewed May 21, 2026

View reviewed changes

Comment thread pkg/runtime/toolexec/dispatcher.go Outdated

dgageot marked this pull request as draft May 21, 2026 12:23

This comment was marked as low quality.

Sign in to view

dgageot force-pushed the board/e34279a60727904e branch from 2104c15 to bb2f838 Compare May 21, 2026 13:44

dgageot marked this pull request as ready for review May 21, 2026 13:56

This comment was marked as outdated.

Sign in to view

aheritier requested changes May 23, 2026

View reviewed changes

dgageot added 4 commits May 24, 2026 10:38

fix: wait for supervisor watcher shutdown

4f72de4

fix: spool large mcp media to disk

6dd0262

fix: slim retained tui tool results

d7660f0

fix: avoid retaining file contents in metadata

6bbc74b

dgageot force-pushed the board/e34279a60727904e branch from bb2f838 to 6bbc74b Compare May 24, 2026 08:42

dgageot requested a review from aheritier May 24, 2026 12:59

docker-agent reviewed May 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reduce retained tool output memory#2854

fix: reduce retained tool output memory#2854
dgageot wants to merge 4 commits into
docker:mainfrom
dgageot:board/e34279a60727904e

dgageot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

This comment was marked as low quality.

This comment was marked as outdated.

aheritier left a comment

Uh oh!

aheritier May 23, 2026

Uh oh!

aheritier May 23, 2026

Uh oh!

aheritier May 23, 2026

Uh oh!

aheritier May 23, 2026

Uh oh!

docker-agent left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dgageot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as low quality.

This comment was marked as outdated.

aheritier left a comment

Choose a reason for hiding this comment

Blocking

Non-blocking

Uh oh!

aheritier May 23, 2026

Choose a reason for hiding this comment

Uh oh!

aheritier May 23, 2026

Choose a reason for hiding this comment

Uh oh!

aheritier May 23, 2026

Choose a reason for hiding this comment

Uh oh!

aheritier May 23, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dgageot commented May 21, 2026 •

edited

Loading