test: session compaction — observation mask and arg truncation by anandgupta42 · Pull Request #532 · AltimateAI/altimate-code

anandgupta42 · 2026-03-28T01:09:13Z

Summary

What does this PR do?

1. SessionCompaction.createObservationMask() — src/session/compaction.ts (10 new tests)

This function generates the replacement text that substitutes pruned tool outputs during session compaction. When a long session triggers pruning, old tool call results are replaced with a compact summary like [Tool output cleared — bash(command: "git status") returned 3 lines, 45 B — "On branch main"]. Zero tests existed for this function or its internal helpers (truncateArgs, formatBytes).

Why it matters: Every long-running session that triggers compaction relies on these masks to preserve context about what tools were previously called. A malformed mask means the model loses track of prior work — causing it to re-read files, re-run commands, or lose context about the task.

Scenarios covered:

Full mask format verification (tool name, args, line count, byte size, fingerprint)
Empty output produces no fingerprint suffix
Pending status falls through to empty args path (not completed/running/error)
Multi-line output counts lines correctly
Long args are truncated at 80 chars with "…" suffix
Circular/unserializable input gracefully returns [unserializable]
KB formatting for ~2KB outputs
MB formatting for ~1.5MB outputs
Multi-byte UTF-8 characters (CJK) counted correctly in bytes
Fingerprint capped at 80 characters from first output line

Type of change

New feature (non-breaking change which adds functionality)

Issue for this PR

N/A — proactive test coverage discovered via test-discovery rotation (session area)

How did you verify your code works?

bun test test/session/compaction-mask.test.ts   # 10 pass, 26 expect() calls

Checklist

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Summary by CodeRabbit

Tests
- Added comprehensive test suite for observation mask generation functionality, covering various scenarios including tool execution handling, output formatting, size calculations, and edge cases.

Cover createObservationMask() which generates the replacement text when old tool outputs are pruned during session compaction. Tests verify format correctness, UTF-8 byte counting, arg truncation with surrogate pair safety, unserializable input handling, and fingerprint capping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01SHDrUNHjUpTwPvcjQcJ4ug

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai · 2026-03-28T01:09:27Z

📝 Walkthrough

Walkthrough

A new Bun test suite for SessionCompaction.createObservationMask was added, covering mask generation for tool executions, fingerprint handling, output formatting, multi-line content, byte-size encoding, UTF-8 character counting, and unserializable input handling.

Changes

Cohort / File(s)	Summary
New test suite `packages/opencode/test/session/compaction-mask.test.ts`	Added comprehensive test coverage for `SessionCompaction.createObservationMask`, validating mask generation for completed and pending tool executions, fingerprint behavior with/without output, line/byte-size formatting, UTF-8 byte counting, and truncation handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

contributor

Poem

🐰 A test suite hops along with care,
Checking masks both fair and square,
Fingerprints and bytes align,
UTF-8 counts so fine,
Validations everywhere! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding tests for session compaction observation mask and argument truncation functionality.
Description check	✅ Passed	The description comprehensively covers all template sections: Summary details what changed and why, Test Plan documents verification, and Checklist confirms tests were added and pass locally.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/hourly-20260328-0108

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

packages/opencode/test/session/compaction-mask.test.ts (1)

78-87: Add running/error status assertions to close the remaining branch gap.

You validate pending’s empty-args path, but createObservationMask also has explicit running/error arg handling via part.state.input. A small table-driven test would lock this down.

✅ Suggested test addition

+  test.each(["running", "error"] as const)(
+    "uses input args for %s status",
+    (status) => {
+      const part = {
+        ...makeCompletedPart({ tool: "bash", input: { command: "pwd" }, output: "" }),
+        state:
+          status === "running"
+            ? {
+                status: "running",
+                input: { command: "pwd" },
+                output: "",
+                title: "test",
+                metadata: {},
+                time: { start: 1000 },
+              }
+            : {
+                status: "error",
+                input: { command: "pwd" },
+                error: "boom",
+                title: "test",
+                metadata: {},
+                time: { start: 1000, end: 2000 },
+              },
+      } as MessageV2.ToolPart
+
+      const mask = SessionCompaction.createObservationMask(part)
+      expect(mask).toContain('bash(command: "pwd")')
+      expect(mask).toContain("1 lines")
+      expect(mask).toContain("0 B")
+    },
+  )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/session/compaction-mask.test.ts` around lines 78 - 87,
Test coverage currently verifies the "pending" branch for createObservationMask
but misses the explicit "running" and "error" branches that take args from
part.state.input; add table-driven assertions in the same test file to create
parts with state "running" and "error" (use makePendingPart or a similar helper
to set part.state.input and part.state.status) and assert that
SessionCompaction.createObservationMask(part) contains the expected
"tool(args...)" string derived from part.state.input plus the usual "lines" and
"B" size outputs, so both running and error code paths are exercised.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/opencode/test/session/compaction-mask.test.ts`:
- Around line 78-87: Test coverage currently verifies the "pending" branch for
createObservationMask but misses the explicit "running" and "error" branches
that take args from part.state.input; add table-driven assertions in the same
test file to create parts with state "running" and "error" (use makePendingPart
or a similar helper to set part.state.input and part.state.status) and assert
that SessionCompaction.createObservationMask(part) contains the expected
"tool(args...)" string derived from part.state.input plus the usual "lines" and
"B" size outputs, so both running and error code paths are exercised.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16862d0d-f223-42eb-95de-d8ef900cc2cb

📥 Commits

Reviewing files that changed from the base of the PR and between 9ba2114 and 5cc85f7.

📒 Files selected for processing (1)

packages/opencode/test/session/compaction-mask.test.ts

… fixes Consolidates PRs #515, #526, #527, #528, #530, #531, #532, #533, #534, #535, #536, #537, #538, #539, #540, #541, #542, #543 into a single PR. Changes: - 30 files changed, ~3000 lines of new test coverage - Deduplicated redundant tests: - `copilot-compat.test.ts`: removed duplicate `mapOpenAICompatibleFinishReason` tests (already covered in `copilot/finish-reason.test.ts`) - `lazy.test.ts`: removed duplicate error-retry and `reset()` tests - `transform.test.ts`: kept most comprehensive version (#535) over subset PRs (#539, #541) - Bug fixes from PR #528: - `extractEquivalenceErrors`: `null` entries in `validation_errors` crashed with TypeError (`null.message` throws before `??` evaluates). Fixed with optional chaining: `e?.message` - `extractSemanticsErrors`: same fix applied - Updated test from `expect(...).toThrow(TypeError)` to verify the fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

anandgupta42 · 2026-03-28T15:25:22Z

Consolidated into #545

… fixes (#545) * test: MCP auth — URL validation, token expiry, and client secret lifecycle Cover security-critical McpAuth functions (getForUrl, isTokenExpired) and McpOAuthProvider.clientInformation() expiry detection that had zero test coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01CqcvvXp5hUVsNU441DFTwb * test: copilot provider — finish reason mapping and tool preparation Add 27 unit tests for three previously untested copilot SDK functions that are critical to the GitHub Copilot provider integration path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: log-buffer, RWLock concurrency, SSE chunk splitting — 13 new tests Cover three untested risk areas: dbt ring buffer overflow (ties to #249 TUI corruption fix), reader-writer lock starvation ordering, and SSE event parsing across chunk boundaries and abort signals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01153R7Dh9BMKiarndEUraBk * test: SQL tool formatters — check, equivalence, semantics (38 tests) Export and test pure formatting functions across three SQL analysis tools that had zero test coverage. Discovered a real bug: null entries in validation_errors crash extractEquivalenceErrors (TypeError on null.message). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01Lz8zxrbwHXfsC2FbHxXZh9 * test: stats display + MCP OAuth XSS prevention — 26 new tests Add first-ever test coverage for the `altimate-code stats` CLI output formatting and the MCP OAuth callback server's HTML escaping (XSS prevention boundary). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: util — proxy detection and lazy error recovery Add tests for proxied() corporate proxy detection (6 tests) and lazy() error recovery + reset behavior (2 tests) to cover untested code paths that affect package installation and initialization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01EDCRjjHdb1dWvxyAfrLuhw * test: session compaction — observation mask and arg truncation Cover createObservationMask() which generates the replacement text when old tool outputs are pruned during session compaction. Tests verify format correctness, UTF-8 byte counting, arg truncation with surrogate pair safety, unserializable input handling, and fingerprint capping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01SHDrUNHjUpTwPvcjQcJ4ug * test: bus — publish/subscribe/once/unsubscribe mechanics Zero dedicated tests existed for the core event Bus that powers session updates, permission prompts, file watcher notifications, and SSE delivery. New coverage includes subscriber delivery, unsubscribe correctness, wildcard subscriptions, type isolation, and Bus.once auto-removal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01GchE7rUZayV1ouLEseVndK * test: lazy utility and credential-store — error retry, reset, sensitive field coverage Cover untested behaviors in lazy() (error non-caching and reset) that power shell detection, plus complete isSensitiveField unit coverage for BigQuery/SSL/SSH fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01WoqeutgfwXNcktweCKoLwd * test: provider/transform — temperature, topP, topK, smallOptions, maxOutputTokens Add 35 tests for five previously untested ProviderTransform functions that control model-specific inference parameters for all users. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_014NGgCMNXEg4Nn3JCpzDg5w * test: fingerprint + context — fill coverage gaps in core utilities Add tests for Fingerprint.refresh() cache invalidation and dbt-packages tag detection (both untested code paths), plus first-ever unit tests for the Context utility (AsyncLocalStorage wrapper) used by every module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01N8kgPYhXX7SrYnZKJLiTfC * test: session todo — CRUD lifecycle with database persistence Adds 6 tests for the Todo module (zero prior coverage). Covers insert/get round-trip, position ordering, empty-array clear, replacement semantics, bus event emission, and cross-session isolation. These guard the TUI todo panel against stale or phantom tasks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: finops recommendations + dbt manifest edge cases — 12 new tests Cover untested recommendation logic in warehouse-advisor and credit-analyzer edge cases in dbt manifest parsing that affect real-world dbt projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01XhZy7vaqdasKH5hQ6H9ee3 * test: provider — sampling parameter functions (temperature, topP, topK) Add 28 tests for ProviderTransform.temperature(), topP(), and topK() which had zero direct test coverage. These pure functions control LLM sampling behavior per model family and wrong values cause degraded output quality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_011NoVCnMW9Kw6eh92ayU7GB * test: session utilities — isDefaultTitle, fromRow/toRow, createObservationMask Add 17 tests covering two untested modules in the session subsystem: session identity helpers and compaction observation masks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: provider — temperature, topP, topK model parameter defaults Add 30 unit tests for ProviderTransform.temperature(), topP(), and topK() which are pure functions that return model-specific sampling defaults. These functions are the sole source of per-model parameter configuration and were previously untested, risking silent regressions when adding or modifying model ID patterns (e.g., kimi-k2 sub-variants, minimax-m2 dot/hyphen variants). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01WZthZmQczd51XXSjhiABNH * test: agent — .env read protection and analyst write denial Verify security-relevant agent permission defaults: builder agent asks before reading .env files (preventing accidental secret exposure), and analyst agent denies file modification tools (edit/write/todowrite/todoread). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01Wp9YaEvw6jAAL73VVdXFxA * test: docker discovery + copilot provider compatibility Add 20 new tests covering two previously untested modules: 1. Docker container discovery (containerToConfig) — verifies correct ConnectionConfig shape generation from discovered containers 2. Copilot provider finish-reason mapping and response metadata — ensures OpenAI-compatible finish reasons are correctly translated and response timestamps are properly converted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01J8xz7ijLjbzEe3mu7ajdWh * test: consolidate 18 test PRs — 434 new tests, deduplicated, with bug fixes Consolidates PRs #515, #526, #527, #528, #530, #531, #532, #533, #534, #535, #536, #537, #538, #539, #540, #541, #542, #543 into a single PR. Changes: - 30 files changed, ~3000 lines of new test coverage - Deduplicated redundant tests: - `copilot-compat.test.ts`: removed duplicate `mapOpenAICompatibleFinishReason` tests (already covered in `copilot/finish-reason.test.ts`) - `lazy.test.ts`: removed duplicate error-retry and `reset()` tests - `transform.test.ts`: kept most comprehensive version (#535) over subset PRs (#539, #541) - Bug fixes from PR #528: - `extractEquivalenceErrors`: `null` entries in `validation_errors` crashed with TypeError (`null.message` throws before `??` evaluates). Fixed with optional chaining: `e?.message` - `extractSemanticsErrors`: same fix applied - Updated test from `expect(...).toThrow(TypeError)` to verify the fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve typecheck errors in test files - `prepare-tools.test.ts`: use template literal type for provider tool `id` - `compaction-mask.test.ts`: use `as unknown as` for branded type casts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove flaky `setTimeout` in todo bus event test `Bus.publish` is synchronous — the event is delivered immediately, no 50ms delay needed. Removes resource contention risk in parallel CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review feedback - `formatCheck`: harden validation error formatting against null entries using optional chaining and filter (CodeRabbit + GPT consensus) - `extractEquivalenceErrors`: propagate extracted errors into `formatEquivalence` output to prevent title/output inconsistency - `todo.test.ts`: use `tmpdir({ git: true })` + `await using` for proper test isolation instead of shared project root Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

claude bot reviewed Mar 28, 2026

View reviewed changes

github-actions bot added the contributor label Mar 28, 2026

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

This was referenced Mar 28, 2026

test: consolidate 18 test PRs into single PR with deduplication and bug fixes #544

Closed

test: consolidate 18 test PRs — 434 new tests, deduplicated, with bug fixes #545

Merged

anandgupta42 closed this Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: session compaction — observation mask and arg truncation#532

test: session compaction — observation mask and arg truncation#532
anandgupta42 wants to merge 1 commit intomainfrom
test/hourly-20260328-0108

anandgupta42 commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

claude bot left a comment

Uh oh!

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

anandgupta42 commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anandgupta42 commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What does this PR do?

Type of change

Issue for this PR

How did you verify your code works?

Checklist

Summary by CodeRabbit

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

anandgupta42 commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anandgupta42 commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading