tests: variable-length parametrize for 2KB-budget regression test by tony · Pull Request #42 · tmux-python/libtmux-mcp

tony · 2026-05-08T23:35:57Z

Summary

The existing 2KB-budget regression test parametrized (3 tiers × 2 tmux_pane states), but every case used %42 (3 chars) and default (7-char socket name). A user with a slightly longer custom socket name + a multi-digit pane id from a long session pushes the readonly worst case very close to 2048 bytes, and the static cases never exercised this — future text additions could silently put realistic runtime injections over the budget.

This PR:

Collapses the two cross-product @pytest.mark.parametrize decorators into one explicit list so the variable-length stress case is a peer entry instead of expanding the cross-product space
Adds (TAG_READONLY, "%99", "/tmp/tmux-1000/dev-prod,12345,0") as the 7th case. Exercises BOTH axes: multi-digit pane id and a longer-than-default socket name. Margin ~2 bytes from the 2048 ceiling.
Inline comment names the fallback path (tighter compression form) if a future text addition trips this case.

Note on base branch

This PR targets pane-discoverability (PR #37), not main. The variable-length stress test cannot pass off main alone — the 2KB compression in pane-discoverability is what makes the budget fit in the first place. Merge after #37 lands.

Test plan

uv run ruff check . && uv run mypy && uv run py.test -q (443 passed; +1 stress case)
just build-docs

codecov-commenter · 2026-05-08T23:36:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.91%. Comparing base (a3ff362) to head (52c849e).

Additional details and impacted files

@@                  Coverage Diff                  @@
##           pane-discoverability      #42   +/-   ##
=====================================================
  Coverage                 84.91%   84.91%           
=====================================================
  Files                        40       40           
  Lines                      2294     2294           
  Branches                    294      294           
=====================================================
  Hits                       1948     1948           
  Misses                      261      261           
  Partials                     85       85

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…B transmitted why: _BASE_INSTRUCTIONS was already 2162 bytes; the dynamic safety-tier and $TMUX_PANE agent-context blocks added another ~1340 bytes, putting the transmitted instructions ~1500 bytes over Claude Code's documented 2KB truncation budget — silently severing the agent-context block that is the only server-side fix for "current window" anaphora. The new SCOPE segment documents activation triggers and anti-triggers (browser/editor/GUI WM/Jupyter) so the LLM has explicit boundaries when bare pane/window/session appears. Coupled with compressed safety and agent-context blocks, the full transmitted instructions now fit under 2048 across all three safety tiers and both TMUX_PANE configurations. what: - Trim 6 _INSTR_* segments; preserve every existing test substring at tests/test_server.py:132-247 - Add _INSTR_SCOPE with TRIGGERS / ANTI-TRIGGERS labels; place second in the join (after hierarchy as topic sentence) - _build_instructions: compress safety-tier block (~672 -> ~165 bytes) and agent-context block (~671 -> ~225 bytes); add readonly-tier discoverability hint inside the function (only emitted on TAG_READONLY) - Tests: parametrized 2KB budget assertion across (tier x tmux_pane); scope substrings; tier-conditional hint visibility

…erverInfo.name why: The pre-activation discovery surface in Claude Code is BM25 over tool name + description + parameter names + parameter descriptions (per Anthropic ToolSearch docs; cross-verified vs. fastmcp _extract_searchable_text at server/transforms/search/base.py:41-57). Re-writing the leading paragraph of six discovery-anchor docstrings to carry "tmux" plus a buried synonym (terminal, shell, scrollback, multiplexer, workspace) widens the indexed lexicon. Per-tool anthropic/alwaysLoad on three read-only anchors is a best-effort hint — opaque pass-through in FastMCP, with honoring delegated to Claude Code (documented at code.claude.com/docs/en/mcp v2.1.121+); ship as forward-compatible metadata. FastMCP(name="tmux") aligns serverInfo.name with the README registration slug; cosmetic but removes a cross-client papercut. what: - Six discovery-anchor docstring rewrites (list_panes, list_windows, list_sessions, snapshot_pane, search_panes, capture_pane); first paragraph carries "tmux" plus a buried user-vocabulary synonym + an inline anti-trigger ("not editor splits or browser panes"). Both sentences land in BM25's corpus via FastMCP's griffe-based parse_docstring (utilities/docstring_parsing.py:35-65). - DISCOVERY_META = {"anthropic/alwaysLoad": True} in _utils.py; applied to list_panes, list_windows, snapshot_pane (Snapshot Pane title unchanged — verb-of-art carve-out preserved) - FastMCP(name="libtmux") -> name="tmux"; add website_url - docs/conf.py: monkey-patch sphinx_autodoc_fastmcp.ToolCollector to accept and ignore meta= kwarg. Upstream mock signature lacks **kwargs, so per-tool meta= raises TypeError inside the docs-build collector and silently drops the entire enclosing module's tools from the docs catalog (caught by a generic except Exception). The shim is the minimum-viable workaround; upstream fix is a **kwargs on ToolCollector.tool(). - Tests: server-name, anchor-description coverage, alwaysLoad presence

…ragraph why: the 2KB-budget compression in 4164758 dropped the load-bearing rationale phrase "survive process death" from _INSTR_HOOKS_GAP. Without it, agents read "Write-hooks belong in your tmux config file" as soft preference rather than a correctness boundary tied to a concrete tmux fact. Restoring the rationale costs 26 bytes; tightening the safety- tier paragraph (preserving the read / read+send / read+send+kill verb-pairings inline) banks 27 bytes back. Net -1 byte; readonly+TMUX_PANE worst case 2045 -> 2044 (margin 4 of 2048). what: - _INSTR_HOOKS_GAP now reads "Write-hooks survive process death; keep them in your tmux config file, not a transient MCP session." Substring "tmux config file" preserved verbatim (asserted at tests/test_server.py:173). - _build_instructions safety-tier paragraph rewrites to: "Safety level: <tier> (readonly: read; mutating: read+send; destructive: read+send+kill). Set LIBTMUX_SAFETY; off-tier tools are hidden." Substrings preserved: "Safety level:", f"Safety level: {tier}", "LIBTMUX_SAFETY". - New test_hooks_gap_keeps_process_death_rationale defensively pins both "survive process death" and "tmux config file" to the gap segment so a future refactor that moves the substring still fails the pin (line-173's existing test passes either way).

…tress case why: the existing 2KB-budget regression test parametrized (3 tiers x 2 tmux_pane states) but every case used "%42" (3 chars) and "default" (7-char socket name). A user with a slightly longer custom socket name + a multi-digit pane id from a long session pushes the readonly worst case very close to 2048. The static cases never exercised this, so future text additions could silently put realistic runtime injections over the budget. what: - Collapse the two cross-product @pytest.mark.parametrize decorators into one explicit list so the variable-length stress case is a peer entry instead of expanding the cross-product space. - Add (TAG_READONLY, "%99", "/tmp/tmux-1000/dev-prod,12345,0") as the 7th case. Exercises BOTH axes: multi-digit pane id and a longer- than-default socket name. Margin ~2 bytes from the 2048 ceiling. - Inline comment names the fallback path (tighter compression form) if a future text addition trips this case. note: this branch builds on pane-discoverability — merge after that one lands. Standalone off main, the test would not pass since the 2KB compression in pane-discoverability is what makes the budget fit in the first place.

tony · 2026-05-09T12:10:34Z

Folded into #37 per the multi-model weave-ask synthesis (2-of-3 votes for fold; the variable-length parametrize is intrinsic to #37's 2KB-budget contract, not an independent feature).

The variable-length stress case (%99 + dev-prod socket, margin ~2 bytes from 2048) now lives in the same commit that introduces _INSTR_SCOPE and the budget regression test on pane-discoverability — see 4d5744e mcp(refactor[server]): Compress instructions, add SCOPE, fit under 2KB transmitted.

Closing without merge.

tony force-pushed the pane-discoverability branch from 1a4eda5 to 11f9b76 Compare May 9, 2026 09:53

tony added 3 commits May 9, 2026 04:58

tony force-pushed the pane-discoverability branch from 11f9b76 to a3ff362 Compare May 9, 2026 09:58

tony force-pushed the feature/variable-length-budget-test branch from d1de08f to 52c849e Compare May 9, 2026 10:37

tony force-pushed the pane-discoverability branch from ca489e2 to 872248c Compare May 9, 2026 12:10

tony closed this May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: variable-length parametrize for 2KB-budget regression test#42

tests: variable-length parametrize for 2KB-budget regression test#42
tony wants to merge 4 commits intopane-discoverabilityfrom
feature/variable-length-budget-test

tony commented May 8, 2026

Uh oh!

codecov-commenter commented May 8, 2026 •

edited

Loading

Uh oh!

tony commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tony commented May 8, 2026

Summary

Note on base branch

Test plan

Uh oh!

codecov-commenter commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tony commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented May 8, 2026 •

edited

Loading