diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 929adb705..9340e2de8 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -473,6 +473,7 @@ {"id":"bd-xly8b","title":"Catalog/corpus drift: markdown Q-2-X assignments diverge between error_catalog.json, pampa/resources/error-corpus/, and production-emit sites","description":"While authoring docs/errors/markdown/ pages (bd-lgxdr), found multiple drift cases between the three sources of truth:\n\n1. **Q-2-5**: catalog title 'Unclosed Emphasis' (generic), but corpus Q-2-5.json and production code (json.rs:328) use 'Unclosed Underscore Emphasis'. Catalog says Q-2-14 is the underscore-specific one. Production behavior collapses to Q-2-5; Q-2-14 is never actually emitted with the *with_code* API.\n\n2. **Q-2-35**: catalog says 'Invalid List-Table Structure' (matches production emits at treesitter_utils/postprocess.rs:785,976), but corpus Q-2-35.json claims 'Indented code blocks are not supported'. The corpus appears stale.\n\n3. **Q-2-36, Q-2-37, Q-2-38**: present in pampa/resources/error-corpus/ as Q-2-36.json (Old-style knitr chunk options), Q-2-37.json (Line break in link destination), Q-2-38.json (Unclosed Attribute Specifier) — and Q-2-36 is emitted in production (treesitter.rs:1244) — but NONE are in error_catalog.json. The catalog jumps Q-2-35 → Q-2-39.\n\n4. **Q-2-6, Q-2-8**: in catalog but never emitted in production. Reserved for future strict-mode (Q-2-6) and upgraded-to-error-and-renumbered (Q-2-8 → Q-2-36).\n\nPlan: the catalog is authoritative for docs_url; the corpus needs to be reconciled to match. Add missing Q-2-36/37/38 to the catalog (or remove from corpus if they should not exist). Fix Q-2-5 and Q-2-35 corpus titles to match catalog. Decide whether Q-2-14 should be deprecated or whether production should be updated to emit it for the underscore variant specifically.\n\nDiscovered from: bd-lgxdr (markdown subsystem error-docs pages).","status":"open","priority":3,"issue_type":"bug","created_at":"2026-05-22T20:38:32.703765Z","created_by":"cscheid","updated_at":"2026-05-22T20:38:32.703765Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["documentation","error-reporting","markdown"],"dependencies":[{"issue_id":"bd-xly8b","depends_on_id":"bd-lgxdr","type":"discovered-from","created_at":"2026-05-22T20:38:32.703765Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} {"id":"bd-xm7l","title":"Audit non-cargo / vendored dependencies and expand upgrade skill","description":"Expand the upgrade-cargo-deps skill (or add a sibling audit-vendored-deps skill) so the bi-weekly dependency audit also covers non-Cargo vendored assets — Bootstrap SCSS/Icons, chicago-author-date CSL, tree-sitter highlight queries, quarto-cli built-in extensions, knitr R scripts, Pandoc HTML template, quarto-system-runtime JS bundles, reveal.js-menu CSS, etc. Discovery strategies and a per-asset inventory live at claude-notes/research/vendored-dependencies-inventory.md. Plan + phased work items at claude-notes/plans/2026-05-04-vendored-deps-audit.md.","status":"open","priority":2,"issue_type":"epic","created_at":"2026-05-04T21:14:02.442855Z","created_by":"cscheid","updated_at":"2026-05-04T21:14:02.442855Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["deps","vendored"]} {"id":"bd-xs2u","title":"Em-dash / en-dash in document titles breaks something in hub-client","description":"User reported during L3 testing (2026-05-06): when uploading a project containing em-dash characters (U+2014, '—') in document titles, hub-client exhibits some bug. The user worked around it by replacing em-dashes with regular dashes in the Automerge documents.\n\nI (Claude) did not reproduce the bug myself; I only observed that the workaround (replace em-dash with single dash) made hub-client behave correctly afterwards. The bug could be in any of:\n- pampa parser handling of unicode dashes in YAML strings\n- hub-client / Monaco display of titles with non-ASCII characters\n- The Automerge text sync round-trip (less likely; bytes round-tripped identically in my test)\n\nReproduction (potentially):\n1. Create a Q2 .qmd with 'title: \"Some — text\"' (em-dash) in the frontmatter.\n2. Upload via scripts/upload-project.mjs to wss://sync.automerge.org.\n3. Open in hub-client and observe whatever the user observed.\n\nTo investigate: have the user describe the exact symptom (render error? blank title? wrong rendering?), then bisect against the affected component.\n\nDiscovered while testing L3 (bd-ml8z) listings via hub-client; see claude-notes/plans/2026-05-06-listings-L3-resolve-transform.md §\"Hand-off summary\".","status":"open","priority":2,"issue_type":"bug","created_at":"2026-05-06T22:08:24.119375Z","created_by":"cscheid","updated_at":"2026-05-06T22:08:24.119375Z","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-xs2u","depends_on_id":"bd-ml8z","type":"discovered-from","created_at":"2026-05-06T22:08:24.119375Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} +{"id":"bd-xvdop","title":"Experiment: consolidate integration tests to reduce target/ size","description":"Rust defaults give each tests/*.rs file its own test binary, fully linked against the crate + all transitive deps. We have 164 integration test files across 20 crates; target/debug is currently 251 GB while target/release is 2.7 GB, suggesting per-file debug binaries are the dominant bloat. The ark project applied matklad's tests/integration/ consolidation pattern (posit-dev/ark#1240) and saw a ~57% drop in fresh cargo clean size and a ~3x drop in CI runner footprint. This issue tracks an experiment to measure the same change on Q2 from a macOS dev machine, piloting pampa (57 files) first before deciding on a full rollout. Plan: claude-notes/plans/2026-05-28-integration-test-consolidation.md","status":"open","priority":3,"issue_type":"chore","created_at":"2026-05-28T16:16:18.200610Z","created_by":"cscheid","updated_at":"2026-05-28T16:16:18.200610Z","source_repo":".","compaction_level":0,"original_size":0} {"id":"bd-xwq8","title":"Suppress page-nav for custom-layout pages","description":"Q1 hides the prev/next strip when a page sets page-layout: custom. Phase 4 ships without this; defer until a real page hits the edge case. See Phase 4 plan non-goals.","status":"open","priority":3,"issue_type":"feature","created_at":"2026-04-24T22:47:57.195184Z","created_by":"cscheid","updated_at":"2026-04-24T22:47:57.195184Z","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-xwq8","depends_on_id":"bd-nwun","type":"discovered-from","created_at":"2026-04-24T22:47:57.195184Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} {"id":"bd-xxul","title":"Non-.qmd input extensions in project discovery (.md, .ipynb, .Rmd)","description":"Phase 1 of the website epic explicitly discovers only .qmd files. The plan §File-list expansion defers the decision about which non-.qmd extensions are renderable documents (to include in project.files) vs. source artifacts (to treat like resources). Needs a user conversation about semantics for .md (literal markdown — render? copy?), .ipynb (converted at render time in Q1), .Rmd. Once settled, extend discovery and the pipeline's SourceType handling.","status":"open","priority":2,"issue_type":"task","created_at":"2026-04-24T01:05:32.255233Z","created_by":"cscheid","updated_at":"2026-04-24T01:05:32.255233Z","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-xxul","depends_on_id":"bd-w5os","type":"discovered-from","created_at":"2026-04-24T01:05:32.255233Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} {"id":"bd-y1fs3","title":"q2 preview: CodeBlock DOM mismatches q2 render — classes go on instead of
, sourceCode class missing","description":"q2 render emits '
'; q2 preview emits '
'. Two divergences: (1) language/role classes are on in preview, but on
 in render (native writer's behavior, see crates/pampa/src/writers/html.rs:963-975); (2) the 'sourceCode' class — prepended to 
's class list whenever data-hl-spans is present (write_code_container_attr at line 487-495) — is entirely missing from preview. Both differences break Quarto theme rules that key off .sourceCode and pre.sourceCode, causing the visible spacing/indentation drift the user reported. Fix in React CodeBlock: move classes + data-* kvs to 
; bare ; prepend 'sourceCode' to 
's class list when data-hl-spans is non-empty.","status":"closed","priority":2,"issue_type":"bug","created_at":"2026-05-18T17:39:17.866285Z","created_by":"cscheid","updated_at":"2026-05-18T17:44:55.722896Z","closed_at":"2026-05-18T17:44:55.722758Z","close_reason":"Implementation complete: React CodeBlock now mirrors the native HTML writer's DOM shape — classes + data-* kvs on 
, bare , sourceCode prepended when data-hl-spans is present. Verified end-to-end: pre.className matches between q2 render and q2 preview ('sourceCode r cell-code' on both), code.className empty on both. 2 tests rewritten, 0 new added; 150/150 SPA integration tests green; cargo xtask verify 12/12 green.","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-y1fs3","depends_on_id":"bd-kw93","type":"parent-child","created_at":"2026-05-18T17:39:17.866285Z","created_by":"cscheid","metadata":"{}","thread_id":""},{"issue_id":"bd-y1fs3","depends_on_id":"bd-nxslt","type":"related","created_at":"2026-05-18T17:39:17.866285Z","created_by":"cscheid","metadata":"{}","thread_id":""}]}
diff --git a/.claude/rules/integration-tests.md b/.claude/rules/integration-tests.md
new file mode 100644
index 000000000..03c9350b8
--- /dev/null
+++ b/.claude/rules/integration-tests.md
@@ -0,0 +1,94 @@
+# Integration test layout
+
+All integration tests in this workspace live in
+`tests/integration/.rs` + a `tests/integration/main.rs` that
+declares each as `pub mod ;`. Cargo compiles **one
+`integration` binary per crate** instead of one binary per file.
+
+**Do not add new `tests/.rs` files at the top level of any
+crate's `tests/` directory.** Cargo would treat each one as its
+own test binary that statically links the crate's full dependency
+closure — pampa's closure alone is ~130 MB. That was the bloat we
+removed in bd-xvdop (commits `1b420d65` through `1592e8cd`), worth
+~9 GB in `target/debug` and ~6.5 GB in `target/release` at the
+workspace level.
+
+## Adding a new integration test
+
+1. Create the test file: `crates//tests/integration/.rs`.
+2. Register it in `main.rs`:
+   ```rust
+   // crates//tests/integration/main.rs
+   pub mod ;
+   ```
+   Keep the list alphabetized.
+
+That's it — nextest still runs each `#[test]` in its own process,
+so test isolation is preserved. Filter expressions use the new
+selector form `package() & binary(integration) & test(::)`.
+
+## If you move test files
+
+Any **source-file-relative path** in the moved files needs to be
+re-evaluated against the new location. The pampa pilot migration
+burned several iterations on insta `set_snapshot_path` calls that
+silently resolved to the wrong directory.
+
+Audit grep before declaring a move done — including **cross-language
+references** (the bd-xvdop PR shipped without this audit and broke
+a TypeScript test that hardcoded a Rust snapshot path):
+
+```bash
+# Rust side — source-file-relative paths inside the moved files
+grep -nE 'include_str!|include_bytes!|include_dir!|#\[path|set_snapshot_path|"\.\./' \
+  crates//tests/integration/*.rs
+
+# Cross-language side — anything in hub-client / ts-packages /
+# scripts / CI config that references the old paths
+grep -rn 'crates//tests/' \
+  --include='*.ts' --include='*.tsx' --include='*.js' --include='*.mjs' \
+  --include='*.json' --include='*.toml' --include='*.yml' --include='*.yaml' \
+  hub-client/ ts-packages/ scripts/ .github/ .config/
+```
+
+Each `../` needs to be re-checked: moving from `tests/foo.rs` to
+`tests/integration/foo.rs` adds one directory level, so any
+relative path inside the file usually needs one more `../`. And
+any hardcoded `crates//tests/` path elsewhere in the repo —
+typically a TS test that reads a Rust snapshot or fixture — needs
+the new `tests/integration/` location.
+
+Insta `.snap` files also live in a snapshot directory adjacent to
+the test source by default. If you move a test that uses default
+insta snapshot paths, the existing `.snap` files need to move too:
+
+- Directory: `tests/snapshots/` → `tests/integration/snapshots/`
+- Filename: gains an `integration__` prefix because insta's
+  `module_path!()`-based filenames now start with the binary's
+  name (`integration`)
+
+## Why this matters (the measurements)
+
+From bd-xvdop's Phase 6 measurement (controlled, back-to-back
+`cargo clean` + `cargo build --workspace --tests`, alternating
+between baseline and full rollout):
+
+|                            |  Before |    After |      Δ |
+| -------------------------- | ------: | -------: | -----: |
+| target/debug               |   21 GB |    12 GB | −43 % |
+| target/release             |   11 GB |   4.5 GB | −59 % |
+| Executables in deps/       |     220 |       76 | −65 % |
+| Sum of executable bytes    | 10.5 GiB |  2.5 GiB | −77 % |
+| Release-mode build wall    |   158 s |    120 s | −24 % |
+
+The two extra `../` characters in a relative path are easy to get
+wrong; the disk savings are not. Prefer the convention even when it
+feels redundant.
+
+## References
+
+- `claude-notes/plans/2026-05-28-integration-test-consolidation.md`
+- `claude-notes/research/2026-05-28-integration-test-bloat.md`
+- [matklad: "Delete Cargo Integration Tests"](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html)
+- [posit-dev/ark#1240](https://github.com/posit-dev/ark/pull/1240) —
+  the precedent that motivated this change
diff --git a/.config/nextest.toml b/.config/nextest.toml
index ff0acf091..86cf325fa 100644
--- a/.config/nextest.toml
+++ b/.config/nextest.toml
@@ -18,8 +18,16 @@
 # at most, in exchange for deterministic results.
 #
 # If you're adding a new integration test under `crates/quarto-preview/
-# tests/` that calls `run_with_on_ready` or otherwise instantiates the
-# file watcher, add its binary name to the override filter below.
+# tests/integration/` that calls `run_with_on_ready` or otherwise
+# instantiates the file watcher, add its module name to the regex
+# in the override filter below.
+#
+# Layout note (2026-05-28, bd-xvdop): integration tests across the
+# workspace now live in `tests/integration/main.rs` + sibling modules
+# (matklad pattern), so each crate produces a single `integration`
+# test binary instead of one binary per file. Filters that used to
+# target per-file binary names now need `package(...) & test(...::)`
+# selectors instead.
 
 [test-groups]
 # Quarto-preview integration tests that arm a `notify-rs` FSEvents
@@ -27,5 +35,5 @@
 quarto-preview-fs-watcher = { max-threads = 1 }
 
 [[profile.default.overrides]]
-filter = "binary(staleness) | binary(eager_capture) | binary(boot)"
+filter = "package(quarto-preview) & binary(integration) & test(/^(staleness|eager_capture|boot)::/)"
 test-group = "quarto-preview-fs-watcher"
diff --git a/claude-notes/plans/2026-05-28-integration-test-consolidation.md b/claude-notes/plans/2026-05-28-integration-test-consolidation.md
new file mode 100644
index 000000000..598adaccd
--- /dev/null
+++ b/claude-notes/plans/2026-05-28-integration-test-consolidation.md
@@ -0,0 +1,316 @@
+# Experiment: consolidate integration tests into single binary per crate
+
+**Beads:** [bd-xvdop](../../.beads/issues.jsonl) — `br show bd-xvdop`
+**Branch:** `beads/bd-xvdop-integration-test-consolidation` (off `main`)
+**Status:** proposed (not yet started)
+
+## Overview
+
+Rust's default integration-test layout creates one test binary per
+`tests/*.rs` file. Each binary is fully linked against the host crate
+and all transitive dependencies. We have **164 integration test files
+across 20 crates** and `target/debug/` on this machine is currently
+**251 GB** while `target/release/` is only 2.7 GB — strongly
+suggesting the per-file debug test binaries are the dominant bloat,
+matching exactly the pattern the ark project diagnosed.
+
+The ark project hit the same problem on Linux CI (out-of-disk
+failures) and resolved it by moving `tests/*.rs` into
+`tests/integration/*.rs` with a single `tests/integration/main.rs`
+declaring each former file as a `pub mod`. Reported wins:
+
+- Fresh `cargo clean` size: **8.1 GiB → 3.5 GiB** (~57% reduction)
+- Test-suite compile time on macOS: **88s → 52s** (~40% faster)
+- Linux CI runner footprint: **15 GB → ~2 GB**
+
+This experiment measures the same change on Q2 from a macOS dev
+machine. We cannot directly measure Linux/Windows CI from here, but
+the ark numbers suggest the macOS delta will be a representative
+proxy for the platform delta.
+
+### Goals
+
+- Establish a clean baseline for debug + release test-build footprint.
+- Pilot the migration on `pampa` (57 files — the largest single signal).
+- If the pilot delta justifies it, roll out to the other 12 multi-file
+  crates and remeasure.
+- Capture all numbers in a research note so the Linux/Windows CI win
+  can be predicted before pushing.
+
+### Non-goals
+
+- Migrating crates that already have only one integration test file
+  (no payoff — each is already 1 binary).
+- WASM / hub-client changes (this is a Rust-only refactor).
+- Changing test execution semantics — nextest handles single-binary
+  test discovery fine.
+
+## References
+
+- ark PR: 
+- matklad post: 
+- Cargo issue cited by matklad: 
+
+## Crates in scope
+
+13 crates have >1 integration test file (sorted by file count, since
+the payoff scales with file count):
+
+| Crate                  | tests/*.rs files |
+| ---------------------- | ---------------: |
+| pampa                  |               57 |
+| quarto-core            |               33 |
+| qmd-syntax-helper      |               20 |
+| quarto-preview         |                7 |
+| quarto-sass            |                7 |
+| quarto                 |                7 |
+| quarto-highlight       |                6 |
+| comrak-to-pandoc       |                5 |
+| quarto-yaml-validation |                5 |
+| quarto-brand           |                4 |
+| quarto-citeproc        |                2 |
+| quarto-csl             |                2 |
+| quarto-doctemplate     |                2 |
+
+Out of scope (single-file integration test crates, no benefit):
+`quarto-error-reporting`, `quarto-hub`, `quarto-lsp`,
+`quarto-lsp-core`, `quarto-publish`, `quarto-trace`,
+`wasm-qmd-parser`.
+
+## Work Items
+
+### Phase 0 — Setup
+
+- [x] Create worktree
+      `.worktrees/bd-xvdop-experiment-consolidate-integration-tests/`
+      on branch `beads/bd-xvdop-experiment-consolidate-integration-tests`
+      (off `main`); worktree starts with empty `target/`, so baseline
+      measurements aren't polluted by the 259 GB in the main checkout
+- [x] Write a measurement helper script `scripts/measure-test-build.sh`
+      that:
+  - times `cargo build --workspace --tests` (debug or release per arg)
+  - records `target//` size via `du -sh`
+  - lists the largest 25 binaries under `target//deps/`
+  - prints a paste-able summary block
+- [x] Create research note skeleton
+      `claude-notes/research/2026-05-28-integration-test-bloat.md`
+      and `claude-notes/research/measurements/` directory
+
+### Phase 1 — Baseline measurement
+
+- [x] `cargo clean` (no-op — fresh worktree)
+- [x] `scripts/measure-test-build.sh debug` →
+      **21 GB / 114 s / 220 deps executables / 10.5 GiB exec-bytes**
+- [x] `cargo clean` (freed 22.3 GiB / 36 689 files)
+- [x] `scripts/measure-test-build.sh release` →
+      **11 GB / 133 s / 220 deps executables / 9.0 GiB exec-bytes**
+- [x] Write baseline numbers into
+      `claude-notes/research/2026-05-28-integration-test-bloat.md`
+
+### Phase 2 — Pilot migration: pampa
+
+- [x] Create `crates/pampa/tests/integration/main.rs` with `pub mod
+      ;` lines for each of the 57 current `tests/*.rs` files
+- [x] `git mv` each `tests/*.rs` → `tests/integration/.rs`
+- [x] Audit for collisions:
+  - `test_location_health.rs` has one inline `mod tests {}` — verified
+    inline only, no file-resolution risk
+  - Zero `crate::` references in pampa tests (would change meaning
+    in consolidated layout)
+  - Exactly one `super::` reference in `test_location_health.rs`,
+    inside the inline `mod tests {}` — parent semantics preserved
+    (`super::*` still refers to the enclosing file)
+  - No `include_str!` / `include_bytes!` (source-file-relative
+    compile-time paths)
+- [x] Adjacent data dirs (`fixtures/`, `snapshots/`, `*.qmd`) stay
+      where they are — they're referenced by fixed paths from test
+      code; moving the .rs files doesn't change `CARGO_MANIFEST_DIR`
+- [x] **Discovered:** insta `set_snapshot_path("../snapshots/…")`
+      in `test.rs` and `test_error_corpus.rs` is resolved relative
+      to the test file's directory. After the move, all 3 occurrences
+      needed an extra `../` to keep pointing at `crates/pampa/snapshots/`.
+      Fixed in this pilot; cleaned up 3 stale `.snap.new` litter
+      files generated by the first broken run.
+- [x] `cargo nextest run -p pampa --test integration` → **941 passed,
+      2 skipped, 0 failed**
+- [ ] `cargo xtask verify --skip-hub-build` → expect green (in progress)
+
+### Phase 3 — Pilot measurement
+
+- [x] `cargo clean` between each measurement
+- [x] `scripts/measure-test-build.sh debug` (pampa-pilot, first run):
+      **18 GB / 173 s / 164 exes / 7.8 GiB exec-bytes**
+- [x] `scripts/measure-test-build.sh release` (pampa-pilot, first run):
+      **9.2 GB / 255 s / 164 exes / 7.2 GiB exec-bytes**
+- [x] Controlled back-to-back **debug** re-measurement to validate
+      the surprising wall-time delta: baseline **114 s** vs. pilot
+      **130 s** (+14 %, far smaller than the first run's apparent
+      +52 %). Disk numbers reproduced identically.
+- [x] Controlled **release** re-measurement: baseline **138 s** vs.
+      pilot **136 s** (−2 s, statistically a wash). The first pilot
+      release's 255 s was the same kind of system-noise artifact as
+      the first pilot debug.
+- [x] Compute pampa-pilot delta vs. baseline in research note
+
+### Phase 4 — Decision point
+
+- [ ] Review pilot numbers with user (awaiting input)
+- [ ] **If** the pilot delta is meaningful (e.g. >20% size drop) →
+      Phase 5
+- [ ] **If not** → revert pilot, close beads issue with findings,
+      stop
+
+**Pilot numbers (controlled, ready for decision):**
+
+|                            | Debug  Δ            | Release Δ           |
+| -------------------------- | -------------------:| -------------------:|
+| `target/` size    | −3 GB  (−14 %)      | −1.9 GB (−17 %)     |
+| Executables in `deps/`     | −56  (−25 %)        | −56  (−25 %)        |
+| Sum of executable bytes    | −2.7 GiB (−26 %)    | −2.3 GiB (−26 %)    |
+| Build wall time            | +16 s  (+14 %)      | −2 s   (−1 %)       |
+
+This is from pampa *alone* (57/164 ≈ 35 % of all integration test
+files). If the per-binary savings amortize roughly linearly across
+the remaining 12 candidate crates (107 more files → 12 binaries,
+i.e. saving 95 more binaries), Phase 6 should land near a ~50 %
+reduction in `target/debug` and `target/release` from the baseline.
+The wall-time cost stays small.
+
+Recommendation: proceed to Phase 5.
+
+### Phase 5 — Full rollout (conditional)
+
+For each crate below, migrate in its own commit (one commit per
+crate makes any individual revert cheap). After each migration, run
+`cargo nextest run -p ` before moving on.
+
+**Preflight checklist** (run on each crate before moving files —
+the pampa pilot showed how easy it is to miss one and then watch
+several seemingly-unrelated tests fail):
+
+```bash
+crate=
+# 1. File-root fn main / file-based mod / inline mod summary
+grep -nE "^(mod [a-zA-Z_]+;|mod [a-zA-Z_]+ \{|fn main)" \
+  crates/$crate/tests/*.rs
+
+# 2. Source-file-relative compile-time and runtime paths
+grep -nE 'include_str!|include_bytes!|include_dir!|#\[path' \
+  crates/$crate/tests/*.rs
+grep -nE 'set_snapshot_path|"\.\./' \
+  crates/$crate/tests/*.rs
+
+# 3. crate:: and super:: usage (would change meaning when each file
+#    becomes a sub-module rather than the binary's crate root)
+grep -nE 'crate::|super::' crates/$crate/tests/*.rs
+```
+
+Each non-zero match needs to be evaluated against the
+"`tests/integration/` is one level deeper" rule. The two known
+surgical edits below were caught by this checklist on the audit
+pass; new ones may surface per crate.
+
+Audit findings (pre-cached during Phase 0 / Phase 2 to make rollout
+fast): all remaining crates are **clean pure-rename migrations**
+*except* the specific files called out below.
+
+- [x] quarto-core (33 files, commit `0ad8a40d`) — 4 `include_str!`
+      `fixtures/...` paths got a `../` prefix
+- [x] qmd-syntax-helper (20 files, commit `0d345c7c`) — clean rename
+- [x] quarto-preview (7 files, commit `86dcd0c1`) — clean rename
+- [x] quarto-sass (7 files, commit `1ad8b116`) — clean rename
+- [x] quarto (7 files, commit `6651b1da`) — `#[path]` edit
+- [x] quarto-highlight (6 files, commit `8dc679ef`) — `include_str!`
+      prefix edit
+- [x] comrak-to-pandoc (5 files, commit `6f8b1507`) — surgical edits:
+      `proptest_roundtrip.rs` had `mod generators;` referring to the
+      sibling top-level file; replaced with `use super::generators::*;`
+      since `generators` is now a sibling module of
+      `proptest_roundtrip` in the same binary. Also dropped the
+      vestigial `fn main() {}` lines from `debug.rs` and `debug_comrak.rs`.
+- [x] quarto-yaml-validation (5 files, commit `ce214672`) — 7
+      `include_str!("../test-fixtures/...")` paths got an extra `../`
+- [x] quarto-brand (4 files, commit `91cd5b71`) — clean rename
+- [x] quarto-citeproc (2 files, commit `e8a9d075`) — clean rename
+- [x] quarto-csl (2 files, commit `1d03f3f9`) — clean rename
+- [x] quarto-doctemplate (2 files, commit `136a5c87`) — clean rename
+- [x] `cargo nextest run --workspace` → 9444 passed, 196 skipped
+      (same count as pre-migration baseline)
+- [x] `cargo xtask verify --skip-hub-build` → Rust steps 1-7 green
+      (warnings denied build pass, 9444/9444 tests). JS/TS legs
+      remain environmentally blocked in this fresh worktree.
+- [x] Snapshot relocations: 20 `.snap` files in quarto-core and
+      quarto-highlight needed to move from `tests/snapshots/` to
+      `tests/integration/snapshots/` and gain an `integration__`
+      prefix to match insta's new file resolution
+- [x] `.config/nextest.toml`: bd-u3ze override filter updated from
+      `binary(staleness) | binary(eager_capture) | binary(boot)`
+      to `package(quarto-preview) & binary(integration) &
+      test(/^(staleness|eager_capture|boot)::/)`
+
+### Phase 6 — Final measurement
+
+- [x] Controlled back-to-back comparison via `git checkout`
+      alternation between baseline (`8733ed67`) and branch tip:
+
+|                            | Baseline (debug) | Rollout (debug) |       Δ debug | Baseline (release) | Rollout (release) |     Δ release |
+| -------------------------- | ---------------: | --------------: | ------------: | -----------------: | ----------------: | ------------: |
+| `target/` size    |            21 GB |           12 GB |  **−9 GB (−43 %)** |              11 GB |           4.5 GB |  **−6.5 GB (−59 %)** |
+| Executables in `deps/`     |              220 |              76 | **−144 (−65 %)** |                220 |               76 | **−144 (−65 %)** |
+| Sum of executable bytes    |         10.5 GiB |          2.5 GiB | **−8.0 GiB (−77 %)** |           9.0 GiB |           2.1 GiB | **−6.9 GiB (−77 %)** |
+| Build wall time            |            118 s |           122 s |  **+4 s (+3 %)** |              158 s |             120 s | **−38 s (−24 %)** |
+
+- [x] Computed final delta vs. baseline; recorded in research note
+
+### Phase 7 — Report and decide
+
+- [x] Update research note with all numbers + extrapolated CI impact
+- [ ] Discuss findings with user; ask for push permission
+- [ ] If pushed: update `CLAUDE.md` if any developer-facing test
+      invocation conventions change (e.g. references to per-file
+      test binaries)
+
+## Risks & open questions
+
+- **nextest binary filtering.** Per-binary names change from
+  `` to `integration`. Current CI runs
+  `cargo nextest run --tests --cargo-profile ci` (no per-binary
+  filters), so CI is unaffected. Confirmed by grepping `.github/`.
+  Still worth a quick check during pilot that no `xtask` or
+  `scripts/` invocation depends on the old per-file binary names.
+
+- **Module name collisions inside a single binary.** Each former
+  `tests/foo.rs` becomes module `integration::foo`. If two former
+  files both had e.g. `mod helpers;` referring to sibling files,
+  they would now refer to the same `integration/helpers.rs`. None
+  of the pampa files declare a file-based `mod` at the top level
+  (only `test_location_health.rs` has any `mod` keyword — needs
+  audit). Risk is small; the audit step in Phase 2 covers it.
+
+- **`fn main()` collisions.** A `#[test]` integration file can
+  optionally declare its own `fn main()`. Generated `main.rs` will
+  declare one. Verified: zero pampa test files declare `fn main()`.
+
+- **macOS-only measurement.** We cannot directly measure the
+  Linux/Windows CI delta from this machine. The ark PR shows the
+  size ratio is roughly stable across platforms (linker bloat is
+  proportional to dependency closure, which is the same on all
+  platforms). The research note should call out that the
+  Linux/Windows wins are *predicted*, not measured.
+
+- **Existing 259 GB target/.** Not affected by this change — it
+  accumulates across all branches the user has built locally. The
+  experiment uses a fresh build for clean numbers and `cargo clean`
+  between each phase. Worth offering to clean it up at the end of
+  the experiment regardless of outcome (would free ~256 GB).
+
+## Decision log
+
+- 2026-05-28: Use `tests/integration/` (matches ark PR) over
+  `tests/it/` (matklad blog). Same mechanic; ark name is more
+  discoverable to new contributors.
+- 2026-05-28: Pilot pampa first rather than migrate all 13 crates
+  in one pass. pampa is 57/164 ≈ 35% of integration files; if the
+  per-file bloat hypothesis holds, the pilot should already produce
+  a measurable size drop in `target/debug/` and give us a confident
+  decision point.
diff --git a/claude-notes/research/2026-05-28-integration-test-bloat.md b/claude-notes/research/2026-05-28-integration-test-bloat.md
new file mode 100644
index 000000000..0900075d5
--- /dev/null
+++ b/claude-notes/research/2026-05-28-integration-test-bloat.md
@@ -0,0 +1,264 @@
+# Integration test bloat — measurements
+
+**Beads:** bd-xvdop
+**Plan:** [2026-05-28-integration-test-consolidation.md](../plans/2026-05-28-integration-test-consolidation.md)
+**Host:** macOS dev machine (Apple Silicon)
+
+This note records the on-disk and wall-time cost of building all test
+binaries in the workspace, both before and after consolidating each
+crate's `tests/*.rs` files into `tests/integration/main.rs` + sibling
+modules.
+
+Each measurement is taken from a clean state (`cargo clean` between
+runs). The raw script logs live in `measurements/`.
+
+> Because each integration test file in Rust's default layout compiles
+> into its own fully-linked binary, we expect debug numbers to dominate
+> (no symbol stripping, no LTO), and release numbers to be much smaller.
+> Our starting `target/` confirms the asymmetry: `target/debug` = 251 GB
+> vs `target/release` = 2.7 GB on this machine.
+
+## Headline table
+
+| Stage                                | Profile | target/ | Wall-time | Executables in deps/ |
+| ------------------------------------ | ------- | ---------------: | --------: | -------------------: |
+| Baseline (first run)                 | debug   |       21 GB (1) |    114 s  |                  220 |
+| Baseline (first run)                 | release |           11 GB  |    133 s  |                  220 |
+| Baseline (controlled, back-to-back)  | debug   |           21 GB  |    114 s  |                  220 |
+| Baseline (controlled, back-to-back)  | release |           11 GB  |    138 s  |                  220 |
+| Pilot: pampa only (first run)        | debug   |           18 GB  |    173 s  |                  164 |
+| Pilot: pampa only (first run)        | release |          9.2 GB  |    255 s  |                  164 |
+| Pilot: pampa only (controlled)       | debug   |           18 GB  |    130 s  |                  164 |
+| Pilot: pampa only (controlled)       | release |          9.1 GB  |    136 s  |                  164 |
+| **Phase 6 baseline (re-measured)**   | debug   |           21 GB  |    118 s  |                  220 |
+| **Phase 6 baseline (re-measured)**   | release |           11 GB  |    158 s  |                  220 |
+| **Full rollout (13 crates)**         | debug   |           12 GB  |    122 s  |                   76 |
+| **Full rollout (13 crates)**         | release |          4.5 GB  |    120 s  |                   76 |
+
+The "first run" rows were taken with intervening work between samples
+(verify runs, file edits, disk pressure). The "controlled" and
+"Phase 6" rows were taken back-to-back with `cargo clean` between
+each and no other intervening work.
+
+The "first run" rows were taken with intervening work between samples
+(verify runs, file edits, disk pressure). The "controlled" rows were
+taken back-to-back with `cargo clean` between each and no other
+intervening work — these are the apples-to-apples comparison and the
+ones the Phase 4 decision should use.
+
+(1) `cargo clean` after the build reported "Removed 36689 files, 22.3 GiB total"
+— a slight discrepancy with `du`'s 21 GB because `du` undercounts
+hardlinks and compressed metadata. We use `du -sh` as the headline
+figure to stay consistent with the script's output.
+
+(Filled in as each phase completes.)
+
+## Methodology
+
+```bash
+# For each measurement:
+cargo clean
+scripts/measure-test-build.sh