From 8733ed67e18bc0e14cad4e454d585c6c890f594f Mon Sep 17 00:00:00 2001 From: Carlos Scheidegger Date: Thu, 28 May 2026 11:58:29 -0500 Subject: [PATCH 01/20] plan(bd-xvdop): experiment to consolidate integration tests per crate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 164 integration test files across 20 crates → 13 candidate test binaries under the matklad/ark pattern. Plan pilots pampa (57 files), then decides on rollout based on debug + release target/ deltas measured locally. Co-Authored-By: Claude Opus 4.7 (1M context) --- .beads/issues.jsonl | 1 + ...26-05-28-integration-test-consolidation.md | 215 ++++++++++++++++++ 2 files changed, 216 insertions(+) create mode 100644 claude-notes/plans/2026-05-28-integration-test-consolidation.md diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 929adb70..9340e2de 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -473,6 +473,7 @@ {"id":"bd-xly8b","title":"Catalog/corpus drift: markdown Q-2-X assignments diverge between error_catalog.json, pampa/resources/error-corpus/, and production-emit sites","description":"While authoring docs/errors/markdown/ pages (bd-lgxdr), found multiple drift cases between the three sources of truth:\n\n1. **Q-2-5**: catalog title 'Unclosed Emphasis' (generic), but corpus Q-2-5.json and production code (json.rs:328) use 'Unclosed Underscore Emphasis'. Catalog says Q-2-14 is the underscore-specific one. Production behavior collapses to Q-2-5; Q-2-14 is never actually emitted with the *with_code* API.\n\n2. **Q-2-35**: catalog says 'Invalid List-Table Structure' (matches production emits at treesitter_utils/postprocess.rs:785,976), but corpus Q-2-35.json claims 'Indented code blocks are not supported'. The corpus appears stale.\n\n3. **Q-2-36, Q-2-37, Q-2-38**: present in pampa/resources/error-corpus/ as Q-2-36.json (Old-style knitr chunk options), Q-2-37.json (Line break in link destination), Q-2-38.json (Unclosed Attribute Specifier) — and Q-2-36 is emitted in production (treesitter.rs:1244) — but NONE are in error_catalog.json. The catalog jumps Q-2-35 → Q-2-39.\n\n4. **Q-2-6, Q-2-8**: in catalog but never emitted in production. Reserved for future strict-mode (Q-2-6) and upgraded-to-error-and-renumbered (Q-2-8 → Q-2-36).\n\nPlan: the catalog is authoritative for docs_url; the corpus needs to be reconciled to match. Add missing Q-2-36/37/38 to the catalog (or remove from corpus if they should not exist). Fix Q-2-5 and Q-2-35 corpus titles to match catalog. Decide whether Q-2-14 should be deprecated or whether production should be updated to emit it for the underscore variant specifically.\n\nDiscovered from: bd-lgxdr (markdown subsystem error-docs pages).","status":"open","priority":3,"issue_type":"bug","created_at":"2026-05-22T20:38:32.703765Z","created_by":"cscheid","updated_at":"2026-05-22T20:38:32.703765Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["documentation","error-reporting","markdown"],"dependencies":[{"issue_id":"bd-xly8b","depends_on_id":"bd-lgxdr","type":"discovered-from","created_at":"2026-05-22T20:38:32.703765Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} {"id":"bd-xm7l","title":"Audit non-cargo / vendored dependencies and expand upgrade skill","description":"Expand the upgrade-cargo-deps skill (or add a sibling audit-vendored-deps skill) so the bi-weekly dependency audit also covers non-Cargo vendored assets — Bootstrap SCSS/Icons, chicago-author-date CSL, tree-sitter highlight queries, quarto-cli built-in extensions, knitr R scripts, Pandoc HTML template, quarto-system-runtime JS bundles, reveal.js-menu CSS, etc. Discovery strategies and a per-asset inventory live at claude-notes/research/vendored-dependencies-inventory.md. Plan + phased work items at claude-notes/plans/2026-05-04-vendored-deps-audit.md.","status":"open","priority":2,"issue_type":"epic","created_at":"2026-05-04T21:14:02.442855Z","created_by":"cscheid","updated_at":"2026-05-04T21:14:02.442855Z","source_repo":".","compaction_level":0,"original_size":0,"labels":["deps","vendored"]} {"id":"bd-xs2u","title":"Em-dash / en-dash in document titles breaks something in hub-client","description":"User reported during L3 testing (2026-05-06): when uploading a project containing em-dash characters (U+2014, '—') in document titles, hub-client exhibits some bug. The user worked around it by replacing em-dashes with regular dashes in the Automerge documents.\n\nI (Claude) did not reproduce the bug myself; I only observed that the workaround (replace em-dash with single dash) made hub-client behave correctly afterwards. The bug could be in any of:\n- pampa parser handling of unicode dashes in YAML strings\n- hub-client / Monaco display of titles with non-ASCII characters\n- The Automerge text sync round-trip (less likely; bytes round-tripped identically in my test)\n\nReproduction (potentially):\n1. Create a Q2 .qmd with 'title: \"Some — text\"' (em-dash) in the frontmatter.\n2. Upload via scripts/upload-project.mjs to wss://sync.automerge.org.\n3. Open in hub-client and observe whatever the user observed.\n\nTo investigate: have the user describe the exact symptom (render error? blank title? wrong rendering?), then bisect against the affected component.\n\nDiscovered while testing L3 (bd-ml8z) listings via hub-client; see claude-notes/plans/2026-05-06-listings-L3-resolve-transform.md §\"Hand-off summary\".","status":"open","priority":2,"issue_type":"bug","created_at":"2026-05-06T22:08:24.119375Z","created_by":"cscheid","updated_at":"2026-05-06T22:08:24.119375Z","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-xs2u","depends_on_id":"bd-ml8z","type":"discovered-from","created_at":"2026-05-06T22:08:24.119375Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} +{"id":"bd-xvdop","title":"Experiment: consolidate integration tests to reduce target/ size","description":"Rust defaults give each tests/*.rs file its own test binary, fully linked against the crate + all transitive deps. We have 164 integration test files across 20 crates; target/debug is currently 251 GB while target/release is 2.7 GB, suggesting per-file debug binaries are the dominant bloat. The ark project applied matklad's tests/integration/ consolidation pattern (posit-dev/ark#1240) and saw a ~57% drop in fresh cargo clean size and a ~3x drop in CI runner footprint. This issue tracks an experiment to measure the same change on Q2 from a macOS dev machine, piloting pampa (57 files) first before deciding on a full rollout. Plan: claude-notes/plans/2026-05-28-integration-test-consolidation.md","status":"open","priority":3,"issue_type":"chore","created_at":"2026-05-28T16:16:18.200610Z","created_by":"cscheid","updated_at":"2026-05-28T16:16:18.200610Z","source_repo":".","compaction_level":0,"original_size":0} {"id":"bd-xwq8","title":"Suppress page-nav for custom-layout pages","description":"Q1 hides the prev/next strip when a page sets page-layout: custom. Phase 4 ships without this; defer until a real page hits the edge case. See Phase 4 plan non-goals.","status":"open","priority":3,"issue_type":"feature","created_at":"2026-04-24T22:47:57.195184Z","created_by":"cscheid","updated_at":"2026-04-24T22:47:57.195184Z","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-xwq8","depends_on_id":"bd-nwun","type":"discovered-from","created_at":"2026-04-24T22:47:57.195184Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} {"id":"bd-xxul","title":"Non-.qmd input extensions in project discovery (.md, .ipynb, .Rmd)","description":"Phase 1 of the website epic explicitly discovers only .qmd files. The plan §File-list expansion defers the decision about which non-.qmd extensions are renderable documents (to include in project.files) vs. source artifacts (to treat like resources). Needs a user conversation about semantics for .md (literal markdown — render? copy?), .ipynb (converted at render time in Q1), .Rmd. Once settled, extend discovery and the pipeline's SourceType handling.","status":"open","priority":2,"issue_type":"task","created_at":"2026-04-24T01:05:32.255233Z","created_by":"cscheid","updated_at":"2026-04-24T01:05:32.255233Z","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-xxul","depends_on_id":"bd-w5os","type":"discovered-from","created_at":"2026-04-24T01:05:32.255233Z","created_by":"cscheid","metadata":"{}","thread_id":""}]} {"id":"bd-y1fs3","title":"q2 preview: CodeBlock DOM mismatches q2 render — classes go on instead of
, sourceCode class missing","description":"q2 render emits '
'; q2 preview emits '
'. Two divergences: (1) language/role classes are on in preview, but on
 in render (native writer's behavior, see crates/pampa/src/writers/html.rs:963-975); (2) the 'sourceCode' class — prepended to 
's class list whenever data-hl-spans is present (write_code_container_attr at line 487-495) — is entirely missing from preview. Both differences break Quarto theme rules that key off .sourceCode and pre.sourceCode, causing the visible spacing/indentation drift the user reported. Fix in React CodeBlock: move classes + data-* kvs to 
; bare ; prepend 'sourceCode' to 
's class list when data-hl-spans is non-empty.","status":"closed","priority":2,"issue_type":"bug","created_at":"2026-05-18T17:39:17.866285Z","created_by":"cscheid","updated_at":"2026-05-18T17:44:55.722896Z","closed_at":"2026-05-18T17:44:55.722758Z","close_reason":"Implementation complete: React CodeBlock now mirrors the native HTML writer's DOM shape — classes + data-* kvs on 
, bare , sourceCode prepended when data-hl-spans is present. Verified end-to-end: pre.className matches between q2 render and q2 preview ('sourceCode r cell-code' on both), code.className empty on both. 2 tests rewritten, 0 new added; 150/150 SPA integration tests green; cargo xtask verify 12/12 green.","source_repo":".","compaction_level":0,"original_size":0,"dependencies":[{"issue_id":"bd-y1fs3","depends_on_id":"bd-kw93","type":"parent-child","created_at":"2026-05-18T17:39:17.866285Z","created_by":"cscheid","metadata":"{}","thread_id":""},{"issue_id":"bd-y1fs3","depends_on_id":"bd-nxslt","type":"related","created_at":"2026-05-18T17:39:17.866285Z","created_by":"cscheid","metadata":"{}","thread_id":""}]}
diff --git a/claude-notes/plans/2026-05-28-integration-test-consolidation.md b/claude-notes/plans/2026-05-28-integration-test-consolidation.md
new file mode 100644
index 00000000..a4e5df60
--- /dev/null
+++ b/claude-notes/plans/2026-05-28-integration-test-consolidation.md
@@ -0,0 +1,215 @@
+# Experiment: consolidate integration tests into single binary per crate
+
+**Beads:** [bd-xvdop](../../.beads/issues.jsonl) — `br show bd-xvdop`
+**Branch:** `beads/bd-xvdop-integration-test-consolidation` (off `main`)
+**Status:** proposed (not yet started)
+
+## Overview
+
+Rust's default integration-test layout creates one test binary per
+`tests/*.rs` file. Each binary is fully linked against the host crate
+and all transitive dependencies. We have **164 integration test files
+across 20 crates** and `target/debug/` on this machine is currently
+**251 GB** while `target/release/` is only 2.7 GB — strongly
+suggesting the per-file debug test binaries are the dominant bloat,
+matching exactly the pattern the ark project diagnosed.
+
+The ark project hit the same problem on Linux CI (out-of-disk
+failures) and resolved it by moving `tests/*.rs` into
+`tests/integration/*.rs` with a single `tests/integration/main.rs`
+declaring each former file as a `pub mod`. Reported wins:
+
+- Fresh `cargo clean` size: **8.1 GiB → 3.5 GiB** (~57% reduction)
+- Test-suite compile time on macOS: **88s → 52s** (~40% faster)
+- Linux CI runner footprint: **15 GB → ~2 GB**
+
+This experiment measures the same change on Q2 from a macOS dev
+machine. We cannot directly measure Linux/Windows CI from here, but
+the ark numbers suggest the macOS delta will be a representative
+proxy for the platform delta.
+
+### Goals
+
+- Establish a clean baseline for debug + release test-build footprint.
+- Pilot the migration on `pampa` (57 files — the largest single signal).
+- If the pilot delta justifies it, roll out to the other 12 multi-file
+  crates and remeasure.
+- Capture all numbers in a research note so the Linux/Windows CI win
+  can be predicted before pushing.
+
+### Non-goals
+
+- Migrating crates that already have only one integration test file
+  (no payoff — each is already 1 binary).
+- WASM / hub-client changes (this is a Rust-only refactor).
+- Changing test execution semantics — nextest handles single-binary
+  test discovery fine.
+
+## References
+
+- ark PR: 
+- matklad post: 
+- Cargo issue cited by matklad: 
+
+## Crates in scope
+
+13 crates have >1 integration test file (sorted by file count, since
+the payoff scales with file count):
+
+| Crate                  | tests/*.rs files |
+| ---------------------- | ---------------: |
+| pampa                  |               57 |
+| quarto-core            |               33 |
+| qmd-syntax-helper      |               20 |
+| quarto-preview         |                7 |
+| quarto-sass            |                7 |
+| quarto                 |                7 |
+| quarto-highlight       |                6 |
+| comrak-to-pandoc       |                5 |
+| quarto-yaml-validation |                5 |
+| quarto-brand           |                4 |
+| quarto-citeproc        |                2 |
+| quarto-csl             |                2 |
+| quarto-doctemplate     |                2 |
+
+Out of scope (single-file integration test crates, no benefit):
+`quarto-error-reporting`, `quarto-hub`, `quarto-lsp`,
+`quarto-lsp-core`, `quarto-publish`, `quarto-trace`,
+`wasm-qmd-parser`.
+
+## Work Items
+
+### Phase 0 — Setup
+
+- [ ] Create branch `beads/bd-xvdop-integration-test-consolidation`
+      off `main` (worktree optional; sharing the existing 259 GB
+      target/ defeats the experiment, so prefer a fresh checkout)
+- [ ] Write a measurement helper script `scripts/measure-test-build.sh`
+      that:
+  - times `cargo build --workspace --tests` (debug or release per arg)
+  - records `target//` size via `du -sh`
+  - lists the largest 20 integration test binaries under
+    `target//deps/` with sizes
+  - prints a summary block we can paste into the research note
+
+### Phase 1 — Baseline measurement
+
+- [ ] `cargo clean`
+- [ ] `scripts/measure-test-build.sh debug` (records debug baseline)
+- [ ] `cargo clean`
+- [ ] `scripts/measure-test-build.sh release` (records release baseline)
+- [ ] Write baseline numbers into
+      `claude-notes/research/2026-05-28-integration-test-bloat.md`
+
+### Phase 2 — Pilot migration: pampa
+
+- [ ] Create `crates/pampa/tests/integration/main.rs` with `pub mod
+      ;` lines for each of the 57 current `tests/*.rs` files
+- [ ] `git mv` each `tests/*.rs` → `tests/integration/.rs`
+- [ ] Audit for collisions:
+  - any inline `mod X;` declarations in the migrated files
+    (`test_location_health.rs` has one — confirm inline-or-file)
+  - duplicate top-level identifier names across files (unlikely but
+    worth a `grep -E '^(fn|struct|enum|trait|const|static)'`)
+- [ ] Adjacent data dirs (`fixtures/`, `snapshots/`, `*.qmd`) stay
+      where they are — they're referenced by fixed paths from test
+      code; moving the .rs files doesn't change `CARGO_MANIFEST_DIR`
+- [ ] `cargo nextest run -p pampa` → expect green
+- [ ] `cargo xtask verify --skip-hub-build` → expect green
+
+### Phase 3 — Pilot measurement
+
+- [ ] `cargo clean`
+- [ ] `scripts/measure-test-build.sh debug` (pampa-pilot)
+- [ ] `cargo clean`
+- [ ] `scripts/measure-test-build.sh release` (pampa-pilot)
+- [ ] Compute pampa-pilot delta vs. baseline in research note
+
+### Phase 4 — Decision point
+
+- [ ] Review pilot numbers with user
+- [ ] **If** the pilot delta is meaningful (e.g. >20% size drop) →
+      Phase 5
+- [ ] **If not** → revert pilot, close beads issue with findings,
+      stop
+
+### Phase 5 — Full rollout (conditional)
+
+For each crate below, migrate in its own commit (one commit per
+crate makes any individual revert cheap). After each migration, run
+`cargo nextest run -p ` before moving on.
+
+- [ ] quarto-core (33 files)
+- [ ] qmd-syntax-helper (20 files)
+- [ ] quarto-preview (7 files)
+- [ ] quarto-sass (7 files)
+- [ ] quarto (7 files)
+- [ ] quarto-highlight (6 files)
+- [ ] comrak-to-pandoc (5 files)
+- [ ] quarto-yaml-validation (5 files)
+- [ ] quarto-brand (4 files)
+- [ ] quarto-citeproc (2 files)
+- [ ] quarto-csl (2 files)
+- [ ] quarto-doctemplate (2 files)
+- [ ] `cargo xtask verify --skip-hub-build` → expect green
+
+### Phase 6 — Final measurement
+
+- [ ] `cargo clean`
+- [ ] `scripts/measure-test-build.sh debug` (full-rollout)
+- [ ] `cargo clean`
+- [ ] `scripts/measure-test-build.sh release` (full-rollout)
+- [ ] Compute final delta vs. baseline and vs. pilot
+
+### Phase 7 — Report and decide
+
+- [ ] Update research note with all numbers + extrapolated CI impact
+- [ ] Discuss findings with user; ask for push permission
+- [ ] If pushed: update `CLAUDE.md` if any developer-facing test
+      invocation conventions change (e.g. references to per-file
+      test binaries)
+
+## Risks & open questions
+
+- **nextest binary filtering.** Per-binary names change from
+  `` to `integration`. Current CI runs
+  `cargo nextest run --tests --cargo-profile ci` (no per-binary
+  filters), so CI is unaffected. Confirmed by grepping `.github/`.
+  Still worth a quick check during pilot that no `xtask` or
+  `scripts/` invocation depends on the old per-file binary names.
+
+- **Module name collisions inside a single binary.** Each former
+  `tests/foo.rs` becomes module `integration::foo`. If two former
+  files both had e.g. `mod helpers;` referring to sibling files,
+  they would now refer to the same `integration/helpers.rs`. None
+  of the pampa files declare a file-based `mod` at the top level
+  (only `test_location_health.rs` has any `mod` keyword — needs
+  audit). Risk is small; the audit step in Phase 2 covers it.
+
+- **`fn main()` collisions.** A `#[test]` integration file can
+  optionally declare its own `fn main()`. Generated `main.rs` will
+  declare one. Verified: zero pampa test files declare `fn main()`.
+
+- **macOS-only measurement.** We cannot directly measure the
+  Linux/Windows CI delta from this machine. The ark PR shows the
+  size ratio is roughly stable across platforms (linker bloat is
+  proportional to dependency closure, which is the same on all
+  platforms). The research note should call out that the
+  Linux/Windows wins are *predicted*, not measured.
+
+- **Existing 259 GB target/.** Not affected by this change — it
+  accumulates across all branches the user has built locally. The
+  experiment uses a fresh build for clean numbers and `cargo clean`
+  between each phase. Worth offering to clean it up at the end of
+  the experiment regardless of outcome (would free ~256 GB).
+
+## Decision log
+
+- 2026-05-28: Use `tests/integration/` (matches ark PR) over
+  `tests/it/` (matklad blog). Same mechanic; ark name is more
+  discoverable to new contributors.
+- 2026-05-28: Pilot pampa first rather than migrate all 13 crates
+  in one pass. pampa is 57/164 ≈ 35% of integration files; if the
+  per-file bloat hypothesis holds, the pilot should already produce
+  a measurable size drop in `target/debug/` and give us a confident
+  decision point.

From 1b420d654ea86aed5d3a9cecc8a897d9663dc1d7 Mon Sep 17 00:00:00 2001
From: Carlos Scheidegger 
Date: Thu, 28 May 2026 12:19:27 -0500
Subject: [PATCH 02/20] pampa(bd-xvdop): consolidate integration tests into
 single binary
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Move all 57 `crates/pampa/tests/*.rs` files into `tests/integration/`
and add a generated `main.rs` declaring each as a `pub mod`. Cargo
now compiles one `integration` test binary instead of 57 per-file
binaries that each statically link pampa's full dependency closure
(matklad's pattern; see posit-dev/ark#1240 for the precedent).

Two surgical edits beyond the rename:
- `test.rs` and `test_error_corpus.rs` use insta's
  `set_snapshot_path("../snapshots/...")`, resolved relative to the
  source file's directory. After moving down one level, each
  occurrence needed an extra `../`. All 3 occurrences updated.

Audit findings (all clean):
- No `crate::` references in pampa tests
- One `super::*` inside an inline `mod tests {}` — semantics preserved
- No `include_str!`/`include_bytes!` source-file-relative compile-time
  paths
- One inline `mod tests {}` in `test_location_health.rs` — no risk

Validation:
- `cargo nextest run -p pampa --test integration` → 941 passed,
  2 skipped (matches pre-migration count)
- `cargo xtask verify --skip-hub-build` → all 9444 workspace Rust
  tests pass; the JS/TS leg is environmentally blocked in this
  fresh worktree (no `npm install` / WASM build), independent of
  this change.

Also includes experiment scaffolding (measurement script, research
note with baseline numbers, plan checkpoints) so the before/after
framing is visible in one diff.

Advances bd-xvdop through Phase 2; Phase 3 (pilot measurement)
follows in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) 
---
 ...26-05-28-integration-test-consolidation.md | 127 +++-
 .../2026-05-28-integration-test-bloat.md      | 154 ++++
 .../research/measurements/baseline-debug.log  | 656 ++++++++++++++++++
 .../measurements/baseline-release.log         | 656 ++++++++++++++++++
 .../attribution_html_coalescing_test.rs       |   0
 .../attribution_json_wire_test.rs             |   0
 .../{ => integration}/error_node_analysis.rs  |   0
 .../incremental_writer_investigation.rs       |   0
 .../incremental_writer_tests.rs               |   0
 .../inline_span_investigation.rs              |   0
 .../inline_splice_integration_tests.rs        |   0
 .../inline_splice_property_tests.rs           |   0
 .../inline_splice_safety_tests.rs             |   0
 .../{ => integration}/json_location_test.rs   |   0
 .../json_reader_smoke_tests.rs                |   0
 crates/pampa/tests/integration/main.rs        |  71 ++
 .../qmd_writer_source_info.rs                 |   0
 crates/pampa/tests/{ => integration}/test.rs  |   4 +-
 .../{ => integration}/test_ansi_writer.rs     |   0
 .../test_attr_source_parsing.rs               |   0
 .../test_attr_source_structure.rs             |   0
 .../{ => integration}/test_bare_lt_str.rs     |   0
 .../test_blockquote_multiline_attrs.rs        |   0
 .../test_citeproc_integration.rs              |   0
 .../{ => integration}/test_cli_input_arg.rs   |   0
 .../test_code_block_attributes.rs             |   0
 .../tests/{ => integration}/test_code_span.rs |   0
 .../test_diagnostic_determinism.rs            |   0
 .../test_editorial_mark_spacing.rs            |   0
 .../{ => integration}/test_error_corpus.rs    |   4 +-
 .../test_grid_table_error.rs                  |   0
 .../{ => integration}/test_hard_soft_break.rs |   0
 .../test_html_attr_handling.rs                |   0
 .../test_inline_locations.rs                  |   0
 .../test_json_div_transforms.rs               |   0
 .../{ => integration}/test_json_errors.rs     |   0
 .../{ => integration}/test_json_roundtrip.rs  |   0
 .../test_link_destination_linebreak.rs        |   0
 .../{ => integration}/test_location_health.rs |   0
 .../test_lua_attr_mutation.rs                 |   0
 .../test_lua_constructors.rs                  |   0
 .../tests/{ => integration}/test_lua_list.rs  |   0
 .../tests/{ => integration}/test_lua_utils.rs |   0
 .../tests/{ => integration}/test_math_attr.rs |   0
 .../tests/{ => integration}/test_meta.rs      |   0
 .../test_metadata_source_tracking.rs          |   0
 .../test_nested_yaml_serialization.rs         |   0
 .../test_ordered_list_formatting.rs           |   0
 .../test_rawblock_to_config_value.rs          |   0
 .../{ => integration}/test_section_divs.rs    |   0
 .../tests/{ => integration}/test_shortcode.rs |   0
 .../test_template_integration.rs              |   0
 .../test_trailing_linebreak_commonmark.rs     |   0
 .../test_treesitter_coverage.rs               |   0
 .../test_treesitter_refactoring.rs            |   0
 .../test_unclosed_attr_specifier.rs           |   0
 .../test_unicode_error_offsets.rs             |   0
 .../test_unicode_whitespace.rs                |   0
 .../tests/{ => integration}/test_warnings.rs  |   0
 .../test_wasm_entrypoints.rs                  |   0
 .../test_yaml_tag_regression.rs               |   0
 .../test_yaml_to_config_value.rs              |   0
 scripts/measure-test-build.sh                 | 115 +++
 63 files changed, 1750 insertions(+), 37 deletions(-)
 create mode 100644 claude-notes/research/2026-05-28-integration-test-bloat.md
 create mode 100644 claude-notes/research/measurements/baseline-debug.log
 create mode 100644 claude-notes/research/measurements/baseline-release.log
 rename crates/pampa/tests/{ => integration}/attribution_html_coalescing_test.rs (100%)
 rename crates/pampa/tests/{ => integration}/attribution_json_wire_test.rs (100%)
 rename crates/pampa/tests/{ => integration}/error_node_analysis.rs (100%)
 rename crates/pampa/tests/{ => integration}/incremental_writer_investigation.rs (100%)
 rename crates/pampa/tests/{ => integration}/incremental_writer_tests.rs (100%)
 rename crates/pampa/tests/{ => integration}/inline_span_investigation.rs (100%)
 rename crates/pampa/tests/{ => integration}/inline_splice_integration_tests.rs (100%)
 rename crates/pampa/tests/{ => integration}/inline_splice_property_tests.rs (100%)
 rename crates/pampa/tests/{ => integration}/inline_splice_safety_tests.rs (100%)
 rename crates/pampa/tests/{ => integration}/json_location_test.rs (100%)
 rename crates/pampa/tests/{ => integration}/json_reader_smoke_tests.rs (100%)
 create mode 100644 crates/pampa/tests/integration/main.rs
 rename crates/pampa/tests/{ => integration}/qmd_writer_source_info.rs (100%)
 rename crates/pampa/tests/{ => integration}/test.rs (99%)
 rename crates/pampa/tests/{ => integration}/test_ansi_writer.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_attr_source_parsing.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_attr_source_structure.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_bare_lt_str.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_blockquote_multiline_attrs.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_citeproc_integration.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_cli_input_arg.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_code_block_attributes.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_code_span.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_diagnostic_determinism.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_editorial_mark_spacing.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_error_corpus.rs (99%)
 rename crates/pampa/tests/{ => integration}/test_grid_table_error.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_hard_soft_break.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_html_attr_handling.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_inline_locations.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_json_div_transforms.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_json_errors.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_json_roundtrip.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_link_destination_linebreak.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_location_health.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_lua_attr_mutation.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_lua_constructors.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_lua_list.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_lua_utils.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_math_attr.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_meta.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_metadata_source_tracking.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_nested_yaml_serialization.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_ordered_list_formatting.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_rawblock_to_config_value.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_section_divs.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_shortcode.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_template_integration.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_trailing_linebreak_commonmark.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_treesitter_coverage.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_treesitter_refactoring.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_unclosed_attr_specifier.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_unicode_error_offsets.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_unicode_whitespace.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_warnings.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_wasm_entrypoints.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_yaml_tag_regression.rs (100%)
 rename crates/pampa/tests/{ => integration}/test_yaml_to_config_value.rs (100%)
 create mode 100755 scripts/measure-test-build.sh

diff --git a/claude-notes/plans/2026-05-28-integration-test-consolidation.md b/claude-notes/plans/2026-05-28-integration-test-consolidation.md
index a4e5df60..2a39a77a 100644
--- a/claude-notes/plans/2026-05-28-integration-test-consolidation.md
+++ b/claude-notes/plans/2026-05-28-integration-test-consolidation.md
@@ -81,41 +81,59 @@ Out of scope (single-file integration test crates, no benefit):
 
 ### Phase 0 — Setup
 
-- [ ] Create branch `beads/bd-xvdop-integration-test-consolidation`
-      off `main` (worktree optional; sharing the existing 259 GB
-      target/ defeats the experiment, so prefer a fresh checkout)
-- [ ] Write a measurement helper script `scripts/measure-test-build.sh`
+- [x] Create worktree
+      `.worktrees/bd-xvdop-experiment-consolidate-integration-tests/`
+      on branch `beads/bd-xvdop-experiment-consolidate-integration-tests`
+      (off `main`); worktree starts with empty `target/`, so baseline
+      measurements aren't polluted by the 259 GB in the main checkout
+- [x] Write a measurement helper script `scripts/measure-test-build.sh`
       that:
   - times `cargo build --workspace --tests` (debug or release per arg)
   - records `target//` size via `du -sh`
-  - lists the largest 20 integration test binaries under
-    `target//deps/` with sizes
-  - prints a summary block we can paste into the research note
+  - lists the largest 25 binaries under `target//deps/`
+  - prints a paste-able summary block
+- [x] Create research note skeleton
+      `claude-notes/research/2026-05-28-integration-test-bloat.md`
+      and `claude-notes/research/measurements/` directory
 
 ### Phase 1 — Baseline measurement
 
-- [ ] `cargo clean`
-- [ ] `scripts/measure-test-build.sh debug` (records debug baseline)
-- [ ] `cargo clean`
-- [ ] `scripts/measure-test-build.sh release` (records release baseline)
-- [ ] Write baseline numbers into
+- [x] `cargo clean` (no-op — fresh worktree)
+- [x] `scripts/measure-test-build.sh debug` →
+      **21 GB / 114 s / 220 deps executables / 10.5 GiB exec-bytes**
+- [x] `cargo clean` (freed 22.3 GiB / 36 689 files)
+- [x] `scripts/measure-test-build.sh release` →
+      **11 GB / 133 s / 220 deps executables / 9.0 GiB exec-bytes**
+- [x] Write baseline numbers into
       `claude-notes/research/2026-05-28-integration-test-bloat.md`
 
 ### Phase 2 — Pilot migration: pampa
 
-- [ ] Create `crates/pampa/tests/integration/main.rs` with `pub mod
+- [x] Create `crates/pampa/tests/integration/main.rs` with `pub mod
       ;` lines for each of the 57 current `tests/*.rs` files
-- [ ] `git mv` each `tests/*.rs` → `tests/integration/.rs`
-- [ ] Audit for collisions:
-  - any inline `mod X;` declarations in the migrated files
-    (`test_location_health.rs` has one — confirm inline-or-file)
-  - duplicate top-level identifier names across files (unlikely but
-    worth a `grep -E '^(fn|struct|enum|trait|const|static)'`)
-- [ ] Adjacent data dirs (`fixtures/`, `snapshots/`, `*.qmd`) stay
+- [x] `git mv` each `tests/*.rs` → `tests/integration/.rs`
+- [x] Audit for collisions:
+  - `test_location_health.rs` has one inline `mod tests {}` — verified
+    inline only, no file-resolution risk
+  - Zero `crate::` references in pampa tests (would change meaning
+    in consolidated layout)
+  - Exactly one `super::` reference in `test_location_health.rs`,
+    inside the inline `mod tests {}` — parent semantics preserved
+    (`super::*` still refers to the enclosing file)
+  - No `include_str!` / `include_bytes!` (source-file-relative
+    compile-time paths)
+- [x] Adjacent data dirs (`fixtures/`, `snapshots/`, `*.qmd`) stay
       where they are — they're referenced by fixed paths from test
       code; moving the .rs files doesn't change `CARGO_MANIFEST_DIR`
-- [ ] `cargo nextest run -p pampa` → expect green
-- [ ] `cargo xtask verify --skip-hub-build` → expect green
+- [x] **Discovered:** insta `set_snapshot_path("../snapshots/…")`
+      in `test.rs` and `test_error_corpus.rs` is resolved relative
+      to the test file's directory. After the move, all 3 occurrences
+      needed an extra `../` to keep pointing at `crates/pampa/snapshots/`.
+      Fixed in this pilot; cleaned up 3 stale `.snap.new` litter
+      files generated by the first broken run.
+- [x] `cargo nextest run -p pampa --test integration` → **941 passed,
+      2 skipped, 0 failed**
+- [ ] `cargo xtask verify --skip-hub-build` → expect green (in progress)
 
 ### Phase 3 — Pilot measurement
 
@@ -139,18 +157,59 @@ For each crate below, migrate in its own commit (one commit per
 crate makes any individual revert cheap). After each migration, run
 `cargo nextest run -p ` before moving on.
 
-- [ ] quarto-core (33 files)
-- [ ] qmd-syntax-helper (20 files)
-- [ ] quarto-preview (7 files)
-- [ ] quarto-sass (7 files)
-- [ ] quarto (7 files)
-- [ ] quarto-highlight (6 files)
-- [ ] comrak-to-pandoc (5 files)
-- [ ] quarto-yaml-validation (5 files)
-- [ ] quarto-brand (4 files)
-- [ ] quarto-citeproc (2 files)
-- [ ] quarto-csl (2 files)
-- [ ] quarto-doctemplate (2 files)
+**Preflight checklist** (run on each crate before moving files —
+the pampa pilot showed how easy it is to miss one and then watch
+several seemingly-unrelated tests fail):
+
+```bash
+crate=
+# 1. File-root fn main / file-based mod / inline mod summary
+grep -nE "^(mod [a-zA-Z_]+;|mod [a-zA-Z_]+ \{|fn main)" \
+  crates/$crate/tests/*.rs
+
+# 2. Source-file-relative compile-time and runtime paths
+grep -nE 'include_str!|include_bytes!|include_dir!|#\[path' \
+  crates/$crate/tests/*.rs
+grep -nE 'set_snapshot_path|"\.\./' \
+  crates/$crate/tests/*.rs
+
+# 3. crate:: and super:: usage (would change meaning when each file
+#    becomes a sub-module rather than the binary's crate root)
+grep -nE 'crate::|super::' crates/$crate/tests/*.rs
+```
+
+Each non-zero match needs to be evaluated against the
+"`tests/integration/` is one level deeper" rule. The two known
+surgical edits below were caught by this checklist on the audit
+pass; new ones may surface per crate.
+
+Audit findings (pre-cached during Phase 0 / Phase 2 to make rollout
+fast): all remaining crates are **clean pure-rename migrations**
+*except* the specific files called out below.
+
+- [ ] quarto-core (33 files) — clean rename; one inline `mod
+      orchestrator_engine_channel {…}` in `project_resources.rs`
+      stays inline, no path resolution
+- [ ] qmd-syntax-helper (20 files) — clean rename
+- [ ] quarto-preview (7 files) — clean rename
+- [ ] quarto-sass (7 files) — clean rename
+- [ ] quarto (7 files) — clean rename **plus** edit
+      `tests/integration/trace_cli.rs`: change
+      `#[path = "../src/commands/trace.rs"]` →
+      `#[path = "../../src/commands/trace.rs"]` to account for the
+      extra directory level
+- [ ] quarto-highlight (6 files) — clean rename
+- [ ] comrak-to-pandoc (5 files) — clean rename; `debug.rs` and
+      `debug_comrak.rs` have file-root `fn main() {}` that will
+      become unused inner functions inside their modules — add
+      `#[allow(dead_code)]` if rustc warns, or simply drop the
+      `fn main()` lines (they were only there to satisfy the
+      per-file harness in the old layout)
+- [ ] quarto-yaml-validation (5 files) — clean rename
+- [ ] quarto-brand (4 files) — clean rename
+- [ ] quarto-citeproc (2 files) — clean rename
+- [ ] quarto-csl (2 files) — clean rename
+- [ ] quarto-doctemplate (2 files) — clean rename
 - [ ] `cargo xtask verify --skip-hub-build` → expect green
 
 ### Phase 6 — Final measurement
diff --git a/claude-notes/research/2026-05-28-integration-test-bloat.md b/claude-notes/research/2026-05-28-integration-test-bloat.md
new file mode 100644
index 00000000..5e92752e
--- /dev/null
+++ b/claude-notes/research/2026-05-28-integration-test-bloat.md
@@ -0,0 +1,154 @@
+# Integration test bloat — measurements
+
+**Beads:** bd-xvdop
+**Plan:** [2026-05-28-integration-test-consolidation.md](../plans/2026-05-28-integration-test-consolidation.md)
+**Host:** macOS dev machine (Apple Silicon)
+
+This note records the on-disk and wall-time cost of building all test
+binaries in the workspace, both before and after consolidating each
+crate's `tests/*.rs` files into `tests/integration/main.rs` + sibling
+modules.
+
+Each measurement is taken from a clean state (`cargo clean` between
+runs). The raw script logs live in `measurements/`.
+
+> Because each integration test file in Rust's default layout compiles
+> into its own fully-linked binary, we expect debug numbers to dominate
+> (no symbol stripping, no LTO), and release numbers to be much smaller.
+> Our starting `target/` confirms the asymmetry: `target/debug` = 251 GB
+> vs `target/release` = 2.7 GB on this machine.
+
+## Headline table
+
+| Stage                | Profile | target/ | Wall-time | Executables in deps/ |
+| -------------------- | ------- | ---------------: | --------: | -------------------: |
+| Baseline             | debug   |       21 GB (1) |    114 s  |                  220 |
+| Baseline             | release |           11 GB  |    133 s  |                  220 |
+| Pilot: pampa only    | debug   |                — |         — |                    — |
+| Pilot: pampa only    | release |                — |         — |                    — |
+| Full rollout         | debug   |                — |         — |                    — |
+| Full rollout         | release |                — |         — |                    — |
+
+(1) `cargo clean` after the build reported "Removed 36689 files, 22.3 GiB total"
+— a slight discrepancy with `du`'s 21 GB because `du` undercounts
+hardlinks and compressed metadata. We use `du -sh` as the headline
+figure to stay consistent with the script's output.
+
+(Filled in as each phase completes.)
+
+## Methodology
+
+```bash
+# For each measurement:
+cargo clean
+scripts/measure-test-build.sh