From 25366c9d7cec4a67857ed76a9b3c9de3f6e4b4b1 Mon Sep 17 00:00:00 2001
From: Brian Love <brian@liveloveapp.com>
Date: Mon, 18 May 2026 12:53:29 -0700
Subject: [PATCH] docs(research): close filter-text perf-fix investigation (PRs
 #142-146)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

After the full diagnostic pipeline (CDP tracing → trigger gating →
sourcemaps → interaction-window slicing), single-trace data shows
in-window scripting for filter-text is ~5 ms, of which the leading
hotspot (`matchesFilters`) is ~0.7 ms. PR #134's n=20 p95 is
16.79 ms; the ~10 ms gap is in paint/composite/native browser work
that scripting fixes can't move. Even eliminating matchesFilters
entirely would only get filter-text to ~16.0-16.2 ms — still grazing.

Investigation closed. Infrastructure (analyzer, sourcemaps, slicing,
markers) is production-ready and reusable. Homepage prose (PR #141)
accurately reflects state: real-but-over-budget; pretable 2-3.5×
faster than every comparator.

Memory updated with the lesson that single-trace n=1 is not enough
for borderline budget misses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 ...5-16-filter-text-investigation-closeout.md | 63 +++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 docs/research/2026-05-16-filter-text-investigation-closeout.md

diff --git a/docs/research/2026-05-16-filter-text-investigation-closeout.md b/docs/research/2026-05-16-filter-text-investigation-closeout.md
new file mode 100644
index 0000000..192681d
--- /dev/null
+++ b/docs/research/2026-05-16-filter-text-investigation-closeout.md
@@ -0,0 +1,63 @@
+# Filter-text perf investigation closeout — 2026-05-16
+
+## Summary
+
+Closing the wrapped-text filter perf-fix investigation thread that ran through PRs #142, #143, #144, #145, #146. After landing the full CDP + sourcemap + interaction-window-slicing pipeline, the data shows **the 1-2 ms budget miss on filter-text/sort/filter-metadata is not concentrated in any single scripting hotspot large enough to reliably close**. The leading interaction-window scripting hotspot is `matchesFilters` at ~0.6-0.8 ms self-time (~14% of in-window scripting); eliminating it would move filter-text from 16.79 ms p95 to ~16.0-16.2 ms — still grazing the 16 ms single-frame budget. PR #134's homepage reframe ("real-but-over-budget; pretable still 2-3.5× faster than every comparator") remains accurate. **No production code change shipped from this investigation chain.**
+
+## What this investigation produced
+
+| PR | What landed |
+|---|---|
+| #142 | First diagnostic memo; identified harness gap (Playwright action trace ≠ flame graph) |
+| #143 | Opt-in CDP tracing (`PLAYWRIGHT_PERF_TRACE=1`) |
+| #144 | `waitForTrigger=1` gate so CDP attaches before bench interaction runs |
+| #145 | First flame-graph diagnostic + `scripts/analyze-cdp.mjs` + bench sourcemaps; misattributed `getEstimatedRowHeightSignature` from full-trace view |
+| #146 | `performance.mark` window bounds + `--window=interaction` slicing; corrected #145; identified `matchesFilters` |
+| (closed) | This memo. |
+
+The infrastructure is permanent and reusable for any future bench-perf investigation. The lesson — always slice to the interaction window — is saved to repo-wide memory.
+
+## Why no fix shipped
+
+Single-trace n=1 interaction window: ~6-8 ms. PR #134's n=20 p95: 16.79 ms. The ~10 ms gap between single-sample and p95 is in **paint/composite/native browser work** that the V8 CPU profiler doesn't fully attribute. Even reducing the entire `matchesFilters` cost to zero would only save ~0.7 ms; the remaining budget miss lives outside scripting.
+
+A `WeakMap<row, Map<columnId, lowercaseStr>>` cache for `matchesFilters` would help real-world multi-character typing (where filter triggers repeat as the user types) but **does not help the bench scenario** (single trigger per run). Shipping it would have been a real-world improvement with zero bench-metric movement; arguably worth doing as a separate small PR, but not under the "fix filter-text budget miss" banner.
+
+## What the captured data actually shows
+
+Interaction window for `pretable / S2 / hypothesis / filter-text` (n=1, 6.77 ms):
+
+```
+  4822μs  (47.2%)  (program)                                       native overhead
+   773μs  ( 7.6%)  matchesFilters  packages/grid-core/src/derived-rows.ts:73
+   507μs  ( 5.0%)  (anon) packages/react/src/pretable-surface.tsx
+   253μs  ( 2.5%)  u8 packages/react/dist/index.mjs:17
+   220μs  ( 2.2%)  bench-runtime.ts:376  (probe-row read)
+```
+
+Top RasterTask: 908μs (single tile). Total raster: ~3 ms across many small tiles. Total Layout/UpdateLayoutTree: <1 ms.
+
+So in a ~7 ms interaction window:
+- ~5.4 ms attributable JS (incl. native overhead)
+- ~1-2 ms paint/raster/composite
+
+The bench p95 of 16.79 ms must reflect either:
+- Variance: some runs hit slow GC / cache-miss paths the trace didn't sample.
+- Aggregate paint cost across the multi-frame settle window (not just first-changed frame).
+- Mount-time work bleeding into the timer (though `startTimestamp` is post-rAF, so this should be excluded).
+
+Identifying which would require capturing 20+ traces and statistically attributing per-run timing — significantly more work than this investigation's scope.
+
+## Recommended next steps (if anyone returns to this)
+
+1. **Capture n≥20 CDP traces and aggregate.** The current analyzer works on a single trace; extending it to aggregate p95-per-frame across traces would let us attribute the actual long-tail.
+2. **Shift focus to settling/paint, not scripting.** The bench's `settle_duration_ms` is separate from `interaction_latency_ms`. If the budget miss is in paint, scripting-level optimizations are irrelevant.
+3. **WeakMap-cached `matchesFilters` as a real-world-only PR.** Doesn't move the bench but helps live filter UX. Optional, low-risk.
+
+## Verdict
+
+**Investigation closed. Homepage prose (PR #141) accurately reflects the state: filter-text/sort/filter-metadata are ~1 ms over budget, pretable remains 2-3.5× faster than every measured comparator. The interaction-window scripting is already well-optimized; remaining cost is in unattributed paint/composite/native work that needs a different kind of investigation.**
+
+## Lesson (also in memory)
+
+CDP scripting traces alone don't explain bench latency for fast operations dominated by paint/composite. When `analyze-cdp.mjs --window=interaction` shows scripting time well below the bench's reported `interaction_latency_ms`, the remaining cost lives outside scripting; chasing scripting-level fixes for that gap is unlikely to move the metric.