From 25366c9d7cec4a67857ed76a9b3c9de3f6e4b4b1 Mon Sep 17 00:00:00 2001 From: Brian Love Date: Mon, 18 May 2026 12:53:29 -0700 Subject: [PATCH] docs(research): close filter-text perf-fix investigation (PRs #142-146) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After the full diagnostic pipeline (CDP tracing → trigger gating → sourcemaps → interaction-window slicing), single-trace data shows in-window scripting for filter-text is ~5 ms, of which the leading hotspot (`matchesFilters`) is ~0.7 ms. PR #134's n=20 p95 is 16.79 ms; the ~10 ms gap is in paint/composite/native browser work that scripting fixes can't move. Even eliminating matchesFilters entirely would only get filter-text to ~16.0-16.2 ms — still grazing. Investigation closed. Infrastructure (analyzer, sourcemaps, slicing, markers) is production-ready and reusable. Homepage prose (PR #141) accurately reflects state: real-but-over-budget; pretable 2-3.5× faster than every comparator. Memory updated with the lesson that single-trace n=1 is not enough for borderline budget misses. Co-Authored-By: Claude Opus 4.7 --- ...5-16-filter-text-investigation-closeout.md | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 docs/research/2026-05-16-filter-text-investigation-closeout.md diff --git a/docs/research/2026-05-16-filter-text-investigation-closeout.md b/docs/research/2026-05-16-filter-text-investigation-closeout.md new file mode 100644 index 0000000..192681d --- /dev/null +++ b/docs/research/2026-05-16-filter-text-investigation-closeout.md @@ -0,0 +1,63 @@ +# Filter-text perf investigation closeout — 2026-05-16 + +## Summary + +Closing the wrapped-text filter perf-fix investigation thread that ran through PRs #142, #143, #144, #145, #146. After landing the full CDP + sourcemap + interaction-window-slicing pipeline, the data shows **the 1-2 ms budget miss on filter-text/sort/filter-metadata is not concentrated in any single scripting hotspot large enough to reliably close**. The leading interaction-window scripting hotspot is `matchesFilters` at ~0.6-0.8 ms self-time (~14% of in-window scripting); eliminating it would move filter-text from 16.79 ms p95 to ~16.0-16.2 ms — still grazing the 16 ms single-frame budget. PR #134's homepage reframe ("real-but-over-budget; pretable still 2-3.5× faster than every comparator") remains accurate. **No production code change shipped from this investigation chain.** + +## What this investigation produced + +| PR | What landed | +|---|---| +| #142 | First diagnostic memo; identified harness gap (Playwright action trace ≠ flame graph) | +| #143 | Opt-in CDP tracing (`PLAYWRIGHT_PERF_TRACE=1`) | +| #144 | `waitForTrigger=1` gate so CDP attaches before bench interaction runs | +| #145 | First flame-graph diagnostic + `scripts/analyze-cdp.mjs` + bench sourcemaps; misattributed `getEstimatedRowHeightSignature` from full-trace view | +| #146 | `performance.mark` window bounds + `--window=interaction` slicing; corrected #145; identified `matchesFilters` | +| (closed) | This memo. | + +The infrastructure is permanent and reusable for any future bench-perf investigation. The lesson — always slice to the interaction window — is saved to repo-wide memory. + +## Why no fix shipped + +Single-trace n=1 interaction window: ~6-8 ms. PR #134's n=20 p95: 16.79 ms. The ~10 ms gap between single-sample and p95 is in **paint/composite/native browser work** that the V8 CPU profiler doesn't fully attribute. Even reducing the entire `matchesFilters` cost to zero would only save ~0.7 ms; the remaining budget miss lives outside scripting. + +A `WeakMap>` cache for `matchesFilters` would help real-world multi-character typing (where filter triggers repeat as the user types) but **does not help the bench scenario** (single trigger per run). Shipping it would have been a real-world improvement with zero bench-metric movement; arguably worth doing as a separate small PR, but not under the "fix filter-text budget miss" banner. + +## What the captured data actually shows + +Interaction window for `pretable / S2 / hypothesis / filter-text` (n=1, 6.77 ms): + +``` + 4822μs (47.2%) (program) native overhead + 773μs ( 7.6%) matchesFilters packages/grid-core/src/derived-rows.ts:73 + 507μs ( 5.0%) (anon) packages/react/src/pretable-surface.tsx + 253μs ( 2.5%) u8 packages/react/dist/index.mjs:17 + 220μs ( 2.2%) bench-runtime.ts:376 (probe-row read) +``` + +Top RasterTask: 908μs (single tile). Total raster: ~3 ms across many small tiles. Total Layout/UpdateLayoutTree: <1 ms. + +So in a ~7 ms interaction window: +- ~5.4 ms attributable JS (incl. native overhead) +- ~1-2 ms paint/raster/composite + +The bench p95 of 16.79 ms must reflect either: +- Variance: some runs hit slow GC / cache-miss paths the trace didn't sample. +- Aggregate paint cost across the multi-frame settle window (not just first-changed frame). +- Mount-time work bleeding into the timer (though `startTimestamp` is post-rAF, so this should be excluded). + +Identifying which would require capturing 20+ traces and statistically attributing per-run timing — significantly more work than this investigation's scope. + +## Recommended next steps (if anyone returns to this) + +1. **Capture n≥20 CDP traces and aggregate.** The current analyzer works on a single trace; extending it to aggregate p95-per-frame across traces would let us attribute the actual long-tail. +2. **Shift focus to settling/paint, not scripting.** The bench's `settle_duration_ms` is separate from `interaction_latency_ms`. If the budget miss is in paint, scripting-level optimizations are irrelevant. +3. **WeakMap-cached `matchesFilters` as a real-world-only PR.** Doesn't move the bench but helps live filter UX. Optional, low-risk. + +## Verdict + +**Investigation closed. Homepage prose (PR #141) accurately reflects the state: filter-text/sort/filter-metadata are ~1 ms over budget, pretable remains 2-3.5× faster than every measured comparator. The interaction-window scripting is already well-optimized; remaining cost is in unattributed paint/composite/native work that needs a different kind of investigation.** + +## Lesson (also in memory) + +CDP scripting traces alone don't explain bench latency for fast operations dominated by paint/composite. When `analyze-cdp.mjs --window=interaction` shows scripting time well below the bench's reported `interaction_latency_ms`, the remaining cost lives outside scripting; chasing scripting-level fixes for that gap is unlikely to move the metric.