@@ -24,9 +24,14 @@ The GroupBy Regression module provides high-performance grouped linear regressio
2424| ** 12.14b.GB** | ** BF Integration** | ** Jan 1, 2026** | ✅ Complete |
2525| ** 12.14b.GB-add** | ** Dual Timing + cProfile** | ** Jan 1, 2026** | ✅ Complete |
2626| ** 12.14c.GB** | ** ADF Visualization** | ** Jan 3-4, 2026** | ✅ Complete |
27+ | ** 13.7.GB** | ** Capability Matrix Infrastructure** | ** Feb 7-8, 2026** | ✅ Complete |
28+ | ** 13.8.GB** | ** SW API Refactor + Invariance Tests** | ** Feb 9-10, 2026** | ✅ Complete |
29+ | ** 13.9.GB** | ** V3 Incremental Algorithm** | ** Feb 10, 2026** | ✅ Complete |
30+ | ** 13.8.SW** | ** Parallel Sliding Window + Benchmarks** | ** Feb 12-14, 2026** | ✅ Complete |
2731| 12.15.GB | V4 Integration | — | 📋 Planned |
2832
29- ** Current Test Count:** 145 benchmark tests + existing kernel/module suites
33+ ** Current Test Count:** 338 passed, 3 failed (pre-existing), 102 features (25 verified)
34+ ** Capability Matrix:** Phase 13.8.SW — 0 broken, 1 planned (SW.weighted)
3035
3136---
3237
@@ -54,6 +59,143 @@ Phases 12.14.GB through 12.14c.GB address this with:
5459
5560---
5661
62+ ## February 2026 Phases
63+
64+ ### Phase 13.8.SW: Parallel Sliding Window + Benchmarks (Feb 12-14, 2026)
65+ ** Tag:** ` phase-13.8.SW `
66+ ** Commits:** ` b38fd93d ` , ` b45f69c5 ` , ` 7838885c ` , ` 8785fd58 ` , ` 73de51f1 ` , ` ef8c89bc `
67+ ** Goal:** Parallel execution for V5 sliding window, benchmark consolidation, user-facing documentation
68+
69+ ** Deliverables:**
70+
71+ | # | Component | Description |
72+ | ---| -----------| -------------|
73+ | D1 | Parallel dispatch | ` make_sliding_window_fit_parallel() ` with fork() COW |
74+ | D2 | Zero-pickle fix | Module-level shared state, ~ 200B/task vs 125MB |
75+ | D3 | Counting sort | O(N) ` _counting_sort_indices() ` , 44.8× vs mergesort |
76+ | D4 | Numba cache fix | ` conftest.py ` hash-based auto-invalidation |
77+ | D5 | Parallel tests | 8 tests in ` test_parallel_sliding_window.py ` |
78+ | D6 | Parametric benchmark | ` bench_slidingwindow_parametric.py ` (V1/V2/V3/V5, 11 cost models) |
79+ | D7 | Parallel benchmark | ` bench_slidingwindow_parallel.py ` (scaling curve, sort comparison) |
80+ | D8 | README | ` README_sliding_window_benchmark.md ` (algorithm guidance + cost formulas) |
81+ | D9 | Capability matrix | Updated to 102 features, Phase 13.8.SW |
82+
83+ ** Performance Results:**
84+
85+ | Metric | Value |
86+ | --------| -------|
87+ | V5 speedup vs alternatives | 25–68× (wins 10/10 configs) |
88+ | Parallel scaling (16 workers, 112M rows) | 3.3× |
89+ | Counting sort vs mergesort | 44.8× |
90+ | Pickle overhead eliminated | 155GB → 200 bytes/task |
91+ | Smoothing overhead at TPC scale | ~ 10–20% vs noSW |
92+
93+ ** TPC Predictions (extrapolated):**
94+
95+ | Scenario | Serial (36×3) | With 10-way parallel |
96+ | ----------| ---------------| ----------------------|
97+ | Standard (54K bins, 1K rpb) | ~ 3.4 min | ~ 20s |
98+ | High (54K bins, 2K rpb) | ~ 6.8 min | ~ 40s |
99+
100+ ** Bug Fixes:**
101+
102+ | Bug | Impact | Fix |
103+ | -----| --------| -----|
104+ | ` _build_bin_index_map ` on V5 path | 85% of V5tot wasted (70 min → 3.4 min) | Skip on V5 path |
105+ | ` executor.submit() ` pickled arrays | 125MB × 36 = 155GB serialization | Module globals + fork() COW |
106+ | Numba stale cache after restructuring | ` ModuleNotFoundError ` | Hash-based auto-clear |
107+
108+ ** Cost Models:** 11 linear models fitted (R² > 0.99 at production scale). User-facing prediction formulas in README with machine-scaling guidance.
109+
110+ ** Tests:** 338 passed, 3 failed (pre-existing), 0 errors
111+
112+ ** Reviewed by:** Claude14 (coder), cross-review pending
113+
114+ ---
115+
116+ ### Phase 13.9.GB: V3 Incremental Algorithm (Feb 10, 2026)
117+ ** Commit:** ` bda26ca3 `
118+ ** Goal:** Replace O(N_bins × N_nbr × RPB) recompute with O(N_rows + N_bins × N_nbr) incremental
119+
120+ ** Key Innovation:** Pre-compute per-bin sufficient statistics (XtX, XtY, n, sum_y, sum_y2) in one pass over raw data, then sum neighbor matrices instead of re-accumulating from rows for each window.
121+
122+ ** Performance:**
123+
124+ | Config | V3/V1 speedup |
125+ | --------| ---------------|
126+ | 10³ grid, W=1 | 4.6× |
127+ | 15³ grid, W=2 | 5.6× |
128+
129+ ** Correctness:** V3 = V1 to machine precision (max diff < 3×10⁻¹⁴)
130+
131+ ** Trade-off:** median=NaN (cannot compute from sufficient statistics)
132+
133+ ** Tests:** +10 new invariance tests (TestSWV3Parity). Results: 306 passed, 3 failed (pre-existing)
134+
135+ ---
136+
137+ ### Phase 13.8.GB: SW API Refactor + V3b + V3-Numba (Feb 9-10, 2026)
138+ ** Commits:** ` 2a713f2c ` , ` 6601dc9b ` , ` bd6f042e `
139+ ** Goal:** Align sliding window API with v4 conventions, add boundary/kernel modes, add Numba incremental solver
140+
141+ ** Sub-phases:**
142+
143+ ** Action A+B (Feb 9):** 3 bug fixes + invariance tests
144+ - A.1: Remove duplicate validation block
145+ - A.2: Fix wrong arg to ` _get_neighbor_bins `
146+ - A.3: Add ` res.bse ` extraction — new ` _err ` columns (Bug #6 )
147+ - 13 invariance tests with analytical checks (nsigma recovery, error consistency, pull distribution)
148+
149+ ** API Refactor (Feb 10):** v4-aligned ` make_sliding_window_fit() `
150+ - Keyword-only params, v4 naming (` gb_columns ` , ` linear_columns ` , ` weights ` , ` min_stat ` )
151+ - ` backend='auto' ` (Numba auto-detect), omitted window dims default to 0
152+ - Feature taxonomy: 96 → 100 features. Verified: 21 → 24
153+
154+ ** V3b Boundary + Kernel (Feb 10):**
155+ - ` boundary='full'|'symmetric'|'periodic' ` , per-dimension
156+ - ` kernel='uniform'|'gaussian'|'epanechnikov'|'linear'|callable `
157+ - 15 V3b invariance tests
158+
159+ ** V3-Numba (Feb 10):** Incremental + Cholesky JIT
160+ - ` _get_numba_incremental_kernel ` : Cholesky solve + SE + diagnostics
161+ - Shared neighbor table between accumulation and solve phases
162+ - Dispatch: ` algorithm='incremental', backend='numba' `
163+
164+ ** Tests:** 55 SW tests total (24 new), all passing
165+
166+ ** Approved by:** GPT10, GPT11, Claude, Claude-Opus, Claude12 (5/5)
167+
168+ ---
169+
170+ ### Phase 13.7.GB: Capability Matrix Infrastructure (Feb 7-8, 2026)
171+ ** Commits:** ` 5279be73 ` , ` 692ffeab ` , ` 3bfdcfc5 `
172+ ** Goal:** Two-tier quality classification for all features and tests
173+
174+ ** Deliverables:**
175+
176+ | Component | Description |
177+ | -----------| -------------|
178+ | ` feature_taxonomy.py ` | 96-feature taxonomy with proof-test references |
179+ | ` test_layer_classification.py ` | 291-test classification (invariance/integration/smoke/validation/performance) |
180+ | ` generate_capability_matrix.py ` | Auto-generated two-tier capability matrix |
181+ | ` run_tests.sh ` | Unified test runner with timestamped logging, reviewer.zip packaging |
182+ | ` conftest.py ` | Register feature/layer markers |
183+ | ` pytest.ini ` | Strict marker enforcement |
184+ | ` tests/README.md ` | Test infrastructure documentation |
185+
186+ ** Classification Rules:**
187+ - ✅ Verified: has invariance or integration test
188+ - ☑️ Smoke-only: smoke tests only — does not catch numerical regressions
189+ - 🧨 Broken: test failures
190+ - 📋 Planned: no tests yet
191+ - Fail-closed (§3.5): unclassified tests default to smoke
192+
193+ ** Initial Matrix:** 96 features (21 verified, 75 smoke-only, 0 broken, 0 partial)
194+
195+ ** Tests:** 296 passed, 3 failed (pre-existing)
196+
197+ ---
198+
57199## Recent Phases (Jan 2026)
58200
59201### Phase 12.14c.GB: AliasDataFrame Visualization (Jan 3-4, 2026)
@@ -431,6 +573,15 @@ Two specifications govern Phase 12.14:
431573| Always-on cProfile | Historical profiles for bottleneck analysis | 12.14b.GB-add |
432574| Skip cProfile for n_jobs > 1 | cProfile only captures main process | 12.14c.GB D6 |
433575| wall_time_s optional fallback | Backward compat with pre-12.14b.GB data | 12.14c.GB D1 |
576+ | Two-tier capability matrix | Verified (inv/int test) vs smoke-only classification | 13.7.GB |
577+ | Fail-closed rule | Unclassified tests default to smoke, not verified | 13.7.GB |
578+ | V4-aligned SW API | Keyword-only, gb_columns/linear_columns naming | 13.8.GB |
579+ | Incremental algorithm (V3) | Pre-compute XtX/XtY per bin, sum neighbors | 13.9.GB |
580+ | V5 incremental+numba | Best of both: incremental algorithm + JIT kernels | 13.8.SW |
581+ | Fork() COW dispatch | Module globals + try/finally, not pickle | 13.8.SW |
582+ | O(N) counting sort | Numba JIT counting sort vs O(N log N) argsort | 13.8.SW |
583+ | Retain statsmodels | OLS/WLS/GLM/RLM diversity for TPC calibration | 13.8.GB |
584+ | Always-on benchmark validation | V1 vs V5 diff check catches bugs unit tests miss | 13.8.SW |
434585
435586---
436587
@@ -464,9 +615,20 @@ Each phase follows this workflow:
464615
465616| File | Purpose |
466617| ------| ---------|
618+ | ` groupby_regression_sliding_window.py ` | Sliding window regression (V1-V5 + parallel) |
467619| ` groupby_regression_kernels.py ` | Shared Numba kernel module |
468620| ` groupby_regression_optimized.py ` | V4/V5 implementations |
621+ | ` tests/test_parallel_sliding_window.py ` | 8 parallel correctness tests |
622+ | ` tests/test_invariance_sliding_window.py ` | 55 SW invariance tests |
469623| ` tests/test_groupby_regression_kernels.py ` | 28 kernel tests |
624+ | ` tests/feature_taxonomy.py ` | 102-feature taxonomy |
625+ | ` tests/test_layer_classification.py ` | 281-test layer classification |
626+ | ` scripts/generate_capability_matrix.py ` | Capability matrix generator |
627+ | ` tests/conftest.py ` | Numba cache invalidation + markers |
628+ | ` run_tests.sh ` | Unified test runner + reviewer.zip |
629+ | ` benchmarks/bench_slidingwindow_parametric.py ` | Serial V1/V2/V3/V5 benchmark |
630+ | ` benchmarks/bench_slidingwindow_parallel.py ` | Parallel scaling benchmark |
631+ | ` benchmarks/README_sliding_window_benchmark.md ` | User-facing performance guide |
470632| ` benchmarks/runner.py ` | BF runner with multi-source discovery |
471633| ` benchmarks/schema.py ` | JSON schema + NumpyEncoder |
472634| ` benchmarks/benchmark_adf.py ` | AliasDataFrame adapter |
@@ -488,6 +650,15 @@ Each phase follows this workflow:
488650- Update V5 to use shared kernel
489651- Handle heterogeneous ` linear_columns ` in wrapper
490652
653+ ### SW.weighted: Sliding Window Weighted Fits (WLS)
654+ - Planned feature (marked in capability matrix)
655+ - Requires extending sufficient statistics to weighted case
656+
657+ ### Asymmetric Windows (benchmark coverage)
658+ - Asymmetric window support exists in code
659+ - Missing from benchmark parametric sweep
660+ - Add to ` bench_slidingwindow_parametric.py `
661+
491662---
492663
493664## Performance Reference
@@ -519,6 +690,35 @@ Each phase follows this workflow:
519690| CLI commands | 3 (--history, --history-stats, --plot) |
520691| Exit codes | 4 (0/1/2/3) |
521692
693+ ### V5 Sliding Window (Phase 13.8.SW, Apple M1 Pro)
694+
695+ | Config | V1 | V2 | V3 | V5tot | V5 Speedup |
696+ | --------| ------| ------| ------| -------| ------------|
697+ | 10³ W=1 r=10 | 0.114s | 0.093s | 0.127s | 0.005s | 25× |
698+ | 25³ W=1 r=10 | 1.867s | 1.609s | 2.179s | 0.050s | 44× |
699+ | 25³ W=2 r=10 | 3.515s | 3.094s | 6.072s | 0.089s | 68× |
700+ | 25³ W=1 r=50 | 3.377s | 2.832s | 2.702s | 0.086s | 40× |
701+
702+ ### Parallel Scaling (Phase 13.8.SW, Linux aarch64, 112M rows)
703+
704+ | Workers | Time | Speedup |
705+ | ---------| ------| ---------|
706+ | 1 | 16.6s | 1.0× |
707+ | 8 | 5.2s | 3.2× |
708+ | 16 | 5.1s | 3.3× |
709+
710+ ### Capability Matrix (Phase 13.8.SW)
711+
712+ | Metric | Value |
713+ | --------| -------|
714+ | Total features | 102 |
715+ | Verified (✅) | 25 (24.5%) |
716+ | Smoke-only (☑️) | 76 (74.5%) |
717+ | Broken (🧨) | 0 (0.0%) |
718+ | Planned (📋) | 1 (1.0%) |
719+ | Total tests (unique) | 281 |
720+ | Invariance tests | 41 |
721+
522722---
523723
524724## Document History
@@ -528,3 +728,4 @@ Each phase follows this workflow:
528728| 1.0 | Dec 16, 2025 | Initial version |
529729| 2.0 | Dec 31, 2025 | Added Phases 12.14.GB, 12.14a.GB, incident analysis |
530730| 3.0 | Jan 4, 2026 | Added Phases 12.14b.GB, 12.14b.GB-addendum, 12.14c.GB |
731+ | 4.0 | Feb 14, 2026 | Added Phases 13.7.GB, 13.8.GB, 13.9.GB, 13.8.SW. Updated capability matrix (102 features), performance reference, key files, technical decisions |
0 commit comments