Skip to content

Commit 056bcd4

Browse files
committed
Phase 13.9.GB review fixes: matrix regen, phase label, classifications
P1.3: Regenerate CAPABILITY_MATRIX.md (34 verified, 0 unclassified) P1.4: Update phase label to 13.9.GB in generator P1.6: 9 parallel SW tests classified (0 unclassified remaining)
1 parent ce1361d commit 056bcd4

4 files changed

Lines changed: 366 additions & 44 deletions

File tree

UTILS/dfextensions/groupby_regression/docs/CAPABILITY_MATRIX.md

Lines changed: 29 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,30 @@
11
# Capability Matrix — groupby_regression
22

3-
**Generated:** 2026-02-14 10:29 UTC
4-
**Phase:** 13.8.SWParallel Sliding Window + Benchmarks
3+
**Generated:** 2026-02-16 08:00 UTC
4+
**Phase:** 13.9.GBGroupByRegressionEvaluator + Unified Metadata
55
**Generator:** `scripts/generate_capability_matrix.py`
66

77
## Summary
88

99
| Status | Count | % |
1010
|--------|------:|--:|
11-
| ✅ Verified | 25 | 24.5% |
12-
| ☑️ Smoke-only | 76 | 74.5% |
11+
| ✅ Verified | 34 | 30.4% |
12+
| ☑️ Smoke-only | 77 | 68.8% |
1313
| 🧨 Broken | 0 | 0.0% |
1414
| ⚠️ Partial | 0 | 0.0% |
15-
| 📋 Planned | 1 | 1.0% |
16-
| **Total** | **102** | |
15+
| 📋 Planned | 1 | 0.9% |
16+
| **Total** | **112** | |
1717

1818
## Test Layer Distribution (excluding verbose duplicates)
1919

2020
| Layer | Count |
2121
|-------|------:|
22-
| invariance | 41 |
23-
| integration | 10 |
24-
| performance | 5 |
25-
| smoke | 201 |
22+
| invariance | 88 |
23+
| integration | 12 |
24+
| performance | 6 |
25+
| smoke | 211 |
2626
| validation | 24 |
27-
| **Total unique** | **281** |
28-
29-
⚠️ **UNCLASSIFIED tests: 9** — defaulted to smoke (fail-closed rule §3.5)
27+
| **Total unique** | **341** |
3028

3129
## groupby_regression
3230

@@ -47,6 +45,21 @@
4745
| ☑️ | **XVAL.robust_v2** — Robust vs V2 structural agreement | 1 | 0 | | |
4846
|| **XVAL.robust_v4_parity** — Robust vs V4 numerical parity | 2 | 2 || |
4947

48+
## groupby_regression_evaluator
49+
50+
| Status | Feature | Tests | Inv/Int | Bench | Tag |
51+
|--------|---------|------:|--------:|-------|-----|
52+
|| **EVAL.boundary** — Boundary handling (clamp/nan/extrapolate) | 7 | 7 | | |
53+
| ☑️ | **EVAL.construction** — Evaluator construction from dfGB | 7 | 0 | | |
54+
|| **EVAL.export_roundtrip** — JSON export/import roundtrip | 6 | 6 | | |
55+
|| **EVAL.ivar_weighting** — Inverse-variance weighted interpolation | 7 | 7 | | |
56+
| ☑️ | **EVAL.metadata_construction** — Evaluator construction from metadata | 7 | 0 | | |
57+
|| **EVAL.multi_target** — Multi-target evaluation | 3 | 3 | | |
58+
|| **EVAL.multilinear** — Multilinear interpolation | 8 | 8 | | |
59+
|| **EVAL.nearest** — Nearest-neighbor evaluation | 7 | 6 | | |
60+
|| **EVAL.sparse_grid** — Sparse grid handling (invalid bins) | 6 | 6 | | |
61+
|| **EVAL.sw_integration** — SW output integration | 2 | 2 | | |
62+
5063
## groupby_regression_kernels
5164

5265
| Status | Feature | Tests | Inv/Int | Bench | Tag |
@@ -139,13 +152,13 @@
139152
|| **SW.multi_predictor** — Sliding window multi-predictor | 2 | 1 | | |
140153
| ☑️ | **SW.multi_target** — Sliding window multi-target | 1 | 0 | | |
141154
| ☑️ | **SW.omitted_dims** — Omitted window dims default to 0 | 1 | 0 | | |
142-
| ☑️ | **SW.parallel** — Parallel sliding window (split-column) | 8 | 0 || NUMBA |
155+
| | **SW.parallel** — Parallel sliding window (split-column) | 8 | 3 || NUMBA |
143156
| ☑️ | **SW.return_metadata** — Sliding window return_metadata | 1 | 0 | | |
144157
| ☑️ | **SW.selection** — Sliding window selection mask | 1 | 0 | | |
145158
| ☑️ | **SW.smoke_gate** — Realistic smoke normalised residuals | 1 | 0 | | |
146159
| ☑️ | **SW.suffix** — Sliding window output suffix | 1 | 0 | | |
147160
|| **SW.v4_parity** — SW window-zero parity with V4 | 2 | 2 | | |
148-
|| **SW.v5_dominance** — V5 algorithm dominance across all backends | 2 | 1 || NUMBA |
161+
|| **SW.v5_dominance** — V5 algorithm dominance across all backends | 2 | 2 || NUMBA |
149162
| ☑️ | **SW.validation** — Sliding window input validation | 5 | 0 | | |
150163
| 📋 | **SW.weighted** — Sliding window weighted fits (WLS) | 0 | 0 | | PLANNED |
151164

@@ -177,20 +190,8 @@
177190
| V5.performance | bench_v5.py::timing [MONITOR] | 📊 MONITOR |
178191
| XVAL.robust_v4_parity | bench_comparison.py::compute_agreement() [MONITOR] | 📊 MONITOR |
179192

180-
## ⚠️ UNCLASSIFIED Tests (fail-closed → smoke)
181-
182-
- `test_parallel_sliding_window.py::TestParallelCorrectness::test_parallel_matches_serial`
183-
- `test_parallel_sliding_window.py::TestParallelCorrectness::test_parallel_matches_serial`
184-
- `test_parallel_sliding_window.py::TestParallelCorrectness::test_parallel_multiple_targets`
185-
- `test_parallel_sliding_window.py::TestParallelCorrectness::test_parallel_output_columns`
186-
- `test_parallel_sliding_window.py::TestParallelCorrectness::test_parallel_single_worker`
187-
- `test_parallel_sliding_window.py::TestParallelEdgeCases::test_empty_dataframe`
188-
- `test_parallel_sliding_window.py::TestParallelErrorHandling::test_on_error_nan_fills`
189-
- `test_parallel_sliding_window.py::TestParallelErrorHandling::test_on_error_raise_raises`
190-
- `test_parallel_sliding_window.py::TestParallelPerformance::test_parallel_speedup`
191-
192193
---
193194

194-
*Two-tier verification per Phase 13.8.SW v02 proposal.*
195+
*Two-tier verification per Phase 13.9.GB v02 proposal.*
195196
*✅ = invariance/integration test exists. ☑️ = smoke tests only — does not catch numerical regressions.*
196197
*Verbose SW duplicates (1 files) deduplicated per §3.6.*

UTILS/dfextensions/groupby_regression/docs/PHASE_HISTORY.md

Lines changed: 202 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,14 @@ The GroupBy Regression module provides high-performance grouped linear regressio
2424
| **12.14b.GB** | **BF Integration** | **Jan 1, 2026** | ✅ Complete |
2525
| **12.14b.GB-add** | **Dual Timing + cProfile** | **Jan 1, 2026** | ✅ Complete |
2626
| **12.14c.GB** | **ADF Visualization** | **Jan 3-4, 2026** | ✅ Complete |
27+
| **13.7.GB** | **Capability Matrix Infrastructure** | **Feb 7-8, 2026** | ✅ Complete |
28+
| **13.8.GB** | **SW API Refactor + Invariance Tests** | **Feb 9-10, 2026** | ✅ Complete |
29+
| **13.9.GB** | **V3 Incremental Algorithm** | **Feb 10, 2026** | ✅ Complete |
30+
| **13.8.SW** | **Parallel Sliding Window + Benchmarks** | **Feb 12-14, 2026** | ✅ Complete |
2731
| 12.15.GB | V4 Integration || 📋 Planned |
2832

29-
**Current Test Count:** 145 benchmark tests + existing kernel/module suites
33+
**Current Test Count:** 338 passed, 3 failed (pre-existing), 102 features (25 verified)
34+
**Capability Matrix:** Phase 13.8.SW — 0 broken, 1 planned (SW.weighted)
3035

3136
---
3237

@@ -54,6 +59,143 @@ Phases 12.14.GB through 12.14c.GB address this with:
5459

5560
---
5661

62+
## February 2026 Phases
63+
64+
### Phase 13.8.SW: Parallel Sliding Window + Benchmarks (Feb 12-14, 2026)
65+
**Tag:** `phase-13.8.SW`
66+
**Commits:** `b38fd93d`, `b45f69c5`, `7838885c`, `8785fd58`, `73de51f1`, `ef8c89bc`
67+
**Goal:** Parallel execution for V5 sliding window, benchmark consolidation, user-facing documentation
68+
69+
**Deliverables:**
70+
71+
| # | Component | Description |
72+
|---|-----------|-------------|
73+
| D1 | Parallel dispatch | `make_sliding_window_fit_parallel()` with fork() COW |
74+
| D2 | Zero-pickle fix | Module-level shared state, ~200B/task vs 125MB |
75+
| D3 | Counting sort | O(N) `_counting_sort_indices()`, 44.8× vs mergesort |
76+
| D4 | Numba cache fix | `conftest.py` hash-based auto-invalidation |
77+
| D5 | Parallel tests | 8 tests in `test_parallel_sliding_window.py` |
78+
| D6 | Parametric benchmark | `bench_slidingwindow_parametric.py` (V1/V2/V3/V5, 11 cost models) |
79+
| D7 | Parallel benchmark | `bench_slidingwindow_parallel.py` (scaling curve, sort comparison) |
80+
| D8 | README | `README_sliding_window_benchmark.md` (algorithm guidance + cost formulas) |
81+
| D9 | Capability matrix | Updated to 102 features, Phase 13.8.SW |
82+
83+
**Performance Results:**
84+
85+
| Metric | Value |
86+
|--------|-------|
87+
| V5 speedup vs alternatives | 25–68× (wins 10/10 configs) |
88+
| Parallel scaling (16 workers, 112M rows) | 3.3× |
89+
| Counting sort vs mergesort | 44.8× |
90+
| Pickle overhead eliminated | 155GB → 200 bytes/task |
91+
| Smoothing overhead at TPC scale | ~10–20% vs noSW |
92+
93+
**TPC Predictions (extrapolated):**
94+
95+
| Scenario | Serial (36×3) | With 10-way parallel |
96+
|----------|---------------|----------------------|
97+
| Standard (54K bins, 1K rpb) | ~3.4 min | ~20s |
98+
| High (54K bins, 2K rpb) | ~6.8 min | ~40s |
99+
100+
**Bug Fixes:**
101+
102+
| Bug | Impact | Fix |
103+
|-----|--------|-----|
104+
| `_build_bin_index_map` on V5 path | 85% of V5tot wasted (70 min → 3.4 min) | Skip on V5 path |
105+
| `executor.submit()` pickled arrays | 125MB × 36 = 155GB serialization | Module globals + fork() COW |
106+
| Numba stale cache after restructuring | `ModuleNotFoundError` | Hash-based auto-clear |
107+
108+
**Cost Models:** 11 linear models fitted (R² > 0.99 at production scale). User-facing prediction formulas in README with machine-scaling guidance.
109+
110+
**Tests:** 338 passed, 3 failed (pre-existing), 0 errors
111+
112+
**Reviewed by:** Claude14 (coder), cross-review pending
113+
114+
---
115+
116+
### Phase 13.9.GB: V3 Incremental Algorithm (Feb 10, 2026)
117+
**Commit:** `bda26ca3`
118+
**Goal:** Replace O(N_bins × N_nbr × RPB) recompute with O(N_rows + N_bins × N_nbr) incremental
119+
120+
**Key Innovation:** Pre-compute per-bin sufficient statistics (XtX, XtY, n, sum_y, sum_y2) in one pass over raw data, then sum neighbor matrices instead of re-accumulating from rows for each window.
121+
122+
**Performance:**
123+
124+
| Config | V3/V1 speedup |
125+
|--------|---------------|
126+
| 10³ grid, W=1 | 4.6× |
127+
| 15³ grid, W=2 | 5.6× |
128+
129+
**Correctness:** V3 = V1 to machine precision (max diff < 3×10⁻¹⁴)
130+
131+
**Trade-off:** median=NaN (cannot compute from sufficient statistics)
132+
133+
**Tests:** +10 new invariance tests (TestSWV3Parity). Results: 306 passed, 3 failed (pre-existing)
134+
135+
---
136+
137+
### Phase 13.8.GB: SW API Refactor + V3b + V3-Numba (Feb 9-10, 2026)
138+
**Commits:** `2a713f2c`, `6601dc9b`, `bd6f042e`
139+
**Goal:** Align sliding window API with v4 conventions, add boundary/kernel modes, add Numba incremental solver
140+
141+
**Sub-phases:**
142+
143+
**Action A+B (Feb 9):** 3 bug fixes + invariance tests
144+
- A.1: Remove duplicate validation block
145+
- A.2: Fix wrong arg to `_get_neighbor_bins`
146+
- A.3: Add `res.bse` extraction — new `_err` columns (Bug #6)
147+
- 13 invariance tests with analytical checks (nsigma recovery, error consistency, pull distribution)
148+
149+
**API Refactor (Feb 10):** v4-aligned `make_sliding_window_fit()`
150+
- Keyword-only params, v4 naming (`gb_columns`, `linear_columns`, `weights`, `min_stat`)
151+
- `backend='auto'` (Numba auto-detect), omitted window dims default to 0
152+
- Feature taxonomy: 96 → 100 features. Verified: 21 → 24
153+
154+
**V3b Boundary + Kernel (Feb 10):**
155+
- `boundary='full'|'symmetric'|'periodic'`, per-dimension
156+
- `kernel='uniform'|'gaussian'|'epanechnikov'|'linear'|callable`
157+
- 15 V3b invariance tests
158+
159+
**V3-Numba (Feb 10):** Incremental + Cholesky JIT
160+
- `_get_numba_incremental_kernel`: Cholesky solve + SE + diagnostics
161+
- Shared neighbor table between accumulation and solve phases
162+
- Dispatch: `algorithm='incremental', backend='numba'`
163+
164+
**Tests:** 55 SW tests total (24 new), all passing
165+
166+
**Approved by:** GPT10, GPT11, Claude, Claude-Opus, Claude12 (5/5)
167+
168+
---
169+
170+
### Phase 13.7.GB: Capability Matrix Infrastructure (Feb 7-8, 2026)
171+
**Commits:** `5279be73`, `692ffeab`, `3bfdcfc5`
172+
**Goal:** Two-tier quality classification for all features and tests
173+
174+
**Deliverables:**
175+
176+
| Component | Description |
177+
|-----------|-------------|
178+
| `feature_taxonomy.py` | 96-feature taxonomy with proof-test references |
179+
| `test_layer_classification.py` | 291-test classification (invariance/integration/smoke/validation/performance) |
180+
| `generate_capability_matrix.py` | Auto-generated two-tier capability matrix |
181+
| `run_tests.sh` | Unified test runner with timestamped logging, reviewer.zip packaging |
182+
| `conftest.py` | Register feature/layer markers |
183+
| `pytest.ini` | Strict marker enforcement |
184+
| `tests/README.md` | Test infrastructure documentation |
185+
186+
**Classification Rules:**
187+
- ✅ Verified: has invariance or integration test
188+
- ☑️ Smoke-only: smoke tests only — does not catch numerical regressions
189+
- 🧨 Broken: test failures
190+
- 📋 Planned: no tests yet
191+
- Fail-closed (§3.5): unclassified tests default to smoke
192+
193+
**Initial Matrix:** 96 features (21 verified, 75 smoke-only, 0 broken, 0 partial)
194+
195+
**Tests:** 296 passed, 3 failed (pre-existing)
196+
197+
---
198+
57199
## Recent Phases (Jan 2026)
58200

59201
### Phase 12.14c.GB: AliasDataFrame Visualization (Jan 3-4, 2026)
@@ -431,6 +573,15 @@ Two specifications govern Phase 12.14:
431573
| Always-on cProfile | Historical profiles for bottleneck analysis | 12.14b.GB-add |
432574
| Skip cProfile for n_jobs > 1 | cProfile only captures main process | 12.14c.GB D6 |
433575
| wall_time_s optional fallback | Backward compat with pre-12.14b.GB data | 12.14c.GB D1 |
576+
| Two-tier capability matrix | Verified (inv/int test) vs smoke-only classification | 13.7.GB |
577+
| Fail-closed rule | Unclassified tests default to smoke, not verified | 13.7.GB |
578+
| V4-aligned SW API | Keyword-only, gb_columns/linear_columns naming | 13.8.GB |
579+
| Incremental algorithm (V3) | Pre-compute XtX/XtY per bin, sum neighbors | 13.9.GB |
580+
| V5 incremental+numba | Best of both: incremental algorithm + JIT kernels | 13.8.SW |
581+
| Fork() COW dispatch | Module globals + try/finally, not pickle | 13.8.SW |
582+
| O(N) counting sort | Numba JIT counting sort vs O(N log N) argsort | 13.8.SW |
583+
| Retain statsmodels | OLS/WLS/GLM/RLM diversity for TPC calibration | 13.8.GB |
584+
| Always-on benchmark validation | V1 vs V5 diff check catches bugs unit tests miss | 13.8.SW |
434585

435586
---
436587

@@ -464,9 +615,20 @@ Each phase follows this workflow:
464615

465616
| File | Purpose |
466617
|------|---------|
618+
| `groupby_regression_sliding_window.py` | Sliding window regression (V1-V5 + parallel) |
467619
| `groupby_regression_kernels.py` | Shared Numba kernel module |
468620
| `groupby_regression_optimized.py` | V4/V5 implementations |
621+
| `tests/test_parallel_sliding_window.py` | 8 parallel correctness tests |
622+
| `tests/test_invariance_sliding_window.py` | 55 SW invariance tests |
469623
| `tests/test_groupby_regression_kernels.py` | 28 kernel tests |
624+
| `tests/feature_taxonomy.py` | 102-feature taxonomy |
625+
| `tests/test_layer_classification.py` | 281-test layer classification |
626+
| `scripts/generate_capability_matrix.py` | Capability matrix generator |
627+
| `tests/conftest.py` | Numba cache invalidation + markers |
628+
| `run_tests.sh` | Unified test runner + reviewer.zip |
629+
| `benchmarks/bench_slidingwindow_parametric.py` | Serial V1/V2/V3/V5 benchmark |
630+
| `benchmarks/bench_slidingwindow_parallel.py` | Parallel scaling benchmark |
631+
| `benchmarks/README_sliding_window_benchmark.md` | User-facing performance guide |
470632
| `benchmarks/runner.py` | BF runner with multi-source discovery |
471633
| `benchmarks/schema.py` | JSON schema + NumpyEncoder |
472634
| `benchmarks/benchmark_adf.py` | AliasDataFrame adapter |
@@ -488,6 +650,15 @@ Each phase follows this workflow:
488650
- Update V5 to use shared kernel
489651
- Handle heterogeneous `linear_columns` in wrapper
490652

653+
### SW.weighted: Sliding Window Weighted Fits (WLS)
654+
- Planned feature (marked in capability matrix)
655+
- Requires extending sufficient statistics to weighted case
656+
657+
### Asymmetric Windows (benchmark coverage)
658+
- Asymmetric window support exists in code
659+
- Missing from benchmark parametric sweep
660+
- Add to `bench_slidingwindow_parametric.py`
661+
491662
---
492663

493664
## Performance Reference
@@ -519,6 +690,35 @@ Each phase follows this workflow:
519690
| CLI commands | 3 (--history, --history-stats, --plot) |
520691
| Exit codes | 4 (0/1/2/3) |
521692

693+
### V5 Sliding Window (Phase 13.8.SW, Apple M1 Pro)
694+
695+
| Config | V1 | V2 | V3 | V5tot | V5 Speedup |
696+
|--------|------|------|------|-------|------------|
697+
| 10³ W=1 r=10 | 0.114s | 0.093s | 0.127s | 0.005s | 25× |
698+
| 25³ W=1 r=10 | 1.867s | 1.609s | 2.179s | 0.050s | 44× |
699+
| 25³ W=2 r=10 | 3.515s | 3.094s | 6.072s | 0.089s | 68× |
700+
| 25³ W=1 r=50 | 3.377s | 2.832s | 2.702s | 0.086s | 40× |
701+
702+
### Parallel Scaling (Phase 13.8.SW, Linux aarch64, 112M rows)
703+
704+
| Workers | Time | Speedup |
705+
|---------|------|---------|
706+
| 1 | 16.6s | 1.0× |
707+
| 8 | 5.2s | 3.2× |
708+
| 16 | 5.1s | 3.3× |
709+
710+
### Capability Matrix (Phase 13.8.SW)
711+
712+
| Metric | Value |
713+
|--------|-------|
714+
| Total features | 102 |
715+
| Verified (✅) | 25 (24.5%) |
716+
| Smoke-only (☑️) | 76 (74.5%) |
717+
| Broken (🧨) | 0 (0.0%) |
718+
| Planned (📋) | 1 (1.0%) |
719+
| Total tests (unique) | 281 |
720+
| Invariance tests | 41 |
721+
522722
---
523723

524724
## Document History
@@ -528,3 +728,4 @@ Each phase follows this workflow:
528728
| 1.0 | Dec 16, 2025 | Initial version |
529729
| 2.0 | Dec 31, 2025 | Added Phases 12.14.GB, 12.14a.GB, incident analysis |
530730
| 3.0 | Jan 4, 2026 | Added Phases 12.14b.GB, 12.14b.GB-addendum, 12.14c.GB |
731+
| 4.0 | Feb 14, 2026 | Added Phases 13.7.GB, 13.8.GB, 13.9.GB, 13.8.SW. Updated capability matrix (102 features), performance reference, key files, technical decisions |

0 commit comments

Comments
 (0)