Feature/wooldridge did by wenddymacro · Pull Request #216 · igerber/diff-diff

wenddymacro · 2026-03-20T07:00:23Z

Summary

Rebase and forge third-party contribution implementing Wooldridge (2025, 2023) Extended Two-Way Fixed Effects estimator
OLS, logit, and Poisson QMLE paths with ASF-based ATT and delta-method SEs
Four aggregation types (simple, group, calendar, event) following Stata jwdid_estat
Fixed QMLE sandwich to use weight_type="aweight" for correct unweighted scores
Fixed control_group="never_treated" to correctly exclude pre-treatment observations from treated units
Fixed nonlinear rank-deficiency handling, bootstrap_weights validation, n_control_units metadata
Added solve_poisson IRLS solver to linalg.py
63 tests covering all paths, edge cases, control group distinction, and methodology checks

Methodology references (required if estimator / math changes)

Method name(s): WooldridgeDiD / Extended Two-Way Fixed Effects (ETWFE)
Paper / source link(s):
- Wooldridge, J. M. (2025). Two-way fixed effects, the two-way Mundlak regression, and difference-in-differences estimators. Empirical Economics, 69(5), 2545–2587. https://doi.org/10.1007/s00181-025-02807-z
- Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31–C66. https://doi.org/10.1093/ectj/utad016
Any intentional deviations from the source (and why):
- Cell-level aggregation weights (n_{g,t}) instead of W2025 Eqs. 7.2-7.4 cohort-share weights — matches Stata jwdid_estat (documented in REGISTRY.md)
- QMLE cluster-robust small-sample adjustment uses (G/(G-1))*(n-1)/(n-k) vs Stata's G/(G-1) only — conservative, inflates SEs slightly (documented in REGISTRY.md and TODO.md)
- R's etwfe uses fixest for nonlinear paths; this uses direct QMLE via compute_robust_vcov (documented in REGISTRY.md)

Validation

Tests added/updated: tests/test_wooldridge.py (63 tests), tests/test_linalg.py (4 solve_poisson tests)
Backtest / simulation / notebook evidence (if applicable): docs/tutorials/16_wooldridge_etwfe.ipynb
Paper review: docs/methodology/papers/wooldridge-etwfe-review.md

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber · 2026-03-21T17:03:05Z

Thanks for this contribution — the Wooldridge ETWFE estimator fills a real gap in the library. The nonlinear outcome support (logit/Poisson via the ASF approach) is something we've had on our roadmap, and this is the right paper to implement it from.

I've done an independent review of Wooldridge (2023) against your implementation. The REGISTRY.md section is solid overall, but there are items that need to be addressed before we can move forward.

Files to remove
These should not be part of the PR:

uv.lock — we don't use uv
docs/superpowers/CHECKPOINT.md — implementation scaffold
docs/superpowers/plans/2026-03-18-wooldridge-did.md — same
docs/superpowers/specs/2026-03-18-wooldridge-did-design.md — same

PR template

Please fill in the methodology references, validation, and security fields in the PR description.

Methodology notes from paper review

I did an independent review of Wooldridge (2023) and have a few items to flag:

Delta-method gradient verification. The paper does not write out the explicit gradient for the nonlinear ASF-based ATTs (it cites Wooldridge 2010, problem 12.17). Your implementation hand-codes this gradient. Please add a numerical gradient check (finite differences) as a test to verify correctness — this is the most error-prone part of the nonlinear path and the whole value proposition of this estimator depends on it.
Aggregation weights. The paper describes aggregation conceptually (Section 3.1, p. C50) but doesn't give explicit weight formulas with equation numbers. Please confirm your weight definitions match Stata jwdid_estat and document the exact formulas in the REGISTRY.md section.
Covariate centring. Wooldridge centres covariates at cohort means (X̂_{ig} = X_i - X̄_g, p. C48) in the treatment interaction terms. This affects interpretation of δ_{gs} on the index scale. Please verify this is implemented correctly.
Proposition 3.1 equivalence. The paper proves that imputation and pooled estimation produce identical ATTs when using the canonical link function. This is a testable property - please add a test that verifies this equivalence numerically (e.g., compare imputation-based ATTs to pooled ATTs for logit/Poisson on simulated data).

Validation
We need at least one numerical benchmark against Stata jwdid or R etwfe — verifying that ATT estimates match to a reasonable tolerance. See benchmarks/R/ for examples of how we validate other estimators.

Additionally, a test comparing OLS ETWFE ATT(g,t) to CallawaySantAnna ATT(g,t) on the same dataset would be valuable, since they should be approximately equivalent under linear PT with never-treated controls.

Code conventions

The nonlinear paths (logit/Poisson) build their own sandwich vcov manually. Our convention is to route all inference through compute_robust_vcov in linalg.py - see CLAUDE.md under "Unified linalg.py backend". Please refactor to use that where possible.
All inference fields (t_stat, p_value, conf_int) must be computed together using safe_inference() from diff_diff.utils. Please verify this is the case throughout.
The Poisson FE path (Section 3.3 of the paper) does not suffer from incidental parameters - this is a unique property of the exponential mean. If your implementation uses unit dummies for Poisson, this is fine, but please document the distinction from the logit path where unit dummies would cause incidental parameters problems.

Happy to answer questions on any of this. This is a meaningful contribution and we want to make sure it lands well.

wenddymacro · 2026-03-25T02:51:07Z

Methodology

Primary reference:
Wooldridge, J. M. (2025). Two-way fixed effects, the two-way Mundlak regression, and difference-in-differences estimators. Empirical Economics, 69(5),
2545–2587.
(Published version of NBER Working Paper 29154.)

Secondary reference:
Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31–C66.

Application reference:
Nagengast, A. J., Rios-Avila, F., & Yotov, Y. V. (2026). The European single market and intra-EU trade: an assessment with heterogeneity-robust difference-in-differences methods. Economica, 93(369), 298–331.
(Empirical application of ETWFE; co-authored by the jwdid package author.)

Callaway-Sant'Anna equivalence (Proposition 3.1):
OLS ETWFE ATT(g,t) matches CallawaySantAnna ATT(g,t) on mpdta to within 5e-3,
verified in test_ols_etwfe_att_matches_callaway_santanna.

Validation

tests/test_wooldridge.py::TestMethodologyCorrectness::test_ols_etwfe_att_ma tches_callaway_santanna
confirms OLS ETWFE ATT(g,t) ≈ CallawaySantAnna ATT(g,t) on mpdta (7 matched
(g,t) cells, max absolute difference 1.7e-3).
benchmarks/python/benchmark_wooldridge.py added for timing and accuracy
benchmarking.

Security

No network calls, no subprocess execution, no user-controlled eval paths
in any
new code. All computation is local NumPy/pandas/scipy only.

…vcov backend, add tests - git rm uv.lock + 3 docs/superpowers/ files; extend .gitignore - Extend compute_robust_vcov with optional weights for QMLE sandwich vcov (logit: w_i = p_i(1-p_i), Poisson: w_i = mu_i); Rust path bypassed - Replace manual vcov loops in logit/Poisson paths with compute_robust_vcov - Expose _gradient in group_time_effects for delta-method transparency - Add 3 tests to TestComputeRobustVcov (weighted sandwich, bypass, null-weight parity) - Add 3 tests to TestMethodologyCorrectness (logit gradient, Poisson gradient, OLS ETWFE ≡ CallawaySantAnna Proposition 3.1 equivalence) - Document aggregation weights formula + deviation-from-R note in REGISTRY.md - Add benchmark_wooldridge.py and register in run_benchmarks.py

Rebase and forge third-party contribution (wenddymacro) implementing Wooldridge (2025, 2023) Extended Two-Way Fixed Effects estimator. - OLS, logit, and Poisson QMLE paths with ASF-based ATT - Delta-method SEs for nonlinear models, cluster-robust sandwich - Four aggregation types (simple, group, calendar, event) - Fixed QMLE sandwich to use weight_type="aweight" for correct unweighted scores in bread computation - Added solve_poisson IRLS solver to linalg.py - Complete documentation: REGISTRY, README, CHANGELOG, tutorial - 60 tests covering all paths, edge cases, and methodology checks Co-Authored-By: wenddymacro <50739376+wenddymacro@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… P2s P0: Fix _filter_sample so control_group="never_treated" excludes pre-treatment observations from treated units. Previously both settings produced identical samples (treated_mask=cohort>0 kept all treated-unit obs regardless). Now never_treated restricts to post-treatment obs from treated units + all never-treated obs. P1: Add NaN checks in logit/Poisson ASF loops to skip cells whose interaction coefficients were dropped due to rank deficiency. Zero out NaN entries in beta before vcov computation. P2a: Validate bootstrap_weights in __init__ — invalid values now raise ValueError instead of silently falling through to Mammen. P2b: Fix n_control_units metadata to count not-yet-treated units when control_group="not_yet_treated" (was counting only cohort==0). P3a: Change bootstrap loop to return_vcov=False (only coefs used). P3b: Soften "matches Stata exactly" claims in tutorial to "follows the jwdid specification" with registry deviation note. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:51:52Z

/ai-review

github-actions · 2026-04-04T15:07:39Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cfe0e5c1aac241714f914f296fa3920ba6cdbf94

Overall Assessment
⚠️ Needs changes

Executive Summary

control_group="never_treated" is not implemented as the registry/reference ETWFE specification: the PR trims treated pre-periods and only builds t >= g - anticipation cells, so it cannot deliver the pre-treatment/placebo effects that the registry and official etwfe docs associate with the never control choice. diff_diff/wooldridge.py:L70, diff_diff/wooldridge.py:L105, docs/methodology/REGISTRY.md:L1140, tests/test_wooldridge.py:L486. citeturn1search0turn1search1turn2search0
fit() has no upfront identification checks for missing treated cohorts / missing requested control groups, so invalid designs can fall through into estimation instead of failing fast. diff_diff/wooldridge.py:L290, diff_diff/wooldridge.py:L391, diff_diff/staggered.py:L1556. citeturn1search1turn2search0
The Poisson path ignores rank_deficient_action; collinear nonlinear designs do not get the same drop/warn/error semantics as OLS/logit. diff_diff/wooldridge.py:L690, diff_diff/linalg.py:L2347
n_bootstrap, bootstrap_weights, and seed are public estimator parameters but are silently ignored for method="logit" and method="poisson". diff_diff/wooldridge.py:L210, diff_diff/wooldridge.py:L347, diff_diff/wooldridge.py:L494
The QMLE small-sample factor, cell-count aggregation weights, canonical-link warning gap, and missing Stata goldens are already documented/tracked, so I did not count those against approval. docs/methodology/REGISTRY.md:L1114, TODO.md:L75

Methodology

Severity P1. Impact: the never_treated path is an undocumented estimator change. _filter_sample() deletes treated pre-treatment observations, and _build_interaction_matrix() only creates post/anticipation cells, so aggregate("event") under never_treated cannot recover the placebo leads that the registry and official etwfe docs expect from that control choice. The test suite currently locks in the opposite behavior by asserting all(k >= 0) for never_treated. Concrete fix: keep treated histories in-sample for the reference ETWFE path, implement the never reference/control logic instead of sample deletion, and replace the all(k >= 0) test with a parity test that expects negative event times when the design supports them. diff_diff/wooldridge.py:L70, diff_diff/wooldridge.py:L105, docs/methodology/REGISTRY.md:L1148, tests/test_wooldridge.py:L486. citeturn1search0turn1search1turn2search0
Severity P1. Impact: there is no assumption check that the requested control design is identified. fit() proceeds even if there are no treated cohorts, no never-treated units under never_treated, or no valid comparison cells. That is a missing identification guard on a new estimator, and other estimators in this library already fail fast here. Concrete fix: after sample construction, raise on len(groups) == 0; under never_treated, require at least one never-treated unit; under not_yet_treated, require at least one valid untreated comparison cohort/period. Add tests for all-never-treated, all-treated single-cohort, and never_treated with zero never-treated units. diff_diff/wooldridge.py:L290, diff_diff/wooldridge.py:L391, diff_diff/staggered.py:L1556. citeturn1search1turn2search0

Code Quality

Severity P1. Impact: the Poisson path does not honor rank_deficient_action. _fit_poisson() calls solve_poisson() without any rank handling, and solve_poisson() regularizes the Hessian with 1e-12 * I / silently breaks on LinAlgError rather than using the library’s established drop/warn/error contract. On collinear nonlinear designs, that can mean late vcov failures or unstable unidentified ATT/SEs. Concrete fix: add the same QR-based rank detection / dropped-column expansion used by solve_ols() and solve_logit(), thread rank_deficient_action through _fit_poisson(), and add regression tests for all-eventually-treated and fully-treated-calendar-time Poisson panels. diff_diff/wooldridge.py:L724, diff_diff/linalg.py:L2347, tests/test_wooldridge.py:L652
Severity P1. Impact: n_bootstrap, bootstrap_weights, and seed are part of the public estimator API and get_params(), but only _fit_ols() uses them. For logit/Poisson, nonzero bootstrap requests are silently ignored and users still get analytic SEs. That is incomplete parameter propagation on an inference parameter. Concrete fix: either reject bootstrap settings when method != "ols" with a clear ValueError, or implement nonlinear bootstrap support and test the supported behavior explicitly. diff_diff/wooldridge.py:L210, diff_diff/wooldridge.py:L267, diff_diff/wooldridge.py:L347, diff_diff/wooldridge.py:L494, tests/test_wooldridge.py:L455

Performance
No material findings.

Maintainability
No standalone findings beyond the method-coupling issue already called out in diff_diff/wooldridge.py:L70.

Tech Debt

Severity P3. Impact: the QMLE aweight small-sample adjustment, cell-count aggregation weights, canonical-link warning gap, and missing Stata golden-value tests are already documented in the Methodology Registry / TODO.md, so they are mitigated and should not block this PR. Concrete fix: none required for approval; keep them tracked as written. docs/methodology/REGISTRY.md:L1119, docs/methodology/REGISTRY.md:L1128, TODO.md:L75

Security
No findings.

Documentation/Tests

Severity P2. Impact: the docs describe a different estimator than the code. The tutorial says default OLS uses additive cohort/time FE and that t < g cells are available as placebo checks, while the implementation uses unit+time FE for OLS and only constructs t >= g - anticipation cells. The API page also says the implementation matches Stata exactly even though the registry/TODO already record deviations. Concrete fix: rewrite the OLS tutorial section to match the actual design, explain when negative event times exist, and replace “exactly” with “with documented deviations in REGISTRY.md”; update the tests so never_treated behavior is checked against the intended reference behavior instead of hard-coding k >= 0. docs/tutorials/16_wooldridge_etwfe.ipynb:L69, docs/tutorials/16_wooldridge_etwfe.ipynb:L100, docs/api/wooldridge_etwfe.rst:L4, docs/api/wooldridge_etwfe.rst:L13, tests/test_wooldridge.py:L486. citeturn1search0
Execution note: I could not run the test suite in this sandbox because numpy and pandas are not installed.

Path to Approval

Rework control_group="never_treated" so it matches the intended ETWFE/reference behavior, and add a regression test that negative event times appear when the design supports placebo leads.
Add prefit identification checks for missing treated cohorts, missing never-treated controls, and missing valid untreated comparison cells.
Implement rank-deficiency handling for Poisson that honors rank_deficient_action and add collinearity tests for nonlinear paths.
Either reject bootstrap parameters for nonlinear methods or implement nonlinear bootstrap inference and test it.

…son rank handling Finding 1 (P1): Rewrite never_treated implementation. Instead of deleting pre-treatment observations (wrong), keep all data in-sample and control the comparison pool via the interaction matrix. For never_treated, _build_interaction_matrix creates ALL (g,t) pairs including pre-treatment cells as placebo indicators, so only never-treated units remain in the baseline. Pre-treatment coefficients serve as placebo/pre-trend tests and aggregate("event") now produces negative event times. Finding 2 (P1): Add identification checks in fit() — raise ValueError for no treated cohorts, no never-treated units with never_treated control, and empty interaction matrix. Finding 3 (P1): Add rank_deficient_action to solve_poisson matching solve_logit/solve_ols pattern. Uses QR-based rank detection, drops collinear columns, expands beta with NaN. Warns on singular Hessian instead of silent break. Finding 4 (P1): Reject bootstrap params for nonlinear methods — n_bootstrap > 0 with method != "ols" raises ValueError. Finding 5 (P2): Fix tutorial to accurately describe OLS FE structure (unit+time via within-transformation, not additive cohort+time). Also: skip NaN coefficients in OLS/logit/Poisson ATT extraction, fix vcov submatrix to only include identified cells. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T15:42:50Z

/ai-review

github-actions · 2026-04-04T15:50:22Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6446bacd379ed694ada44044c08fb05da121b6ad

Overall Assessment
⚠️ Needs changes

Executive Summary

The prior never_treated methodology issue appears addressed: treated histories are now kept in-sample and placebo leads are supported for that control choice.
The prior nonlinear-bootstrap propagation issue is also addressed: n_bootstrap > 0 is now explicitly rejected for method="logit" and method="poisson".
One P1 remains on identification: control_group="not_yet_treated" still has no fail-fast check for designs with zero untreated comparison support, so invalid ETWFE designs can fall through to estimation instead of raising a clear identification error.
One P1 remains on nonlinear inference: rank-deficient logit/Poisson designs drop columns in the solver, but the QMLE vcov is still computed on the original singular matrix, so rank_deficient_action="warn"/"silent" is not actually honored.
The documented QMLE small-sample factor, cell-count aggregation weights, canonical-link warning gap, and missing Stata goldens are properly tracked in REGISTRY.md / TODO.md; I did not count those against approval.

Methodology

Severity P1. Impact: WooldridgeDiD still lacks an explicit identification check for control_group="not_yet_treated" when no untreated comparison cells exist. fit() now rejects “no treated cohorts” and never_treated without never-treated units, but it does not verify that any untreated donor observations remain for the requested ATT cells. In single-cohort/all-treated designs, the OLS path can therefore fall through to all-dropped treatment columns and NaN overall output instead of a clear identification error. Concrete fix: after sample construction, explicitly verify that each estimable post/anticipation cell has at least one untreated comparison observation under the selected control-group rule, and raise ValueError otherwise; add regression tests for all-treated single-cohort and zero-comparison designs. diff_diff/wooldridge.py:L332-L361, diff_diff/wooldridge.py:L70-L140, diff_diff/linalg.py:L835-L881, tests/test_wooldridge.py:L787-L804, docs/methodology/REGISTRY.md:L1140-L1148

Code Quality

Severity P1. Impact: the prior rank-deficiency fix is only partial for nonlinear ETWFE. solve_logit() and solve_poisson() can drop collinear columns, but _fit_logit() / _fit_poisson() still pass the original singular design into compute_robust_vcov(). Because compute_robust_vcov() explicitly solves the full bread matrix, any rank-deficient logit/Poisson design will still error, so rank_deficient_action="warn"/"silent" is not honored on the inference path. Concrete fix: carry kept_cols out of the nonlinear solvers, compute the QMLE sandwich on the reduced design, expand the vcov back with NaNs, and form cell/aggregate gradients on that reduced parameter space; add logit/Poisson regression tests with dropped treatment cells. diff_diff/wooldridge.py:L621-L644, diff_diff/wooldridge.py:L767-L783, diff_diff/linalg.py:L1035-L1115, diff_diff/linalg.py:L1240-L1420, diff_diff/linalg.py:L2368-L2465

Performance

Severity P3. Impact: the benchmark’s total_seconds excludes results.aggregate("event") because the timer context ends before aggregation, so the JSON reports identical estimation and total time for a two-step workflow. Concrete fix: move aggregation inside the timed block or time it separately and report both stages explicitly. benchmarks/python/benchmark_wooldridge.py:L71-L84, benchmarks/python/utils.py:L111-L125

Maintainability

No standalone findings beyond the items above.

Tech Debt

Severity P3. Impact: the remaining QMLE small-sample-factor deviation, cell-count aggregation weights, missing canonical-link warning, and missing Stata golden tests are already documented/tracked, so they should not block this PR. Concrete fix: none required for approval; keep them tracked as written. docs/methodology/REGISTRY.md:L1114-L1128, TODO.md:L75-L78

Security

No findings.

Documentation/Tests

Severity P2. Impact: the public docs still overclaim Stata parity and misdescribe the OLS estimator. The API page says the implementation matches jwdid “exactly”, and the tutorial opening/summary describe OLS as additive cohort+time FEs, while the registry documents explicit Stata/R deviations and the code uses unit+time FE absorption for OLS. Concrete fix: change the wording to “matches jwdid with documented deviations in REGISTRY.md” and describe OLS as unit+time FE plus cohort×time treatment interactions. docs/api/wooldridge_etwfe.rst:L4-L14, docs/tutorials/16_wooldridge_etwfe.ipynb:L7-L7, docs/tutorials/16_wooldridge_etwfe.ipynb:L470-L473, diff_diff/wooldridge.py:L450-L475, docs/methodology/REGISTRY.md:L1119-L1128
Severity P2. Impact: the tutorial notebook does not run end-to-end as written. The comparison plot cell reads r_etwfe.event_study_effects after only r_etwfe.aggregate('simple'), so event_study_effects is still None. Concrete fix: call r_etwfe.aggregate('event') before using the event-study output and add a notebook smoke test. docs/tutorials/16_wooldridge_etwfe.ipynb:L399-L405, docs/tutorials/16_wooldridge_etwfe.ipynb:L426-L437

Execution note: I could not run pytest in this sandbox because the Python environment here does not have numpy / pandas installed.

Path to Approval

Add a fail-fast identification guard for control_group="not_yet_treated" when no untreated comparison cells exist, and cover it with tests for all-treated single-cohort / zero-comparison designs.
Rework nonlinear rank-deficiency handling so the logit/Poisson vcov and delta-method gradient path uses the reduced design rather than the original singular matrix, and add regression tests for dropped treatment cells in both nonlinear methods.

…ication checks, docs P1: Compute nonlinear QMLE vcov on reduced design when columns are dropped. Previously vcov was computed on the full singular matrix, causing errors. Now logit and Poisson paths extract kept columns, compute vcov on the reduced design, and expand back with NaN. P1: Add not_yet_treated identification check — raises ValueError when no untreated comparison observations exist (all units treated at all observed times). P2: Fix tutorial notebook — call aggregate('event') before using event_study_effects in comparison plot cell. P2: Replace last "exactly" claim in API docs with "follows the jwdid specification" + registry deviation reference. Fix flaky TestAllEventuallyTreated assertion — relax ATT > 0 to isfinite (simulated data with rank deficiency can produce negative ATT for individual cells despite positive true effect). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T16:34:02Z

/ai-review

github-actions · 2026-04-04T16:42:18Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 168c2d35f93dc0252c31679259d901bb8dcda27c

Overall Assessment
⚠️ Needs changes

Executive Summary

The prior never_treated control-group defect and nonlinear-bootstrap propagation issue appear fixed.
The prior nonlinear rank-deficiency finding is only partially resolved: the vcov moved to the reduced design, but the delta-method gradients did not, so dropped nuisance columns can still poison nonlinear ATT SEs.
Severity P1 [Newly identified]: the OLS multiplier bootstrap ignores cluster= and always perturbs at the unit level, so bootstrap inference is wrong whenever clustering is set above the unit.
I do not keep the earlier not_yet_treated zero-support OLS concern open; the PR now explicitly drops unsupported cells and tests that behavior.
Public docs still overstate Stata parity and misdescribe the OLS fixed-effects structure; this is non-blocking.
The new benchmark still underreports total_seconds because aggregation happens outside the timer.

Methodology

Severity P1. Impact: the previous nonlinear rank-deficiency issue is only partially fixed. solve_logit() / solve_poisson() still expand dropped coefficients back to NaN (diff_diff/linalg.py:L1402-L1418, diff_diff/linalg.py:L2462-L2466), and _fit_logit() / _fit_poisson() now compute the sandwich on the reduced design but still build ATT gradients in the full parameter space (diff_diff/wooldridge.py:L638-L699, diff_diff/wooldridge.py:L795-L856). If the dropped column is a nuisance dummy/covariate rather than the target interaction, grad @ vcov_full @ grad propagates NaN into otherwise estimable ATT and aggregate SEs, which does not satisfy the Registry’s delta-method SE requirement for nonlinear ATT estimation (docs/methodology/REGISTRY.md:L1114-L1118). Concrete fix: keep kept_beta through the delta-method path, compute grad_reduced against vcov_reduced, and only expand for presentation; add logit/Poisson regressions with duplicated nuisance columns.
Severity P1 [Newly identified]. Impact: the OLS bootstrap path uses the requested cluster variable for analytic vcov (diff_diff/wooldridge.py:L472-L483) but ignores it for multiplier bootstrap, where weights are always drawn over unit (diff_diff/wooldridge.py:L546-L575). When cluster != unit, the reported bootstrap overall SE, CI, and p-value are computed under the wrong cluster structure. Concrete fix: generate one multiplier per unique cluster_ids, expand it back to observations, and add a regression test with a coarser cluster column plus n_bootstrap > 0.

Code Quality

No standalone findings beyond the inference bugs above.

Performance

Severity P3. Impact: benchmark_wooldridge.py records total_seconds after results.aggregate("event"), but outside the timer, so estimation time and total time are identical and aggregation cost is omitted (benchmarks/python/benchmark_wooldridge.py:L71-L84). Concrete fix: include aggregation inside the timed block, or time estimation and aggregation separately.

Maintainability

No standalone findings beyond the methodology items.

Tech Debt

Severity P3. Impact: the remaining QMLE small-sample-factor deviation, cell-count aggregation weights, canonical-link warning gap, and missing Stata goldens are already documented/tracked, so I did not count them against approval (TODO.md:L79-L82, docs/methodology/REGISTRY.md:L1119-L1128). Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P2. Impact: the public docs still say the estimator is “equivalent” to Stata jwdid and describe the OLS path as additive cohort+time fixed effects (docs/api/wooldridge_etwfe.rst:L4-L19, docs/tutorials/16_wooldridge_etwfe.ipynb:L7, docs/tutorials/16_wooldridge_etwfe.ipynb:L454), but the implementation actually absorbs unit+time FE (diff_diff/wooldridge.py:L458-L476) and the Registry already documents nontrivial SE/aggregation deviations (docs/methodology/REGISTRY.md:L1119-L1128, docs/methodology/REGISTRY.md:L1152-L1158). Concrete fix: change the wording to “matches jwdid with documented deviations in REGISTRY.md” and describe OLS as unit+time FE plus cohort×time treatment interactions.
Severity P2. Impact: the new tests do not cover either blocking inference branch above. Bootstrap coverage only exercises the default unit-cluster path (tests/test_wooldridge.py:L455-L473), and the nonlinear tests only cover full-rank happy paths (tests/test_wooldridge.py:L386-L452, tests/test_wooldridge.py:L537-L578). Concrete fix: add one OLS bootstrap regression with cluster != unit, and one duplicated-covariate/dummy regression for each nonlinear method asserting finite SEs for estimable ATT cells under rank_deficient_action="silent".

Path to Approval

Make the OLS multiplier bootstrap generate weights at the effective analytic cluster level (cluster if set, else unit) and add regression coverage for cluster != unit.
Move nonlinear delta-method SE calculations to the reduced parameter space used by the solver/vcov, and add rank-deficient logit/Poisson tests with duplicated nuisance columns.

Execution note: I did not run pytest in this sandbox because the available Python environment here does not have numpy / pandas installed.

…laims P1-A: Compute nonlinear delta-method gradients in reduced parameter space when rank-deficient. Previously gradients were in full space but vcov had NaN for dropped columns, causing NaN propagation into estimable ATT SEs. Now grad_reduced @ vcov_reduced @ grad_reduced for both logit and Poisson. P1-B: OLS bootstrap now draws multiplier weights at the analytic cluster level (cluster_ids), not always at the unit level. Previously ignored the cluster parameter, producing wrong bootstrap SEs for coarser clustering. P2: Fix remaining "equivalent to jwdid" claim in API docs and "additive cohort + time FEs" in tutorial intro/summary. Now accurately describes unit + time FE via within-transformation with documented deviations. Also: update gradient tests to check SE finiteness directly (internal _gradient key removed from public effects dict). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T17:25:00Z

/ai-review

github-actions · 2026-04-04T17:31:53Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2f9667ccc61b604348057df8bfa613f9a90d52f2

Overall Assessment

⚠️ Needs changes

Executive Summary

The two prior implementation defects appear fixed in code: the OLS multiplier bootstrap now draws weights at the effective analytic cluster in diff_diff/wooldridge.py:546, and the nonlinear delta-method now projects gradients onto the kept-parameter space before SE propagation in diff_diff/wooldridge.py:638 and diff_diff/wooldridge.py:802.
Cross-checking against Wooldridge’s ETWFE/nonlinear DiD papers and the jwdid write-up, I did not find an undocumented methodology mismatch in the changed estimator logic; the remaining QMLE small-sample-factor and cell-weight differences are explicitly documented in the Registry/TODO and are non-blocking. (link.springer.com)
The PR still does not add regression coverage for either previously blocking edge case. Bootstrap tests only exercise the default unit-cluster path in tests/test_wooldridge.py:455, and nonlinear tests only cover full-rank happy paths in tests/test_wooldridge.py:537 and tests/test_wooldridge.py:554.
Under this repo’s review checklist, those untested new parameter/inference interactions remain P1.
Non-blockers remain in the benchmark timer and a few lingering “matches Stata” doc phrasings.

Methodology

Severity P3 informational. Impact: The estimator now matches the documented ETWFE structure I would expect from the sources: saturated cohort×time OLS with unit/time FE, ASF-based nonlinear ATT for logit/Poisson, never_treated placebo leads, and post-treatment-only simple/group aggregation. The QMLE aweight small-sample factor and cell-count aggregation weights are explicitly documented deviations in docs/methodology/REGISTRY.md:1116, docs/methodology/REGISTRY.md:1122, and TODO.md:79. Concrete fix: none for approval. (link.springer.com)

Code Quality

No findings beyond the test/doc items below.

Performance

Severity P3. Impact: total_seconds still excludes results.aggregate("event"), because timing stops at the end of the with Timer() block and the script then re-reads the same elapsed value; the benchmark therefore reports estimation time twice in benchmarks/python/benchmark_wooldridge.py:71 and benchmarks/python/benchmark_wooldridge.py:84. Concrete fix: either time estimation and aggregation separately or move aggregate("event") inside the timed block.

Maintainability

No findings beyond the missing regression coverage.

Tech Debt

Severity P3 informational. Impact: the remaining known limitations are properly tracked: QMLE small-sample-factor parity, cell-count aggregation weights, missing canonical-link warning, and missing Stata goldens in TODO.md:79, TODO.md:80, TODO.md:81, and TODO.md:82. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

Severity P1. Impact: the bootstrap fix is unprotected by regression tests. TestBootstrap only instantiates WooldridgeDiD(n_bootstrap=...) with default unit clustering in tests/test_wooldridge.py:455, so the newly changed cluster × bootstrap path in diff_diff/wooldridge.py:546 is never exercised. Per the repo’s edge-case checklist, an untested new parameter interaction with inference is still a needs-changes item. Concrete fix: add a regression test with a coarser cluster column and n_bootstrap > 0 that specifically exercises clustered multiplier weights.
Severity P1. Impact: the reduced-space nonlinear SE fix is also unprotected. The only nonlinear SE tests are full-rank mpdta happy paths in tests/test_wooldridge.py:537 and tests/test_wooldridge.py:554; there is no duplicated-nuisance logit/Poisson case proving that dropped non-ATT columns no longer poison finite ATT SEs in diff_diff/wooldridge.py:638 and diff_diff/wooldridge.py:802. Concrete fix: add one logit and one Poisson test with a duplicated nuisance dummy/covariate and rank_deficient_action="silent" that assert finite SEs for estimable ATT cells.
Severity P3. Impact: the API examples still use unconditional “matches Stata jwdid” wording in docs/api/wooldridge_etwfe.rst:72, docs/api/wooldridge_etwfe.rst:94, and docs/api/wooldridge_etwfe.rst:106. That overstates parity given the documented SE/aggregation deviations in docs/methodology/REGISTRY.md:1116 and docs/methodology/REGISTRY.md:1122. Concrete fix: change those headings to “follows the jwdid specification” or limit the claim to model structure / point estimates.

Path to Approval

Add a cluster != unit multiplier-bootstrap regression test for WooldridgeDiD.
Add rank-deficient logit and Poisson regression tests with duplicated nuisance columns to lock in the reduced-space delta-method fix.

Execution note: I could not run pytest in this sandbox because numpy and pytest are not installed, so this is a static review.

…ciency P1 test coverage for previously blocking fixes: - TestBootstrapClusterLevel: bootstrap with cluster != unit exercises the clustered multiplier weight path, asserts different SE from unit-level - TestNonlinearRankDeficiency: logit and Poisson with duplicated nuisance covariate (rank_deficient_action="silent") assert finite SEs for estimable ATT cells after dropped columns P3: Replace remaining "matches Stata" with "follows Stata" in API docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T17:36:21Z

/ai-review

github-actions · 2026-04-04T17:50:23Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 940739d134a554e8fbee1c0acc8053b78118f9c5

Overall Assessment

⚠️ Needs changes

Executive Summary

The two previously blocking items from the last review look addressed: cluster-level bootstrap coverage is now present at tests/test_wooldridge.py#L825, and nonlinear rank-deficiency SE coverage was added at tests/test_wooldridge.py#L855 and tests/test_wooldridge.py#L880.
The core ETWFE / nonlinear DiD implementation is otherwise aligned with Wooldridge’s linear ETWFE framework and his pooled nonlinear DiD formulation, and the remaining QMLE small-sample / aggregation-weight differences are explicitly documented in the Registry/TODO, so those are non-blocking. (link.springer.com)
Severity P1: anticipation is only partially propagated. Treatment cells honor g - anticipation at diff_diff/wooldridge.py#L136 and the Registry documents that behavior at docs/methodology/REGISTRY.md#L1147, but the not_yet_treated identification guard still uses cohort > time at diff_diff/wooldridge.py#L350. That can let anticipation-window pseudo-controls satisfy the “has untreated comparisons” check.
Anticipation-specific test coverage still stops at raw cell construction in tests/test_wooldridge.py#L614; there is no regression for anticipation-aware no-control rejection or for the intended aggregate semantics under anticipation > 0.
No security issues found. The benchmark timer still underreports total_seconds in benchmarks/python/benchmark_wooldridge.py#L71 and benchmarks/python/benchmark_wooldridge.py#L84.

Methodology

The linear ETWFE structure and the pooled nonlinear logit/Poisson framing are consistent with the cited Wooldridge sources. (link.springer.com)

Severity P1. Impact: anticipation is a public estimator parameter, but the changed code only uses it when building treatment cells (diff_diff/wooldridge.py#L136). The not_yet_treated availability check remains hard-coded to cohort > time (diff_diff/wooldridge.py#L350), and control-unit metadata also ignores the anticipation window (diff_diff/wooldridge.py#L441). Because the Registry defines anticipation-adjusted treatment cells but does not document any “cell-only” restriction for this parameter (docs/methodology/REGISTRY.md#L1147), this is an undocumented partial propagation / missing assumption-check gap. Concrete fix: make the not_yet_treated guard and control-count logic anticipation-aware, carry anticipation through the results object if downstream summaries depend on it, and add a regression where all remaining pseudo-controls lie inside the anticipation window and fit() must fail clearly.

Code Quality

No additional findings beyond the anticipation propagation issue above.

Performance

Severity P3. Impact: total_seconds is still just the elapsed time from the estimation-only timer; results.aggregate("event") happens after the with Timer() block, but benchmarks/python/benchmark_wooldridge.py#L84 reuses the old value. The benchmark therefore reports estimation time twice instead of estimation-plus-aggregation time. Concrete fix: move results.aggregate("event") inside the timed block, or time aggregation separately and report both numbers.

Maintainability

No additional findings.

Tech Debt

Severity P3 informational. Impact: the remaining ETWFE limitations are properly tracked already: QMLE small-sample-factor parity, cell-count aggregation weights, canonical-link warning, and missing Stata goldens at TODO.md#L80, TODO.md#L81, TODO.md#L82, and TODO.md#L83. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: anticipation coverage only asserts that extra (g,t) cells exist (tests/test_wooldridge.py#L614); it does not lock down anticipation-aware identification or downstream summaries, which is why the P1 path above remains unprotected. Concrete fix: add one regression that expects a clear error when only anticipation-window pseudo-controls remain, and one regression that fixes the intended overall_att / aggregate("group") / aggregate("calendar") behavior under anticipation > 0.

Path to Approval

Make the not_yet_treated control-availability check anticipation-aware in diff_diff/wooldridge.py#L348, and align control-unit metadata in diff_diff/wooldridge.py#L436 with the same rule.
Add anticipation regressions covering both failure and success paths: one dataset with no valid controls after applying the anticipation window that must raise, and one dataset that locks in the intended aggregate semantics for overall_att, group, and calendar.

Execution note: this was a static review only; pytest, numpy, pandas, and scipy are not installed in this sandbox.

…adata Make not_yet_treated identification guard anticipation-aware: check (cohort - anticipation) > time instead of cohort > time. Also fix _count_control_units to use anticipation-adjusted threshold. Add regression tests: - test_anticipation_aware_identification_rejects_pseudo_controls: single-cohort design where anticipation=1 consumes all controls - test_anticipation_aggregate_semantics: verify aggregations produce finite results with anticipation > 0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T18:23:03Z

/ai-review

github-actions · 2026-04-04T18:32:26Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 01dc4eac906e64c52608d209c8c978253ba14d78

Overall Assessment

⛔ Blocker

The core ETWFE regression structure and the nonlinear ASF/QMLE framing are broadly consistent with Wooldridge’s published linear and nonlinear DiD treatments, and the PR’s stated QMLE small-sample and cell-weight differences are already documented locally, so the remaining blockers are implementation defects rather than source-material disputes. (academic.oup.com)

Executive Summary

The prior re-review blocker around anticipation-aware no-control rejection/control-counting looks addressed in diff_diff/wooldridge.py#L348, diff_diff/wooldridge.py#L438, with regression coverage at tests/test_wooldridge.py#L631.
Severity P0 [Newly identified]: the Poisson rank-deficiency path can silently turn dropped treatment-interaction coefficients into zero ATT cells and then average them into overall_att; this is a silent statistical correctness bug in diff_diff/wooldridge.py#L808, diff_diff/wooldridge.py#L836, diff_diff/wooldridge.py#L874, and diff_diff/wooldridge.py#L897.
Severity P1 [Newly identified]: anticipation affects cell construction and identification, but simple/group/calendar aggregation still hardcodes t >= g, and the results object has no anticipation metadata; that is an undocumented methodology deviation in diff_diff/wooldridge.py#L47, diff_diff/wooldridge_results.py#L48, diff_diff/wooldridge_results.py#L112, and diff_diff/wooldridge_results.py#L130.
The Registry/TODO tracking for QMLE small-sample adjustment, cell-count aggregation weights, canonical-link warnings, and missing Stata goldens is in place, so those items are non-blocking at this stage: docs/methodology/REGISTRY.md#L1114, TODO.md#L80.
The benchmark timing bug from the last review is still present: total_seconds does not include aggregation in benchmarks/python/benchmark_wooldridge.py#L71 and benchmarks/python/benchmark_wooldridge.py#L84.

Methodology

Affected methods in this diff: WooldridgeDiD ETWFE OLS/logit/Poisson estimation, ASF-based ATT, delta-method SEs, and simple/group/calendar/event aggregation.

Severity P0 [Newly identified]. Impact: the Poisson path cleans NaN coefficients to 0.0 at diff_diff/wooldridge.py#L808 before slicing treatment-interaction coefficients at diff_diff/wooldridge.py#L836. That means the dropped-column guard at diff_diff/wooldridge.py#L853 no longer skips dropped interaction cells, so unidentified (g,t) effects get stored as zero/zero-SE cells at diff_diff/wooldridge.py#L874 and can be averaged into overall_att at diff_diff/wooldridge.py#L897. This is a silent correctness bug whenever a treatment-interaction column is dropped. Concrete fix: mirror the logit path by preserving the raw interaction slice before np.where(..., 0.0), skipping any cell whose interaction coefficient is NaN, and excluding those cells from _gt_weights/aggregation.
Severity P1 [Newly identified]. Impact: anticipation changes the treated-cell support in diff_diff/wooldridge.py#L136 and the identification guard in diff_diff/wooldridge.py#L350, but overall_att still uses t >= g in diff_diff/wooldridge.py#L47, and aggregate("group") / aggregate("calendar") do the same in diff_diff/wooldridge_results.py#L112 and diff_diff/wooldridge_results.py#L130. The results object also omits anticipation entirely at diff_diff/wooldridge_results.py#L48, so downstream summaries cannot express anticipation-aware semantics. The Registry documents anticipation as expanding treatment cells to t >= g - anticipation at docs/methodology/REGISTRY.md#L1147, but there is no **Note:** documenting a different aggregation rule. Concrete fix: either propagate anticipation into WooldridgeDiDResults and use t >= g - anticipation wherever post-treatment support is defined, or explicitly document the alternative rule in REGISTRY.md and lock it with tests.

Code Quality

No separate findings beyond the methodology defects above.

Performance

Severity P3. Impact: results.aggregate("event") runs after the timer context exits, but total_seconds is still read from the estimation-only timer in benchmarks/python/benchmark_wooldridge.py#L83 and benchmarks/python/benchmark_wooldridge.py#L124, so the benchmark underreports end-to-end time. Concrete fix: move aggregation inside the timed block or report estimation and aggregation on separate timers.

Maintainability

No separate findings.

Tech Debt

Severity P3 informational. Impact: the remaining ETWFE limitations are already tracked in TODO.md: QMLE small-sample-factor parity at TODO.md#L80, cell-count aggregation weights at TODO.md#L81, canonical-link warnings at TODO.md#L82, and missing Stata golden tests at TODO.md#L83. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new anticipation regression at tests/test_wooldridge.py#L647 only checks that outputs are finite, so it would not catch the aggregation-semantics mismatch above; the nonlinear rank-deficiency coverage at tests/test_wooldridge.py#L889 and tests/test_wooldridge.py#L914 only duplicates nuisance covariates and never forces a dropped treatment-interaction column, so it misses the Poisson P0 path. Concrete fix: add one anticipation test with known expected overall_att / group / calendar values under anticipation > 0, and one Poisson test that forces an interaction-column drop and asserts the affected cell is omitted/NaN, not att=0.0.

Path to Approval

Fix Poisson dropped-cell handling in diff_diff/wooldridge.py#L804 by preserving the raw coefficient vector, skipping dropped interaction cells, and adding a regression test that forces treatment-interaction rank deficiency.
Resolve the anticipation aggregation contract. Either:
implement anticipation-aware post-treatment filters in overall_att, group, and calendar and carry anticipation into WooldridgeDiDResults; or
document the current t >= g aggregation rule with a **Note:** in docs/methodology/REGISTRY.md and add tests that explicitly assert that behavior.

Execution note: static review only; numpy and pytest are unavailable in this sandbox.

…tics P0: Poisson path zeroed NaN coefficients BEFORE checking for dropped interaction cells, causing dropped cells to silently become att=0 and get averaged into overall_att. Fix: preserve raw interaction coefficients before NaN->0 zeroing and check against the raw values in the ASF loop. P1: Document that simple/group/calendar aggregation uses t >= g as the post-treatment threshold regardless of anticipation setting. Anticipation cells are estimated but treated as pre-treatment placebos in aggregation. Add REGISTRY **Note:** and test asserting overall_att equals manually computed post-treatment-only weighted average. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T18:39:22Z

/ai-review

github-actions · 2026-04-04T18:50:57Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8a70087929ea099be607364a5225900d263de88d

Overall Assessment

⚠️ Needs changes

The prior Poisson dropped-cell blocker looks addressed in diff_diff/wooldridge.py:L806, and the earlier anticipation aggregation-threshold concern is now explicitly documented in docs/methodology/REGISTRY.md:L1148. But two unmitigated P1 issues remain in the new ETWFE API/assumption handling.

Executive Summary

The previous Poisson rank-deficiency blocker appears fixed: the Poisson path now preserves raw interaction coefficients before NaN -> 0 cleanup and skips dropped treatment cells during ASF construction in diff_diff/wooldridge.py:L806.
The previous anticipation aggregation mismatch is now mitigated by documentation: the Registry explicitly says simple/group/calendar aggregation uses t >= g even when anticipation > 0 in docs/methodology/REGISTRY.md:L1148.
Severity P1: anticipation still is not propagated into WooldridgeDiDResults, so event-study outputs cannot distinguish true placebo leads from anticipation effects, and summary('event') labels every negative k as [pre] in diff_diff/wooldridge_results.py:L238.
Severity P1 [Newly identified]: fit() never validates that cohort is time-invariant within unit, even though the estimator is defined on unit-level first-treatment cohorts in diff_diff/wooldridge.py:L315.
Severity P3: the new benchmark still underreports total_seconds because aggregation runs after the timed block in benchmarks/python/benchmark_wooldridge.py:L71 and benchmarks/python/benchmark_wooldridge.py:L83.

Methodology

Affected methods: WooldridgeDiD OLS/logit/Poisson ETWFE estimation, ASF ATT(g,t), and WooldridgeDiDResults event/simple/group/calendar aggregation.

Severity P1. Impact: the Registry now defines anticipation-window cells as estimated but distinct from post-treatment ATT in docs/methodology/REGISTRY.md:L1147 and docs/methodology/REGISTRY.md:L1148, but none of the three fit paths propagate anticipation into the results object in diff_diff/wooldridge.py:L530, diff_diff/wooldridge.py:L750, and diff_diff/wooldridge.py:L919. As a result, diff_diff/wooldridge_results.py:L238 marks all negative event times as [pre], which is semantically wrong for k in [-anticipation, -1]. Concrete fix: add an anticipation field to WooldridgeDiDResults, pass it from all fit paths, and make event-study rendering/export distinguish true pre-trend placebos (k < -anticipation) from anticipation effects.
Severity P1 [Newly identified]. Impact: cohort is documented as a first-treatment-period variable in diff_diff/wooldridge.py:L315, but fit() immediately normalizes NaNs and proceeds to sample construction in diff_diff/wooldridge.py:L321 without checking that cohort is constant within each unit. If callers pass a row-varying treatment-timing column instead of a true G_i, the estimator silently builds an invalid cohort design and can return wrong ATT(g,t) values. Concrete fix: before _filter_sample(), enforce df.groupby(unit)[cohort].nunique() == 1 with a clear error message and example offending unit.

Code Quality

No separate findings beyond the methodology issues above.

Performance

Severity P3. Impact: total_seconds is currently just the estimation timer, because results.aggregate("event") executes after the Timer() context exits in benchmarks/python/benchmark_wooldridge.py:L71 and benchmarks/python/benchmark_wooldridge.py:L83. The benchmark therefore understates end-to-end runtime. Concrete fix: move aggregation inside the timed block or report estimation and aggregation on separate timers.

Maintainability

No separate findings.

Tech Debt

Severity P3 informational. Impact: the conservative QMLE small-sample factor, cell-count aggregation weights, canonical-link warning gap, and missing Stata goldens are properly tracked in TODO.md:L80, TODO.md:L81, TODO.md:L82, and TODO.md:L83, with matching methodology notes in docs/methodology/REGISTRY.md:L1120, docs/methodology/REGISTRY.md:L1128, and docs/methodology/REGISTRY.md:L1148. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new nonlinear rank-deficiency coverage in tests/test_wooldridge.py:L899 only duplicates nuisance covariates in tests/test_wooldridge.py:L903 and tests/test_wooldridge.py:L928; it still does not force a dropped treatment-interaction column, so the exact prior Poisson failure mode is not locked by a regression test. Concrete fix: add a Poisson case that makes one (g,t) interaction column collinear and assert that the affected cell is omitted rather than recorded as att=0.

Path to Approval

Propagate anticipation into WooldridgeDiDResults from all three fit paths and make summary('event')/exports distinguish anticipation effects from true pre-treatment placebo leads.
Add a unit-level validation that cohort is time-invariant within unit, and fail fast on violations with a clear error.
Add regression tests for both fixes: one anticipation=1 event-study output test and one non-constant-cohort validation test.

Execution note: static review only. pytest and even numpy are unavailable in this sandbox, so I could not run the new test suite.

P1: Validate cohort is time-invariant within unit before fitting. Raises ValueError with the offending unit if cohort varies across time periods for any unit. P1: Propagate anticipation into WooldridgeDiDResults. Event study summary now labels anticipation-window cells as [antic] (not [pre]). True pre-treatment placebo leads (k < -anticipation) still labeled [pre]. Document t >= g aggregation rule in REGISTRY **Note:**. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T19:05:49Z

/ai-review

github-actions · 2026-04-04T19:13:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7fa18fef4691f421157c1ffb2e23743b95982338

Overall Assessment

⚠️ Needs changes

The prior P1s from the last AI review appear resolved, but one untracked P1 remains in the new nonlinear ETWFE API: the logit/Poisson paths still accept outcomes outside the model’s support and can therefore return ATT/SE output for invalid specifications.

Executive Summary

The previous anticipation propagation and cohort-invariance blockers look fixed: fit() now rejects row-varying cohort within unit in diff_diff/wooldridge.py:L324-L333, and event summaries now distinguish true pre-periods from anticipation periods via the carried anticipation metadata in diff_diff/wooldridge_results.py:L55-L56 and diff_diff/wooldridge_results.py:L237-L245.
The earlier Poisson dropped-cell issue also looks addressed: raw interaction coefficients are preserved before NaN -> 0 cleanup, and dropped treatment cells are skipped in ASF construction in diff_diff/wooldridge.py:L819-L823 and diff_diff/wooldridge.py:L868-L871.
Severity P1 [Newly identified]: the nonlinear paths do not validate outcome support. WooldridgeDiD advertises logit for binary/fractional outcomes and Poisson for count/non-negative outcomes in diff_diff/wooldridge.py:L197-L199 and docs/api/wooldridge_etwfe.rst:L18-L21, but fit() passes y straight into the solvers in diff_diff/wooldridge.py:L642-L650 and diff_diff/wooldridge.py:L813-L817. solve_logit() only validates weights in diff_diff/linalg.py:L1195-L1207, and solve_poisson() documents non-negative outcomes in diff_diff/linalg.py:L2383-L2385 but never checks them in diff_diff/linalg.py:L2400-L2434. A user can accidentally run Poisson on negative outcomes or logit on values outside [0,1] and get silent ATT/SE output from an unsupported model.
Severity P3: the new benchmark still understates end-to-end runtime because total_seconds is captured after the timer closes, while results.aggregate("event") runs outside the timed block in benchmarks/python/benchmark_wooldridge.py:L71-L84.
Severity P3: the new rank-deficiency tests only duplicate nuisance covariates in tests/test_wooldridge.py:L903-L947, so the exact dropped-treatment-interaction failure mode fixed in the Poisson path is still not regression-locked.

Methodology

Severity P1 [Newly identified]. Impact: the nonlinear ETWFE implementation is missing a core assumption check for outcome support. The public contract says logit is for binary/fractional outcomes and Poisson is for count/non-negative outcomes in diff_diff/wooldridge.py:L197-L199, docs/api/wooldridge_etwfe.rst:L18-L21, and diff_diff/linalg.py:L2383-L2385, but neither nonlinear path enforces those domains before estimation in diff_diff/wooldridge.py:L642-L650, diff_diff/wooldridge.py:L813-L817, diff_diff/linalg.py:L1195-L1207, and diff_diff/linalg.py:L2400-L2434. The tracked TODO note about canonical-link warnings in TODO.md:L82-L82 does not mitigate this: that note is about valid-but-suboptimal method choice, not accepting out-of-support outcomes. Concrete fix: before dispatching to _fit_logit() / _fit_poisson(), reject non-finite outcomes; require 0 <= y <= 1 for logit; require y >= 0 for Poisson; and add regression tests for those failures.

Code Quality

Severity P3. Impact: rank_deficient_action is validated for logit in diff_diff/linalg.py:L1209-L1215, but not for the new Poisson path. Combined with the lack of constructor-level validation in diff_diff/wooldridge.py:L238-L251, a typo like "warnn" silently behaves like "silent" on rank-deficient Poisson fits in diff_diff/linalg.py:L2403-L2417. Concrete fix: validate rank_deficient_action in WooldridgeDiD.__init__ and/or mirror the solve_logit() valid_actions check inside solve_poisson().

Performance

Severity P3. Impact: benchmark_wooldridge.py reports total_seconds as the same timer used for fit(), even though event aggregation happens afterward in benchmarks/python/benchmark_wooldridge.py:L71-L84. The benchmark therefore understates end-to-end ETWFE runtime. Concrete fix: either move results.aggregate("event") inside the timed block or report separate estimation_seconds and aggregation_seconds.

Maintainability

No findings.

Tech Debt

Severity P3 informational. Impact: the key documented ETWFE deviations and deferred items are properly tracked and should not count against approval: the conservative QMLE small-sample factor, cell-count aggregation weights, the canonical-link warning gap, and missing Stata goldens are all recorded in TODO.md:L80-L83, with matching methodology notes in docs/methodology/REGISTRY.md:L1114-L1120, docs/methodology/REGISTRY.md:L1122-L1128, and docs/methodology/REGISTRY.md:L1147-L1148. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the added nonlinear rank-deficiency coverage in tests/test_wooldridge.py:L903-L947 only duplicates nuisance covariates. It still does not force a dropped (g,t) treatment-interaction column, so the exact prior Poisson failure mode fixed in diff_diff/wooldridge.py:L819-L823 and diff_diff/wooldridge.py:L868-L871 is not pinned by a regression test. Concrete fix: add a logit/Poisson case where one treatment interaction is collinear and assert that the corresponding cell is omitted from group_time_effects rather than returned with att=0.

Path to Approval

Add explicit nonlinear outcome-support validation before fitting: logit should reject non-finite or out-of-range outcomes (y < 0 or y > 1), and poisson should reject non-finite or negative outcomes.
Add regression tests for both invalid-input failures, plus one dropped-treatment-interaction test that forces a collinear (g,t) column and asserts the cell is omitted rather than estimated as zero.

Static review only. This sandbox lacks pytest and even numpy, so I could not execute the added test suite or run numeric spot checks.

…ation P1: Validate outcome support before nonlinear fitting. Logit rejects y outside [0, 1], Poisson rejects y < 0, both reject non-finite values. Previously out-of-support outcomes were silently accepted. P3: Validate rank_deficient_action in solve_poisson (matching solve_logit/solve_ols pattern). Typos like "warnn" now raise ValueError instead of silently behaving like "silent". Tests: TestOutcomeValidation with logit out-of-range and Poisson negative outcome rejection tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T19:17:05Z

/ai-review

github-actions · 2026-04-04T19:30:22Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 56aad1f7d644457b96157d49ae4351b6389abca0

Overall Assessment

⚠️ Needs changes

The prior nonlinear-outcome-support blocker from the last AI review appears resolved, but two unmitigated P1s remain in the new ETWFE implementation: the public covariate-adjusted design does not match the documented/source ETWFE specification, and the new sklearn-style set_params() path can silently switch estimator semantics on invalid values.

Executive Summary

The previous P1 from the last review looks fixed: logit now rejects outcomes outside [0, 1] and Poisson rejects negative outcomes in diff_diff/wooldridge.py:L642-L649 and diff_diff/wooldridge.py:L820-L827, with regression coverage in tests/test_wooldridge.py:L988-L1017.
Severity P1 [Newly identified]: covariate-adjusted ETWFE is not implementing the interacted covariate structure documented in the Methodology Registry and source material. _prepare_covariates() only emits raw covariates plus optional cohort interactions, and fit() appends them once to the design in diff_diff/wooldridge.py:L147-L185 and diff_diff/wooldridge.py:L385-L403. That does not match the registered “covariates interacted with cohort×time indicators” requirement in docs/methodology/REGISTRY.md:L1096-L1100. citeturn2view0turn0view0
Severity P1 [Newly identified]: set_params() bypasses validation in diff_diff/wooldridge.py:L288-L294, so typos can silently reroute estimation to the Poisson path, the not-yet-treated control logic, or Mammen bootstrap weights via the open-ended else branches at diff_diff/wooldridge.py:L91-L95, diff_diff/wooldridge.py:L405-L443, and diff_diff/wooldridge.py:L571-L585.
The PR’s declared SE/aggregation deviations are properly documented and tracked: the conservative aweight QMLE small-sample factor, cell-count aggregation weights, canonical-link warning gap, and missing Stata goldens are in docs/methodology/REGISTRY.md:L1114-L1128 and TODO.md:L80-L83, so they are informational rather than blockers.
P3 only: the benchmark still underreports total_seconds, and the new rank-deficiency tests still do not pin the exact dropped-treatment-interaction failure mode.

Methodology

Severity P1 [Newly identified]. Impact: Methods affected: OLS, logit, and Poisson whenever any of exovar, xtvar, or xgvar is supplied. The registry says ETWFE covariates should be part of the cohort/time-specific treatment design in docs/methodology/REGISTRY.md:L1096-L1100 and docs/methodology/REGISTRY.md:L1135-L1138, but _prepare_covariates() only produces raw covariates and cohort interactions in diff_diff/wooldridge.py:L147-L185, and fit() just concatenates those columns once to X_int in diff_diff/wooldridge.py:L385-L403. That means demean_covariates=True is only demeaning a main-effect block, not implementing the documented ATT-parameterized covariate design, so covariate-adjusted ATT(g,t) can come from a different model than the one advertised. citeturn2view0turn0view0
Concrete fix: either implement the full documented covariate design end-to-end, or reject exovar/xtvar/xgvar at fit time until that methodology is actually supported. At minimum, add parity tests for one covariate-adjusted OLS case and one nonlinear case against a manually constructed interacted design or a Stata/R reference.

Code Quality

Severity P1 [Newly identified]. Impact: set_params() is advertised as sklearn-compatible in diff_diff/wooldridge.py:L288-L294 but does not re-run the constructor checks from diff_diff/wooldridge.py:L239-L250. Because downstream dispatch uses catch-all else branches, invalid values silently change estimator behavior: method="probit" falls into _fit_poisson() in diff_diff/wooldridge.py:L405-L443, invalid control_group falls into not-yet-treated logic in diff_diff/wooldridge.py:L91-L95, and invalid bootstrap_weights falls into Mammen weights in diff_diff/wooldridge.py:L571-L585. That is silent semantic drift on a public API.
Concrete fix: factor the existing constructor checks into a shared _validate_params() and call it from __init__, set_params(), and the start of fit(). Add regression tests that invalid method, control_group, anticipation, and bootstrap_weights all raise.

Performance

Severity P3. Impact: benchmark_wooldridge.py stores estimation_seconds after the timer closes, then calls results.aggregate("event"), but still reports total_seconds = estimation_timer.elapsed in benchmarks/python/benchmark_wooldridge.py:L71-L84. The benchmark therefore understates end-to-end runtime and duplicates the estimation timing.
Concrete fix: either move aggregation inside the timed block or report separate aggregation_seconds and total_seconds.

Maintainability

No findings.

Tech Debt

Severity P3 informational. Impact: the PR’s declared ETWFE deviations and deferred items are properly tracked, so they should not count against approval: conservative QMLE cluster adjustment, cell-level n_{g,t} aggregation weights, canonical-link warning gap, and missing Stata goldens are all documented in docs/methodology/REGISTRY.md:L1114-L1128 and TODO.md:L80-L83.
Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new nonlinear rank-deficiency tests in tests/test_wooldridge.py:L903-L951 only duplicate nuisance covariates. They still do not force a dropped (g,t) treatment-interaction column, so the exact omission logic added in diff_diff/wooldridge.py:L833-L885 is not regression-locked.
Concrete fix: add a logit/Poisson test where one treatment-interaction column is made collinear and assert that the affected cell is omitted from group_time_effects rather than returned with att=0.

Path to Approval

Implement the documented covariate-adjusted ETWFE design for exovar/xtvar/xgvar across OLS, logit, and Poisson, or explicitly reject those arguments until the correct interacted specification is supported.
Add covariate-adjusted parity tests that compare ATT(g,t) against a manually constructed interacted design or an external reference implementation, so the covariate specification is locked.
Add shared parameter validation for set_params() and fit() so invalid method, control_group, anticipation, and bootstrap_weights raise instead of silently selecting Poisson/not-yet-treated/Mammen.

Static review only. The sandbox here does not have numpy, pandas, or pytest, so I could not execute the added test suite.

P1: Implement treatment × demeaned-covariate interactions (W2025 Eq. 5.3). Covariates are now included as both main effects and moderating interactions with each (g,t) treatment cell indicator. Demeaning uses within-cohort means. This allows ATT to vary with covariates within cells, matching the documented ETWFE specification. P1: Re-run parameter validation in set_params() after setattr. Previously invalid values for method/control_group/bootstrap_weights/anticipation silently took effect via catch-all else branches. Updated REGISTRY to document interacted covariate specification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T19:55:46Z

/ai-review

github-actions · 2026-04-04T20:01:37Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f25ea09027a7e5ffa0640ba42dee13e6d6dab70a

Overall Assessment

⛔ Blocker

Executive Summary

Re-review status: the prior set_params() silent-rerouting P1 appears resolved. set_params() now revalidates method, control_group, anticipation, and bootstrap_weights in diff_diff/wooldridge.py:288.
The previously flagged covariate-design gap is only partially addressed. The PR now adds treatment × demeaned-covariate interactions to the ETWFE design, consistent with the updated registry note in docs/methodology/REGISTRY.md:1134.
Severity P0: covariate-adjusted nonlinear ETWFE is still incorrect. In logit and Poisson, the ASF counterfactual turns off only the scalar cell effect δ_{g,t} even though the fitted model now also includes cell-specific treatment × covariate terms. That yields wrong ATT(g,t), wrong aggregate ATT, and wrong delta-method SEs whenever exovar, xtvar, or xgvar is supplied.
The PR’s declared SE and aggregation deviations are properly documented/tracked, so they remain informational rather than blockers: QMLE small-sample adjustment, n_{g,t} aggregation weights, canonical-link warning gap, and missing Stata goldens are covered in docs/methodology/REGISTRY.md:1119, docs/methodology/REGISTRY.md:1128, and TODO.md:80.
The benchmark still underreports end-to-end runtime because total_seconds excludes the post-fit aggregation work in benchmarks/python/benchmark_wooldridge.py:71 and benchmarks/python/benchmark_wooldridge.py:83.

Methodology

Severity P0. Impact: Methods affected: WooldridgeDiD(method="logit") and WooldridgeDiD(method="poisson") whenever exovar, xtvar, or xgvar is passed. fit() now builds treatment × demeaned-covariate interactions for all methods in diff_diff/wooldridge.py:412 and diff_diff/wooldridge.py:437, and the registry documents that interacted design in docs/methodology/REGISTRY.md:1138. But the nonlinear ASF implementation still constructs the counterfactual as eta_0 = eta_base - delta and computes gradients as if only δ_{g,t} were treatment-specific in diff_diff/wooldridge.py:752, diff_diff/wooldridge.py:756, diff_diff/wooldridge.py:762, diff_diff/wooldridge.py:932, diff_diff/wooldridge.py:933, and diff_diff/wooldridge.py:941. That no longer matches the registry’s nonlinear ATT definition in docs/methodology/REGISTRY.md:1104, because once the fitted treatment effect includes cell-specific covariate terms, switching treatment off must zero the full active treatment block, not just the scalar cell intercept. The current code silently leaves the treatment × covariate contribution in the “untreated” predictor, so both point estimates and SEs are wrong with no warning. Concrete fix: either reject exovar/xtvar/xgvar for nonlinear methods until the ASF math is implemented correctly, or compute eta_0 and the delta-method gradients using the full set of active (g,t) treatment columns for each observation (cell dummy plus its cell × covariate interaction columns).

Code Quality

No unmitigated findings. The prior set_params() validation issue appears fixed in diff_diff/wooldridge.py:288.

Performance

Severity P3. Impact: the new benchmark reports total_seconds = estimation_timer.elapsed after results.aggregate("event"), so total_seconds duplicates estimation-only timing and understates end-to-end runtime in benchmarks/python/benchmark_wooldridge.py:71, benchmarks/python/benchmark_wooldridge.py:80, and benchmarks/python/benchmark_wooldridge.py:83. Concrete fix: either include aggregation inside the timed block or report separate aggregation_seconds and total_seconds.

Maintainability

No findings.

Tech Debt

Severity P3 informational. Impact: the PR’s declared methodological deviations are properly documented and/or tracked, so they should not count against approval: QMLE aweight small-sample adjustment, n_{g,t} aggregation weights, canonical-link warning gap, and missing Stata golden tests are covered in docs/methodology/REGISTRY.md:1119, docs/methodology/REGISTRY.md:1120, docs/methodology/REGISTRY.md:1128, TODO.md:80, TODO.md:81, TODO.md:82, and TODO.md:83. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new tests cover nonlinear paths without covariates and rank-deficiency cases with duplicated nuisance covariates, but they never assert correctness of covariate-adjusted nonlinear ATT or delta-method gradients in tests/test_wooldridge.py:386, tests/test_wooldridge.py:424, tests/test_wooldridge.py:680, tests/test_wooldridge.py:919, and tests/test_wooldridge.py:944. That gap is why the blocker above is not regression-locked. Concrete fix: add one logit and one Poisson test with a known nonzero treatment × covariate effect and assert ATT(g,t) / overall ATT against a manually computed ASF or external reference.

Path to Approval

Fix covariate-adjusted nonlinear ATT computation in _fit_logit() and _fit_poisson(): when exovar, xtvar, or xgvar is present, turning treatment off for cell (g,t) must zero the full active treatment block, not just δ_{g,t}.
Update the delta-method gradients and aggregate SE calculations to treat those cell × covariate coefficients as treatment-side parameters in the counterfactual, rather than generic baseline coefficients.
Add regression tests for covariate-adjusted logit and Poisson ETWFE with heterogeneous treatment effects by covariate, and assert both point estimates and SEs against a manual ASF calculation or trusted external reference.

Static review only. The sandbox here does not have numpy, pandas, or pytest, so I could not execute the added test suite.

P0: Reject exovar/xtvar/xgvar for method="logit" and method="poisson" with NotImplementedError. The nonlinear ASF counterfactual computation only subtracts the scalar cell effect but must zero the full treatment block (cell indicator + cell × covariate interactions) to be correct. OLS covariate-adjusted ETWFE remains fully supported. P1 (prior round): set_params() now re-validates all constrained params after setattr, preventing silent dispatch to wrong paths via typos. Updated rank-deficiency tests to not use covariates for nonlinear paths. Added test_nonlinear_covariates_rejected asserting NotImplementedError. Added TODO for full nonlinear covariate ASF implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T20:05:59Z

/ai-review

github-actions · 2026-04-04T20:18:27Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 84fc6ed67adf8e5141f85b79f3ce7907367ae7e4

Overall Assessment

⛔ Blocker

Executive Summary

Re-review status: the prior nonlinear-covariate blocker is resolved. logit/poisson now reject exovar/xtvar/xgvar explicitly in diff_diff/wooldridge.py:358, and the deferred implementation is tracked in TODO.md:84.
Severity P0 [Newly identified]: WooldridgeDiD(method="ols") silently mis-estimates unbalanced panels because the new OLS path uses one-pass two-way demeaning instead of exact FE absorption. citeturn0view0turn0view1
The remaining issues are non-blocking: summary() hardcodes a 95% CI label, the new benchmark understates end-to-end runtime, and the new tests/docs still do not lock down the unbalanced-panel OLS case.

Methodology

Severity P0 [Newly identified]. Impact: WooldridgeDiD._fit_ols() always calls diff_diff/wooldridge.py:539, and the unweighted branch of within_transform() uses the one-pass y_it - \bar y_i - \bar y_t + \bar y formula in diff_diff/utils.py:1884. The project’s own methodology notes already say one-pass demeaning is exact only for balanced panels in docs/methodology/REGISTRY.md:871. Wooldridge’s published ETWFE discussion and the jwdid notes both treat unbalanced panels separately, so this is not a harmless implementation choice. On any unbalanced estimation sample, ATT(g,t), aggregate ATT, and their SEs can be wrong with no warning. Concrete fix: use exact FE absorption for unbalanced OLS samples (iterative alternating projections or explicit unit/time dummies), or reject unbalanced panels for method="ols" until that is implemented. citeturn0view0turn0view1
Severity P3 informational. Impact: the prior nonlinear ASF/covariate blocker is now mitigated correctly. The code raises NotImplementedError in diff_diff/wooldridge.py:365, there is a regression test in tests/test_wooldridge.py:943, and the limitation is tracked in TODO.md:84. Concrete fix: none required for approval.

Code Quality

Severity P3. Impact: WooldridgeDiDResults.summary() hardcodes [95% CI] in diff_diff/wooldridge_results.py:197, so fits run with alpha != 0.05 display the correct interval under the wrong confidence-level label. Concrete fix: derive the displayed confidence level from self.alpha (or accept an alpha override like other results classes).

Performance

Severity P3. Impact: the benchmark records estimation_seconds inside the timed block at benchmarks/python/benchmark_wooldridge.py:71, then runs results.aggregate("event") outside it at benchmarks/python/benchmark_wooldridge.py:83, but still stores the unchanged timer value as total_seconds at benchmarks/python/benchmark_wooldridge.py:125. Concrete fix: either include aggregation in the timed block or report separate aggregation_seconds and real end-to-end total_seconds.

Maintainability

No findings.

Tech Debt

Severity P3 informational. Impact: the PR’s declared ETWFE deviations/limitations are properly documented or tracked: QMLE aweight SE adjustment in docs/methodology/REGISTRY.md:1119, cell-count aggregation weights in docs/methodology/REGISTRY.md:1128, canonical-link warning gap in TODO.md:82, and nonlinear covariate-adjusted ASF in TODO.md:84. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new OLS tests cover a balanced synthetic helper in tests/test_wooldridge.py:171 and mpdta smoke tests in tests/test_wooldridge.py:281, but there is no unbalanced-panel parity test that would catch the blocker above. Concrete fix: add an unbalanced OLS ETWFE regression test against an explicit unit+time-dummy OLS reference for both control groups.
Severity P3 informational. Impact: the API comparison table still says ETWFE supports covariates generically in docs/api/wooldridge_etwfe.rst:159, while nonlinear covariate-adjusted ETWFE now raises NotImplementedError in diff_diff/wooldridge.py:365. Because the limitation is tracked in TODO.md:84, this is informational rather than blocking. Concrete fix: add a short caveat that covariates are currently OLS-only.

Path to Approval

Replace the OLS path’s one-pass within_transform with exact FE absorption on unbalanced samples. The lowest-risk route is to reuse the iterative alternating-projection logic already present in diff_diff/utils.py:1842; the acceptable short-term fallback is to reject unbalanced panels for method="ols" with a clear error.
Add an unbalanced-panel parity test for WooldridgeDiD(method="ols") against explicit unit+time-dummy OLS for both control_group="not_yet_treated" and control_group="never_treated".
If the short-term fix is to reject unbalanced panels, document that restriction in the ETWFE API page and Methodology Registry.

Static review only; this sandbox does not have numpy, pandas, or pytest, so I could not execute the new test suite.

Implement the full interacted ASF counterfactual for logit and Poisson (W2023 Eq. 3.15). When computing eta_0 (counterfactual without treatment), now zeros the FULL treatment block per cell: both the scalar cell effect delta_gt AND all cell × covariate interaction effects xi_gt_j * x_hat_j. Previously the ASF only subtracted the scalar delta, leaving covariate interaction terms in the "untreated" predictor — wrong when covariates are present. The delta-method gradient is also updated: for each cell × covariate coefficient, d(ATT)/d(xi_j) = mean[G'(eta_1) * x_hat_j]. Removes NotImplementedError for nonlinear covariates. All three methods (OLS, logit, Poisson) now support exovar/xtvar/xgvar with the interacted specification from Wooldridge (2025) Eq. 5.3. Tests: added test_logit_with_covariates and test_poisson_with_covariates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T20:26:22Z

/ai-review

github-actions · 2026-04-04T20:35:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 93f4d21cf8b427d63df668fcc41304df2220ea14

Overall Assessment

⛔ Blocker

Executive Summary

The prior blocker is still unresolved: WooldridgeDiD(method="ols") still uses one-pass demeaning for FE absorption, so unbalanced panels can return wrong ATT and SEs instead of the exact unit+time-FE regression. See diff_diff/wooldridge.py:L521-L529, diff_diff/utils.py:L1880-L1898, docs/methodology/REGISTRY.md:L1094-L1102, and docs/methodology/REGISTRY.md:L871-L877.
Severity P1 [Newly identified]: the nonlinear control_group="never_treated" path is underidentified as written, because it combines a full (g,t) cell grid with cohort and time dummies, then lets QR dropping choose an implicit normalization. That is not the documented ETWFE estimand. See diff_diff/wooldridge.py:L113-L117, diff_diff/wooldridge.py:L678-L688, diff_diff/wooldridge.py:L866-L878, and docs/methodology/REGISTRY.md:L1141-L1152.
The intentional ETWFE deviations called out in the PR are properly documented or tracked: the QMLE aweight adjustment and cell-count aggregation weights are in the Methodology Registry, and the canonical-link warning gap / Stata golden tests are in TODO.md. Those are not defects. See docs/methodology/REGISTRY.md:L1114-L1128 and TODO.md:L80-L83.
The earlier non-blocking issues also remain: summary() still hardcodes a 95% CI label, and the new benchmark still reports total_seconds without timing aggregation. See diff_diff/wooldridge_results.py:L195-L198 and benchmarks/python/benchmark_wooldridge.py:L71-L84.

Methodology

Severity P0. Impact: WooldridgeDiD._fit_ols() still absorbs FE by calling within_transform() on the unweighted sample, and the unweighted branch still uses the one-pass y_it - \bar y_i - \bar y_t + \bar y formula. The Wooldridge registry requires exact unit/time FE absorption, and the project’s own registry already says one-pass demeaning is exact only for balanced panels. On any unbalanced estimation sample, ATT(g,t), aggregate ATT, and their cluster-robust SEs can therefore be wrong with no estimator-side warning. See diff_diff/wooldridge.py:L521-L529, diff_diff/utils.py:L1880-L1898, docs/methodology/REGISTRY.md:L1094-L1102, and docs/methodology/REGISTRY.md:L871-L877. Concrete fix: use exact alternating-projection / dummy-equivalent FE absorption whenever the OLS estimation sample is unbalanced, or reject unbalanced panels for method="ols" until that path exists.
Severity P1 [Newly identified]. Impact: for control_group="never_treated", the shared interaction builder emits all treated-cohort (g,t) indicators, including pre-treatment placebo cells. The nonlinear paths then append cohort and time dummies instead of unit FE. That makes each cohort dummy an exact linear combination of its cell indicators, so the model is underidentified before estimation. The current implementation resolves this only by QR-dropping arbitrary columns, and the ATT loops simply skip dropped cells, which means the remaining nonlinear ATT(g,t) are tied to an implicit, data-dependent normalization rather than the documented jwdid estimand. See diff_diff/wooldridge.py:L113-L117, diff_diff/wooldridge.py:L678-L688, diff_diff/wooldridge.py:L753-L756, diff_diff/wooldridge.py:L866-L878, diff_diff/wooldridge.py:L943-L946, diff_diff/linalg.py:L84-L145, diff_diff/linalg.py:L1268-L1285, diff_diff/linalg.py:L2410-L2424, and docs/methodology/REGISTRY.md:L1141-L1152. Concrete fix: use a fixed identified normalization for nonlinear never_treated that matches jwdid and reconstruct cell ATTs from it, or temporarily reject control_group="never_treated" for method in {"logit", "poisson"} until that parameterization is implemented.

Code Quality

Severity P3. Impact: WooldridgeDiDResults.summary() always prints [95% CI], even when alpha != 0.05, so the label can disagree with the interval actually shown. See diff_diff/wooldridge_results.py:L195-L198. Concrete fix: derive the displayed confidence level from self.alpha.

Performance

Severity P3. Impact: the benchmark stops timing before results.aggregate("event"), then writes the unchanged value as total_seconds, so the benchmark understates end-to-end runtime. See benchmarks/python/benchmark_wooldridge.py:L71-L84 and benchmarks/python/benchmark_wooldridge.py:L123-L126. Concrete fix: either include aggregation in the timed block or report separate aggregation_seconds and true total_seconds.

Maintainability

No findings.

Tech Debt

Severity P3 informational. Impact: the PR’s intentional ETWFE deviations and deferred items are tracked correctly: the QMLE aweight small-sample adjustment and cell-count aggregation weights are documented in the Methodology Registry, and the canonical-link warning gap plus Stata golden-value work are tracked in TODO.md. See docs/methodology/REGISTRY.md:L1114-L1128 and TODO.md:L80-L83. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new suite covers balanced helpers and mpdta OLS parity, plus default-control-group logit/poisson smoke tests, but it still does not add the unbalanced OLS parity regression or any nonlinear control_group="never_treated" regression. That leaves both methodology issues above unguarded in CI. See tests/test_wooldridge.py:L171-L185, tests/test_wooldridge.py:L281-L335, tests/test_wooldridge.py:L386-L452, and tests/test_wooldridge.py:L571-L597. Concrete fix: add an explicit unbalanced-panel OLS parity test against unit+time-dummy OLS for both control groups, and add logit/poisson never_treated tests that assert the intended full ATT grid and aggregate outputs against a reference implementation or documented normalization.

Path to Approval

Replace the OLS path’s one-pass unweighted demeaning with exact FE absorption on unbalanced samples, or reject unbalanced method="ols" fits and document that restriction in the Wooldridge API + Methodology Registry as an interim fix.
Reparameterize nonlinear control_group="never_treated" to an identified, deterministic normalization consistent with jwdid, or reject that control-group setting for logit and poisson until the correct parameterization is implemented.
Add targeted regression tests for both issues: unbalanced-panel OLS parity for never_treated and not_yet_treated, plus nonlinear never_treated tests that verify no intended post-treatment ATT cell is lost to arbitrary rank-deficiency handling.

Static review only; the sandbox does not have numpy/pandas, so I could not execute the estimator or the new tests.

… CI label P0: OLS path now uses iterative alternating-projection within-transformation (via uniform weights) for exact FE absorption on both balanced and unbalanced panels. The one-pass formula was only exact for balanced panels. P1: Nonlinear methods (logit, Poisson) with control_group="never_treated" now restrict the interaction matrix to post-treatment cells only. Including all (g,t) cells created exact collinearity between cohort dummies and cell indicator sums, causing QR to drop arbitrary columns with a data-dependent normalization. P3: summary() CI label now derived from self.alpha instead of hardcoded 95%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T21:29:50Z

/ai-review

github-actions · 2026-04-04T21:36:41Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 93f4d21cf8b427d63df668fcc41304df2220ea14

Overall Assessment

⛔ Blocker

The prior blocker is still unresolved: the OLS path still uses one-pass unweighted demeaning, so unbalanced panels can return wrong ATT(g,t) and wrong SEs without an estimator-side guard. The nonlinear control_group="never_treated" path also remains underidentified and still depends on generic QR column dropping for which ATT cells survive.

Executive Summary

The prior P0 is still present: WooldridgeDiD(method="ols") calls the unweighted one-pass within_transform(), which is only exact on balanced panels.
The prior P1 is still present: nonlinear control_group="never_treated" builds a rank-deficient design and lets QR dropping determine which ATT cells remain in group_time_effects.
The PR’s intentional QMLE/aggregation deviations are properly documented in the Methodology Registry and TODO.md; those are not blockers.
Two earlier P3s also remain: summary() hardcodes a 95% CI label, and the benchmark’s total_seconds still excludes aggregation time.
The new test suite still does not add the targeted regressions needed to lock down the two unresolved methodology issues.

Methodology

Wooldridge 2025 treats ETWFE OLS as the exact unit/time-FE regression or its pooled-OLS cohort-dummy equivalent, and explicitly notes that the earlier numerical equivalences do not carry over to unbalanced panels. Wooldridge 2023’s nonlinear ATT construction is based on pooled models with cohort and time dummies, with ATT(g,t) computed by turning the cell effect on/off while other cohort/time dummies are held fixed. citeturn3view2turn3view0turn2view1turn2view2

P0 [Previously raised, unresolved] diff_diff/wooldridge.py:L528, diff_diff/utils.py:L1880, docs/methodology/REGISTRY.md:L1094, docs/methodology/REGISTRY.md:L871: Impact: _fit_ols() still absorbs FE via the unweighted within_transform() path, and that helper still uses y_it - y_i - y_t + y one-pass demeaning. The repo’s own registry already states one-pass demeaning is only exact for balanced panels, so any unbalanced ETWFE sample can silently produce wrong cell ATTs, wrong aggregates, and wrong cluster-robust SEs. Concrete fix: use exact alternating-projection / dummy-equivalent FE absorption whenever the OLS sample is unbalanced, or reject unbalanced method="ols" fits until that path exists.
P1 [Previously raised, unresolved] diff_diff/wooldridge.py:L113, diff_diff/wooldridge.py:L686, diff_diff/wooldridge.py:L754, diff_diff/wooldridge.py:L876, diff_diff/wooldridge.py:L943, diff_diff/linalg.py:L1268, docs/methodology/REGISTRY.md:L1141: Impact: for control_group="never_treated", the nonlinear paths build the full treated (g,t) grid and then append cohort and time dummies. That creates exact linear dependencies between each treated-cohort dummy and the sum of its cell indicators. solve_logit() / solve_poisson() resolve this only by QR-dropping columns, and the ATT loops simply skip dropped cell coefficients. The resulting cell set and aggregates therefore depend on arbitrary column dropping rather than a documented ETWFE normalization. Concrete fix: implement a fixed identified normalization that preserves all intended post-treatment ATT cells, or reject control_group="never_treated" for method in {"logit", "poisson"} until that parameterization exists.

Code Quality

P3 diff_diff/wooldridge_results.py:L197: Impact: summary() always prints [95% CI], even when alpha != 0.05, so the label can disagree with the interval shown. Concrete fix: derive the displayed confidence level from self.alpha.

Performance

P3 benchmarks/python/benchmark_wooldridge.py:L83, benchmarks/python/benchmark_wooldridge.py:L125: Impact: results.aggregate("event") runs outside the timed block, but total_seconds is still read from the estimator timer, so the benchmark understates end-to-end runtime. Concrete fix: either include aggregation in the timed block or record aggregation_seconds separately and compute a true total.

Maintainability

No findings.

Tech Debt

P3 informational docs/methodology/REGISTRY.md:L1119, docs/methodology/REGISTRY.md:L1120, docs/methodology/REGISTRY.md:L1128, TODO.md:L80, TODO.md:L81, TODO.md:L82, TODO.md:L83: Impact: the QMLE aweight small-sample adjustment, cell-count aggregation weights, canonical-link warning gap, and missing Stata golden tests are all explicitly documented/tracked. Those are not approval blockers under the project rules. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

P3 tests/test_wooldridge.py:L571, tests/test_wooldridge.py:L761, tests/test_wooldridge.py:L789: Impact: the suite still does not add the two targeted regressions needed for the blocker paths: an unbalanced-panel OLS parity test against explicit unit+time-dummy OLS, and logit/poisson control_group="never_treated" tests that assert the intended full ATT grid survives rank deficiency. The unresolved methodology defects therefore remain unguarded in CI. Concrete fix: add those exact regressions and assert both cell-level outputs and overall aggregates.

Path to Approval

Replace the OLS path’s one-pass unweighted demeaning with exact FE absorption on unbalanced samples, or reject unbalanced method="ols" fits and document that restriction.
Reparameterize nonlinear control_group="never_treated" to a fixed, identified normalization consistent with the documented ETWFE estimand, or reject that setting for logit and poisson.
Add targeted regression tests for both issues: unbalanced-panel OLS parity against explicit unit+time-dummy OLS, and nonlinear never_treated tests that verify no intended post-treatment ATT cell is lost to rank-deficiency handling.

Static review only. The sandbox lacks numpy/pandas, so I could not execute the estimator or the new tests.

igerber · 2026-04-04T21:38:08Z

/ai-review

github-actions · 2026-04-04T21:47:12Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c847eab91bbbb7453d8338b541e47c5d379c6ddf

Overall Assessment

⚠️ Needs changes

This re-review suggests the prior unbalanced-panel OLS blocker and the prior nonlinear control_group="never_treated" normalization blocker are addressed. The remaining unmitigated issue is a new P1: the covariate-adjusted ETWFE paths do not implement Wooldridge’s full pooled/TWFE covariate basis, so exovar/xtvar/xgvar can silently estimate a restricted model instead of the paper’s covariate-adjusted estimator. citeturn3view1turn3view7turn4view0turn4view4

Executive Summary

Prior OLS unbalanced-panel issue looks resolved: the OLS path now forces iterative alternating-projection FE absorption, and there is a targeted parity test against explicit unit+time dummy OLS at tests/test_wooldridge.py:L1070-L1100.
Prior nonlinear never_treated issue looks resolved: nonlinear paths now use post-treatment cells only, with the normalization documented in docs/methodology/REGISTRY.md:L1153-L1154 and regression tests at tests/test_wooldridge.py:L1120-L1208.
P1 [Newly identified]: covariate-adjusted WooldridgeDiD omits the source-model f_t × X block, and D_g × X only appears if users manually duplicate variables into xgvar, so the covariate paths are not source-equivalent across OLS, logit, and Poisson. citeturn3view1turn3view7turn4view0turn4view4
The QMLE small-sample adjustment, cell-count aggregation weights, canonical-link warning gap, and missing Stata golden tests are already tracked in TODO.md:L80-L83 and are not blockers.
total_seconds in the new benchmark still excludes aggregation time at benchmarks/python/benchmark_wooldridge.py:L71-L84.
The new covariate tests only assert finiteness at tests/test_wooldridge.py:L943-L979, so CI would not catch the methodology mismatch.

Methodology

Severity: P1 [Newly identified]. Impact: when covariates are supplied, fit() builds only cell indicators, cell-by-demeaned-covariate interactions, and raw covariates; neither the OLS path nor the nonlinear pooled-QMLE paths add the f_t × X controls required by Wooldridge’s covariate-adjusted pooled/TWFE specification, and D_g × X is only present if the caller separately re-passes the same variables through xgvar. Wooldridge 2025’s covariate-adjusted ETWFE/TWM specification and Wooldridge 2023’s pooled nonlinear specification both require the control-side f_t × X block alongside cohort-specific D_g × X and cell-specific D_g × f_t × \dot X, so the current covariate paths are an undocumented restricted specification rather than the cited method. Concrete fix: automatically generate the full covariate basis for all three methods (X, D_g × X, f_t × X, D_g × f_t × \dot X) and thread the same ordering through OLS, logit, Poisson, and ASF delta-method gradients; otherwise reject covariate arguments or document the restriction as a formal REGISTRY.md deviation. Refs: diff_diff/wooldridge.py:L424-L456, diff_diff/wooldridge.py:L543-L550, diff_diff/wooldridge.py:L702-L706, diff_diff/wooldridge.py:L890-L896, docs/methodology/REGISTRY.md:L1135-L1140. citeturn3view1turn3view7turn4view0turn4view4

Code Quality

No findings.

Performance

Severity: P3. Impact: the benchmark reports total_seconds from the estimation timer even though results.aggregate("event") runs after the timed block, so the JSON understates end-to-end runtime. Concrete fix: either time estimation+aggregation together or record a separate aggregation timer and compute a true total. Refs: benchmarks/python/benchmark_wooldridge.py:L71-L84.

Maintainability

No findings.

Tech Debt

Severity: P3 informational. Impact: the QMLE small-sample adjustment, cell-count aggregation weights, canonical-link warning gap, and missing Stata golden tests are already documented/tracked, so they are not approval blockers under the project rules. Concrete fix: none required for this PR. Refs: docs/methodology/REGISTRY.md:L1115-L1129, TODO.md:L80-L83.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the covariate-path tests only check that ATT/SE are finite, so they would not detect the missing f_t × X / automatic D_g × X blocks or any resulting methodology drift. Concrete fix: add covariate-adjusted reference tests that compare OLS against explicit dummy regression with the full paper basis, plus analogous logit/Poisson design or saved-reference tests. Refs: tests/test_wooldridge.py:L943-L979.

Path to Approval

Bring the covariate-adjusted Wooldridge design into line with the cited sources across OLS, logit, and Poisson by adding the full control-side basis: X, automatic D_g × X, f_t × X, and D_g × f_t × \dot X.
If the restricted covariate basis is intentional, add an explicit **Note/Deviation** entry in docs/methodology/REGISTRY.md and the Wooldridge API docs stating that the current covariate paths are not Eq. 5.3 / Eq. 3.12-equivalent.
Add regression tests for covariate-adjusted ETWFE that fail without the full basis, not just finite-output smoke tests.

Static review only: this environment does not have pytest installed, and numpy is unavailable, so I could not execute the estimator or the new tests.

Add automatic cohort × covariate (D_g × X) and time × covariate (f_t × X) interaction blocks to the ETWFE design matrix, completing the W2025 Eq. 5.3 specification. Previously only cell × demeaned-X and raw X were included, silently fitting a restricted model. D_g × X auto-generated for exovar/xtvar (xgvar already has these in _prepare_covariates). f_t × X generated for all covariates, drop first time period for identification. ASF and delta-method gradient code unchanged — new blocks are nuisance parameters not zeroed in the counterfactual. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T22:21:25Z

/ai-review

github-actions · 2026-04-04T22:27:46Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2102b3d68f1e110d3b2399d41f2b165734b911ca

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review blocker on the covariate-adjusted ETWFE basis appears resolved: the estimator now constructs the full documented covariate basis, the registry records that design, and there is now an explicit dummy-OLS parity test for it. Refs: diff_diff/wooldridge.py:L424-L489, docs/methodology/REGISTRY.md:L1135-L1140, tests/test_wooldridge.py:L1223-L1296
The earlier unbalanced-panel OLS and nonlinear control_group="never_treated" concerns also look addressed in both code and targeted regression tests. Refs: diff_diff/wooldridge.py:L100-L155, diff_diff/wooldridge.py:L573-L579, tests/test_wooldridge.py:L1060-L1220
I did not identify any new unmitigated P0/P1 methodology, inference, control-group, or NaN-propagation defects in the changed estimator paths.
The remaining ETWFE deviations called out in the registry/TODO are properly documented/tracked, so they are informational rather than blockers under the project rules. Refs: docs/methodology/REGISTRY.md:L1115-L1129, TODO.md:L80-L83
Two minor P3 issues remain in the new benchmark: total_seconds excludes aggregation time, and the script claims to validate against Callaway-Sant'Anna without actually doing that comparison. Refs: benchmarks/python/benchmark_wooldridge.py:L2-L9, benchmarks/python/benchmark_wooldridge.py:L71-L84

Methodology

No unmitigated findings. The prior covariate-basis mismatch appears fixed: the implementation now adds D_g × X, f_t × X, cell-specific covariate interactions, and the raw covariate block in diff_diff/wooldridge.py:L424-L489; the nonlinear ASF paths remove the full treatment block when constructing the counterfactual ATT in diff_diff/wooldridge.py:L794-L867 and diff_diff/wooldridge.py:L981-L1058; and the intended methodology is documented in docs/methodology/REGISTRY.md:L1135-L1154.

Code Quality

No findings.

Performance

Severity: P3. Impact: the benchmark understates end-to-end runtime because results.aggregate("event") runs after the timer context has closed, while total_seconds just reuses the already-frozen estimation timer. Concrete fix: either include aggregation inside a broader timer or add a separate aggregation timer and report estimation_seconds, aggregation_seconds, and a true total_seconds. Refs: benchmarks/python/benchmark_wooldridge.py:L71-L84

Maintainability

No findings.

Tech Debt

Severity: P3 informational. Impact: the remaining ETWFE gaps that differ from the paper or Stata are explicitly documented/tracked: QMLE small-sample adjustment, cell-count aggregation weights, canonical-link warning gap, and missing Stata golden-value tests. Under the project rules these are not blockers. Concrete fix: none required for this PR; continue tracking via TODO.md. Refs: docs/methodology/REGISTRY.md:L1115-L1129, TODO.md:L80-L83

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the benchmark header says it “validates OLS ETWFE ATT(g,t) against Callaway-Sant'Anna,” but the script never fits CallawaySantAnna or checks any tolerance, so the claimed validation is not actually performed. Concrete fix: either add the comparison and emit/assert the max ATT(g,t) gap, or rewrite the docstring/CLI description so it describes timing/output only. Refs: benchmarks/python/benchmark_wooldridge.py:L2-L9, benchmarks/python/benchmark_wooldridge.py:L67-L145

Static review only: pytest and numpy were unavailable in this environment, so I could not execute the new tests or run ad hoc numerical checks.

igerber and others added 2 commits April 4, 2026 10:05

igerber mentioned this pull request Apr 4, 2026

Add WooldridgeDiD (ETWFE) estimator #265

Closed

igerber force-pushed the feature/wooldridge-did branch from a6e051d to cfe0e5c Compare April 4, 2026 14:48

igerber and others added 2 commits April 4, 2026 17:20

Merge origin/main, resolve CHANGELOG conflict

c847eab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 4, 2026

igerber merged commit 54bea41 into igerber:main Apr 4, 2026
20 of 22 checks passed

Conversation

wenddymacro commented Mar 20, 2026 • edited by igerber Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

wenddymacro commented Mar 25, 2026

Methodology

Validation

Security

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

wenddymacro commented Mar 20, 2026 •

edited by igerber

Loading