Conversation
|
Thanks for this contribution — the Wooldridge ETWFE estimator fills a real gap in the library. The nonlinear outcome support (logit/Poisson via the ASF approach) is something we've had on our roadmap, and this is the right paper to implement it from. I've done an independent review of Wooldridge (2023) against your implementation. The REGISTRY.md section is solid overall, but there are items that need to be addressed before we can move forward. Files to remove
PR template Please fill in the methodology references, validation, and security fields in the PR description. Methodology notes from paper review I did an independent review of Wooldridge (2023) and have a few items to flag:
Validation Additionally, a test comparing OLS ETWFE ATT(g,t) to CallawaySantAnna ATT(g,t) on the same dataset would be valuable, since they should be approximately equivalent under linear PT with never-treated controls. Code conventions
Happy to answer questions on any of this. This is a meaningful contribution and we want to make sure it lands well. |
MethodologyPrimary reference: Secondary reference: Application reference: Callaway-Sant'Anna equivalence (Proposition 3.1): Validation
Security
|
…vcov backend, add tests - git rm uv.lock + 3 docs/superpowers/ files; extend .gitignore - Extend compute_robust_vcov with optional weights for QMLE sandwich vcov (logit: w_i = p_i(1-p_i), Poisson: w_i = mu_i); Rust path bypassed - Replace manual vcov loops in logit/Poisson paths with compute_robust_vcov - Expose _gradient in group_time_effects for delta-method transparency - Add 3 tests to TestComputeRobustVcov (weighted sandwich, bypass, null-weight parity) - Add 3 tests to TestMethodologyCorrectness (logit gradient, Poisson gradient, OLS ETWFE ≡ CallawaySantAnna Proposition 3.1 equivalence) - Document aggregation weights formula + deviation-from-R note in REGISTRY.md - Add benchmark_wooldridge.py and register in run_benchmarks.py
Rebase and forge third-party contribution (wenddymacro) implementing Wooldridge (2025, 2023) Extended Two-Way Fixed Effects estimator. - OLS, logit, and Poisson QMLE paths with ASF-based ATT - Delta-method SEs for nonlinear models, cluster-robust sandwich - Four aggregation types (simple, group, calendar, event) - Fixed QMLE sandwich to use weight_type="aweight" for correct unweighted scores in bread computation - Added solve_poisson IRLS solver to linalg.py - Complete documentation: REGISTRY, README, CHANGELOG, tutorial - 60 tests covering all paths, edge cases, and methodology checks Co-Authored-By: wenddymacro <50739376+wenddymacro@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… P2s P0: Fix _filter_sample so control_group="never_treated" excludes pre-treatment observations from treated units. Previously both settings produced identical samples (treated_mask=cohort>0 kept all treated-unit obs regardless). Now never_treated restricts to post-treatment obs from treated units + all never-treated obs. P1: Add NaN checks in logit/Poisson ASF loops to skip cells whose interaction coefficients were dropped due to rank deficiency. Zero out NaN entries in beta before vcov computation. P2a: Validate bootstrap_weights in __init__ — invalid values now raise ValueError instead of silently falling through to Mammen. P2b: Fix n_control_units metadata to count not-yet-treated units when control_group="not_yet_treated" (was counting only cohort==0). P3a: Change bootstrap loop to return_vcov=False (only coefs used). P3b: Soften "matches Stata exactly" claims in tutorial to "follows the jwdid specification" with registry deviation note. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a6e051d to
cfe0e5c
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance Maintainability Tech Debt
Security Documentation/Tests
Path to Approval
|
…son rank handling
Finding 1 (P1): Rewrite never_treated implementation. Instead of deleting
pre-treatment observations (wrong), keep all data in-sample and control
the comparison pool via the interaction matrix. For never_treated,
_build_interaction_matrix creates ALL (g,t) pairs including pre-treatment
cells as placebo indicators, so only never-treated units remain in the
baseline. Pre-treatment coefficients serve as placebo/pre-trend tests
and aggregate("event") now produces negative event times.
Finding 2 (P1): Add identification checks in fit() — raise ValueError
for no treated cohorts, no never-treated units with never_treated control,
and empty interaction matrix.
Finding 3 (P1): Add rank_deficient_action to solve_poisson matching
solve_logit/solve_ols pattern. Uses QR-based rank detection, drops
collinear columns, expands beta with NaN. Warns on singular Hessian
instead of silent break.
Finding 4 (P1): Reject bootstrap params for nonlinear methods —
n_bootstrap > 0 with method != "ols" raises ValueError.
Finding 5 (P2): Fix tutorial to accurately describe OLS FE structure
(unit+time via within-transformation, not additive cohort+time).
Also: skip NaN coefficients in OLS/logit/Poisson ATT extraction, fix
vcov submatrix to only include identified cells.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Execution note: I could not run Path to Approval
|
…ication checks, docs
P1: Compute nonlinear QMLE vcov on reduced design when columns are
dropped. Previously vcov was computed on the full singular matrix,
causing errors. Now logit and Poisson paths extract kept columns,
compute vcov on the reduced design, and expand back with NaN.
P1: Add not_yet_treated identification check — raises ValueError when
no untreated comparison observations exist (all units treated at all
observed times).
P2: Fix tutorial notebook — call aggregate('event') before using
event_study_effects in comparison plot cell.
P2: Replace last "exactly" claim in API docs with "follows the jwdid
specification" + registry deviation reference.
Fix flaky TestAllEventuallyTreated assertion — relax ATT > 0 to
isfinite (simulated data with rank deficiency can produce negative ATT
for individual cells despite positive true effect).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Execution note: I did not run |
…laims P1-A: Compute nonlinear delta-method gradients in reduced parameter space when rank-deficient. Previously gradients were in full space but vcov had NaN for dropped columns, causing NaN propagation into estimable ATT SEs. Now grad_reduced @ vcov_reduced @ grad_reduced for both logit and Poisson. P1-B: OLS bootstrap now draws multiplier weights at the analytic cluster level (cluster_ids), not always at the unit level. Previously ignored the cluster parameter, producing wrong bootstrap SEs for coarser clustering. P2: Fix remaining "equivalent to jwdid" claim in API docs and "additive cohort + time FEs" in tutorial intro/summary. Now accurately describes unit + time FE via within-transformation with documented deviations. Also: update gradient tests to check SE finiteness directly (internal _gradient key removed from public effects dict). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No findings beyond the test/doc items below. Performance
Maintainability No findings beyond the missing regression coverage. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Execution note: I could not run |
…ciency P1 test coverage for previously blocking fixes: - TestBootstrapClusterLevel: bootstrap with cluster != unit exercises the clustered multiplier weight path, asserts different SE from unit-level - TestNonlinearRankDeficiency: logit and Poisson with duplicated nuisance covariate (rank_deficient_action="silent") assert finite SEs for estimable ATT cells after dropped columns P3: Replace remaining "matches Stata" with "follows Stata" in API docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology The linear ETWFE structure and the pooled nonlinear logit/Poisson framing are consistent with the cited Wooldridge sources. (link.springer.com)
Code Quality No additional findings beyond the anticipation propagation issue above. Performance
Maintainability No additional findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Execution note: this was a static review only; |
…adata Make not_yet_treated identification guard anticipation-aware: check (cohort - anticipation) > time instead of cohort > time. Also fix _count_control_units to use anticipation-adjusted threshold. Add regression tests: - test_anticipation_aware_identification_rejects_pseudo_controls: single-cohort design where anticipation=1 consumes all controls - test_anticipation_aggregate_semantics: verify aggregations produce finite results with anticipation > 0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker The core ETWFE regression structure and the nonlinear ASF/QMLE framing are broadly consistent with Wooldridge’s published linear and nonlinear DiD treatments, and the PR’s stated QMLE small-sample and cell-weight differences are already documented locally, so the remaining blockers are implementation defects rather than source-material disputes. (academic.oup.com) Executive Summary
Methodology Affected methods in this diff:
Code Quality No separate findings beyond the methodology defects above. Performance
Maintainability No separate findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Execution note: static review only; |
…tics P0: Poisson path zeroed NaN coefficients BEFORE checking for dropped interaction cells, causing dropped cells to silently become att=0 and get averaged into overall_att. Fix: preserve raw interaction coefficients before NaN->0 zeroing and check against the raw values in the ASF loop. P1: Document that simple/group/calendar aggregation uses t >= g as the post-treatment threshold regardless of anticipation setting. Anticipation cells are estimated but treated as pre-treatment placebos in aggregation. Add REGISTRY **Note:** and test asserting overall_att equals manually computed post-treatment-only weighted average. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The prior Poisson dropped-cell blocker looks addressed in diff_diff/wooldridge.py:L806, and the earlier Executive Summary
Methodology Affected methods:
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Execution note: static review only. |
P1: Validate cohort is time-invariant within unit before fitting. Raises ValueError with the offending unit if cohort varies across time periods for any unit. P1: Propagate anticipation into WooldridgeDiDResults. Event study summary now labels anticipation-window cells as [antic] (not [pre]). True pre-treatment placebo leads (k < -anticipation) still labeled [pre]. Document t >= g aggregation rule in REGISTRY **Note:**. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The prior P1s from the last AI review appear resolved, but one untracked P1 remains in the new nonlinear ETWFE API: the logit/Poisson paths still accept outcomes outside the model’s support and can therefore return ATT/SE output for invalid specifications. Executive Summary
Methodology
Code Quality
Performance
Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Static review only. This sandbox lacks |
…ation P1: Validate outcome support before nonlinear fitting. Logit rejects y outside [0, 1], Poisson rejects y < 0, both reject non-finite values. Previously out-of-support outcomes were silently accepted. P3: Validate rank_deficient_action in solve_poisson (matching solve_logit/solve_ols pattern). Typos like "warnn" now raise ValueError instead of silently behaving like "silent". Tests: TestOutcomeValidation with logit out-of-range and Poisson negative outcome rejection tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment The prior nonlinear-outcome-support blocker from the last AI review appears resolved, but two unmitigated P1s remain in the new ETWFE implementation: the public covariate-adjusted design does not match the documented/source ETWFE specification, and the new sklearn-style Executive Summary
Methodology
Code Quality
Performance
Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Static review only. The sandbox here does not have |
P1: Implement treatment × demeaned-covariate interactions (W2025 Eq. 5.3). Covariates are now included as both main effects and moderating interactions with each (g,t) treatment cell indicator. Demeaning uses within-cohort means. This allows ATT to vary with covariates within cells, matching the documented ETWFE specification. P1: Re-run parameter validation in set_params() after setattr. Previously invalid values for method/control_group/bootstrap_weights/anticipation silently took effect via catch-all else branches. Updated REGISTRY to document interacted covariate specification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only. The sandbox here does not have |
P0: Reject exovar/xtvar/xgvar for method="logit" and method="poisson" with NotImplementedError. The nonlinear ASF counterfactual computation only subtracts the scalar cell effect but must zero the full treatment block (cell indicator + cell × covariate interactions) to be correct. OLS covariate-adjusted ETWFE remains fully supported. P1 (prior round): set_params() now re-validates all constrained params after setattr, preventing silent dispatch to wrong paths via typos. Updated rank-deficiency tests to not use covariates for nonlinear paths. Added test_nonlinear_covariates_rejected asserting NotImplementedError. Added TODO for full nonlinear covariate ASF implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only; this sandbox does not have |
Implement the full interacted ASF counterfactual for logit and Poisson (W2023 Eq. 3.15). When computing eta_0 (counterfactual without treatment), now zeros the FULL treatment block per cell: both the scalar cell effect delta_gt AND all cell × covariate interaction effects xi_gt_j * x_hat_j. Previously the ASF only subtracted the scalar delta, leaving covariate interaction terms in the "untreated" predictor — wrong when covariates are present. The delta-method gradient is also updated: for each cell × covariate coefficient, d(ATT)/d(xi_j) = mean[G'(eta_1) * x_hat_j]. Removes NotImplementedError for nonlinear covariates. All three methods (OLS, logit, Poisson) now support exovar/xtvar/xgvar with the interacted specification from Wooldridge (2025) Eq. 5.3. Tests: added test_logit_with_covariates and test_poisson_with_covariates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only; the sandbox does not have |
… CI label P0: OLS path now uses iterative alternating-projection within-transformation (via uniform weights) for exact FE absorption on both balanced and unbalanced panels. The one-pass formula was only exact for balanced panels. P1: Nonlinear methods (logit, Poisson) with control_group="never_treated" now restrict the interaction matrix to post-treatment cells only. Including all (g,t) cells created exact collinearity between cohort dummies and cell indicator sums, causing QR to drop arbitrary columns with a data-dependent normalization. P3: summary() CI label now derived from self.alpha instead of hardcoded 95%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker The prior blocker is still unresolved: the OLS path still uses one-pass unweighted demeaning, so unbalanced panels can return wrong Executive Summary
Methodology Wooldridge 2025 treats ETWFE OLS as the exact unit/time-FE regression or its pooled-OLS cohort-dummy equivalent, and explicitly notes that the earlier numerical equivalences do not carry over to unbalanced panels. Wooldridge 2023’s nonlinear ATT construction is based on pooled models with cohort and time dummies, with
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only. The sandbox lacks |
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment This re-review suggests the prior unbalanced-panel OLS blocker and the prior nonlinear Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only: this environment does not have |
Add automatic cohort × covariate (D_g × X) and time × covariate (f_t × X) interaction blocks to the ETWFE design matrix, completing the W2025 Eq. 5.3 specification. Previously only cell × demeaned-X and raw X were included, silently fitting a restricted model. D_g × X auto-generated for exovar/xtvar (xgvar already has these in _prepare_covariates). f_t × X generated for all covariates, drop first time period for identification. ASF and delta-method gradient code unchanged — new blocks are nuisance parameters not zeroed in the counterfactual. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No unmitigated findings. The prior covariate-basis mismatch appears fixed: the implementation now adds Code Quality No findings. Performance
Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Static review only: |
Summary
jwdid_estatweight_type="aweight"for correct unweighted scorescontrol_group="never_treated"to correctly exclude pre-treatment observations from treated unitssolve_poissonIRLS solver tolinalg.pyMethodology references (required if estimator / math changes)
jwdid_estat(documented in REGISTRY.md)(G/(G-1))*(n-1)/(n-k)vs Stata'sG/(G-1)only — conservative, inflates SEs slightly (documented in REGISTRY.md and TODO.md)etwfeusesfixestfor nonlinear paths; this uses direct QMLE viacompute_robust_vcov(documented in REGISTRY.md)Validation
tests/test_wooldridge.py(63 tests),tests/test_linalg.py(4 solve_poisson tests)docs/tutorials/16_wooldridge_etwfe.ipynbdocs/methodology/papers/wooldridge-etwfe-review.mdSecurity / privacy
Generated with Claude Code