Merge pull request #257 from igerber/release/v2.8.3

igerber · web-flow · commit dc277e36f346 · 2026-04-02T17:27:28.000-04:00
Bump version to 2.8.3, add Phase 8 survey maturity roadmap
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [2.8.3] - 2026-04-02
+
+### Added
+- **Silent operation warnings** — 8 operations that previously altered analysis results without informing the user now emit `UserWarning`:
+  - TROP lstsq → pseudo-inverse numerical fallback
+  - TwoStageDiD NaN masking of unidentified fixed effects (zeroed out with treatment indicator)
+  - TwoStageDiD always-treated unit removal (sample size change)
+  - CallawaySantAnna silent (g,t) pair skipping (zero treated or control observations)
+  - TROP missing treatment indicator fill with 0 (control)
+  - Rust → Python backend fallback (previously debug log only)
+  - Survey weight normalization (pweights/aweights rescaled to mean=1)
+  - `np.inf` → 0 never-treated convention conversion
+- **ImputationDiD pre-period event study coefficients** — pre-treatment "effects" (should be ~0 under parallel trends) for visual pre-trends assessment, following BJS (2024) Test 1
+- **TwoStageDiD pre-period event study coefficients** — same pre-trends extension
+- **Replicate weight expansion** to 7 additional estimators: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD, SunAbraham, StackedDiD, ImputationDiD, TwoStageDiD (coverage: 4/13 → 11/13)
+
+### Changed
+- ImputationDiD pre-period coefficients use BJS Test 1 (impute Y(0) for treated units in pre-treatment periods)
+- SunAbraham replicate weights use full interaction-weighted refit per replicate with cohort-level SEs
+
+### Fixed
+- Fix zero-weight demeaning safety in replicate weight paths
+- Fix `df_survey` writeback for rank-deficient replicate designs (df=0)
+- Fix ImputationDiD `balance_e` zero-qualifying-cohort fallback in pretrends path
+- Fix survey zero-mass (g,t) skip warning gap
+- Fix SunAbraham positional assignment in replicate loop
+
 ## [2.8.2] - 2026-04-02
 
 ### Added
@@ -1085,6 +1112,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - `to_dict()` and `to_dataframe()` export methods
   - `is_significant` and `significance_stars` properties
 
+[2.8.3]: https://github.com/igerber/diff-diff/compare/v2.8.2...v2.8.3
 [2.8.2]: https://github.com/igerber/diff-diff/compare/v2.8.1...v2.8.2
 [2.8.1]: https://github.com/igerber/diff-diff/compare/v2.8.0...v2.8.1
 [2.8.0]: https://github.com/igerber/diff-diff/compare/v2.7.6...v2.8.0
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
@@ -210,7 +210,7 @@
 Bacon = BaconDecomposition
 EDiD = EfficientDiD
 
-__version__ = "2.8.2"
+__version__ = "2.8.3"
 __all__ = [
     # Estimators
     "DifferenceInDifferences",
diff --git a/docs/llms-full.txt b/docs/llms-full.txt
@@ -2,7 +2,7 @@
 
 > A Python library for Difference-in-Differences causal inference analysis. Provides sklearn-like estimators with statsmodels-style output for econometric analysis.
 
-- Version: 2.8.2
+- Version: 2.8.3
 - Repository: https://github.com/igerber/diff-diff
 - License: MIT
 - Dependencies: numpy, pandas, scipy (no statsmodels dependency)
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
@@ -1,7 +1,7 @@
 # Survey Data Support Roadmap
 
 This document captures the survey data support roadmap for diff-diff.
-All phases (1-6) are implemented.
+Phases 1-7 are implemented. Phase 8 (maturity refinements) is planned.
 
 ## Implemented (Phases 1-2)
 
@@ -202,3 +202,101 @@ variance estimation for staggered triple differences.
 
 **Reference:** Ortiz-Villavicencio, M. & Sant'Anna, P.H.C. (2025).
 "Better Understanding Triple Differences Estimators." arXiv:2505.09942.
+
+---
+
+## Phase 8: Survey Maturity
+
+Refinements to close remaining gaps versus R's `survey` package and improve
+practitioner experience. Prioritized by user impact.
+
+### 8a. Successive Difference Replication (SDR)
+
+**Priority: High.** ACS PUMS — the most common US survey dataset for DiD
+policy evaluation — provides 80 SDR replicate weight columns. Without SDR
+support, these users can't use their provided replicate weights directly.
+
+**What's needed:**
+- Add `"SDR"` to `valid_rep_methods` in `SurveyDesign`
+- Variance formula: `V = 4/R * sum((theta_r - theta)^2)` — a scaling
+  difference from BRR, not a new algorithm
+- Wire through `compute_replicate_vcov()` and `compute_replicate_if_variance()`
+
+**Reference:** Fay, R.E. & Train, G.F. (1995). "Aspects of Survey and
+Model-Based Postcensal Estimation of Income and Poverty Characteristics
+for States and Counties." ASA Proceedings.
+
+### 8b. FPC in ImputationDiD and TwoStageDiD
+
+**Priority: High.** Both estimators now support replicate weights and TSL
+with strata/PSU, but reject FPC outright (`NotImplementedError`). Adding
+FPC is incremental — thread `fpc` through the existing TSL variance path.
+Matters for finite population surveys (common in state-level sampling).
+
+**Current gate:** `imputation.py:280`, `two_stage.py:268`
+
+### 8c. Silent Operation Warnings
+
+**Priority: High.** Add `UserWarning` emissions for operations that
+silently alter analysis results:
+- TROP lstsq → pseudo-inverse numerical fallback
+- TwoStageDiD NaN masking of unidentified fixed effects
+- TwoStageDiD always-treated unit removal
+- CallawaySantAnna silent (g,t) pair skipping
+- TROP missing treatment indicator fill with 0
+- Rust → Python backend fallback (currently debug log only)
+- Survey weight normalization (pweights rescaled to mean=1)
+- `np.inf` → 0 never-treated conversion
+
+### 8d. Lonely PSU "adjust" in Bootstrap
+
+**Priority: Medium.** `lonely_psu="adjust"` works for analytical (TSL)
+variance but raises `NotImplementedError` for survey-aware bootstrap
+(2 raises in `bootstrap_utils.py`). Real survey data regularly has
+singleton strata. Users needing bootstrap inference with such data hit
+a wall.
+
+**Reference:** Rust, K.F. & Rao, J.N.K. (1996). "Variance Estimation
+for Complex Surveys Using Replication Techniques." Statistical Methods
+in Medical Research 5(3).
+
+### 8e. Survey Diagnostics and Utilities
+
+**Priority: Medium.** Small additions that signal maturity to survey
+statisticians:
+- **CV on estimates**: coefficient of variation (SE/estimate) on results
+  objects — trivial to add, used by federal agencies for publication
+  standards (NCHS requires CV < 30% for releasable estimates)
+- **Weight trimming**: `trim_weights(data, weight_col, upper=None,
+  quantile=None)` utility in `prep.py` for capping extreme weights
+- **ImputationDiD pretrends + survey**: pre-trends F-test currently
+  ignores survey variance (`NotImplementedError` at `imputation.py:240`)
+
+### 8f. Survey Compatibility Matrix
+
+**Priority: Medium.** Users discover survey support limits by hitting
+`NotImplementedError` at runtime. Add a table to the survey tutorial
+or `choosing_estimator.rst` showing which estimator × survey feature
+combinations are supported (weights, strata/PSU, FPC, replicate weights,
+bootstrap + survey).
+
+### 8g. Documentation-Only Items
+
+**Priority: Low.** No code changes required:
+- **Multi-stage design**: document that single-stage (strata + PSU)
+  is sufficient for variance estimation per Lumley (2004) Section 2.2.
+  Don't implement multi-stage — it adds complexity without changing
+  results for DiD applications.
+- **Post-stratification / calibration**: document that `SurveyDesign`
+  expects pre-calibrated weights. Point users to `samplics` or R's
+  `survey::calibrate()` for weight calibration. This is data prep,
+  not DiD estimation — out of scope.
+
+### Deferred
+
+| Estimator | Capability | Reason |
+|-----------|-----------|--------|
+| SyntheticDiD | Replicate weights | No published theory on replicate weights + unit weight optimization |
+| TROP | Replicate weights | No published theory on replicate weights + nuclear norm regularization |
+| BaconDecomposition | Replicate weights | Diagnostic tool with no inference — replicate weights don't apply |
+| EfficientDiD | Covariates + survey, cluster + survey, bootstrap + survey | Lower demand, newer estimator; 3 `NotImplementedError` paths |
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "maturin"
 
 [project]
 name = "diff-diff"
-version = "2.8.2"
+version = "2.8.3"
 description = "Difference-in-Differences causal inference with sklearn-like API. Callaway-Sant'Anna, Synthetic DiD, Honest DiD, event studies, parallel trends."
 readme = "README.md"
 license = "MIT"
diff --git a/rust/Cargo.toml b/rust/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "diff_diff_rust"
-version = "2.8.2"
+version = "2.8.3"
 edition = "2021"
 description = "Rust backend for diff-diff DiD library"
 license = "MIT"