Skip to content

Commit d82e0fb

Browse files
authored
Merge pull request #139 from igerber/twfe-method-review
Fix TWFE within-transformation bug and add methodology review
2 parents 8e6cf40 + 9884140 commit d82e0fb

8 files changed

Lines changed: 1642 additions & 23 deletions

File tree

METHODOLOGY_REVIEW.md

Lines changed: 59 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -153,14 +153,69 @@ Each estimator in diff-diff should be periodically reviewed to ensure:
153153
| Module | `twfe.py` |
154154
| Primary Reference | Wooldridge (2010), Ch. 10 |
155155
| R Reference | `fixest::feols()` |
156-
| Status | Not Started |
157-
| Last Review | - |
156+
| Status | **Complete** |
157+
| Last Review | 2026-02-08 |
158+
159+
**Verified Components:**
160+
- [x] Within-transformation algebra: `y_it - ȳ_i - ȳ_t + ȳ` matches hand calculation (rtol=1e-12)
161+
- [x] ATT matches manual demeaned OLS (rtol=1e-10)
162+
- [x] ATT matches `DifferenceInDifferences` on 2-period data (rtol=1e-10)
163+
- [x] Covariates are also within-transformed (sum to zero within unit/time groups)
164+
- [x] R comparison: ATT matches `fixest::feols(y ~ treated:post | unit + post, cluster=~unit)` (rtol<0.1%)
165+
- [x] R comparison: Cluster-robust SE match (rtol<1%)
166+
- [x] R comparison: P-value match (atol<0.01)
167+
- [x] R comparison: CI bounds match (rtol<1%)
168+
- [x] R comparison: ATT and SE match with covariate (same tolerances)
169+
- [x] Edge case: Staggered treatment triggers `UserWarning`
170+
- [x] Edge case: Auto-clusters at unit level (SE matches explicit `cluster="unit"`)
171+
- [x] Edge case: DF adjustment for absorbed FE matches manual `solve_ols()` with `df_adjustment`
172+
- [x] Edge case: Covariate collinear with interaction raises `ValueError` ("cannot be identified")
173+
- [x] Edge case: Covariate collinearity warns but ATT remains finite
174+
- [x] Edge case: `rank_deficient_action="error"` raises `ValueError`
175+
- [x] Edge case: `rank_deficient_action="silent"` emits no warnings
176+
- [x] Edge case: Unbalanced panel produces valid results (finite ATT, positive SE)
177+
- [x] Edge case: Missing unit column raises `ValueError`
178+
- [x] Integration: `decompose()` returns `BaconDecompositionResults`
179+
- [x] SE: Cluster-robust SE >= HC1 SE
180+
- [x] SE: VCoV positive semi-definite
181+
- [x] Wild bootstrap: Valid inference (finite SE, p-value in [0,1])
182+
- [x] Wild bootstrap: All weight types (rademacher, mammen, webb) produce valid inference
183+
- [x] Wild bootstrap: `inference="wild_bootstrap"` routes correctly
184+
- [x] Params: `get_params()` returns all inherited parameters
185+
- [x] Params: `set_params()` modifies attributes
186+
- [x] Results: `summary()` contains "ATT"
187+
- [x] Results: `to_dict()` contains att, se, t_stat, p_value, n_obs
188+
- [x] Results: residuals + fitted = demeaned outcome (not raw)
189+
- [x] Edge case: Multi-period time emits UserWarning advising binary post indicator
190+
- [x] Edge case: Non-{0,1} binary time emits UserWarning (ATT still correct)
191+
- [x] Edge case: ATT invariant to time encoding ({0,1} vs {2020,2021} produces identical results)
192+
193+
**Key Implementation Detail:**
194+
The interaction term `D_i × Post_t` must be within-transformed (demeaned) alongside the outcome,
195+
consistent with the Frisch-Waugh-Lovell (FWL) theorem: all regressors and the outcome must be
196+
projected out of the fixed effects space. R's `fixest::feols()` does this automatically when
197+
variables appear to the left of the `|` separator.
158198

159199
**Corrections Made:**
160-
- (None yet)
200+
- **Bug fix: interaction term must be within-transformed** (found during review). The previous
201+
implementation used raw (un-demeaned) `D_i × Post_t` in the demeaned regression. This gave
202+
correct results only for 2-period panels where `post == period`. For multi-period panels
203+
(e.g., 4 periods with binary `post`), the raw interaction had incorrect correlation with
204+
demeaned Y, producing ATT approximately 1/3 of the true value. Fixed by applying the same
205+
within-transformation to the interaction term before regression. This matches R's
206+
`fixest::feols()` behavior. (`twfe.py` lines 99-113)
161207

162208
**Outstanding Concerns:**
163-
- (None yet)
209+
- **Multi-period `time` parameter**: Multi-period time values (e.g., 1,2,3,4) produce
210+
`treated × period_number` instead of `treated × post_indicator`, which is not the standard
211+
D_it treatment indicator. A `UserWarning` is emitted when `time` has >2 unique values.
212+
For binary time with non-{0,1} values (e.g., {2020, 2021}), the ATT is mathematically
213+
correct (the within-transformation absorbs the scaling), but a warning recommends 0/1
214+
encoding for clarity. Users with multi-period data should create a binary `post` column.
215+
- **Staggered treatment warning**: The warning only fires when `time` has >2 unique values
216+
(i.e., actual period numbers). With binary `time="post"`, all treated units appear to start
217+
treatment at `time=1`, making staggering undetectable. Users with staggered designs should
218+
use `decompose()` or `CallawaySantAnna` directly for proper diagnostics.
164219

165220
---
166221

benchmarks/R/benchmark_twfe.R

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
#!/usr/bin/env Rscript
2+
# Benchmark: Two-Way Fixed Effects (R `fixest` package with absorbed FE)
3+
#
4+
# This uses fixest::feols() with absorbed unit + post FE and unit-level clustering,
5+
# matching the Python TwoWayFixedEffects estimator's approach.
6+
#
7+
# Usage:
8+
# Rscript benchmark_twfe.R --data path/to/data.csv --output path/to/results.json
9+
10+
library(fixest)
11+
library(jsonlite)
12+
library(data.table)
13+
14+
# Parse command line arguments
15+
args <- commandArgs(trailingOnly = TRUE)
16+
17+
parse_args <- function(args) {
18+
result <- list(
19+
data = NULL,
20+
output = NULL
21+
)
22+
23+
i <- 1
24+
while (i <= length(args)) {
25+
if (args[i] == "--data") {
26+
result$data <- args[i + 1]
27+
i <- i + 2
28+
} else if (args[i] == "--output") {
29+
result$output <- args[i + 1]
30+
i <- i + 2
31+
} else {
32+
i <- i + 1
33+
}
34+
}
35+
36+
if (is.null(result$data) || is.null(result$output)) {
37+
stop("Usage: Rscript benchmark_twfe.R --data <path> --output <path>")
38+
}
39+
40+
return(result)
41+
}
42+
43+
config <- parse_args(args)
44+
45+
# Load data
46+
message(sprintf("Loading data from: %s", config$data))
47+
data <- fread(config$data)
48+
49+
# Run benchmark
50+
message("Running TWFE estimation with absorbed FE...")
51+
start_time <- Sys.time()
52+
53+
# TWFE with absorbed unit + post fixed effects, clustered at unit level
54+
# This matches Python's TwoWayFixedEffects:
55+
# - Within-transformation removes unit and time (post) FE
56+
# - Cluster-robust SE at unit level (automatic)
57+
model <- feols(
58+
outcome ~ treated:post | unit + post,
59+
data = data,
60+
cluster = ~unit
61+
)
62+
63+
estimation_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))
64+
65+
# Extract results
66+
coef_name <- "treated:post"
67+
coefs <- coef(model)
68+
ses <- se(model)
69+
pvals <- pvalue(model)
70+
ci <- confint(model)
71+
72+
# Find the treatment effect coefficient
73+
if (coef_name %in% names(coefs)) {
74+
att <- coefs[coef_name]
75+
att_se <- ses[coef_name]
76+
att_pval <- pvals[coef_name]
77+
att_ci <- ci[coef_name, ]
78+
} else {
79+
# Try alternative name formats
80+
idx <- grep("treated.*post|post.*treated", names(coefs))
81+
if (length(idx) > 0) {
82+
att <- coefs[idx[1]]
83+
att_se <- ses[idx[1]]
84+
att_pval <- pvals[idx[1]]
85+
att_ci <- ci[idx[1], ]
86+
coef_name <- names(coefs)[idx[1]]
87+
} else {
88+
stop("Could not find treatment effect coefficient")
89+
}
90+
}
91+
92+
# Format output
93+
results <- list(
94+
estimator = "fixest::feols (absorbed FE)",
95+
cluster = "unit",
96+
97+
# Treatment effect
98+
att = unname(att),
99+
se = unname(att_se),
100+
pvalue = unname(att_pval),
101+
ci_lower = unname(att_ci[1]),
102+
ci_upper = unname(att_ci[2]),
103+
coef_name = coef_name,
104+
105+
# Model statistics
106+
model_stats = list(
107+
r_squared = summary(model)$r2,
108+
adj_r_squared = summary(model)$adj.r2,
109+
n_obs = model$nobs
110+
),
111+
112+
# Timing
113+
timing = list(
114+
estimation_seconds = estimation_time,
115+
total_seconds = estimation_time
116+
),
117+
118+
# Metadata
119+
metadata = list(
120+
r_version = R.version.string,
121+
fixest_version = as.character(packageVersion("fixest")),
122+
n_units = length(unique(data$unit)),
123+
n_periods = length(unique(data$post)),
124+
n_obs = nrow(data)
125+
)
126+
)
127+
128+
# Write output
129+
message(sprintf("Writing results to: %s", config$output))
130+
write_json(results, config$output, auto_unbox = TRUE, pretty = TRUE, digits = 15)
131+
132+
message(sprintf("Completed in %.3f seconds", estimation_time))
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Benchmark: TwoWayFixedEffects (diff-diff TwoWayFixedEffects class).
4+
5+
This benchmarks the actual TwoWayFixedEffects class with within-transformation,
6+
as opposed to benchmark_basic.py which uses DifferenceInDifferences with formula.
7+
8+
Usage:
9+
python benchmark_twfe.py --data path/to/data.csv --output path/to/results.json
10+
"""
11+
12+
import argparse
13+
import json
14+
import os
15+
import sys
16+
from pathlib import Path
17+
18+
# IMPORTANT: Parse --backend and set environment variable BEFORE importing diff_diff
19+
def _get_backend_from_args():
20+
"""Parse --backend argument without importing diff_diff."""
21+
parser = argparse.ArgumentParser(add_help=False)
22+
parser.add_argument("--backend", default="auto", choices=["auto", "python", "rust"])
23+
args, _ = parser.parse_known_args()
24+
return args.backend
25+
26+
_requested_backend = _get_backend_from_args()
27+
if _requested_backend in ("python", "rust"):
28+
os.environ["DIFF_DIFF_BACKEND"] = _requested_backend
29+
30+
# NOW import diff_diff and other dependencies (will see the env var)
31+
import pandas as pd
32+
33+
# Add parent to path for imports
34+
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
35+
36+
from diff_diff import TwoWayFixedEffects, HAS_RUST_BACKEND
37+
from benchmarks.python.utils import Timer
38+
39+
40+
def parse_args():
41+
parser = argparse.ArgumentParser(description="Benchmark TwoWayFixedEffects estimator")
42+
parser.add_argument("--data", required=True, help="Path to input CSV data")
43+
parser.add_argument("--output", required=True, help="Path to output JSON results")
44+
parser.add_argument(
45+
"--backend", default="auto", choices=["auto", "python", "rust"],
46+
help="Backend to use: auto (default), python (pure Python), rust (Rust backend)"
47+
)
48+
return parser.parse_args()
49+
50+
51+
def get_actual_backend() -> str:
52+
"""Return the actual backend being used based on HAS_RUST_BACKEND."""
53+
return "rust" if HAS_RUST_BACKEND else "python"
54+
55+
56+
def main():
57+
args = parse_args()
58+
59+
actual_backend = get_actual_backend()
60+
print(f"Using backend: {actual_backend}")
61+
62+
# Load data
63+
print(f"Loading data from: {args.data}")
64+
data = pd.read_csv(args.data)
65+
66+
# Run benchmark using TwoWayFixedEffects (within-transformation approach)
67+
print("Running TWFE estimation...")
68+
69+
twfe = TwoWayFixedEffects(robust=True) # auto-clusters at unit level
70+
71+
with Timer() as timer:
72+
results = twfe.fit(
73+
data,
74+
outcome="outcome",
75+
treatment="treated",
76+
time="post",
77+
unit="unit",
78+
)
79+
80+
att = results.att
81+
se = results.se
82+
pvalue = results.p_value
83+
ci = results.conf_int
84+
85+
total_time = timer.elapsed
86+
87+
# Build output
88+
output = {
89+
"estimator": "diff_diff.TwoWayFixedEffects",
90+
"backend": actual_backend,
91+
"cluster": "unit",
92+
# Treatment effect
93+
"att": float(att),
94+
"se": float(se),
95+
"pvalue": float(pvalue),
96+
"ci_lower": float(ci[0]),
97+
"ci_upper": float(ci[1]),
98+
# Model statistics
99+
"model_stats": {
100+
"n_obs": len(data),
101+
"n_units": len(data["unit"].unique()),
102+
"n_periods": len(data["post"].unique()),
103+
},
104+
# Timing
105+
"timing": {
106+
"estimation_seconds": total_time,
107+
"total_seconds": total_time,
108+
},
109+
# Metadata
110+
"metadata": {
111+
"n_units": len(data["unit"].unique()),
112+
"n_periods": len(data["post"].unique()),
113+
"n_obs": len(data),
114+
},
115+
}
116+
117+
# Write output
118+
print(f"Writing results to: {args.output}")
119+
output_path = Path(args.output)
120+
output_path.parent.mkdir(parents=True, exist_ok=True)
121+
with open(output_path, "w") as f:
122+
json.dump(output, f, indent=2)
123+
124+
print(f"Completed in {total_time:.3f} seconds")
125+
return output
126+
127+
128+
if __name__ == "__main__":
129+
main()

0 commit comments

Comments
 (0)