You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`TripleDifference` - Ortiz-Villavicencio & Sant'Anna (2025) estimator for DDD designs
149
161
-`TripleDifferenceResults` - Results with ATT, SEs, cell means, diagnostics
@@ -314,6 +326,7 @@ pure Rust by default.
314
326
├── TwoStageDiD
315
327
├── TripleDifference
316
328
├── TROP
329
+
├── StackedDiD
317
330
├── SyntheticDiD
318
331
└── BaconDecomposition
319
332
```
@@ -429,6 +442,7 @@ Tests mirror the source modules:
429
442
-`tests/test_sun_abraham.py` - Tests for SunAbraham interaction-weighted estimator
430
443
-`tests/test_imputation.py` - Tests for ImputationDiD (Borusyak et al. 2024) estimator
431
444
-`tests/test_two_stage.py` - Tests for TwoStageDiD (Gardner 2022) estimator, including equivalence tests with ImputationDiD
445
+
-`tests/test_stacked_did.py` - Tests for Stacked DiD (Wing et al. 2024) estimator
432
446
-`tests/test_triple_diff.py` - Tests for Triple Difference (DDD) estimator
433
447
-`tests/test_trop.py` - Tests for Triply Robust Panel (TROP) estimator
434
448
-`tests/test_bacon.py` - Tests for Goodman-Bacon decomposition
@@ -445,6 +459,8 @@ Tests mirror the source modules:
445
459
446
460
Session-scoped `ci_params` fixture in `conftest.py` scales bootstrap iterations and TROP grid sizes in pure Python mode — use `ci_params.bootstrap(n)` and `ci_params.grid(values)` in new tests with `n_bootstrap >= 20`. For SE convergence tests (analytical vs bootstrap comparison), use `ci_params.bootstrap(n, min_n=199)` with a conditional tolerance: `threshold = 0.40 if n_boot < 100 else 0.15`. The `min_n` parameter is capped at 49 in pure Python mode to keep CI fast, so convergence tests use wider tolerances when running with fewer bootstrap iterations.
447
461
462
+
**Slow test suites:**`tests/test_trop.py` is very time-consuming. Only run TROP tests when changes could affect the TROP estimator (e.g., `diff_diff/trop.py`, `diff_diff/trop_results.py`, `diff_diff/linalg.py`, `diff_diff/_backend.py`, or `rust/src/trop.rs`). For unrelated changes, exclude with `pytest --ignore=tests/test_trop.py`.
-**Synthetic DiD**: Combined DiD with synthetic control for improved robustness
76
76
-**Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
@@ -974,6 +974,78 @@ TwoStageDiD(
974
974
975
975
Both estimators are the efficient estimator under homogeneous treatment effects, producing shorter confidence intervals than Callaway-Sant'Anna or Sun-Abraham.
976
976
977
+
### Stacked DiD (Wing, Freedman & Hollingsworth 2024)
978
+
979
+
Stacked DiD addresses TWFE bias in staggered adoption settings by constructing a "clean" comparison dataset for each treatment cohort and stacking them together. Each cohort's sub-experiment compares units treated at that cohort's timing against units that are not yet treated (or never treated) within a symmetric event-study window. This avoids the "bad comparisons" problem in TWFE while retaining a regression-based framework that practitioners familiar with event studies will find intuitive.
980
+
981
+
```python
982
+
from diff_diff import StackedDiD, generate_staggered_data
983
+
984
+
# Generate sample data
985
+
data = generate_staggered_data(n_units=200, n_periods=12,
Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
@@ -2203,6 +2275,60 @@ TwoStageDiD(
2203
2275
|`print_summary(alpha)`| Print summary to stdout |
2204
2276
|`to_dataframe(level)`| Convert to DataFrame ('observation', 'event_study', 'group') |
2205
2277
2278
+
### StackedDiD
2279
+
2280
+
```python
2281
+
StackedDiD(
2282
+
kappa_pre=1, # Pre-treatment event-study periods
2283
+
kappa_post=1, # Post-treatment event-study periods
2284
+
weighting='aggregate', # 'aggregate', 'population', or 'sample_share'
2285
+
clean_control='not_yet_treated', # 'not_yet_treated', 'strict', or 'never_treated'
2286
+
cluster='unit', # 'unit' or 'unit_subexp'
2287
+
alpha=0.05, # Significance level
2288
+
anticipation=0, # Anticipation periods
2289
+
rank_deficient_action='warn', # 'warn', 'error', or 'silent'
2290
+
)
2291
+
```
2292
+
2293
+
**fit() Parameters:**
2294
+
2295
+
| Parameter | Type | Description |
2296
+
|-----------|------|-------------|
2297
+
|`data`| DataFrame | Panel data |
2298
+
|`outcome`| str | Outcome variable column name |
2299
+
|`unit`| str | Unit identifier column |
2300
+
|`time`| str | Time period column |
2301
+
|`first_treat`| str | First treatment period column (0 for never-treated) |
2302
+
|`population`| str, optional | Population column (required if weighting='population') |
2303
+
|`aggregate`| str | Aggregation: None, `"simple"`, or `"event_study"`|
2304
+
2305
+
### StackedDiDResults
2306
+
2307
+
**Attributes:**
2308
+
2309
+
| Attribute | Description |
2310
+
|-----------|-------------|
2311
+
|`overall_att`| Overall average treatment effect on the treated |
2312
+
|`overall_se`| Standard error |
2313
+
|`overall_t_stat`| T-statistic |
2314
+
|`overall_p_value`| P-value for H0: ATT = 0 |
2315
+
|`overall_conf_int`| Confidence interval |
2316
+
|`event_study_effects`| Dict of relative time -> effect dict (if `aggregate='event_study'`) |
2317
+
|`stacked_data`| The stacked dataset used for estimation |
2318
+
|`n_treated_obs`| Number of treated observations |
2319
+
|`n_untreated_obs`| Number of untreated (clean control) observations |
2320
+
|`n_cohorts`| Number of treatment cohorts |
2321
+
|`kappa_pre`| Pre-treatment window used |
2322
+
|`kappa_post`| Post-treatment window used |
2323
+
2324
+
**Methods:**
2325
+
2326
+
| Method | Description |
2327
+
|--------|-------------|
2328
+
|`summary(alpha)`| Get formatted summary string |
2329
+
|`print_summary(alpha)`| Print summary to stdout |
2330
+
|`to_dataframe(level)`| Convert to DataFrame ('event_study') |
2331
+
2206
2332
### TripleDifference
2207
2333
2208
2334
```python
@@ -2689,6 +2815,8 @@ The `HonestDiD` module implements sensitivity analysis methods for relaxing the
2689
2815
2690
2816
-**Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. [https://doi.org/10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014)
2691
2817
2818
+
-**Wing, C., Freedman, S. M., & Hollingsworth, A. (2024).** "Stacked Difference-in-Differences." *NBER Working Paper* 32054. [https://www.nber.org/papers/w32054](https://www.nber.org/papers/w32054)
2819
+
2692
2820
### Power Analysis
2693
2821
2694
2822
-**Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504)
0 commit comments