75-year Projections based on calibration to SSA Trustees data #443

baogorek · 2025-10-17T20:55:40Z

Summary

This PR implements a 75-year projection capability for federal income tax revenue (2025-2100) by integrating PolicyEngine's economic microsimulation with Social Security Administration demographic forecasts. This enables quantifying the fiscal impact of population aging while preserving the full complexity of the tax code.

Key Features

Two-Stage Projection Methodology

Stage 1: Economic Uprating

Sophisticated modeling of 17 distinct income categories uprated by economic fundamentals
Complete tax code implementation with all brackets, credits, deductions, and interactions
Calibrated to match IRS Statistics of Income aggregates

Stage 2: Demographic Reweighting

Two calibration methods available:
- IPF (Iterative Proportional Fitting): Traditional raking approach with KL-divergence
- GREG (Generalized Regression): Modern calibration enabling continuous variables
Age-specific targets from SSA Trustees Report (single-year groups 0-85+)
Optional calibration to Social Security benefit projections (GREG only)

What's Included

New Scripts

run_household_projection.py: Main projection engine with IPF/GREG support
create_reweighting_matrix.py: Builds demographic transition matrices
age_projection.py: Processes SSA age-specific population forecasts
extract_ssa_costs.py: Extracts Social Security benefit targets
transition_matrix_demo.py: Validation and methodology demonstration

Data Files

SSPopJul_TR2024.csv: SSA Trustees Report demographic projections (2025-2100)
social_security_aux.csv: OASDI cost projections from SSA Table VI.G9

Documentation

Comprehensive README explaining methodology and theoretical foundation
Technical notes on avoiding multicollinearity in GREG calibration
Validation metrics showing IPF/GREG agreement within 0.2%

Impact

This enables unprecedented analysis of:

Fiscal sustainability: Decompose revenue changes into economic vs demographic components
Policy design: Evaluate reforms in context of future demographics
Distributional analysis: Track tax burden shifts between age cohorts
Scenario planning: Model alternative demographic and economic scenarios

Usage

# Traditional IPF approach (default)
python run_household_projection.py 2050

# GREG calibration with demographics only
python run_household_projection.py 2050 --greg

# GREG with demographics + Social Security benefits
python run_household_projection.py 2050 --greg --use-ss

Validation

Population totals exactly match SSA projections
Age distributions preserved through reweighting
Social Security benefits match SSA Trustees Report (when using --use-ss)
IPF and GREG produce equivalent results (within 0.2%) for identical constraints
Validated against R's survey package

Future Extensions Needed

Update population calibration from CBO (ending 2055) to SSA (through 2100)
Incorporate CBO Long-Term Budget Outlook inflation projections
Extend income category projections beyond current CBO horizon using consistent methodology

This provides a comprehensive foundation for evidence-based fiscal policy analysis in an era of demographic transformation.

The script depends on social_security_aux.csv for benefit projections and SSPopJul_TR2024.csv for population demographics. Without these files, the script cannot run from a fresh clone of the repository. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Move SSA data files to centralized storage directory - Add comprehensive Jupyter notebook with methodology and analysis - Update README to reference notebook for detailed documentation - Add notebook to MyST documentation structure - Ignore Jupyter checkpoint files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…nto long-term

The myst build command creates output in docs/_build/site, but the deployment was looking for docs/_build/html. This fixes the path in both the workflow and Makefile targets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This allows manually deploying documentation and running the full test suite without waiting for a version update push to main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Corrected imputed variable count from 72 to 67 - Corrected calibration target count from 7,000+ to 2,813 - Removed inaccurate "two-stage" terminology - Added SSA data source documentation in storage README - Renamed notebook to clarify PWBM comparison scope (2025-2100) - Added taxable payroll calibration target 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

policyengine_us_data/datasets/cps/long_term/ssa_data.py

…projections - Add start_year parameter with default 2025 for flexible projection windows - Replace hardcoded 85 with MAX_SINGLE_AGE constant for clarity - Remove csv_path parameter to match codebase conventions (hardcoded instead) Addresses PR #443 review comments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

baogorek and others added 6 commits October 17, 2025 16:55

Scaffolding for 75-year forecasting

95c76ee

working projection with SSA demographics

9549da2

long run first file set creation

1b6dbfc

friday

f11606f

formating

78e0a24

baogorek changed the title ~~Scaffolding for 75-year forecasting~~ 75-year Projections based on calibration to SSA Trustees data Nov 10, 2025

cleanup

100b246

baogorek marked this pull request as ready for review November 10, 2025 21:55

baogorek requested a review from MaxGhenis November 10, 2025 21:55

baogorek and others added 8 commits November 11, 2025 09:35

bringing back a file I deleted

c0c00ff

Merge branch 'main' of github.com:PolicyEngine/policyengine-us-data i…

d407ae2

…nto long-term

Update changelog with documentation deployment fix

839e847

changes to abstract

700374c

MaxGhenis requested changes Nov 19, 2025

View reviewed changes

policyengine_us_data/datasets/cps/long_term/ssa_data.py Outdated Show resolved Hide resolved

policyengine_us_data/datasets/cps/long_term/ssa_data.py Outdated Show resolved Hide resolved

baogorek requested a review from MaxGhenis November 20, 2025 00:02

MaxGhenis approved these changes Nov 20, 2025

View reviewed changes

baogorek merged commit e213759 into main Nov 20, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

75-year Projections based on calibration to SSA Trustees data #443

75-year Projections based on calibration to SSA Trustees data #443

Uh oh!

baogorek commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

75-year Projections based on calibration to SSA Trustees data #443

75-year Projections based on calibration to SSA Trustees data #443

Uh oh!

Conversation

baogorek commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Two-Stage Projection Methodology

What's Included

Impact

Usage

Validation

Future Extensions Needed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

baogorek commented Oct 17, 2025 •

edited

Loading