Skip to content

Commit ed6a0b4

Browse files
baogorekclaudeMaxGhenis
authored
Fix GitHub Pages deployment and add H6 Social Security reform calibration (#448)
* Fix documentation build timeout in CI Remove 10-second timeout and || true from make documentation target. The timeout was causing builds to fail silently, resulting in missing docs/_build/site directory and GitHub Pages deployment failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add Node.js setup to workflow for MyST documentation builds MyST requires Node.js to build documentation. Added setup-node@v4 step to install Node.js 20 before package installation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add index.html bootstrap for MyST GitHub Pages deployment * Fix GitHub Pages deployment to use _build/html instead of _build/site MyST generates static HTML files in _build/html for static hosting, while _build/site contains dynamic content for MyST server. GitHub Pages requires the static HTML files. Changes: - Deploy docs/_build/html instead of docs/_build/site - Update Makefile to touch .nojekyll in correct directory - Remove manual index.html (MyST generates this automatically) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add docs/README.md documenting MyST build outputs pitfall Documents critical distinction between _build/html/ (for static hosting) and _build/site/ (for development server) to prevent future deployment mistakes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add changelog entry for documentation deployment fixes * Add H6 Social Security reform calibration to long-term projections - Add load_h6_income_rate_change() to ssa_data.py to load reform ratio targets - Extend GREG calibration to support H6 revenue impact constraints - Add --use-h6-reform flag to run_household_projection.py - Implement H6 reform that phases out SS benefit taxation (2045-2054) - Use absolute revenue targets (ratio × payroll) for linear GREG constraints - Skip H6 computation for years with zero reform effect (2025-2044) The H6 reform calibration ensures microsimulation results match SSA Trustee Report projections for the revenue impact of phasing out Social Security benefit taxation. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Add configurable start year and improve H6 reform implementation - Add START_YEAR parameter to run_household_projection.py CLI - Enhance H6 reform with threshold crossover handling for OASDI/HI tiers - Implement min/max swapping logic to prevent engine errors when OASDI thresholds exceed HI thresholds - Update usage documentation with clearer examples - Add additional SSA data source URL to storage README 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * linting * Improve PR: Node.js 22 LTS, H6 tests, better changelog - Update Node.js from 20 to 22 (current Active LTS) - Add 18 unit tests for H6 reform threshold crossover logic - Improve changelog to document H6 Social Security reform additions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Format test_h6_reform.py with black * Use Node.js 24 LTS (Active LTS) instead of 22 --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Max Ghenis <mghenis@gmail.com>
1 parent f99beff commit ed6a0b4

File tree

18 files changed

+806
-133
lines changed

18 files changed

+806
-133
lines changed

.github/workflows/reusable_test.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,11 @@ jobs:
4545
with:
4646
python-version: '3.13'
4747

48+
- name: Set up Node.js
49+
uses: actions/setup-node@v4
50+
with:
51+
node-version: '24'
52+
4853
- uses: "google-github-actions/auth@v2"
4954
if: inputs.upload_data
5055
with:
@@ -94,5 +99,5 @@ jobs:
9499
uses: JamesIves/github-pages-deploy-action@v4
95100
with:
96101
branch: gh-pages
97-
folder: docs/_build/site
102+
folder: docs/_build/html
98103
clean: true

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ documentation:
3333
rm -rf _build .jupyter_cache && \
3434
rm -f _toc.yml && \
3535
myst clean && \
36-
timeout 10 myst build --html || true
37-
cd docs && test -d _build/site && touch _build/site/.nojekyll || true
36+
myst build --html
37+
cd docs && test -d _build/html && touch _build/html/.nojekyll || true
3838

3939
documentation-build:
4040
cd docs && \

changelog_entry.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
- bump: patch
2+
changes:
3+
fixed:
4+
- GitHub Pages documentation deployment (was deploying wrong directory causing blank pages)
5+
- Removed timeout and error suppression from documentation build
6+
added:
7+
- Node.js 24 LTS setup to CI workflow for MyST builds
8+
- H6 Social Security reform calibration for long-term projections (phases out OASDI taxation 2045-2054)
9+
- H6 threshold crossover handling when OASDI thresholds exceed HI thresholds
10+
- start_year parameter to run_household_projection.py CLI
11+
- docs/README.md documenting MyST build output pitfall

docs/README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Documentation
2+
3+
This project uses [MyST Markdown](https://mystmd.org/) for documentation.
4+
5+
## Building Locally
6+
7+
### Requirements
8+
- Python 3.13+ with dev dependencies: `uv pip install -e .[dev] --system`
9+
- Node.js 20+ (required by MyST)
10+
11+
### Commands
12+
```bash
13+
make documentation # Build static HTML files
14+
make documentation-serve # Serve locally on http://localhost:8080
15+
```
16+
17+
## Important: MyST Build Outputs
18+
19+
**MyST creates two different outputs - DO NOT confuse them:**
20+
21+
- `_build/html/` - **Static HTML files (use for GitHub Pages deployment)**
22+
- `_build/site/` - Dynamic content for `myst start` development server only
23+
24+
**GitHub Pages must deploy `_build/html/`**, not `_build/site/`. The `_build/site/` directory contains JSON files for MyST's development server and will result in a blank page on GitHub Pages.
25+
26+
## GitHub Pages Deployment
27+
28+
- Site URL: https://policyengine.github.io/policyengine-us-data/
29+
- Deployed from: `docs/_build/html/` directory
30+
- Propagation time: 5-10 minutes after push to gh-pages branch
31+
- Workflow: `.github/workflows/code_changes.yaml` (on main branch only)
32+
33+
## Troubleshooting
34+
35+
**Blank page after deployment:**
36+
- Check that workflow deploys `folder: docs/_build/html` (not `_build/site`)
37+
- Wait 5-10 minutes for GitHub Pages propagation
38+
- Hard refresh browser (Ctrl+Shift+R / Cmd+Shift+R)
39+
40+
**Build fails in CI:**
41+
- Ensure Node.js setup step exists in workflow (MyST requires Node.js)
42+
- Never add timeouts or `|| true` to build commands - they mask failures
43+
44+
**Missing index.html:**
45+
- MyST auto-generates index.html in `_build/html/`
46+
- Do not create manual index.html in docs/

docs/abstract.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,6 @@ quantile regression forests to impute 67 tax variables from the PUF onto CPS rec
66
preserving distributional characteristics while maintaining household composition and member
77
relationships. The imputation process alone does not guarantee consistency with official
88
statistics, necessitating a reweighting step to align the combined dataset with known
9-
population totals and administrative benchmarks. We apply a reweighting algorithm that
10-
calibrates the dataset to 2,813 targets from
11-
the IRS Statistics of Income, Census population projections, Congressional Budget
12-
Office benefit program estimates, Treasury
13-
expenditure data, Joint Committee on Taxation tax expenditure estimates, healthcare
14-
spending patterns, and other benefit program costs. The reweighting employs dropout-regularized
15-
gradient descent optimization
16-
to ensure consistency with administrative benchmarks. Validation shows the enhanced dataset
17-
reduces error in key tax components by [TO BE CALCULATED]% relative to the baseline CPS.
18-
The dataset maintains the CPS's demographic detail and geographic granularity while
9+
population totals and administrative benchmarks. We apply a reweighting algorithm that calibrates the dataset to 2,813 targets from the IRS Statistics of Income, Census population projections, Congressional Budget Office benefit program estimates, Treasury expenditure data, Joint Committee on Taxation tax expenditure estimates, healthcare spending patterns, and other benefit program costs. The reweighting employs dropout-regularized gradient descent optimization to ensure consistency with administrative benchmarks. The dataset maintains the CPS's demographic detail and geographic granularity while
1910
incorporating tax reporting data from administrative sources. We release the enhanced
2011
dataset, source code, and documentation to support policy analysis.

docs/appendix.md

Lines changed: 92 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,95 @@ for iteration in range(5000):
4646

4747
### Table A1: Complete List of Imputed Variables
4848

49-
[TO BE GENERATED - Complete list of 72 imputed variables from PUF organized by category]
49+
#### Variables Imputed from IRS Public Use File (67 variables)
50+
51+
**Income Variables:**
52+
- employment_income
53+
- partnership_s_corp_income
54+
- social_security
55+
- taxable_pension_income
56+
- tax_exempt_pension_income
57+
- long_term_capital_gains
58+
- short_term_capital_gains
59+
- taxable_ira_distributions
60+
- self_employment_income
61+
- qualified_dividend_income
62+
- non_qualified_dividend_income
63+
- rental_income
64+
- taxable_unemployment_compensation
65+
- taxable_interest_income
66+
- tax_exempt_interest_income
67+
- estate_income
68+
- miscellaneous_income
69+
- farm_income
70+
- alimony_income
71+
- farm_rent_income
72+
- non_sch_d_capital_gains
73+
- long_term_capital_gains_on_collectibles
74+
- unrecaptured_section_1250_gain
75+
- salt_refund_income
76+
77+
**Deductions and Adjustments:**
78+
- interest_deduction
79+
- unreimbursed_business_employee_expenses
80+
- pre_tax_contributions
81+
- charitable_cash_donations
82+
- self_employed_pension_contribution_ald
83+
- domestic_production_ald
84+
- self_employed_health_insurance_ald
85+
- charitable_non_cash_donations
86+
- alimony_expense
87+
- health_savings_account_ald
88+
- student_loan_interest
89+
- investment_income_elected_form_4952
90+
- early_withdrawal_penalty
91+
- educator_expense
92+
- deductible_mortgage_interest
93+
94+
**Tax Credits:**
95+
- cdcc_relevant_expenses
96+
- foreign_tax_credit
97+
- american_opportunity_credit
98+
- general_business_credit
99+
- energy_efficient_home_improvement_credit
100+
- amt_foreign_tax_credit
101+
- excess_withheld_payroll_tax
102+
- savers_credit
103+
- prior_year_minimum_tax_credit
104+
- other_credits
105+
106+
**Qualified Business Income Variables:**
107+
- w2_wages_from_qualified_business
108+
- unadjusted_basis_qualified_property
109+
- business_is_sstb
110+
- qualified_reit_and_ptp_income
111+
- qualified_bdc_income
112+
- farm_operations_income
113+
- estate_income_would_be_qualified
114+
- farm_operations_income_would_be_qualified
115+
- farm_rent_income_would_be_qualified
116+
- partnership_s_corp_income_would_be_qualified
117+
- rental_income_would_be_qualified
118+
- self_employment_income_would_be_qualified
119+
120+
**Other Tax Variables:**
121+
- traditional_ira_contributions
122+
- qualified_tuition_expenses
123+
- casualty_loss
124+
- unreported_payroll_tax
125+
- recapture_of_investment_credit
126+
127+
#### Variables Imputed from Survey of Income and Program Participation (1 variable)
128+
129+
- tip_income
130+
131+
#### Variables Imputed from Survey of Consumer Finances (3 variables)
132+
133+
- networth
134+
- auto_loan_balance
135+
- auto_loan_interest
136+
137+
#### Variables Imputed from American Community Survey (2 variables)
138+
139+
- rent
140+
- real_estate_taxes

docs/conclusion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Our work makes several key contributions:
1818

1919
The validation results demonstrate that combining survey and administrative data through principled statistical methods can achieve:
2020
- Improved income distribution representation
21-
- Better alignment with program participation totals
21+
- Better alignment with program participation totals
2222
- Maintained demographic and geographic detail
2323
- Suitable accuracy for policy simulation
2424

docs/discussion.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ We examine the strengths, limitations, and potential applications of the Enhance
88

99
The Enhanced CPS uniquely combines:
1010
- Demographic detail from the CPS including state identifiers
11-
- Tax precision from IRS administrative data
11+
- Tax precision from IRS administrative data
1212
- Calibration to contemporary official statistics
1313
- Open-source availability for research use
1414

@@ -26,7 +26,7 @@ The large-scale calibration to 2,813 targets ensures consistency with administra
2626

2727
### Practical Advantages
2828

29-
For policy analysis, the dataset offers state-level geographic detail enabling subnational analysis, household structure for distributional studies, tax detail for revenue estimation, program participation for benefit analysis, and recent data calibrated to current totals.
29+
For policy analysis, the dataset offers several key features: state-level geographic detail for subnational analysis, household structure for distributional studies, tax detail for revenue estimation, program participation for benefit analysis, and calibration to current administrative totals.
3030

3131
## Limitations
3232

docs/introduction.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Introduction
22

3-
Microsimulation models require high-quality microdata that accurately represents both demographic characteristics and economic outcomes. The ideal dataset would combine the demographic richness and household structure of surveys with the income precision of administrative tax records. However, publicly available datasets typically excel in one dimension while lacking in the other.
3+
Microsimulation models require high-quality microdata that accurately represent demographic characteristics and economic outcomes. The ideal dataset would combine the demographic richness and household structure of surveys with the income precision of administrative tax records. However, publicly available datasets typically excel in one dimension while lacking in the other.
44

55
The Current Population Survey (CPS) Annual Social and Economic Supplement provides detailed household demographics, family relationships, and program participation data for a representative sample of US households. However, it suffers from well-documented income underreporting, particularly at the top of the distribution. The IRS Public Use File (PUF) contains accurate tax return information but lacks household structure, demographic detail, and state identifiers needed for comprehensive policy analysis.
66

7-
This paper presents a methodology for creating an Enhanced CPS dataset that combines the strengths of both sources. Through an enhancement processimputation followed by reweightingwe create a dataset suitable for analyzing both tax and transfer policies at federal and state levels.
7+
This paper presents a methodology for creating an Enhanced CPS dataset that combines the strengths of both sources. Through an enhancement process: imputation followed by reweighting, we create a dataset suitable for analyzing both tax and transfer policies at federal and state levels.
88

99
## Related Work
1010

@@ -24,4 +24,4 @@ Our empirical contribution involves creating and validating a publicly available
2424

2525
From a practical perspective, we provide open-source tools and comprehensive documentation that enable researchers to apply these methods, modify the approach, or build upon our work. This transparency contrasts with existing proprietary models and supports reproducible research. Government agencies could use our framework to enhance their own microsimulation capabilities, while academic researchers gain access to data suitable for analyzing distributional impacts of tax and transfer policies. The modular design allows incremental improvements as new data sources become available.
2626

27-
We organize the remainder of this paper as follows. Section 2 describes our data sources including the primary datasets and calibration targets. Section 3 details the enhancement methodology including both the imputation and reweighting stages. Section 4 presents validation results comparing performance across datasets. Section 5 discusses limitations, applications, and future directions. Section 6 concludes with implications for policy analysis.
27+
We organize the remainder of this paper as follows. Section 2 describes our data sources including the primary datasets and calibration targets. Section 3 details the enhancement methodology including both the imputation and reweighting stages. Section 4 presents validation results comparing performance across datasets. Section 5 discusses limitations, applications, and future directions. Section 6 concludes with implications for policy analysis.
Lines changed: 12 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,17 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {},
6-
"source": [
7-
"# Comparison to Penn Wharton Budget Model: Eliminating Tax on Social Security 2025-2100\n",
8-
"## Integrating Economic Uprating with Demographic Reweighting"
9-
]
6+
"source": "# Long Term Projections\n## Integrating Economic Uprating with Demographic Reweighting"
7+
},
8+
{
9+
"cell_type": "markdown",
10+
"source": "## Executive Summary\n\nThis document outlines an innovative approach for projecting federal income tax revenue through 2100 that uniquely combines sophisticated economic microsimulation with demographic reweighting. By harmonizing PolicyEngine's state-of-the-art tax modeling with Social Security Administration demographic projections, we can isolate and quantify the fiscal impact of population aging while preserving the full complexity of the tax code.",
11+
"metadata": {}
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"source": "## The Challenge\n\nProjecting tax revenue over a 75-year horizon requires simultaneously modeling two distinct but interrelated dynamics:\n\n**Economic Evolution**: How incomes, prices, and tax parameters change over time\n- Wage growth and income distribution shifts\n- Inflation affecting brackets and deductions\n- Legislative changes and indexing rules\n- Behavioral responses to tax policy\n\n**Demographic Transformation**: How the population structure evolves\n- Baby boom generation aging through retirement\n- Declining birth rates reducing working-age population\n- Increasing longevity extending retirement duration\n- Shifting household composition patterns\n\nTraditional approaches typically sacrifice either economic sophistication (using simplified tax calculations) or demographic realism (holding age distributions constant). Our methodology preserves both.",
16+
"metadata": {}
1017
},
1118
{
1219
"cell_type": "markdown",
@@ -176,17 +183,6 @@
176183
"- `--save-h5`: Save year-specific .h5 files to `./projected_datasets/` directory"
177184
]
178185
},
179-
{
180-
"cell_type": "markdown",
181-
"metadata": {},
182-
"source": [
183-
"---\n",
184-
"\n",
185-
"## Executive Summary\n",
186-
"\n",
187-
"This document outlines an innovative approach for projecting federal income tax revenue through 2100 that uniquely combines sophisticated economic microsimulation with demographic reweighting. By harmonizing PolicyEngine's state-of-the-art tax modeling with Social Security Administration demographic projections, we can isolate and quantify the fiscal impact of population aging while preserving the full complexity of the tax code."
188-
]
189-
},
190186
{
191187
"cell_type": "markdown",
192188
"metadata": {},
@@ -210,13 +206,6 @@
210206
"Traditional approaches typically sacrifice either economic sophistication (using simplified tax calculations) or demographic realism (holding age distributions constant). Our methodology preserves both."
211207
]
212208
},
213-
{
214-
"cell_type": "markdown",
215-
"metadata": {},
216-
"source": [
217-
"## Loading and Exploring the Data"
218-
]
219-
},
220209
{
221210
"cell_type": "markdown",
222211
"metadata": {},
@@ -1023,4 +1012,4 @@
10231012
},
10241013
"nbformat": 4,
10251014
"nbformat_minor": 4
1026-
}
1015+
}

0 commit comments

Comments
 (0)