Skip to content

Add ACA Premium Tax Credit targets from IRS SOI data#508

Open
daphnehanse11 wants to merge 4 commits intomainfrom
add-ptc-soi-targets
Open

Add ACA Premium Tax Credit targets from IRS SOI data#508
daphnehanse11 wants to merge 4 commits intomainfrom
add-ptc-soi-targets

Conversation

@daphnehanse11
Copy link
Collaborator

@daphnehanse11 daphnehanse11 commented Feb 6, 2026

Fixes #509

Summary

  • Ingests PTC count (N85530) and amount (A85530) from the IRS SOI CD-level file (22incd.csv) which were already present in the raw data but not being loaded
  • Adds aca_ptc to the TARGETS list in etl_irs_soi.py, creating conditional strata (aca_ptc > 0) with both tax_unit_count and aca_ptc dollar targets at national, state, and CD levels (stratum_group_id 119)
  • Adds aca_ptc variable metadata and aca_ptc_recipients variable group
  • Adds aca_ptc to the calibration weight fitting target filter

Test plan

  • Rebuild database with make database and verify PTC strata are created at all three geographic levels
  • Verify aca_ptc targets appear in the targets table with reasonable values (national PTC ~$50-60B range)
  • Run fit_calibration_weights.py and confirm PTC targets appear in the matrix

🤖 Generated with Claude Code

daphnehanse11 and others added 3 commits February 6, 2026 09:35
The SOI 22incd.csv file contains PTC count (N85530) and amount (A85530)
columns at national, state, and congressional district levels, but they
were not being ingested. This adds PTC to the existing IRS conditional
strata pipeline so calibration can target PTC dollar amounts by geography.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@daphnehanse11 daphnehanse11 requested review from baogorek and juaristi22 and removed request for juaristi22 February 6, 2026 17:21
…yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@baogorek
Copy link
Collaborator

baogorek commented Feb 6, 2026

We need to think about a custom uprating here to get to 2024.

baogorek added a commit that referenced this pull request Feb 7, 2026
Adds aca_ptc ingestion from IRS SOI data (code 85530) to etl_irs_soi.py
and updates DATABASE_GUIDE.md to reflect stratum_group_id 119.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MaxGhenis pushed a commit that referenced this pull request Feb 8, 2026
)

* Fix stale calibration targets by deriving time_period from dataset

- Remove hardcoded CBO_YEAR and TREASURY_YEAR constants
- Add --dataset CLI argument to etl_national_targets.py
- Derive time_period from sim.default_calculation_period
- Default to HuggingFace production dataset

The dataset itself is now the single source of truth for the
calibration year, preventing future drift when updating to new
base years.

Closes #503

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Use income_tax_positive for CBO calibration in loss.py

The CBO income_tax parameter represents positive-only receipts (refundable
credit payments in excess of liability are classified as outlays, not
negative receipts). Using income_tax_positive matches this definition.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add --dataset argument to all database ETL scripts

All ETL scripts now derive their target year from the dataset's
default_calculation_period instead of hardcoding years. This ensures
all calibration targets stay synchronized when updating to a new
base year annually.

Updated scripts:
- create_initial_strata.py
- etl_age.py
- etl_irs_soi.py (with configurable --lag for IRS data delay)
- etl_medicaid.py
- etl_snap.py
- etl_state_income_tax.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add 119th Congress district code support for 2024 ACS data

- Update parse_ucgid to recognize both 5001800US (118th) and 5001900US (119th Congress)
- Expand Puerto Rico and territory filters to handle both Congress code formats
- Update TERRITORY_UCGIDS and NON_VOTING_GEO_IDS with 119th Congress codes

This ensures consistent redistricting alignment: 2024 ACS data uses 119th Congress
codes natively, and IRS SOI data is converted via the 116th→119th mapping matrix.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* Remove seed-related changes to reduce PR scope

Revert deterministic hash-based medicaid/SSI seed logic in cps.py,
update Makefile seed to 3526.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Upgrade policyengine-us to 1.550.1 in uv.lock

Needed for income_tax_positive variable used in loss.py.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Cherry-pick ACA PTC targets from PR #508 and update changelog

Adds aca_ptc ingestion from IRS SOI data (code 85530) to etl_irs_soi.py
and updates DATABASE_GUIDE.md to reflect stratum_group_id 119.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Split local area publish into build+stage and promote phases

Prevents silent no-op promotes by detecting when HF commits don't
change HEAD. Adds separate promote workflow for manual gate before
pushing staging files to production. Also bumps calibration epochs
from 200 to 250.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ingest ACA Premium Tax Credit targets from IRS SOI CD-level data

2 participants