Add ACA Premium Tax Credit targets from IRS SOI data#508
Open
daphnehanse11 wants to merge 4 commits intomainfrom
Open
Add ACA Premium Tax Credit targets from IRS SOI data#508daphnehanse11 wants to merge 4 commits intomainfrom
daphnehanse11 wants to merge 4 commits intomainfrom
Conversation
The SOI 22incd.csv file contains PTC count (N85530) and amount (A85530) columns at national, state, and congressional district levels, but they were not being ingested. This adds PTC to the existing IRS conditional strata pipeline so calibration can target PTC dollar amounts by geography. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…yaml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
|
We need to think about a custom uprating here to get to 2024. |
baogorek
added a commit
that referenced
this pull request
Feb 7, 2026
Adds aca_ptc ingestion from IRS SOI data (code 85530) to etl_irs_soi.py and updates DATABASE_GUIDE.md to reflect stratum_group_id 119. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4 tasks
MaxGhenis
pushed a commit
that referenced
this pull request
Feb 8, 2026
) * Fix stale calibration targets by deriving time_period from dataset - Remove hardcoded CBO_YEAR and TREASURY_YEAR constants - Add --dataset CLI argument to etl_national_targets.py - Derive time_period from sim.default_calculation_period - Default to HuggingFace production dataset The dataset itself is now the single source of truth for the calibration year, preventing future drift when updating to new base years. Closes #503 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Use income_tax_positive for CBO calibration in loss.py The CBO income_tax parameter represents positive-only receipts (refundable credit payments in excess of liability are classified as outlays, not negative receipts). Using income_tax_positive matches this definition. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --dataset argument to all database ETL scripts All ETL scripts now derive their target year from the dataset's default_calculation_period instead of hardcoding years. This ensures all calibration targets stay synchronized when updating to a new base year annually. Updated scripts: - create_initial_strata.py - etl_age.py - etl_irs_soi.py (with configurable --lag for IRS data delay) - etl_medicaid.py - etl_snap.py - etl_state_income_tax.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add 119th Congress district code support for 2024 ACS data - Update parse_ucgid to recognize both 5001800US (118th) and 5001900US (119th Congress) - Expand Puerto Rico and territory filters to handle both Congress code formats - Update TERRITORY_UCGIDS and NON_VOTING_GEO_IDS with 119th Congress codes This ensures consistent redistricting alignment: 2024 ACS data uses 119th Congress codes natively, and IRS SOI data is converted via the 116th→119th mapping matrix. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * Remove seed-related changes to reduce PR scope Revert deterministic hash-based medicaid/SSI seed logic in cps.py, update Makefile seed to 3526. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Upgrade policyengine-us to 1.550.1 in uv.lock Needed for income_tax_positive variable used in loss.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Cherry-pick ACA PTC targets from PR #508 and update changelog Adds aca_ptc ingestion from IRS SOI data (code 85530) to etl_irs_soi.py and updates DATABASE_GUIDE.md to reflect stratum_group_id 119. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Split local area publish into build+stage and promote phases Prevents silent no-op promotes by detecting when HF commits don't change HEAD. Adds separate promote workflow for manual gate before pushing staging files to production. Also bumps calibration epochs from 200 to 250. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #509
Summary
22incd.csv) which were already present in the raw data but not being loadedaca_ptcto theTARGETSlist inetl_irs_soi.py, creating conditional strata (aca_ptc > 0) with bothtax_unit_countandaca_ptcdollar targets at national, state, and CD levels (stratum_group_id 119)aca_ptcvariable metadata andaca_ptc_recipientsvariable groupaca_ptcto the calibration weight fitting target filterTest plan
make databaseand verify PTC strata are created at all three geographic levelsaca_ptctargets appear in the targets table with reasonable values (national PTC ~$50-60B range)fit_calibration_weights.pyand confirm PTC targets appear in the matrix🤖 Generated with Claude Code