Make country package purely deterministic - read stochastic variables from dataset #6635

MaxGhenis · 2025-10-05T19:14:37Z

Summary

This PR removes all random number generation from policyengine-us. All stochastic take-up variables are now generated in policyengine-us-data and read from the dataset. The country package is now a purely deterministic rules engine.

⚠️ MERGE ORDER: The companion PolicyEngine/policyengine-us-data#442 must be merged FIRST, then this PR

Changes

Removed

All take-up seed variables (snap_take_up_seed, aca_take_up_seed, medicaid_take_up_seed)
All take-up rate parameters (moved to policyengine-us-data)

Simplified

All takes_up_* variables now use dataset values with deterministic fallbacks:

takes_up_snap_if_eligible (default: True)
takes_up_aca_if_eligible (default: True)
takes_up_medicaid_if_eligible (default: True)

These variables have no formula - when present in the dataset, OpenFisca uses the dataset value. For policy calculator (non-microsimulation), they default to True (full take-up assumption).

Trade-offs

IMPORTANT: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic.

To adjust take-up rates for analysis, the microdata must be regenerated with updated parameter values in policyengine-us-data.

Test Plan

Package imports successfully
All existing tests pass
Microsimulations produce correct results
Policy calculator (non-microsim) still works

Related PRs

policyengine-us-data: Jupyterbook plotly charts not loading #442 (must be merged FIRST)
Follows same pattern as UK: Move all randomness to data package for deterministic country package policyengine-uk-data#203, Make country package purely deterministic - read stochastic variables from dataset policyengine-uk#1355

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

… from dataset This change removes all random number generation from policyengine-us. All stochastic take-up variables are now generated in policyengine-us-data and read from the dataset. The country package is now a purely deterministic rules engine. ## Key Changes ### Removed - All take-up seed variables (snap_take_up_seed, aca_take_up_seed, medicaid_take_up_seed) - All take-up rate parameters (moved to policyengine-us-data) ### Simplified All takes_up_* variables now use dataset values with deterministic fallbacks: - takes_up_snap_if_eligible (default: True) - takes_up_aca_if_eligible (default: True) - takes_up_medicaid_if_eligible (default: True) ## Trade-offs **IMPORTANT**: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic. To adjust take-up rates for analysis, the microdata must be regenerated with updated parameter values in policyengine-us-data. Related: policyengine-us-data PR (must be merged FIRST) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

PavelMakarchuk · 2025-10-06T18:13:18Z

I think the Mass branch got merged into this PR @MaxGhenis

- Create takes_up_head_start_if_eligible and takes_up_early_head_start_if_eligible - Update head_start and early_head_start to use takeup in microsimulation - Add unit=USD and simplify labels to match conventions - Takeup is generated stochastically in dataset, defaults to True in policy calculator

codecov · 2025-11-10T11:56:51Z

Codecov Report

❌ Patch coverage is 71.42857% with 10 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@61702e1). Learn more about missing BASE report.
⚠️ Report is 26 commits behind head on master.

Files with missing lines	Patch %	Lines
...s/variables/gov/hhs/head_start/early_head_start.py	0.00%	5 Missing ⚠️
...gine_us/variables/gov/hhs/head_start/head_start.py	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master    #6635   +/-   ##
=========================================
  Coverage          ?   71.42%           
=========================================
  Files             ?        7           
  Lines             ?       84           
  Branches          ?        2           
=========================================
  Hits              ?       60           
  Misses            ?       24           
  Partials          ?        0

Flag	Coverage Δ
unittests	`71.42% <71.42%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Changed np.any(programs) to programs > 0 to preserve array structure. The np.any() call was collapsing the entire array into a single boolean, causing all people to be categorically eligible if ANY tax unit qualified. This manifested when using axes - eligibility showed True at all income levels even when income_eligible was correctly False at high incomes. Fixes the issue where Early Head Start benefits were incorrectly given to high-income households (e.g., $200k) in vectorized calculations.

The vectorization fix is now in its own PR (PolicyEngine#6804) to keep the takeup migration PR focused on moving randomness to the data package.

into migrate-random-to-data

These tests tested the old formula-based takeup using seed variables. In the new design, takeup is generated in the dataset (policyengine-us-data) and the variables have no formula (just default_value = True). Removed: - takes_up_snap_if_eligible.yaml - takes_up_medicaid_if_eligible.yaml - takes_up_aca_if_eligible.yaml The stochastic behavior is now tested in the data package, not the rules engine.

MaxGhenis added 2 commits November 10, 2025 05:50

Merge upstream master

5820f11

MaxGhenis added 5 commits November 10, 2025 06:10

Revert vectorization fix - moved to separate PR PolicyEngine#6804

bc5b9c2

The vectorization fix is now in its own PR (PolicyEngine#6804) to keep the takeup migration PR focused on moving randomness to the data package.

Merge branch 'master' of https://github.com/policyengine/policyengine-us

bac0e26

into migrate-random-to-data

Add changelog entry for Head Start takeup variables

f7b2b90

MaxGhenis mentioned this pull request Nov 10, 2025

Update Head Start and Early Head Start spending/enrollment data to FY 2024 #6808

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make country package purely deterministic - read stochastic variables from dataset #6635

Make country package purely deterministic - read stochastic variables from dataset #6635

Uh oh!

MaxGhenis commented Oct 5, 2025 •

edited

Loading

Uh oh!

PavelMakarchuk commented Oct 6, 2025

Uh oh!

codecov bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make country package purely deterministic - read stochastic variables from dataset #6635

Are you sure you want to change the base?

Make country package purely deterministic - read stochastic variables from dataset #6635

Uh oh!

Conversation

MaxGhenis commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Removed

Simplified

Trade-offs

Test Plan

Related PRs

Uh oh!

PavelMakarchuk commented Oct 6, 2025

Uh oh!

codecov bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxGhenis commented Oct 5, 2025 •

edited

Loading

codecov bot commented Nov 10, 2025 •

edited

Loading