-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Overview
After local area calibration produces weight vectors, we need to generate H5 dataset files that can be used for microsimulations at the congressional district level.
Requirements
- Create stacked H5 datasets from calibrated weights
- Support generating datasets for:
- All 436 CDs in a single file
- Per-state files (e.g., NY.h5 with all NY districts)
- Per-CD files (e.g., NY-10.h5)
- Properly handle county assignment so that county-dependent variables (like
in_nyc) work correctly - Reindex entity IDs to prevent overflow when stacking many districts
Technical Details
- Input: Calibrated weight vector
wof lengthn_households × n_cds - Output: H5 files compatible with PolicyEngine microsimulation
- County assignment should use population-weighted P(county|CD) distributions from Census block data
- State variables must be updated and caches cleared for correct variable recalculation
Acceptance Criteria
- Stacked dataset builder creates valid H5 files
- County-dependent variables (e.g.,
in_nyc) return correct values - Can generate per-state and per-CD subsets
- Unit tests pass
Metadata
Metadata
Assignees
Labels
No labels