Skip to content

Create H5 files for local area microsimulations from calibrated weights #458

@baogorek

Description

@baogorek

Overview

After local area calibration produces weight vectors, we need to generate H5 dataset files that can be used for microsimulations at the congressional district level.

Requirements

  • Create stacked H5 datasets from calibrated weights
  • Support generating datasets for:
    • All 436 CDs in a single file
    • Per-state files (e.g., NY.h5 with all NY districts)
    • Per-CD files (e.g., NY-10.h5)
  • Properly handle county assignment so that county-dependent variables (like in_nyc) work correctly
  • Reindex entity IDs to prevent overflow when stacking many districts

Technical Details

  • Input: Calibrated weight vector w of length n_households × n_cds
  • Output: H5 files compatible with PolicyEngine microsimulation
  • County assignment should use population-weighted P(county|CD) distributions from Census block data
  • State variables must be updated and caches cleared for correct variable recalculation

Acceptance Criteria

  • Stacked dataset builder creates valid H5 files
  • County-dependent variables (e.g., in_nyc) return correct values
  • Can generate per-state and per-CD subsets
  • Unit tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions