-
Notifications
You must be signed in to change notification settings - Fork 10
Add sparse matrix builder for local area calibration - SNAP targets #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Core components: - sparse_matrix_builder.py: Database-driven approach for building calibration matrices - calibration_utils.py: Shared utilities (cache clearing, constraints, geo helpers) - matrix_tracer.py: Debugging utility for tracing through sparse matrices - create_stratified_cps.py: Create stratified sample preserving high-income households - test_sparse_matrix_builder.py: 6 verification tests for matrix correctness Data pipeline changes: - Add GEO_STACKING env var to cps.py and puf.py for geo-stacking data generation - Add GEO_STACKING_MODE env var to extended_cps.py - Add CPS_2024_Full, PUF_2023, ExtendedCPS_2023 classes - Add policy_data.db download to prerequisites - Add 'make data-geo' target for geo-stacking data pipeline CI/CD: - Add geo-stacking dataset build step to workflow - Add sparse matrix builder test step after geo data generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Move sparse matrix tests to tests/test_local_area_calibration/ - Split large test file into focused modules (column indexing, same-state, cross-state, geo masking) - Fix small_enhanced_cps.py enum encoding (decode_to_str before astype) - Fix create_stratified_cps.py to use local storage instead of HuggingFace - Remove CPS_2024_Full to keep PR minimal - Revert ExtendedCPS_2024 to use CPS_2024 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…tionality - Rename GEO_STACKING to LOCAL_AREA_CALIBRATION in cps.py, puf.py, extended_cps.py - Rename data-geo to data-local-area in Makefile and workflow - Add create_target_groups function to calibration_utils.py - Enhance MatrixTracer with get_group_rows method and variable_desc in row catalog - Add TARGET GROUPS section to print_matrix_structure output - Add local_area_calibration_setup.ipynb documentation notebook 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…format - Replace silent exception catch with debug logging for constraint evaluation - Add comment explaining CD GEOID format (SSCCC where SS=state FIPS) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Code Review SummaryOverall this is a well-designed PR with clean architecture and comprehensive test coverage. I've pushed a small commit with minor improvements: Changes Made (commit 7f6ea43)
Notes
Waiting for CI to pass before merging. |
MaxGhenis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CI checks passing. Code review complete with minor improvements pushed (logging for constraint failures, documentation of CD GEOID format). Approving.
Summary
Adds infrastructure for local area (congressional district) calibration components related the sparse X matrix (aka the "loss matrix" and the target vector, starting with SNAP targets only.
The Jupyter Notebook added to docs is a great way to get comfortable with the functionality.
Core components
sparse_matrix_builder.py: Database-driven approach for building calibration matricescalibration_utils.py: Shared utilities (cache clearing, constraints, target grouping)matrix_tracer.py: Debugging utility for tracing through sparse matricescreate_stratified_cps.py: Create stratified sample preserving high-income householdsTest plan
tests/test_local_area_calibration/:test_column_indexing.py: Verify column structuretest_same_state.py: Same-state household placementtest_cross_state.py: Cross-state benefit recalculationtest_geo_masking.py: Geographic masking for state targetsData pipeline changes
LOCAL_AREA_CALIBRATIONenv var to cps.py and puf.pyLOCAL_AREA_CALIBRATION_MODEenv var to extended_cps.pymake data-local-areatargetDocumentation
docs/local_area_calibration_setup.ipynbnotebook demonstrating matrix construction