Skip to content

Address PR #505 review feedback#519

Merged
MaxGhenis merged 2 commits intomainfrom
address-pr-505-review-feedback
Feb 9, 2026
Merged

Address PR #505 review feedback#519
MaxGhenis merged 2 commits intomainfrom
address-pr-505-review-feedback

Conversation

@baogorek
Copy link
Collaborator

@baogorek baogorek commented Feb 9, 2026

Summary

Addresses actionable items from @MaxGhenis's review of PR #505. Remaining deferred items tracked in #518.

  • Extract etl_argparser() into utils/db.py — eliminates ~15 lines of repeated argparse + Microsimulation boilerplate across 7 ETL scripts
  • Fix hardcoded target year labels — dollar targets now use HARDCODED_YEAR = 2024 instead of dynamic time_period; warnings.warn fires when dataset year differs
  • Delete get_pseudo_input_variables() — was a no-op return set(); callers updated
  • Switch DEFAULT_DATASET to local storage — local-first workflow; add promote-dataset Makefile target
  • SOI Congress-session constants — extract SOI_CONGRESS_PREFIX / SOI_DISTRICT_TAX_YEAR with RuntimeError guard for future tax-year bumps
  • Makefile improvementsHF_CLONE_DIR variable, updated stratified CPS comments

Test plan

  • black -l 79 --check passes on all modified files
  • DEFAULT_DATASET import prints local path
  • pytest — 120 passed, 5 failed (all pre-existing), 7 skipped

Closes #518

🤖 Generated with Claude Code

- Extract shared etl_argparser() into utils/db.py to eliminate repeated
  boilerplate across 7 ETL scripts
- Label hardcoded dollar targets with HARDCODED_YEAR = 2024 instead of
  dynamic time_period; add warnings.warn when dataset year differs
- Delete dead get_pseudo_input_variables() and update callers
- Switch DEFAULT_DATASET to local storage path for local-first workflow
- Add promote-dataset Makefile target and HF_CLONE_DIR variable
- Add SOI Congress-session constants with RuntimeError guard for
  future tax-year bumps
- Update Makefile comments for stratified CPS parameters

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@baogorek baogorek requested a review from MaxGhenis February 9, 2026 15:56
Copy link
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped PR that directly addresses the review feedback. A few notes:

  • etl_argparser() extraction is nicely done — the extra_args_fn callback for SOI's --lag is a good pattern
  • HARDCODED_YEAR = 2024 with warnings.warn is exactly right — those dollar targets were silently mislabeled before
  • SOI Congress RuntimeError guard is smart defensive coding
  • promote-dataset Makefile target will be useful

One minor thought: the DEFAULT_DATASET switch to local storage means fresh checkouts without the .h5 file will fail on python -m policyengine_us_data.db.etl_* with a missing file error instead of auto-downloading from HF. That's probably fine for the intended local-first workflow, but worth noting in case someone hits it.

LGTM!

On a fresh checkout without `make data`, the local DEFAULT_DATASET
won't exist. Give a clear FileNotFoundError suggesting `make data`
or `--dataset hf://...` instead of a cryptic load failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit 73428fe into main Feb 9, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remaining review feedback from PR #505

2 participants