Skip to content

Conversation

@DTrim99
Copy link
Collaborator

@DTrim99 DTrim99 commented Dec 4, 2025

Summary

  • Add spm-calculator as a dependency (from GitHub)
  • Create policyengine_us_data/utils/spm.py with SPM threshold calculation utilities
  • Update CPS datasets to calculate SPM thresholds using spm-calculator with Census-provided geographic adjustments (SPM_GEOADJ)
  • Update ACS datasets to calculate SPM thresholds using spm-calculator with national-level thresholds

Details

For CPS: Uses the SPM_GEOADJ values already computed by the Census Bureau, combined with spm-calculator's base thresholds and equivalence scale formula. This provides geographically-adjusted thresholds without requiring a Census API key.

For ACS: Uses national-level thresholds (geoadj=1.0) since ACS doesn't include pre-computed geographic adjustment factors.

Test plan

  • Verify CPS dataset generation works with new SPM threshold calculation
  • Verify ACS dataset generation works with new SPM threshold calculation
  • Compare calculated thresholds against Census-provided values for sanity check

🤖 Generated with Claude Code

DTrim99 and others added 5 commits December 4, 2025 16:08
- Add spm-calculator as a dependency (from GitHub)
- Create policyengine_us_data/utils/spm.py with threshold calculation utilities
- Update CPS dataset to use spm-calculator with Census-provided GEOADJ factors
- Update ACS dataset to use spm-calculator with national-level thresholds

For CPS: Uses the SPM_GEOADJ values from Census data combined with
spm-calculator's base thresholds and equivalence scale, providing
geographically-adjusted thresholds.

For ACS: Uses national-level thresholds (no geographic adjustment)
since ACS doesn't include pre-computed GEOADJ values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add spm-calculator as a dependency (from GitHub)
- Create policyengine_us_data/utils/spm.py with threshold calculation utilities
- Update CPS dataset to use spm-calculator with Census-provided GEOADJ factors
- Update ACS dataset to use spm-calculator with national-level thresholds

For CPS: Uses the SPM_GEOADJ values from Census data combined with
spm-calculator's base thresholds and equivalence scale, providing
geographically-adjusted thresholds.

For ACS: Uses national-level thresholds (no geographic adjustment)
since ACS doesn't include pre-computed GEOADJ values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The long_term_projections.ipynb notebook outputs were causing a
"useOutputsContext must be used within a OutputsContextProvider"
error in MyST's book-theme during documentation build.

Clearing the outputs resolves this issue.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The Jupyter notebook was causing "useOutputsContext must be used within
a OutputsContextProvider" errors in the MyST book-theme. Converting to
markdown preserves all content while avoiding the rendering bug.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@DTrim99 DTrim99 requested review from MaxGhenis and baogorek December 5, 2025 19:02
Previously, ACS SPM thresholds used national-level values only (GEOADJ=1.0).
This update uses spm-calculator's PUMA-level geographic adjustment lookup
to compute thresholds with local housing cost adjustments.

Changes:
- Add calculate_spm_thresholds_by_puma() function to spm.py
- Update ACS dataset to pass state FIPS and PUMA codes to the new function
- PUMA identifiers are constructed as 7-digit codes (2-digit state + 5-digit PUMA)

Note: Requires CENSUS_API_KEY environment variable for PUMA lookups.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@DTrim99
Copy link
Collaborator Author

DTrim99 commented Dec 9, 2025

Update: PUMA-level geographic adjustments for ACS

This commit adds PUMA-level geographic adjustments for ACS SPM thresholds.

Changes

Previously: ACS used national-level thresholds only (GEOADJ=1.0 for all households).

Now: ACS uses PUMA-level geographic adjustments via spm-calculator:

  • Combines state FIPS (2-digit) + PUMA (5-digit) into 7-digit PUMA identifiers
  • Calls SPMCalculator.calculate_thresholds(geography_type="puma", geography_ids=puma_ids)
  • This provides local housing cost adjustments based on ACS 5-year median rent data

Geographic identifier format

The spm-calculator supports these geography levels for SPM thresholds:

Geography Identifier Coverage
Nation "us" 1 (US average)
State 2-digit FIPS 51 (states + DC)
County 5-digit FIPS ~3,200
Metro Area (MSA/μSA) 5-digit MSA code ~400
Congressional District 4-digit code 435
PUMA 7-digit code ~2,300
Census Tract 11-digit FIPS ~84,000

For ACS, we use PUMA since:

  • ACS provides ST (state) and PUMA columns directly
  • PUMAs have ~100k-200k population, providing good geographic granularity
  • GEOADJ ranges from ~0.84 (low-cost areas) to ~1.27 (high-cost areas like Hawaii)

Requirements

Generating ACS data now requires CENSUS_API_KEY environment variable to be set for PUMA-level rent lookups. Get a free key at: https://api.census.gov/data/key_signup.html

@DTrim99 DTrim99 requested a review from daphnehanse11 December 9, 2025 18:20
@MaxGhenis
Copy link
Contributor

Per our discussion, we need to discard SPM_GEOADJ and instead add it based on the geographic context of the record (which we will change by stacking/reweighting). Let's put it in the part of the pipeline where we assign counties etc. based on district. @baogorek can advise.

Copy link
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DTrim99, I think I understand what's going on here. One of my comments is actually a question for my own understanding of MyST documentation.

I understand this PR will lead to a new CPS h5 file and I believe you can merge it independently of anything I'm working on. If you agree, let's focus on getting your tests to pass.

My one ask is that you'd think about how you might be able to use your SPM calculator without using the CENSUS_API_KEY (causing the tests to fail). If you cannot, we can add the key to the environment variables of this repo.

- Switch from calculate_spm_thresholds_by_puma to calculate_spm_thresholds_national
  to avoid Census API dependency that was causing CI failures
- Geographic adjustments will be applied later in the pipeline when
  households are assigned to specific areas (per discussion with Max)
- Remove obsolete long_term_projections.ipynb reference from myst.yml

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@DTrim99
Copy link
Collaborator Author

DTrim99 commented Dec 10, 2025

Addressed the review feedback:

  1. Switched ACS to use national-level SPM thresholds (no geographic adjustment) - This avoids the Census API dependency that was causing CI test failures.

  2. Removed the obsolete long_term_projections.ipynb reference from myst.yml - Since it was converted to markdown.

Note for follow-up: Geographic adjustments (SPM_GEOADJ) will need to be applied later in the pipeline when households are assigned to specific geographic areas during stacking/reweighting. @baogorek - would appreciate guidance on where in the county/district assignment process this should be integrated.

Copy link
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DTrim99 , I hope I don't sound too grumpy here. There's just a lot going on with additional dependencies and new responsibilities and I want to make sure we get this right.

So the good news is that, if you're okay with me bringing my own geoadj values for congressional districts, I can run with calculate_spm_thresholds_with_geoadj(), which does not need the census api key, and then I'm using the calculator.

We do need to discuss the following points:

  • Why not use spm_unit.SPM_POVTHRESHOLD when it is available to us? Do we think that our calculator has a superior formula? It seems like we're using the same inputs that it would use. So why take on the work and add more lines of code?
  • Why mess with acs.py at all, especially since moving to the national level feels like a regression from .SPM_POVTHRESHOLD?
  • You'll see that I gripe about the time_period argument a few times. I need to be convinced that having a free time argument in a method is needed, since the class has time_period as a property. There's the potential for getting out of sync, and extra arguments are more complexity.
  • If we're going to use the calculator, I think we should prioritize getting it on PyPI so we don't have to use the github pattern in the pyproject.toml

- Revert all ACS changes - no longer touching ACS since SPM_POVTHRESHOLD
  is already available and we don't need to calculate thresholds there
- Remove time_period argument from add_spm_variables, use self.time_period
  instead to match pattern of other functions like add_rent

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@DTrim99
Copy link
Collaborator Author

DTrim99 commented Dec 12, 2025

Hi @baogorek, thanks for the thorough review! I've addressed your feedback:

Changes made:

  1. Reverted all ACS changes - You're right that we shouldn't mess with ACS. The original code had SPM_POVTHRESHOLD available (even though add_spm_variables was never called), and moving to national thresholds would be a regression. ACS is now untouched.

  2. Removed time_period argument - Changed add_spm_variables(cps, spm_unit, time_period) to add_spm_variables(self, cps, spm_unit) to match the pattern of other functions like add_rent. Now uses self.time_period internally, eliminating the potential for desync.

Regarding your other points:

  • Why use the calculator instead of SPM_POVTHRESHOLD? - We want the calculator available for your congressional district work where you'll bring your own geoadj values. For CPS, using calculate_spm_thresholds_with_geoadj() with the Census-provided SPM_GEOADJ should produce equivalent results to SPM_POVTHRESHOLD.

  • PyPI for spm-calculator - Agreed this should be prioritized. The git+https pattern works but isn't ideal for production.

  • Redundant comments - Will clean those up.

Let me know if you have any other concerns!

DTrim99 and others added 2 commits December 12, 2025 12:08
- Remove redundant comment above TENURE_CODE_MAP (self-explanatory)
- Remove calculate_spm_thresholds_national (unused after ACS revert)
- Remove map_tenure_acs_to_spm (unused after ACS revert)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@MaxGhenis
Copy link
Contributor

SPM_POVTHRESHOLD from the CPS uses SPM_GEOADJ - that's why we need to recalculate it

Copy link
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@baogorek baogorek merged commit 95af449 into main Dec 19, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants