Skip to content

CI workflow takes 2.5+ hours due to sequential data builds #477

@baogorek

Description

@baogorek

Problem

The PR code changes workflow is taking 2.5+ hours to complete, far longer than the expected ~45 minutes. The bottleneck is in the Modal data build step (modal_app/data_build.py).

Root Cause

The Modal data build runs everything sequentially with no parallelization:

First pass (with TEST_LITE):

  1. Download prerequisites
  2. uprating.py
  3. acs.py
  4. cps.py
  5. irs_puf.py
  6. puf.py
  7. extended_cps.py
  8. enhanced_cps.py ← Likely the slowest (calibration/reweighting)
  9. small_enhanced_cps.py

Second pass (LOCAL_AREA_CALIBRATION - runs full, not lite):

  1. cps.py again
  2. puf.py again
  3. extended_cps.py again
  4. create_stratified_cps.py

Then tests:

  1. Local area calibration tests
  2. Main test suite

Issues

  1. No parallelization - All scripts run sequentially despite having 8 CPUs available
  2. Duplicate builds - Several datasets are built twice (once for main, once for local area calibration)
  3. enhanced_cps.py involves expensive optimization/calibration operations

Observed

  • Run #21254423149 took 2+ hours
  • The job appeared to hang/stall at times during the Modal step

Potential Solutions

  • Parallelize independent data build steps
  • Cache intermediate results between the two build passes
  • Profile individual scripts to identify specific bottlenecks
  • Consider whether both build passes are necessary for every PR

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions