Skip to content

Add parallel year wrapper for long-run H5 builds#687

Draft
MaxGhenis wants to merge 3 commits intomainfrom
codex/parallel-long-run-wrapper-upstream
Draft

Add parallel year wrapper for long-run H5 builds#687
MaxGhenis wants to merge 3 commits intomainfrom
codex/parallel-long-run-wrapper-upstream

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • add a parallel wrapper for long-run household projection runs
  • isolate each year in its own output directory to avoid manifest races
  • merge per-year H5 artifacts and rebuild a single manifest afterward
  • document the new wrapper in the long-run README

Why

The underlying long-run year builds are independent by year, but the current runner writes shared manifest state into one output directory. This wrapper makes it safe to fan out year builds in parallel without corrupting calibration_manifest.json or colliding on intermediate artifacts.

Scope

  • add policyengine_us_data/datasets/cps/long_term/run_household_projection_parallel.py
  • update policyengine_us_data/datasets/cps/long_term/README.md

Approach

  • spawn one subprocess per year using the existing single-year runner
  • give each year a private temporary output directory
  • copy YYYY.h5, YYYY.h5.metadata.json, and support reports into the final output directory
  • rebuild the manifest once at the end from the merged artifacts

Validation

  • python3 -m py_compile policyengine_us_data/datasets/cps/long_term/run_household_projection_parallel.py
  • live smoke launched for 2045,2049 with --jobs 2 against the post-OBBBA OACT / core-threshold setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant