Skip to content

Add parallel year wrapper for long-run H5 builds#686

Closed
MaxGhenis wants to merge 3 commits intoPolicyEngine:mainfrom
MaxGhenis:codex/parallel-long-run-wrapper
Closed

Add parallel year wrapper for long-run H5 builds#686
MaxGhenis wants to merge 3 commits intoPolicyEngine:mainfrom
MaxGhenis:codex/parallel-long-run-wrapper

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • add a parallel wrapper for long-run household projection runs
  • isolate each year in its own output directory to avoid manifest races
  • merge per-year H5 artifacts and rebuild a single manifest afterward
  • document the new wrapper in the long-run README

Why

The underlying long-run year builds are independent by year, but the current runner writes shared manifest state into one output directory. This wrapper makes it safe to fan out year builds in parallel without corrupting calibration_manifest.json or colliding on intermediate artifacts.

Scope

  • add policyengine_us_data/datasets/cps/long_term/run_household_projection_parallel.py
  • update policyengine_us_data/datasets/cps/long_term/README.md

Approach

  • spawn one subprocess per year using the existing single-year runner
  • give each year a private temporary output directory
  • copy YYYY.h5, YYYY.h5.metadata.json, and support reports into the final output directory
  • rebuild the manifest once at the end from the merged artifacts

Validation

  • python3 -m py_compile policyengine_us_data/datasets/cps/long_term/run_household_projection_parallel.py
  • live smoke launched for 2045,2049 with --jobs 2 against the post-OBBBA OACT / core-threshold setup

@MaxGhenis MaxGhenis force-pushed the codex/parallel-long-run-wrapper branch from 3a847e7 to dff42a8 Compare April 4, 2026 01:29
@MaxGhenis MaxGhenis changed the base branch from codex/us-data-calibration-contract to main April 4, 2026 12:49
@MaxGhenis MaxGhenis force-pushed the codex/parallel-long-run-wrapper branch from dff42a8 to cfa4997 Compare April 4, 2026 12:49
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

Superseded by #687. I moved the branch into PolicyEngine/policyengine-us-data because the repo PR workflow fails fork-based branches at check-fork, so the new same-repo PR is the one that will get real CI.

@MaxGhenis MaxGhenis closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant