Fix stale calibration targets by deriving time_period from dataset #505

baogorek · 2026-02-02T15:36:42Z

Summary

Fixes state/CD calibration using stale 2022-2023 targets instead of correct 2024 values
Removes hardcoded CBO_YEAR and TREASURY_YEAR constants from etl_national_targets.py
Adds --dataset CLI argument to specify the source dataset
Derives time_period from sim.default_calculation_period - the dataset itself is now the single source of truth

Root Cause

The ETL had hardcoded year constants:

CBO_YEAR = 2023  # was pulling 2023 CBO values
TREASURY_YEAR = 2023  # was pulling 2023 Treasury values

But the calibration runs at time_period=2024. This caused an 18% gap for income tax alone ($2,051B vs $2,426B).

The Fix

Instead of hardcoding years, we now derive the time period from the dataset:

sim = Microsimulation(dataset=args.dataset)
time_period = int(sim.default_calculation_period)  # e.g., 2024

This ensures CBO/Treasury targets always match the dataset's year, preventing future drift when updating to new base years annually.

Usage

# Default: uses HuggingFace production dataset
python policyengine_us_data/db/etl_national_targets.py

# Or specify a local dataset
python policyengine_us_data/db/etl_national_targets.py \
  --dataset /path/to/stratified_extended_cps.h5

Test plan

Run make database to regenerate policy_data.db
Verify CBO/Treasury targets now show 2024 values
Verify income_tax target is ~$2,426B (not $2,051B)

Closes #503

🤖 Generated with Claude Code

- Remove hardcoded CBO_YEAR and TREASURY_YEAR constants - Add --dataset CLI argument to etl_national_targets.py - Derive time_period from sim.default_calculation_period - Default to HuggingFace production dataset The dataset itself is now the single source of truth for the calibration year, preventing future drift when updating to new base years. Closes #503 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The CBO income_tax parameter represents positive-only receipts (refundable credit payments in excess of liability are classified as outlays, not negative receipts). Using income_tax_positive matches this definition. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

All ETL scripts now derive their target year from the dataset's default_calculation_period instead of hardcoding years. This ensures all calibration targets stay synchronized when updating to a new base year annually. Updated scripts: - create_initial_strata.py - etl_age.py - etl_irs_soi.py (with configurable --lag for IRS data delay) - etl_medicaid.py - etl_snap.py - etl_state_income_tax.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Update parse_ucgid to recognize both 5001800US (118th) and 5001900US (119th Congress) - Expand Puerto Rico and territory filters to handle both Congress code formats - Update TERRITORY_UCGIDS and NON_VOTING_GEO_IDS with 119th Congress codes This ensures consistent redistricting alignment: 2024 ACS data uses 119th Congress codes natively, and IRS SOI data is converted via the 116th→119th mapping matrix. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

baogorek and others added 2 commits February 2, 2026 10:36

baogorek force-pushed the fix-stale-calibration-targets-503 branch from ee54587 to 69406d6 Compare February 2, 2026 18:04

baogorek and others added 3 commits February 2, 2026 13:29

Use deterministic hash for medicaid_take_up_seed

634a75d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stale calibration targets by deriving time_period from dataset #505

Fix stale calibration targets by deriving time_period from dataset #505

Uh oh!

baogorek commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix stale calibration targets by deriving time_period from dataset #505

Are you sure you want to change the base?

Fix stale calibration targets by deriving time_period from dataset #505

Uh oh!

Conversation

baogorek commented Feb 2, 2026

Summary

Root Cause

The Fix

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants