Skip to content

Replace ad-hoc targets with structured registry and source modules#269

Open
nikhilwoodruff wants to merge 6 commits intomainfrom
targets-registry
Open

Replace ad-hoc targets with structured registry and source modules#269
nikhilwoodruff wants to merge 6 commits intomainfrom
targets-registry

Conversation

@nikhilwoodruff
Copy link
Contributor

Summary

The old calibration system had ~700 lines of hardcoded targets in utils/loss.py with no source URLs or provenance, and local area loss functions loaded 8+ opaque xlsx/csv files inline. This replaces the whole thing with a structured targets registry.

National targets: pydantic Target schema, YAML registry (sources.yaml), per-source Python scrapers (OBR, ONS, HMRC, DWP, VOA, etc.), and build_loss_matrix.py that computes household-level columns from the registry. 570 targets total, zero skipped.

Local area targets: 4 source modules (local_age, local_income, local_uc, local_la_extras) replace inline file loading in constituencies/loss.py and local_authorities/loss.py. Each module has reference URLs and docstrings for provenance.

Both calibrate.py files updated to import create_national_target_matrix from the new location. Full pipeline runs end-to-end: constituency calibration loss 0.056, LA loss 0.077. All 68 tests pass.

Test plan

  • All imports resolve
  • All 68 tests pass
  • Full TESTING=1 pipeline runs end-to-end with zero warnings/skipped targets
  • Constituency and LA calibration completes successfully

nikhilwoodruff and others added 4 commits February 15, 2026 13:40
The old system had ~700 lines of hardcoded targets scattered across loss.py with no provenance. This replaces it with a targets registry (sources.yaml + per-source Python scrapers) that produces 570 national targets with source URLs, and factors local area data loading into dedicated source modules.

National: pydantic Target schema, YAML registry, build_loss_matrix.py computes household columns from registry. Local: 4 source modules (local_age, local_income, local_uc, local_la_extras) replace inline file reads in constituencies/loss.py and local_authorities/loss.py. Both calibrate.py files updated to import create_national_target_matrix from the new location.

Full pipeline runs end-to-end with zero skipped targets.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Copy link
Collaborator

@vahid-ahmadi vahid-ahmadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Critical

  1. No caching on get_all_targets() — triggers HTTP downloads in 6+ modules on every call, and build_loss_matrix.py calls it three times. If any download fails, targets silently disappear. The old local-CSV approach was deterministic.

  2. Savings interest income conflicthmrc_spi.py now emits savings targets that the old code deliberately excluded (SPI underestimates ~3bn vs ~98bn actual). These conflict with the correct ons_savings.py targets.

  3. SPI band name format change — old names included row index (_band_3_50_000_to_70_000), new ones don't. Breaking change for downstream consumers.

  4. Tests require live network — no mocks, so CI will be flaky.

Duplication

_HEADERS, _STORAGE, _SOURCES_YAML/_load_config(), and _to_float() are each copy-pasted across 2–7 modules. Should be shared constants/utilities.

build_loss_matrix.py (827 lines)

The 200+ line if/elif dispatch chain on target name strings is the same problem the PR claims to solve, just relocated. The custom_compute field was designed for this but most targets don't use it. Skipped targets are logged at DEBUG — should be WARNING.

Undocumented behavioural changes

  • hmrc/salary_sacrifice_contributions target dropped
  • obr/esa no longer combines esa_income + esa_contrib
  • VOA council tax population uprating removed
  • scottish_government.py extrapolation anchors on 2025, not 2026

What's done well

  • Pydantic schema, registry auto-discovery, _SimContext lazy caching
  • Local area refactoring is a clear improvement
  • Source URLs on most targets

@nwoodruff-co
Copy link
Collaborator

Thanks!

pls disapply no 3, and yes undocumented behavioural changes should all be scrapped from the pr.

vahid-ahmadi and others added 2 commits February 16, 2026 10:39
- Restore hmrc/salary_sacrifice_contributions target (24bn base, 3%/yr)
- Fix obr/esa to combine esa_income + esa_contrib
- Restore VOA council tax population uprating for non-base years
- Extract shared HEADERS/STORAGE/load_config/to_float into _common.py
- Decompose build_loss_matrix.py (828→402 lines) into targets/compute/
  subpackage with domain modules: demographics, households, income,
  benefits, council_tax, other

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflict in utils/loss.py (keep registry delegation).
Incorporate salary sacrifice headcount targets from PR #268 into
the structured registry (obr.py, compute/income.py, build_loss_matrix.py).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants