Update pipeline documentation, both public facing and internal#644
Open
juaristi22 wants to merge 5 commits intomainfrom
Open
Update pipeline documentation, both public facing and internal#644juaristi22 wants to merge 5 commits intomainfrom
juaristi22 wants to merge 5 commits intomainfrom
Conversation
0ced478 to
6493e3a
Compare
…ll diagnostics to HF - docs/methodology.md and docs/data.md updated to match current pipeline - pipeline.py now uploads validation diagnostics after H5 builds complete, in addition to the existing calibration diagnostics upload Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move docs/calibration_internals.ipynb → docs/internals/calibration_package_internals.ipynb - Add docs/internals/data_build_internals.ipynb: Stage 1 coverage — clone creation with real assign_random_geography() on 20 records, source imputation concept demo, PUF cloning toy walkthrough - Add docs/internals/local_dataset_assembly_internals.ipynb: Stages 3–4 — Hard Concrete L0 math, λ preset comparison, weight expansion reference, diagnostics column guide - Add docs/internals/README.md: navigation index + §9 pipeline orchestration (run ID format, Modal volumes, step dependency graph, resume logic, HuggingFace artifact paths, meta.json structure) - Extend calibration_package_internals with Part 4 (matrix assembly per-state, domain constraints) and Part 5 (takeup randomization cross-stage demo) - All notebooks execute with zero errors under --allow-errors; toy inputs complete in <30s - Add changelog fragment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6493e3a to
4479b78
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive update to pipeline documentation — both public-facing and internal developer reference. FIxes #643 .
Internal developer reference (
docs/internals/)Three new notebooks providing thorough explanations of the calibration pipeline for developers:
data_build_internals.ipynb— Stage 1: PUF cloning, geography assignment (including AGI-conditional routing and the no-collision constraint), and source imputation. Corrected pipeline ordering to match implementation (PUF clone → geography → source imputation). Documents that geography is rederived per-run, not persisted.calibration_package_internals.ipynb— Stage 2: Matrix construction internals including per-state simulation, clone loop, domain constraints (corrected: constraints come fromstratum_constraintsinpolicy_data.db, nottarget_config.yaml), takeup re-randomization (state precomputation + clone-loop draws), county-dependent variables, COO assembly, target config filtering (clarified: applied post-matrix-build, not during construction), hierarchical uprating, and calibration package serialization with initial weight computation.optimization_and_local_dataset_assembly_internals.ipynb— Stages 3–4: L0 optimization (fixed sparsity demo from 20→200 records so lambda effect is visible), H5 assembly pipeline (expanded from 11→16 steps matching actual implementation), SPM threshold recalculation, takeup consistency invariant, and diagnostics includingvalidation_results.csv.README.md— Pipeline orchestration reference with run ID format, step dependency graph, Modal volumes, HuggingFace artifact paths, resume logic. Added file reference tables forcalibration/andmodal_app/with per-file descriptions and notes on legacy/standalone status.Public-facing documentation
docs/methodology.md— Minor updates to reflect current implementation.docs/data.md— Updated data source descriptions.Dead code removed
save_geography()andload_geography()fromclone_and_assign.py— defined but never called by any pipeline code. Geography is rederived each run via deterministic seeding, making serialization unnecessary.Test plan
ruff format --check .passesunified_matrix_builder.py,unified_calibration.py,publish_local_area.py,clone_and_assign.py)🤖 Generated with Claude Code