forked from kmccleary3301/nested_learning
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or requestexecution-boardExecution board ticket set for paper alignmentExecution board ticket set for paper alignmentphase-0Phase 0: baseline lock and instrumentationPhase 0: baseline lock and instrumentationquality-gateHas explicit acceptance criteria and test gatesHas explicit acceptance criteria and test gates
Milestone
Description
Purpose
Create a reproducible paper-faithful baseline package that all downstream tickets must compare against.
Mandatory Reading (blocking)
Before writing code, the agent must read and summarize in a first issue comment:
reports/NL_IMPLEMENTATION_ORACLE.md(sections: 3, 4.1-4.7, 5, 6.1-6.4)reports/paper/NL-print.extracted.clean.txt(Eq. 21-24, 28-31 and Table 1 text)docs/PAPER_COMPLIANCE.md
The first comment must include:
- 5-10 bullet summary of the current implementation state.
- Exact list of code paths used for baseline runs.
- Confirmed gaps that this ticket does not change.
Required Code Anchors
src/nested_learning/training.pysrc/nested_learning/model.pyconfigs/pilot_paper_faithful.yamlconfigs/pilot_selfmod_paper_faithful.yamlscripts/eval/zeroshot.pyscripts/eval/niah.pyscripts/eval/continual.pyscripts/eval/passkey.pyscripts/eval/pg19_perplexity.py
Scope
- Define canonical baseline run commands for:
hope_selfmodpaper-faithfulhope_attentioncomparison
- Produce frozen baseline artifact bundle with checksums + evals.
- Add
reports/baseline table (short + explicit command provenance).
Runbook
uv run python train.py --config-name pilot_selfmod_paper_faithful train.steps=2000uv run python train.py --config-name pilot_paper_faithful model.block_variant=hope_attention train.steps=2000- Run full eval suite for both checkpoints.
Deliverables
- Baseline artifact bundle under
artifacts/. - Eval JSON set under
eval/for both baselines. - Report file documenting exact commands + hashes + metrics.
Acceptance Criteria
- No NaN/Inf in training logs.
- All eval outputs exist for both baselines.
scripts/checkpoint/verify.pypasses on all published checkpoints.- First issue comment includes mandatory reading summary.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestexecution-boardExecution board ticket set for paper alignmentExecution board ticket set for paper alignmentphase-0Phase 0: baseline lock and instrumentationPhase 0: baseline lock and instrumentationquality-gateHas explicit acceptance criteria and test gatesHas explicit acceptance criteria and test gates