Skip to content

Phase 5: Run RunPod pilot ablation matrix (5k/25k) for paper-fidelity paths #11

@georgepullen

Description

@georgepullen

Purpose

Execute controlled RunPod pilot ablation matrix to choose default fidelity path before scaling.

Mandatory Reading (blocking)

First comment must summarize:

  • reports/NL_IMPLEMENTATION_ORACLE.md section 6.4 summary matrix
  • reports/ablations.md
  • docs/zeroshot_eval.md
  • docs/continual_eval.md

Scope

Run matrix with fixed seeds/data/tokenizer:

  • baseline legacy path
  • delta_rule
  • dmgd_mlp
  • muon_ns
  • CMS projection off/on
  • stopgrad vs differentiable online writes (if available)

Required Eval Bundle per run

  • zeroshot
  • niah
  • continual
  • passkey
  • pg19

Deliverables

  • Run manifest table with command, checkpoint, log, eval paths.
  • Comparative summary with recommended default.

Acceptance Criteria

  • 5k and 25k variants completed for all enabled matrix entries.
  • No untriaged failed runs.
  • Recommendation justified by metrics and stability evidence.
  • First issue comment contains mandatory reading summary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestexecution-boardExecution board ticket set for paper alignmentphase-5Phase 5: scaled training and benchmark reproductionquality-gateHas explicit acceptance criteria and test gatesrunpodRunPod infra and training execution tasks

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions