Conversation
There was a problem hiding this comment.
Hi @krishnakalyan3 ,
Thanks for making this, do we need this file to be outside the ipynb? I can understand the yaml, but curious about this one.
Thanks
There was a problem hiding this comment.
The invoice_dataset.py is intentionally kept as an external file to demonstrate how you can bring your own custom dataset without relying on one that's already registered in datasets.py.
HuiyingLi
left a comment
There was a problem hiding this comment.
The launchable works fantastic TYSM!
|
/ok to test 6c6072d |
|
/claude review |
| "source": [ | ||
| "ckpt_dirs = sorted(glob.glob(f\"{WORK_DIR}/invoice_checkpoints/epoch_*_step_*\"))\n", | ||
| "assert ckpt_dirs, f\"No checkpoints found in {WORK_DIR}/invoice_checkpoints/\"\n", | ||
| "WORK_DIR = str(Path.home())\n", |
There was a problem hiding this comment.
Bug: WORK_DIR is hardcoded to Path.home(), but checkpoint_dir in the YAML config is nemotron_parse_checkpoints/ (a relative path resolved against CWD at training time). If the user's CWD during torchrun isn't their home directory, this glob will find nothing and the assert will fail.
Consider deriving it from the actual CWD (or using Path.cwd()) to stay consistent with how the YAML relative path resolves:
WORK_DIR = str(Path.cwd())Or, alternatively, make the YAML checkpoint_dir an absolute path under Path.home() so both sides agree.
There was a problem hiding this comment.
Changes made :)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Krishna Kalyan <krkalyan@nvidia.com>
|
/ok to test 665da0d |
@krishnakalyan3, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test 1bf05a7 |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Krishna Kalyan <krkalyan@nvidia.com>
|
/ok to test 8b1eb20 |
What does this PR do ?
Adds a Nemotron Parse fine-tuning example. An end-to-end Jupyter notebook, matching YAML config, and a small dataset helper so users can reproduce fine-tuning with NeMo AutoModel. The notebooks README lists the notebook with Open in Colab and Launch on Brev links for quick starts.