[Fix](nvbug6304585): specdec README online base-model example should use Instruct model#1755
[Fix](nvbug6304585): specdec README online base-model example should use Instruct model#1755h-guo18 wants to merge 1 commit into
Conversation
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughTwo small coordinated fixes: the speculative decoding README replaces the base ChangesInstruct Model Alignment
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
/claude review |
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 0
Reviewed both changed files end-to-end:
-
examples/speculative_decoding/README.md— Online and offline example commands now usemeta-llama/Llama-3.2-1B-Instruct, consistent with the other 3 references to the model already in the README (lines 64, 276, 307). Correctly addresses theNo valid chat template!startup failure since the baseLlama-3.2-1Bcheckpoint ships no chat template. -
modelopt/torch/utils/plugins/transformers_dataset.py— The expanded error message is actionable and, importantly, both remediation paths it cites are real:data.chat_template=<path>is threaded into the collator viaeagle_utils.py(data_args.chat_template→LanguageDataCollator(chat_template=...)), andchat_templateis a genuineLanguageDataCollator.__init__parameter. No misleading guidance. The string change is non-breaking (no callers pattern-match on the message).
No algorithm, mode/state, export, backward-compat, or performance concerns — the change touches only documentation text and an error string. Risk level: minimal.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1755 +/- ##
===========================================
+ Coverage 58.45% 76.53% +18.07%
===========================================
Files 510 511 +1
Lines 56274 56342 +68
===========================================
+ Hits 32896 43120 +10224
+ Misses 23378 13222 -10156
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: Bug fix
The Training Draft Model with Online/Offline Base Model examples in
examples/speculative_decoding/README.mdusedmeta-llama/Llama-3.2-1B, abase / pretrained checkpoint that ships no chat template. The online EAGLE3
flow tokenizes conversations through
tokenizer.apply_chat_template(...), sothe data collator fails fast at startup:
This PR:
meta-llama/Llama-3.2-1B-Instruct, which carries a chat template.modelopt/torch/utils/plugins/transformers_dataset.pyactionable — it nowexplains the cause (base checkpoints have no chat template) and points users
at an Instruct model or a custom
chat_template.Usage
./launch_train.sh \ --config ../../modelopt_recipes/general/speculative_decoding/eagle3.yaml \ model.model_name_or_path=meta-llama/Llama-3.2-1B-Instruct \ data.data_path=input_conversations/train.jsonl \ training.output_dir=ckpts/llama-3.2-1b-onlineTesting
No valid chat template!failure with the baseLlama-3.2-1Band confirmed the Instruct variant carries a chat template.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Fixes nvbug 6304585: https://nvbugspro.nvidia.com/bug/6304585
Summary by CodeRabbit
Release Notes
Documentation
Bug Fixes