Feat: disable rope scaling for training, add yarn during export#917
Feat: disable rope scaling for training, add yarn during export#917
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ModelArguments.trust_remote_code and threads it into model/tokenizer loading; introduces eagle_train_length into EagleConfig and propagates it into EagleModel and HF export plugin (affecting rope_scaling/rope_theta); updates default Eagle configs and revises launch_train.sh GPU/FSDP and trust_remote_code handling. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
dc86aca to
cb2086a
Compare
533f400 to
5371477
Compare
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
cb2086a to
7a5d1ad
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
modelopt/torch/export/plugins/hf_spec_export.py (1)
186-194:⚠️ Potential issue | 🟠 MajorKeep exported RoPE schema decoder-aware and Transformers-version consistent.
Line 193 sets
rope_thetaat top level, while related defaults now nest it underrope_scaling; and Line 189 hardcodesrope_type, which may not match decoder schemas that usetype. This can cause config parse/behavior drift at load time.Suggested patch
- template_config["rope_scaling"] = { - "rope_type": "yarn", - "factor": 32.0, - "original_max_position_embeddings": getattr(self.model, "eagle_train_length", 4096), - } - template_config["rope_theta"] = 10000 + rope_scaling = { + "factor": 32.0, + "original_max_position_embeddings": getattr(self.model, "eagle_train_length", 4096), + "rope_theta": 10000, + } + if self.model.eagle_config.eagle_decoder_type == "kimik2": + rope_scaling["type"] = "yarn" + else: + rope_scaling["rope_type"] = "yarn" + template_config["rope_scaling"] = rope_scaling + template_config.pop("rope_theta", None)In Transformers 5.x config schemas, for Llama and Kimi-K2: 1) Should `rope_theta` be top-level or nested inside `rope_scaling`? 2) For YaRN, is the discriminator key `rope_type` or `type` for each model family? Please include links to the exact source lines in `modeling_rope_utils.py` / model config code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/export/plugins/hf_spec_export.py` around lines 186 - 194, The exported RoPE config currently writes a top-level "rope_theta" and hardcodes "rope_type", which can break decoder-aware schemas; update the export in template_config so that "rope_theta" is placed inside the "rope_scaling" dict and use the schema key "type" (not "rope_type"), and make the values derive from the model's existing config/attributes (e.g., check getattr(self.model, "rope_theta", None) or getattr(self.model.config, "rope_scaling", None) and fall back to defaults) so the export mirrors the model's actual decoder/schema (symbols to update: template_config, rope_scaling, rope_theta, and usage of self.model/self.model.config).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/speculative/config.py`:
- Around line 114-119: The eagle_train_length ModeloptField currently allows
non-positive values which can break RoPE export; add a lower-bound validation so
eagle_train_length must be >= 1. Update the declaration for eagle_train_length
in config.py (the ModeloptField for eagle_train_length) to include a minimum
constraint or attach a validator (or add a simple check in the SpeculativeConfig
post-init/validator) that raises a clear error if eagle_train_length <= 0, so
invalid values are rejected early and cannot propagate to rope_scaling.
---
Duplicate comments:
In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 186-194: The exported RoPE config currently writes a top-level
"rope_theta" and hardcodes "rope_type", which can break decoder-aware schemas;
update the export in template_config so that "rope_theta" is placed inside the
"rope_scaling" dict and use the schema key "type" (not "rope_type"), and make
the values derive from the model's existing config/attributes (e.g., check
getattr(self.model, "rope_theta", None) or getattr(self.model.config,
"rope_scaling", None) and fall back to defaults) so the export mirrors the
model's actual decoder/schema (symbols to update: template_config, rope_scaling,
rope_theta, and usage of self.model/self.model.config).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9ace503d-96da-41ba-96dd-20058692ea3a
📒 Files selected for processing (5)
examples/speculative_decoding/main.pymodelopt/torch/export/plugins/hf_spec_export.pymodelopt/torch/speculative/config.pymodelopt/torch/speculative/eagle/default_config.pymodelopt/torch/speculative/eagle/eagle_model.py
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
examples/speculative_decoding/launch_train.sh (1)
257-257: Minor: Arithmetic expansion syntax.The arithmetic expansion
$(( ... ))doesn't require$prefix for variables inside it. This is a minor style point and works correctly either way.Optional cleanup
-echo "Total time taken: $(( $(date +%s) - $start_time )) seconds" +echo "Total time taken: $(( $(date +%s) - start_time )) seconds"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/launch_train.sh` at line 257, The echo using arithmetic expansion uses unnecessary $ prefixes inside $(( ... )); update the echo statement that prints total time (the line containing echo "Total time taken: $(( $(date +%s) - $start_time )) seconds") to remove the inner $ before start_time and date substitution so the arithmetic expansion uses the simpler form $(( (date +%s) - start_time )) or similar correct POSIX arithmetic usage.modelopt/torch/export/plugins/hf_spec_export.py (1)
186-193: Consider documenting the hardcoded yarn parameters.The hardcoded
factor: 32.0and fallbackoriginal_max_position_embeddings: 4096may not be optimal for all use cases. Consider adding a brief inline comment explaining the rationale for these values, or exposing them as configurable parameters if they're expected to vary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/export/plugins/hf_spec_export.py` around lines 186 - 193, The hardcoded yarn parameters in the export path need explanation or configurability: add a brief inline comment near the rope-scaling block (where self.model.eagle_config.rope_parameters["rope_type"] == "default" and template_config["rope_scaling"] is set) documenting why "factor": 32.0 and the fallback original_max_position_embeddings default of 4096 were chosen and their expected impact, and/or expose these values as configurable options (e.g., read from self.model.eagle_config or method parameters like yarn_factor and original_max_position_embeddings) so callers can override them instead of using hardcoded literals.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 196-199: Replace the direct index access on template_config for
"rope_theta" with a safe .get usage: check template_config.get("rope_theta")
instead of template_config["rope_theta"] and only set
template_config["rope_theta"] from
self.model.eagle_config.rope_parameters.get("rope_theta") when the retrieved
value is falsy/None; update the conditional in the block that references
template_config and keep using self.model.eagle_config.rope_parameters.get to
avoid KeyError if that mapping is missing the key.
- Line 195: Update the comment that currently reads "rope_thea" to the correct
spelling "rope_theta" in the hf_spec_export.py export plugin comment; locate the
string "rope_thea" and replace it with "rope_theta" so the comment correctly
states that in transformer 5.x, rope_theta is under rope_parameters rather than
the main config.
---
Nitpick comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Line 257: The echo using arithmetic expansion uses unnecessary $ prefixes
inside $(( ... )); update the echo statement that prints total time (the line
containing echo "Total time taken: $(( $(date +%s) - $start_time )) seconds") to
remove the inner $ before start_time and date substitution so the arithmetic
expansion uses the simpler form $(( (date +%s) - start_time )) or similar
correct POSIX arithmetic usage.
In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 186-193: The hardcoded yarn parameters in the export path need
explanation or configurability: add a brief inline comment near the rope-scaling
block (where self.model.eagle_config.rope_parameters["rope_type"] == "default"
and template_config["rope_scaling"] is set) documenting why "factor": 32.0 and
the fallback original_max_position_embeddings default of 4096 were chosen and
their expected impact, and/or expose these values as configurable options (e.g.,
read from self.model.eagle_config or method parameters like yarn_factor and
original_max_position_embeddings) so callers can override them instead of using
hardcoded literals.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d4a0723f-ec4b-4579-8857-1c1510698fcb
📒 Files selected for processing (3)
examples/speculative_decoding/launch_train.shmodelopt/torch/export/plugins/hf_spec_export.pymodelopt/torch/speculative/eagle/default_config.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Around line 134-140: The script currently computes TOTAL_GPU via python -c
"import torch; print(torch.cuda.device_count())" and then divides 8192 by
TOTAL_GPU to set DEFAULT_SAVE_STEPS, which will crash on division-by-zero when
TOTAL_GPU is 0; add a guard after computing TOTAL_GPU (the TOTAL_GPU variable in
launch_train.sh) to check for a zero value and either exit with a clear error
message about no GPUs being available (non-zero exit code) or set a safe
fallback (e.g., 1) before computing DEFAULT_SAVE_STEPS, and ensure the error
message references the GPU count so it's clear why the script stopped (modify
the block that computes TOTAL_GPU and DEFAULT_SAVE_STEPS accordingly).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 93992145-574c-42fe-829c-49823dd14ce9
📒 Files selected for processing (1)
examples/speculative_decoding/launch_train.sh
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/speculative_decoding/launch_train.sh (1)
229-229: Consider makingtrust_remote_codeconfigurable.Hardcoding
--trust_remote_code Trueremoves user control over this security-sensitive flag. It allows execution of arbitrary code from model repositories. Consider exposing this as a script parameter with a default, similar to other configurable options.♻️ Proposed refactor
Add a parameter handler in the argument parsing section:
+ --trust_remote_code*) + if [[ "$1" != *=* ]]; then shift; fi + TRUST_REMOTE_CODE="${1#*=}" + ;;Add a default value:
+TRUST_REMOTE_CODE=${TRUST_REMOTE_CODE:-True}Then use the variable in the command:
- --trust_remote_code True \ + --trust_remote_code $TRUST_REMOTE_CODE \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/launch_train.sh` at line 229, The script hardcodes the security-sensitive flag `--trust_remote_code True`; instead add a configurable parameter (e.g., `TRUST_REMOTE_CODE`) in the argument parsing/flags section of launch_train.sh, give it a sensible default (False or True per project policy), and replace the literal `--trust_remote_code True` in the command invocation with the variable (e.g., `--trust_remote_code ${TRUST_REMOTE_CODE}`) so users can override it when running the script.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Line 229: The script hardcodes the security-sensitive flag
`--trust_remote_code True`; instead add a configurable parameter (e.g.,
`TRUST_REMOTE_CODE`) in the argument parsing/flags section of launch_train.sh,
give it a sensible default (False or True per project policy), and replace the
literal `--trust_remote_code True` in the command invocation with the variable
(e.g., `--trust_remote_code ${TRUST_REMOTE_CODE}`) so users can override it when
running the script.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1acf1ed6-2552-49bf-96c6-5190dce01d93
📒 Files selected for processing (3)
examples/speculative_decoding/launch_train.shexamples/speculative_decoding/main.pymodelopt/torch/export/plugins/hf_spec_export.py
🚧 Files skipped from review as they are similar to previous changes (2)
- modelopt/torch/export/plugins/hf_spec_export.py
- examples/speculative_decoding/main.py
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
examples/speculative_decoding/launch_train.sh (1)
133-142:⚠️ Potential issue | 🟠 MajorGuard
TOTAL_GPUbefore the save-step division.Line 144 still divides by
TOTAL_GPUwithout checking that it is positive. That will abort the script when no GPUs are visible, and the new multi-node branch can hit the same path ifGPU_PER_NODEresolves to0.🛡️ Proposed fix
fi + +if [[ "$TOTAL_GPU" -le 0 ]]; then + echo "Error: expected at least one visible GPU, got TOTAL_GPU=$TOTAL_GPU." + exit 1 +fi + # Calculate save_steps DEFAULT_SAVE_STEPS=$((8192 / TOTAL_GPU))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/speculative_decoding/launch_train.sh` around lines 133 - 142, The script computes TOTAL_GPU but later divides by it without validation; update the launch_train.sh logic around TOTAL_GPU (and GPU_PER_NODE/NUM_NODES) to validate TOTAL_GPU>0 before any division: if TOTAL_GPU is zero or unset, either exit with a clear error message suggesting to set CUDA_VISIBLE_DEVICES or specify GPU_PER_NODE/NUM_NODES, or set a safe default (e.g., 1) before performing divisions; ensure the checks cover both the multi-node branch (GPU_PER_NODE result) and the single-node torch.cuda.device_count() path so no division by zero occurs when TOTAL_GPU==0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Around line 29-32: The --trust_remote_code* branch currently sets
TRUST_REMOTE_CODE to unchecked text (defaulting to True) and later interpolates
it into sh -c, creating injection risk; change the logic around the case pattern
handling to default TRUST_REMOTE_CODE to "false", accept only validated
boolean-like inputs (e.g., true/false, 1/0, yes/no — normalize them to "true" or
"false") and reject/exit on anything else, and stop interpolating raw values
into sh -c by using safe conditional branches or passing a fixed flag/parameter
instead of expansion; update the handling at the --trust_remote_code* case and
any locations that execute sh -c with TRUST_REMOTE_CODE (the same pattern at the
other occurrences) to use the validated value.
- Around line 133-136: The multi-node branch computes GPU_PER_NODE using
nvidia-smi which counts physical GPUs and ignores CUDA_VISIBLE_DEVICES, causing
TOTAL_GPU to be inconsistent with the single-node path; change the calculation
to mirror the single-node behavior by using the environment-aware device count
(e.g., use torch.cuda.device_count() or an equivalent helper) when computing
GPU_PER_NODE so TOTAL_GPU reflects allocated devices; update references to
GPU_PER_NODE and TOTAL_GPU (used for DEFAULT_SAVE_STEPS, DP_SHARD_SIZE, and
--num_processes) to rely on that corrected count.
---
Duplicate comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Around line 133-142: The script computes TOTAL_GPU but later divides by it
without validation; update the launch_train.sh logic around TOTAL_GPU (and
GPU_PER_NODE/NUM_NODES) to validate TOTAL_GPU>0 before any division: if
TOTAL_GPU is zero or unset, either exit with a clear error message suggesting to
set CUDA_VISIBLE_DEVICES or specify GPU_PER_NODE/NUM_NODES, or set a safe
default (e.g., 1) before performing divisions; ensure the checks cover both the
multi-node branch (GPU_PER_NODE result) and the single-node
torch.cuda.device_count() path so no division by zero occurs when TOTAL_GPU==0.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b7725ef0-94c4-41a3-a2c4-9449e77177dc
📒 Files selected for processing (1)
examples/speculative_decoding/launch_train.sh
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #917 +/- ##
==========================================
+ Coverage 70.07% 70.47% +0.39%
==========================================
Files 221 221
Lines 25531 26154 +623
==========================================
+ Hits 17892 18433 +541
- Misses 7639 7721 +82 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
872b889 to
1c2a85d
Compare
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
ChenhanYu
left a comment
There was a problem hiding this comment.
This PR makes several changes to the EAGLE speculative decoding pipeline: disables rope scaling during training (switches to "default"), applies YaRN at export, adds eagle_train_length config, makes trust_remote_code configurable, fixes single-node GPU detection, and adds conditional FSDP2. The changes look reasonable but there are some issues with hardcoded values and inconsistent defaults (see inline comments). This PR also needs tests — at minimum for the new export rope scaling logic (hf_spec_export.py) and the eagle_train_length propagation path, since incorrect rope configs would silently degrade inference quality.
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
New unit tests added |
benchislett
left a comment
There was a problem hiding this comment.
"factor" should be equal to the ratio of the target model's max_position_embeddings and the training max_position_embeddings. 32 is not a reasonable default in all cases, and may cause a mismatch.
Also: I can't find where you are actually setting max_position_embeddings in the training config. This will need to be set to the training max seq len, and then reset to the target model's max seq len during export.
|
It would also be helpful to see some actual validation of this scaling in effect, such as the work I did for the SPEED-Bench paper (see figure on final page). |
|
moved to PR #1238 |
What does this PR do?
Type of change: New feature, bug fix
Overview:
rope_type: "default"(no scaling) during training instead of llama3-style rope scaling.rope_type: "yarn"withfactor=32.0.eagle_train_lengthconfig field toEagleConfig, which propagatestraining_seq_lenasoriginal_max_position_embeddingsin the exported YaRN rope config.rope_thetafrom the eagle config'srope_parametersinto the exported HF config (required for transformers 5.x compatibility whererope_thetalives insiderope_scaling).trust_remote_codeflag tomain.pyandlaunch_train.sh(was previously hardcoded toTrue).launch_train.shto usetorch.cuda.device_count()(respectsCUDA_VISIBLE_DEVICES) instead ofnvidia-smi(counts physical GPUs). Multi-node path keepsnvidia-smi.fsdp_config.json(FSDP2) when transformers >= 5.0; fall back to FSDP1 otherwise.Usage
Testing
Validated on single-node and multi-node training runs.
Before your PR is "Ready for review"
eagle_train_lengthdefaults to 2048,trust_remote_codedefaults toTruein the script.