Feat: disable rope scaling for training, add yarn during export by h-guo18 · Pull Request #917 · NVIDIA/Model-Optimizer

h-guo18 · 2026-02-23T06:29:17Z

What does this PR do?

Type of change: New feature, bug fix

Overview:

Disable rope scaling during EAGLE training for better training stability. The default EAGLE config now uses rope_type: "default" (no scaling) during training instead of llama3-style rope scaling.
Add YaRN rope scaling during export for improved long-context inference quality. When exporting a model trained with default rope, the exporter sets rope_type: "yarn" with factor=32.0.
Add eagle_train_length config field to EagleConfig, which propagates training_seq_len as original_max_position_embeddings in the exported YaRN rope config.
Propagate rope_theta from the eagle config's rope_parameters into the exported HF config (required for transformers 5.x compatibility where rope_theta lives inside rope_scaling).
Add trust_remote_code flag to main.py and launch_train.sh (was previously hardcoded to True).
Fix single-node GPU detection in launch_train.sh to use torch.cuda.device_count() (respects CUDA_VISIBLE_DEVICES) instead of nvidia-smi (counts physical GPUs). Multi-node path keeps nvidia-smi.
Conditional FSDP2: only use fsdp_config.json (FSDP2) when transformers >= 5.0; fall back to FSDP1 otherwise.

Usage

# Training: rope scaling is disabled by default for stability
# Export: yarn rope scaling is automatically applied
mtsp.convert(model, [("eagle", {
    "eagle_train_length": training_args.training_seq_len,  # propagated to rope config at export
    ...
})])

Testing

Validated on single-node and multi-node training runs.

Before your PR is "Ready for review"

Is this change backward compatible?: Yes — eagle_train_length defaults to 2048, trust_remote_code defaults to True in the script.
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

copy-pr-bot · 2026-02-23T06:29:20Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-23T06:29:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds ModelArguments.trust_remote_code and threads it into model/tokenizer loading; introduces eagle_train_length into EagleConfig and propagates it into EagleModel and HF export plugin (affecting rope_scaling/rope_theta); updates default Eagle configs and revises launch_train.sh GPU/FSDP and trust_remote_code handling.

Changes

Cohort / File(s)	Summary
Examples — CLI & invocation `examples/speculative_decoding/main.py`, `examples/speculative_decoding/launch_train.sh`	Adds `ModelArguments.trust_remote_code`; replaces hardcoded `trust_remote_code=True` with `model_args.trust_remote_code` in loader/tokenizer calls; launch script exposes `--trust_remote_code`, revises TOTAL_GPU detection and FSDP argument construction.
Eagle config schema & defaults `modelopt/torch/speculative/config.py`, `modelopt/torch/speculative/eagle/default_config.py`	Adds `eagle_train_length` to `EagleConfig` (default 2048); adjusts default `rope_scaling` structure and aligns `rope_theta` placement/values in default Eagle configs.
Eagle model wiring `modelopt/torch/speculative/eagle/eagle_model.py`	Reads `config.eagle_train_length` and assigns it to `self.eagle_train_length` during model modification.
HF export plugin `modelopt/torch/export/plugins/hf_spec_export.py`	During export, if `rope_type == "default"` injects a `rope_scaling` entry (type `"yarn"`, `factor: 32.0`, `original_max_position_embeddings` from `eagle_train_length`); fills missing `rope_theta` from `model.eagle_config.rope_parameters`.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant User
participant CLI as "Launcher / Script"
participant Loader as "Model/Tokenizer Loader"
participant Model as "EagleModel"
participant Export as "HF Export Plugin"
participant Template as "TemplateConfig"
User->>CLI: start training/conversion (passes flags incl. trust_remote_code)
CLI->>Loader: load model/tokenizer (trust_remote_code=model_args.trust_remote_code)
Loader->>Model: initialize/modify (pass eagle config)
Model->>Model: set eagle_train_length
Model->>Export: request export config
Export->>Model: read eagle_train_length & rope_parameters
Export->>Template: inject rope_scaling / rope_theta
Template->>User: emit final HF model/config

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	Pull request contains critical security violation: TRUST_REMOTE_CODE defaults to True in launch_train.sh, violating SECURITY.md requirement to default to False.	Change line 147 to default TRUST_REMOTE_CODE to False and add input validation to accept only 'true' or 'false' values.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: disabling rope scaling for training and adding yarn rope scaling during export, which aligns with the config modifications and export plugin updates across multiple files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch haoguo/eagle-rope

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

modelopt/torch/export/plugins/hf_spec_export.py

modelopt/torch/speculative/eagle/default_config.py

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

modelopt/torch/export/plugins/hf_spec_export.py (1)

186-194: ⚠️ Potential issue | 🟠 Major

Keep exported RoPE schema decoder-aware and Transformers-version consistent.

Line 193 sets rope_theta at top level, while related defaults now nest it under rope_scaling; and Line 189 hardcodes rope_type, which may not match decoder schemas that use type. This can cause config parse/behavior drift at load time.

Suggested patch

-        template_config["rope_scaling"] = {
-            "rope_type": "yarn",
-            "factor": 32.0,
-            "original_max_position_embeddings": getattr(self.model, "eagle_train_length", 4096),
-        }
-        template_config["rope_theta"] = 10000
+        rope_scaling = {
+            "factor": 32.0,
+            "original_max_position_embeddings": getattr(self.model, "eagle_train_length", 4096),
+            "rope_theta": 10000,
+        }
+        if self.model.eagle_config.eagle_decoder_type == "kimik2":
+            rope_scaling["type"] = "yarn"
+        else:
+            rope_scaling["rope_type"] = "yarn"
+        template_config["rope_scaling"] = rope_scaling
+        template_config.pop("rope_theta", None)

In Transformers 5.x config schemas, for Llama and Kimi-K2:
1) Should `rope_theta` be top-level or nested inside `rope_scaling`?
2) For YaRN, is the discriminator key `rope_type` or `type` for each model family?
Please include links to the exact source lines in `modeling_rope_utils.py` / model config code.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/plugins/hf_spec_export.py` around lines 186 - 194, The
exported RoPE config currently writes a top-level "rope_theta" and hardcodes
"rope_type", which can break decoder-aware schemas; update the export in
template_config so that "rope_theta" is placed inside the "rope_scaling" dict
and use the schema key "type" (not "rope_type"), and make the values derive from
the model's existing config/attributes (e.g., check getattr(self.model,
"rope_theta", None) or getattr(self.model.config, "rope_scaling", None) and fall
back to defaults) so the export mirrors the model's actual decoder/schema
(symbols to update: template_config, rope_scaling, rope_theta, and usage of
self.model/self.model.config).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/speculative/config.py`:
- Around line 114-119: The eagle_train_length ModeloptField currently allows
non-positive values which can break RoPE export; add a lower-bound validation so
eagle_train_length must be >= 1. Update the declaration for eagle_train_length
in config.py (the ModeloptField for eagle_train_length) to include a minimum
constraint or attach a validator (or add a simple check in the SpeculativeConfig
post-init/validator) that raises a clear error if eagle_train_length <= 0, so
invalid values are rejected early and cannot propagate to rope_scaling.

---

Duplicate comments:
In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 186-194: The exported RoPE config currently writes a top-level
"rope_theta" and hardcodes "rope_type", which can break decoder-aware schemas;
update the export in template_config so that "rope_theta" is placed inside the
"rope_scaling" dict and use the schema key "type" (not "rope_type"), and make
the values derive from the model's existing config/attributes (e.g., check
getattr(self.model, "rope_theta", None) or getattr(self.model.config,
"rope_scaling", None) and fall back to defaults) so the export mirrors the
model's actual decoder/schema (symbols to update: template_config, rope_scaling,
rope_theta, and usage of self.model/self.model.config).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9ace503d-96da-41ba-96dd-20058692ea3a

📥 Commits

Reviewing files that changed from the base of the PR and between 2d7d1ec and 97b425b.

📒 Files selected for processing (5)

examples/speculative_decoding/main.py
modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/config.py
modelopt/torch/speculative/eagle/default_config.py
modelopt/torch/speculative/eagle/eagle_model.py

modelopt/torch/speculative/config.py

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

examples/speculative_decoding/launch_train.sh (1)
257-257: Minor: Arithmetic expansion syntax.

The arithmetic expansion $(( ... )) doesn't require $ prefix for variables inside it. This is a minor style point and works correctly either way.
Optional cleanup
-echo "Total time taken: $(( $(date +%s) - $start_time )) seconds"
+echo "Total time taken: $(( $(date +%s) - start_time )) seconds"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/launch_train.sh` at line 257, The echo using
arithmetic expansion uses unnecessary $ prefixes inside $(( ... )); update the
echo statement that prints total time (the line containing echo "Total time
taken: $(( $(date +%s) - $start_time )) seconds") to remove the inner $ before
start_time and date substitution so the arithmetic expansion uses the simpler
form $(( (date +%s) - start_time )) or similar correct POSIX arithmetic usage.
modelopt/torch/export/plugins/hf_spec_export.py (1)
186-193: Consider documenting the hardcoded yarn parameters.

The hardcoded factor: 32.0 and fallback original_max_position_embeddings: 4096 may not be optimal for all use cases. Consider adding a brief inline comment explaining the rationale for these values, or exposing them as configurable parameters if they're expected to vary.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/plugins/hf_spec_export.py` around lines 186 - 193, The
hardcoded yarn parameters in the export path need explanation or
configurability: add a brief inline comment near the rope-scaling block (where
self.model.eagle_config.rope_parameters["rope_type"] == "default" and
template_config["rope_scaling"] is set) documenting why "factor": 32.0 and the
fallback original_max_position_embeddings default of 4096 were chosen and their
expected impact, and/or expose these values as configurable options (e.g., read
from self.model.eagle_config or method parameters like yarn_factor and
original_max_position_embeddings) so callers can override them instead of using
hardcoded literals.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 196-199: Replace the direct index access on template_config for
"rope_theta" with a safe .get usage: check template_config.get("rope_theta")
instead of template_config["rope_theta"] and only set
template_config["rope_theta"] from
self.model.eagle_config.rope_parameters.get("rope_theta") when the retrieved
value is falsy/None; update the conditional in the block that references
template_config and keep using self.model.eagle_config.rope_parameters.get to
avoid KeyError if that mapping is missing the key.
- Line 195: Update the comment that currently reads "rope_thea" to the correct
spelling "rope_theta" in the hf_spec_export.py export plugin comment; locate the
string "rope_thea" and replace it with "rope_theta" so the comment correctly
states that in transformer 5.x, rope_theta is under rope_parameters rather than
the main config.

---

Nitpick comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Line 257: The echo using arithmetic expansion uses unnecessary $ prefixes
inside $(( ... )); update the echo statement that prints total time (the line
containing echo "Total time taken: $(( $(date +%s) - $start_time )) seconds") to
remove the inner $ before start_time and date substitution so the arithmetic
expansion uses the simpler form $(( (date +%s) - start_time )) or similar
correct POSIX arithmetic usage.

In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 186-193: The hardcoded yarn parameters in the export path need
explanation or configurability: add a brief inline comment near the rope-scaling
block (where self.model.eagle_config.rope_parameters["rope_type"] == "default"
and template_config["rope_scaling"] is set) documenting why "factor": 32.0 and
the fallback original_max_position_embeddings default of 4096 were chosen and
their expected impact, and/or expose these values as configurable options (e.g.,
read from self.model.eagle_config or method parameters like yarn_factor and
original_max_position_embeddings) so callers can override them instead of using
hardcoded literals.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d4a0723f-ec4b-4579-8857-1c1510698fcb

📥 Commits

Reviewing files that changed from the base of the PR and between 97b425b and 9e723b1.

📒 Files selected for processing (3)

examples/speculative_decoding/launch_train.sh
modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/eagle/default_config.py

modelopt/torch/export/plugins/hf_spec_export.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Around line 134-140: The script currently computes TOTAL_GPU via python -c
"import torch; print(torch.cuda.device_count())" and then divides 8192 by
TOTAL_GPU to set DEFAULT_SAVE_STEPS, which will crash on division-by-zero when
TOTAL_GPU is 0; add a guard after computing TOTAL_GPU (the TOTAL_GPU variable in
launch_train.sh) to check for a zero value and either exit with a clear error
message about no GPUs being available (non-zero exit code) or set a safe
fallback (e.g., 1) before computing DEFAULT_SAVE_STEPS, and ensure the error
message references the GPU count so it's clear why the script stopped (modify
the block that computes TOTAL_GPU and DEFAULT_SAVE_STEPS accordingly).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 93992145-574c-42fe-829c-49823dd14ce9

📥 Commits

Reviewing files that changed from the base of the PR and between 9e723b1 and 4a56a96.

📒 Files selected for processing (1)

examples/speculative_decoding/launch_train.sh

examples/speculative_decoding/launch_train.sh

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

🧹 Nitpick comments (1)

examples/speculative_decoding/launch_train.sh (1)
229-229: Consider making trust_remote_code configurable.

Hardcoding --trust_remote_code True removes user control over this security-sensitive flag. It allows execution of arbitrary code from model repositories. Consider exposing this as a script parameter with a default, similar to other configurable options.
♻️ Proposed refactor

Add a parameter handler in the argument parsing section:
+    --trust_remote_code*)
+      if [[ "$1" != *=* ]]; then shift; fi
+      TRUST_REMOTE_CODE="${1#*=}"
+      ;;
Add a default value:
+TRUST_REMOTE_CODE=${TRUST_REMOTE_CODE:-True}
Then use the variable in the command:
-    --trust_remote_code True \
+    --trust_remote_code $TRUST_REMOTE_CODE \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/launch_train.sh` at line 229, The script
hardcodes the security-sensitive flag `--trust_remote_code True`; instead add a
configurable parameter (e.g., `TRUST_REMOTE_CODE`) in the argument parsing/flags
section of launch_train.sh, give it a sensible default (False or True per
project policy), and replace the literal `--trust_remote_code True` in the
command invocation with the variable (e.g., `--trust_remote_code
${TRUST_REMOTE_CODE}`) so users can override it when running the script.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Line 229: The script hardcodes the security-sensitive flag
`--trust_remote_code True`; instead add a configurable parameter (e.g.,
`TRUST_REMOTE_CODE`) in the argument parsing/flags section of launch_train.sh,
give it a sensible default (False or True per project policy), and replace the
literal `--trust_remote_code True` in the command invocation with the variable
(e.g., `--trust_remote_code ${TRUST_REMOTE_CODE}`) so users can override it when
running the script.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1acf1ed6-2552-49bf-96c6-5190dce01d93

📥 Commits

Reviewing files that changed from the base of the PR and between 4a56a96 and ff6b711.

📒 Files selected for processing (3)

examples/speculative_decoding/launch_train.sh
examples/speculative_decoding/main.py
modelopt/torch/export/plugins/hf_spec_export.py

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/export/plugins/hf_spec_export.py
examples/speculative_decoding/main.py

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

examples/speculative_decoding/launch_train.sh (1)

133-142: ⚠️ Potential issue | 🟠 Major

Guard TOTAL_GPU before the save-step division.

Line 144 still divides by TOTAL_GPU without checking that it is positive. That will abort the script when no GPUs are visible, and the new multi-node branch can hit the same path if GPU_PER_NODE resolves to 0.

🛡️ Proposed fix

 fi
+
+if [[ "$TOTAL_GPU" -le 0 ]]; then
+  echo "Error: expected at least one visible GPU, got TOTAL_GPU=$TOTAL_GPU."
+  exit 1
+fi
+
 # Calculate save_steps
 DEFAULT_SAVE_STEPS=$((8192 / TOTAL_GPU))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/speculative_decoding/launch_train.sh` around lines 133 - 142, The
script computes TOTAL_GPU but later divides by it without validation; update the
launch_train.sh logic around TOTAL_GPU (and GPU_PER_NODE/NUM_NODES) to validate
TOTAL_GPU>0 before any division: if TOTAL_GPU is zero or unset, either exit with
a clear error message suggesting to set CUDA_VISIBLE_DEVICES or specify
GPU_PER_NODE/NUM_NODES, or set a safe default (e.g., 1) before performing
divisions; ensure the checks cover both the multi-node branch (GPU_PER_NODE
result) and the single-node torch.cuda.device_count() path so no division by
zero occurs when TOTAL_GPU==0.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Around line 29-32: The --trust_remote_code* branch currently sets
TRUST_REMOTE_CODE to unchecked text (defaulting to True) and later interpolates
it into sh -c, creating injection risk; change the logic around the case pattern
handling to default TRUST_REMOTE_CODE to "false", accept only validated
boolean-like inputs (e.g., true/false, 1/0, yes/no — normalize them to "true" or
"false") and reject/exit on anything else, and stop interpolating raw values
into sh -c by using safe conditional branches or passing a fixed flag/parameter
instead of expansion; update the handling at the --trust_remote_code* case and
any locations that execute sh -c with TRUST_REMOTE_CODE (the same pattern at the
other occurrences) to use the validated value.
- Around line 133-136: The multi-node branch computes GPU_PER_NODE using
nvidia-smi which counts physical GPUs and ignores CUDA_VISIBLE_DEVICES, causing
TOTAL_GPU to be inconsistent with the single-node path; change the calculation
to mirror the single-node behavior by using the environment-aware device count
(e.g., use torch.cuda.device_count() or an equivalent helper) when computing
GPU_PER_NODE so TOTAL_GPU reflects allocated devices; update references to
GPU_PER_NODE and TOTAL_GPU (used for DEFAULT_SAVE_STEPS, DP_SHARD_SIZE, and
--num_processes) to rely on that corrected count.

---

Duplicate comments:
In `@examples/speculative_decoding/launch_train.sh`:
- Around line 133-142: The script computes TOTAL_GPU but later divides by it
without validation; update the launch_train.sh logic around TOTAL_GPU (and
GPU_PER_NODE/NUM_NODES) to validate TOTAL_GPU>0 before any division: if
TOTAL_GPU is zero or unset, either exit with a clear error message suggesting to
set CUDA_VISIBLE_DEVICES or specify GPU_PER_NODE/NUM_NODES, or set a safe
default (e.g., 1) before performing divisions; ensure the checks cover both the
multi-node branch (GPU_PER_NODE result) and the single-node
torch.cuda.device_count() path so no division by zero occurs when TOTAL_GPU==0.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b7725ef0-94c4-41a3-a2c4-9449e77177dc

📥 Commits

Reviewing files that changed from the base of the PR and between ff6b711 and 1c2a85d.

📒 Files selected for processing (1)

examples/speculative_decoding/launch_train.sh

examples/speculative_decoding/launch_train.sh

codecov · 2026-03-14T02:24:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.47%. Comparing base (2d7d1ec) to head (1c2a85d).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #917      +/-   ##
==========================================
+ Coverage   70.07%   70.47%   +0.39%     
==========================================
  Files         221      221              
  Lines       25531    26154     +623     
==========================================
+ Hits        17892    18433     +541     
- Misses       7639     7721      +82

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

examples/speculative_decoding/launch_train.sh

modelopt/torch/export/plugins/hf_spec_export.py

modelopt/torch/speculative/eagle/default_config.py

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

ChenhanYu

This PR makes several changes to the EAGLE speculative decoding pipeline: disables rope scaling during training (switches to "default"), applies YaRN at export, adds eagle_train_length config, makes trust_remote_code configurable, fixes single-node GPU detection, and adds conditional FSDP2. The changes look reasonable but there are some issues with hardcoded values and inconsistent defaults (see inline comments). This PR also needs tests — at minimum for the new export rope scaling logic (hf_spec_export.py) and the eagle_train_length propagation path, since incorrect rope configs would silently degrade inference quality.

modelopt/torch/export/plugins/hf_spec_export.py

modelopt/torch/speculative/eagle/default_config.py

examples/speculative_decoding/main.py

examples/speculative_decoding/launch_train.sh

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 · 2026-03-19T03:56:39Z

This PR makes several changes to the EAGLE speculative decoding pipeline: disables rope scaling during training (switches to "default"), applies YaRN at export, adds eagle_train_length config, makes trust_remote_code configurable, fixes single-node GPU detection, and adds conditional FSDP2. The changes look reasonable but there are some issues with hardcoded values and inconsistent defaults (see inline comments). This PR also needs tests — at minimum for the new export rope scaling logic (hf_spec_export.py) and the eagle_train_length propagation path, since incorrect rope configs would silently degrade inference quality.

New unit tests added

benchislett

"factor" should be equal to the ratio of the target model's max_position_embeddings and the training max_position_embeddings. 32 is not a reasonable default in all cases, and may cause a mismatch.

Also: I can't find where you are actually setting max_position_embeddings in the training config. This will need to be set to the training max seq len, and then reset to the target model's max seq len during export.

benchislett · 2026-03-19T20:28:22Z

It would also be helpful to see some actual validation of this scaling in effect, such as the work I did for the SPEED-Bench paper (see figure on final page).

h-guo18 · 2026-04-12T06:03:26Z

moved to PR #1238

h-guo18 requested a review from benchislett February 23, 2026 06:35

h-guo18 force-pushed the haoguo/eagle-rope branch from dc86aca to cb2086a Compare February 23, 2026 06:50

benchislett reviewed Feb 23, 2026

View reviewed changes

modelopt/torch/export/plugins/hf_spec_export.py Outdated Show resolved Hide resolved

benchislett reviewed Feb 23, 2026

View reviewed changes

modelopt/torch/export/plugins/hf_spec_export.py Outdated Show resolved Hide resolved

benchislett reviewed Feb 23, 2026

View reviewed changes

modelopt/torch/speculative/eagle/default_config.py Show resolved Hide resolved

h-guo18 force-pushed the haoguo/eagle-export branch 3 times, most recently from 533f400 to 5371477 Compare March 3, 2026 23:11

Base automatically changed from haoguo/eagle-export to main March 4, 2026 01:46

disable rope scaling for training, add yarn during export

7a5d1ad

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 force-pushed the haoguo/eagle-rope branch from cb2086a to 7a5d1ad Compare March 14, 2026 01:04

h-guo18 added 2 commits March 14, 2026 01:25

fix

97b425b

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

dev

8387bee

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 marked this pull request as ready for review March 14, 2026 01:49

h-guo18 requested review from a team as code owners March 14, 2026 01:49

h-guo18 requested review from Edwardf0t1 and yeyu-nvidia March 14, 2026 01:49

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

modelopt/torch/speculative/config.py Show resolved Hide resolved

h-guo18 added 2 commits March 14, 2026 01:55

fix

9e723b1

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

fix

4a56a96

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

modelopt/torch/export/plugins/hf_spec_export.py Outdated Show resolved Hide resolved

modelopt/torch/export/plugins/hf_spec_export.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

examples/speculative_decoding/launch_train.sh Show resolved Hide resolved

coderabbit

ff6b711

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 changed the title ~~disable rope scaling for training, add yarn during export~~ Feat: disable rope scaling for training, add yarn during export Mar 14, 2026

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

coderabbit

1c2a85d

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

examples/speculative_decoding/launch_train.sh Show resolved Hide resolved

examples/speculative_decoding/launch_train.sh Show resolved Hide resolved

h-guo18 force-pushed the haoguo/eagle-rope branch 2 times, most recently from 872b889 to 1c2a85d Compare March 16, 2026 04:26

h-guo18 self-assigned this Mar 16, 2026

benchislett reviewed Mar 16, 2026

View reviewed changes

examples/speculative_decoding/launch_train.sh Show resolved Hide resolved

benchislett reviewed Mar 16, 2026

View reviewed changes

modelopt/torch/export/plugins/hf_spec_export.py Outdated Show resolved Hide resolved

benchislett reviewed Mar 16, 2026

View reviewed changes

modelopt/torch/speculative/eagle/default_config.py Show resolved Hide resolved

benchislett reviewed Mar 16, 2026

View reviewed changes

modelopt/torch/speculative/eagle/default_config.py Show resolved Hide resolved

benchislett requested changes Mar 16, 2026

View reviewed changes

address comments

4f57e87

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

ChenhanYu requested changes Mar 18, 2026

View reviewed changes

h-guo18 added 2 commits March 19, 2026 03:31

address comments

bd2a092

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

add test

db7cd08

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

benchislett requested changes Mar 19, 2026

View reviewed changes

h-guo18 closed this Apr 12, 2026

Conversation

h-guo18 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChenhanYu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

h-guo18 commented Mar 19, 2026

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

benchislett commented Mar 19, 2026

Uh oh!

h-guo18 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h-guo18 commented Feb 23, 2026 •

edited

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

codecov bot commented Mar 14, 2026 •

edited

Loading