Skip to content

consolidate mbridge distillation: merge distill_hf.py into distill.py#1220

Open
j-rausch wants to merge 6 commits intofeature/puzzletronfrom
jrausch/distillation-consolidation
Open

consolidate mbridge distillation: merge distill_hf.py into distill.py#1220
j-rausch wants to merge 6 commits intofeature/puzzletronfrom
jrausch/distillation-consolidation

Conversation

@j-rausch
Copy link
Copy Markdown
Contributor

@j-rausch j-rausch commented Apr 9, 2026

Summary

  • Unified examples/puzzletron/mbridge_distillation/distill_hf.py (AnyModel-specific) into examples/megatron_bridge/distill.py (general)
    • The single script now handles both standard HF and Puzzletron AnyModel checkpoints.
  • Added --hf_export_path / --student_hf_model args for inline HF export after distillation.
  • Merged AnyModel integration test into tests/examples/megatron_bridge/test_distill.py
    • test models use vocab_size=128 (instead of default 102) for TP divisibility including 8.
  • Moved MMLU distillation results into megatron_bridge/README.md
    • puzzletron README now redirects to the consolidated docs.

Limitation discovered during consolidation:
HF export via --hf_export_path seems to currently not work for Puzzletron AnyModel (heterogeneous) checkpoints. Megatron-Bridge's export_ckpt cannot reload heterogeneous model configs from saved checkpoints (heterogeneous_layers_config_encoded_json is None during __post_init__ in heterogeneous_config.py). This affects both inline --hf_export_path and the separate convert_checkpoints.py export script.

The original distill_hf.py README documented this as supported, but I think it might have been broken there too (on the side of Megatron-Bridge). The consolidated README now documents this as a known limitation. HF export for standard models works fine via both methods.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Puzzletron AnyModel checkpoints in distillation pipeline.
    • Introduced inline HuggingFace export capability during distillation process.
  • Documentation

    • Updated distillation guide with clearer conversion workflows and optional HuggingFace export instructions.
    • Added distillation benchmarks and performance recommendations.
  • Bug Fixes & Improvements

    • Streamlined test infrastructure and workflow configuration.

…still.py

Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: root <root@pool0-00848.cm.cluster>
@j-rausch j-rausch requested review from a team as code owners April 9, 2026 12:29
@j-rausch j-rausch requested review from ChenhanYu and jenchen13 and removed request for a team April 9, 2026 12:29
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR consolidates distillation documentation into a single megatron_bridge location, adds optional HuggingFace export capabilities to the megatron_bridge distillation script with new CLI flags and rank-conditional export logic, removes the redundant puzzletron-specific distillation script and documentation, extends test coverage to support AnyModel distillation, and simplifies GitHub Actions workflows by removing unused test jobs.

Changes

Cohort / File(s) Summary
Distillation Documentation & Configuration
examples/megatron_bridge/README.md, examples/puzzletron/README.md
Updated megatron_bridge distillation docs with inline HF export workflow, conversion options, AnyModel limitations, and MMLU results tables. Updated link from local mbridge_distillation path to centralized megatron_bridge docs.
Megatron Bridge Distillation Script
examples/megatron_bridge/distill.py
Added optional CLI arguments --hf_export_path and --student_hf_model for HuggingFace export control. Extended end-of-distillation flow to conditionally export final checkpoint via AutoBridge.export_ckpt on rank 0 and copy student config.json. Added guarded import of puzzletron export module.
Megatron Bridge Test Suite
tests/examples/megatron_bridge/test_distill.py
Added helper function to generate and convert tiny Qwen3 student/teacher models into AnyModel format. Introduced test_distill_puzzletron_anymodel to validate distillation with AnyModel-formatted student checkpoint. Updated existing test with pp_size=1 parameter.
Test Utilities
tests/_test_utils/torch/puzzletron/utils.py
Docstring formatting adjustment for create_and_save_small_hf_model (no functional changes).
Removed Components
examples/puzzletron/mbridge_distillation/README.md, examples/puzzletron/mbridge_distillation/distill_hf.py, tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Deleted entire puzzletron-specific distillation documentation, distill_hf.py script, and associated test file to consolidate functionality into megatron_bridge location.
CI/CD Workflow Simplification
.github/workflows/example_tests.yml, .github/workflows/_example_tests_runner.yml
Removed wait-checks job and all torch/trtllm/onnx test jobs. Updated nemo-pr matrix to run only megatron_bridge. Updated required checks to depend only on check-file-changes and nemo-pr. Added -s flag to pytest invocation for stdout capture during test runs.

Sequence Diagram(s)

sequenceDiagram
    participant Script as Distillation Script
    participant Train as Distributed Training
    participant Export as AutoBridge Export
    participant FS as File System

    Script->>Train: Run distillation loop
    Train->>Train: Training complete on all ranks
    Script->>Train: dist.cleanup() all ranks
    Train->>Script: Cleanup done
    
    rect rgba(100, 200, 100, 0.5)
    Note over Script,FS: HF Export Phase (rank 0 only)
    Script->>Export: Call export_ckpt(megatron_ckpt)
    Export->>FS: Write HF format checkpoint
    FS->>Script: Export complete
    Script->>FS: Copy student config.json to HF dir
    FS->>Script: Copy complete
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of consolidating the distill_hf.py script into distill.py, which is the primary change across the changeset.
Security Anti-Patterns ✅ Passed The pull request contains no security anti-pattern violations. All uses of trust_remote_code are properly exposed as CLI arguments defaulting to False.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jrausch/distillation-consolidation

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 requested review from AAnoosheh, danielkorzekwa and kevalmorabia97 and removed request for ChenhanYu and jenchen13 April 9, 2026 12:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1220/

Built to branch gh-pages at 2026-04-13 13:17 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/megatron_bridge/distill.py`:
- Line 304: The shutil.copy call copying config.json from args.student_hf_path
will fail for remote HuggingFace model IDs; update the logic in distill.py
(around the shutil.copy usage) to detect if args.student_hf_path is a remote
model ID and in that case use the HuggingFace API (e.g., hf_hub_download or
transformers.AutoConfig.from_pretrained / AutoConfig.save_pretrained) to fetch
the config and write it to args.hf_export_path/config.json, otherwise keep the
local shutil.copy behavior; reference args.student_hf_path and
args.hf_export_path so the code handles both local paths and remote model IDs.

In `@tests/_test_utils/torch/puzzletron/utils.py`:
- Line 89: The hardcoded trust_remote_code=True must be made
caller-configurable: add a boolean parameter trust_remote_code: bool = False to
the function that loads the HF model (the function around lines 66-72 in
tests/_test_utils/torch/puzzletron/utils.py), then pass that parameter into
AutoConfig.from_pretrained(...) and any other pretrained loaders (e.g., the call
at line ~152) instead of the hardcoded True; ensure the new parameter defaults
to False and is threaded through to all transformer loading calls
(AutoConfig.from_pretrained and tokenizer/model loading sites) so callers can
opt in when necessary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5e8fb78d-7646-49ef-ad4c-8b49d491c83b

📥 Commits

Reviewing files that changed from the base of the PR and between 25266b8 and 6abc8ab.

📒 Files selected for processing (8)
  • examples/megatron_bridge/README.md
  • examples/megatron_bridge/distill.py
  • examples/puzzletron/README.md
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • tests/_test_utils/torch/puzzletron/utils.py
  • tests/examples/megatron_bridge/test_distill.py
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (3)
  • examples/puzzletron/mbridge_distillation/README.md
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
  • examples/puzzletron/mbridge_distillation/distill_hf.py

For more details, you can refer to the checkpoint conversion scripts in the [Megatron-Bridge README](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/conversion).
For more details, see the [Megatron-Bridge conversion README](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/conversion).

> **Known limitation:** HF export does not yet work for Puzzletron AnyModel (heterogeneous) checkpoints -- Megatron-Bridge cannot reload heterogeneous configs from saved checkpoints. Standard models export correctly with both methods.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this in nemo:26.02.00 or nemo:26.02.01? It was fixed in 26.02.01. Please give that a try again


> **Known limitation:** HF export does not yet work for Puzzletron AnyModel (heterogeneous) checkpoints -- Megatron-Bridge cannot reload heterogeneous configs from saved checkpoints. Standard models export correctly with both methods.

### Distillation Results
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create results/puzzletron.md file and move the results there and add a reference to it here? I plan to add minitron distillation results also so this way we can keep this doc clean.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can keep in examples/puzzletron/README.md also since thats where actual pruning is happening. Either is fine

@kevalmorabia97 kevalmorabia97 force-pushed the jrausch/distillation-consolidation branch from 4ed9996 to 8c2fa10 Compare April 10, 2026 18:54
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner April 10, 2026 18:54
@kevalmorabia97 kevalmorabia97 requested review from kevalmorabia97 and removed request for a team April 10, 2026 18:54
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.80%. Comparing base (38d9522) to head (151b03d).

Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron    #1220      +/-   ##
======================================================
+ Coverage               59.90%   61.80%   +1.89%     
======================================================
  Files                     453      463      +10     
  Lines                   47957    48440     +483     
======================================================
+ Hits                    28728    29937    +1209     
+ Misses                  19229    18503     -726     
Flag Coverage Δ
unit 51.83% <ø> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 8c2fa10

@kevalmorabia97 kevalmorabia97 requested review from a team as code owners April 10, 2026 23:38
@kevalmorabia97 kevalmorabia97 requested review from realAsma and removed request for a team April 10, 2026 23:38
@kevalmorabia97 kevalmorabia97 force-pushed the jrausch/distillation-consolidation branch from 3c0fa5a to 0c2a2ee Compare April 10, 2026 23:39
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 0c2a2ee

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 286916a

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 494cf38

…tion

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the jrausch/distillation-consolidation branch from 494cf38 to 3e39174 Compare April 11, 2026 14:25
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 3e39174

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants