Skip to content

fastgen DMD2: make the Qwen-Image example self-contained on stock nemo_automodel#1688

Open
jingyu-ml wants to merge 16 commits into
mainfrom
jingyux/fastgen-vendor-automodel-patches
Open

fastgen DMD2: make the Qwen-Image example self-contained on stock nemo_automodel#1688
jingyu-ml wants to merge 16 commits into
mainfrom
jingyux/fastgen-vendor-automodel-patches

Conversation

@jingyu-ml

@jingyu-ml jingyu-ml commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: new example / refactor (example self-containment)

The published examples/diffusers/fastgen DMD2 Qwen-Image distillation example previously relied on
local, unpublished modifications to the sibling nemo_automodel package (data path, collate, a
partial-load checkpointer, and Qwen-Image preprocessing). An external user running the published
Model-Optimizer example against official stock nemo_automodel would hit import/attribute errors.

This PR makes the example self-contained on stock nemo_automodel>=0.4.0 — with the changes kept
as small as possible: the team's actual AutoModel delta is only ~430 lines, so wherever the
upstream module is importable, the delta is expressed as a thin subclass/wrapper rather than a
full-file copy.

  • fastgen_data/ — the DMD2 data path:
    • collate_fns.pythin wrappers over stock: import the unpatched upstream collate +
      bucket sampler and re-add only the DMD2 delta (text_embeddings_mask, stacked from the
      per-item mask the stock collate omits; and an optional broadcast negative_text_embeddings
      for CFG). The builder loads an optional negative_prompt_embedding_path.
    • text_to_image_dataset.py — a faithful vendored copy of the upstream reader (its
      prompt_embeds_mask emission is interleaved with cache loading, so wrapping it would force a
      redundant per-item torch.load; carried verbatim instead).
  • fastgen_checkpoint.pyPartialLoadCheckpointer(Checkpointer) that overrides only
    load_optimizer (FSDP2 DefaultLoadPlanner(allow_partial_load=True)) so optimizer resume works
    without patching upstream. Injected via an in-place re-bless of self.checkpointer in the
    recipe's load_checkpoint (model-state load stays strict).
  • preprocess/ — Qwen-Image preprocessing (preprocessing_multiprocess.py + processors/),
    trimmed to the image path (drops the flux/wan/hunyuan processors and the video base class).
    It lives in AutoModel's top-level tools/ tree, which is not shipped in the pip package, so
    it cannot be wrapped and is vendored; MultiTierBucketCalculator is imported from stock upstream.
  • make_negative_prompt_embedding.py — generates the optional CFG negative-prompt embedding.
  • All configs/*.yaml target fastgen_data.build_* (a test enumerates every config).
  • Licensing/provenance per CONTRIBUTING.md (source link + commit, original Apache-2.0 header, then
    NVIDIA SPDX header on verbatim-vendored files; a NeMo-AutoModel note in LICENSE; insert-license
    excludes for the verbatim files; nemo_automodel[diffusion] version bound in requirements.txt).

The DMD2 math in modelopt/torch/fastgen/ is unchanged — only example/training-time glue moved.

Usage

# Install example deps (stock nemo_automodel) from a source checkout
pip install -r examples/diffusers/fastgen/requirements.txt

# Build the training cache from raw images (Qwen-Image VAE latents + text embeddings)
python examples/diffusers/fastgen/preprocess_qwen_image.py image \
    --image_dir <raw images> --output_dir <cache dir> --processor qwen_image \
    --caption_format meta_json

# Generate the CFG negative-prompt embedding once
python examples/diffusers/fastgen/make_negative_prompt_embedding.py \
    --output <cache dir>/negative_prompt_embedding.pt

# Point the config's data.dataloader.cache_dir + negative_prompt_embedding_path at the cache, then train.

See examples/diffusers/fastgen/README.md → "Requirements & self-contained data path".

Testing

  • New tests/examples/diffusers/fastgen/test_vendored_migration.py: environment-independent
    invariants (every config targets a vendored builder; no tools.* imports; each former AutoModel
    patch is vendored / wrapped / a documented exclusion; provenance + SPDX headers present) plus
    dependency-guarded structural tests (the collate wrapper emits the batch contract + broadcasts the
    negative embedding; the builder accepts negative_prompt_embedding_path; the checkpointer
    overrides only load_optimizer; the Qwen-Image processor self-registers).
  • Validated 9/9 against a pure stock nemo_automodel 0.4.0 worktree (none of the local patches
    present) via SLURM — re-run after this slim-down.
  • The migrated code path is exercised by a live multi-GPU DMD2 run that resumed from a checkpoint
    through the vendored PartialLoadCheckpointer.
  • ruff check + ruff format --check clean on all changed files.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅ (additive; bundled configs target the vendored builders)
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ (NeMo-AutoModel @ e42584e3, Apache-2.0 — provenance + original license retained, NVIDIA SPDX header added after; LICENSE third-party note; nemo_automodel was already a dependency)
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A (example-only change)
  • Did you get Claude approval on this PR?: ❌ (pending — opened as draft)

Additional Information

Opened as a draft pending: OSRB review of the vendored NeMo-AutoModel (Apache-2.0) code, CI green,
and (optional) a multi-GPU smoke-train + resume on stock upstream. Vendored from
NVIDIA-NeMo/Automodel at commit e42584e3.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Vendored Qwen‑Image preprocessing and processors; multi‑process dataset builder; fastgen_data dataloader package, TextToImageDataset, collate/batching with negative‑prompt support; negative‑prompt embedding generator; checkpoint loader tolerant of partial FSDP2 optimizer shards; CLI entrypoints and updated training script/configs (removed smoke config).
  • Documentation

    • README updated for real‑data training, install instructions, config references, and troubleshooting.
  • Tests

    • New tests validating vendored migration, dataloader/collate behavior, processor registry, and checkpoint upgrade.
  • Chores

    • License attribution added, example requirements pinned, mypy override for vendored code, pre‑commit exclusion and CODEOWNERS updated.

jingyu-ml and others added 11 commits June 10, 2026 17:41
…n-time self-contained)

Make the published DMD2 Qwen-Image example run against STOCK upstream nemo_automodel by
removing its dependence on local AutoModel modifications (staged patches #1-5):

- Vendor the patched data files into examples/diffusers/fastgen/fastgen_data/
  (collate_fns, text_to_image_dataset, mock_dataloader), importing the UNPATCHED upstream
  helpers (sampler, base_dataset, text_to_video_dataset) from the stock nemo_automodel
  package. Provenance headers per CONTRIBUTING (source + commit e42584e3, Apache-2.0).
- Repoint the published configs' dataloader _target_ to fastgen_data.build_* and add an
  explicit sys.path seam + startup resolution logging in dmd2_finetune.py (source-checkout).
- Add fastgen_checkpoint.PartialLoadCheckpointer (overrides only load_optimizer with
  allow_partial_load=True) and inject it in dmd2_recipe.load_checkpoint, so both the student
  and fake-score optimizer restores tolerate FSDP2 partial shards without a patched AutoModel.

No change to modelopt/torch/fastgen/ (DMD2 math). Preprocessing vendoring + tests + the
removability audit follow in subsequent commits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Bound nemo_automodel[diffusion] to the tested range (>=0.4.0,<1.0; 0.4.0 == the validated
commit e42584e3) and add a runtime soft-guard in fastgen_data/__init__.py that converts a
missing unpatched-upstream-helper ImportError into an actionable message naming the
supported range. Supports AC-7 (bounded version) and AC-1 (clear failure on drift).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…ined, no tools/ tree)

Vendor the AutoModel preprocessing stack needed to build the VAE+text-embed cache from raw
images, trimmed to the Qwen-Image path:
- preprocess/processors/{base,base_video,registry,caption_loaders,qwen_image}.py + a trimmed
  processors/__init__.py that drops flux/wan/hunyuan (and their deps, e.g.
  nemo_automodel.shared.transformers_patches). The Qwen-Image processor self-registers.
- preprocess/preprocessing_multiprocess.py (driver): the `from tools.diffusion.processors`
  import is rewritten to the vendored `.processors` package; MultiTierBucketCalculator is
  imported from the stock nemo_automodel package (it IS packaged). Image subcommand
  --processor defaults to qwen_image.
- preprocess_qwen_image.py launcher mirroring dmd2_finetune.py's sys.path seam.

Removes the example's dependence on AutoModel staged patches #7-8 and on the un-packaged
AutoModel tools/ tree. Provenance headers per CONTRIBUTING. No tools.* imports remain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…per CONTRIBUTING)

The 10 vendored files (fastgen_data/{collate_fns,text_to_image_dataset,mock_dataloader}.py and
preprocess/{preprocessing_multiprocess,processors/*}.py) carry the original NeMo-AutoModel
Apache-2.0 header plus a "Vendored from ... @ e42584e3" provenance note. Add them to the
insert-license-py exclude so the hook does not prepend a second NVIDIA header on top.

LICENSE Third-Party Software Notices is intentionally NOT modified: every vendored file is
NVIDIA-copyright Apache-2.0 (same entity and license as this repo, from the sibling
NVIDIA-NeMo/Automodel repo), so there is no external third-party copyright holder to list;
the per-file provenance headers are the applicable record. OSRB sign-off remains a maintainer
release gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Add examples/diffusers/fastgen/tests/test_vendored_migration.py with two tiers:
- Environment-independent (run anywhere): published configs repointed to fastgen_data.*,
  no tools.* imports in vendored code, the AC-8 removability coverage audit (all nine staged
  AutoModel files vendored-or-excluded), and provenance headers on the 10 vendored files.
- Dependency-guarded (importorskip nemo_automodel/torch, run in the container): data builders
  accept negative_prompt_embedding_path, collate emits the batch contract + broadcasts the
  negative embed, PartialLoadCheckpointer overrides only load_optimizer (model load stays
  strict) + the recipe injects it, and the qwen_image processor self-registers.

The four environment-independent tests pass on CPU; the GPU/container end-to-end positive
checks (smoke train, FSDP2 resume, preprocessing on real images) run via SLURM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Add a "Requirements & self-contained data path" section: runs on stock nemo_automodel
(>=0.4.0,<1.0) from a source checkout; data loading (fastgen_data/) and preprocessing
(preprocess/) are vendored so no nemo_automodel modifications are needed; how to build the
cache via preprocess_qwen_image.py. Completes the train-time + preprocessing migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…style

Round 1 closes the login-node-actionable AC-7 gaps Codex flagged:
- LICENSE: record the NeMo-AutoModel vendoring under Third-Party Software Notices (Apache-2.0).
- Vendored files: add the NVIDIA SPDX header after the original AutoModel license, per
  CONTRIBUTING "Copying code from other sources" (matches modelopt/.../calib_utils.py) — all 10.
- Runtime guard: fastgen_data/__init__.py soft-warns when installed nemo_automodel is outside the
  tested >=0.4.0,<1.0 range; preprocess/__init__.py guards the multi_tier_bucketing import with an
  actionable message.
- Tests: drop plan workflow markers (AC-/Phase/Step) from comments per the plan's code-style note.

Not changed (deliberate): the collate prompt_embeds_mask -> text_embeddings_mask mapping is
byte-identical to the patched AutoModel (faithful vendor; changing it would alter current training
behavior). GPU/SLURM end-to-end validation (smoke/resume/preprocessing/revert-all-nine) remains
deferred to the container environment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
… tests

Address the Round 1 review's actionable gaps:
- Add make_negative_prompt_embedding.py: generates the CFG negative_prompt_embedding.pt the
  canonical config requires, reusing the vendored QwenImageProcessor's text encoder (default
  negative prompt ""), saved in the dataloader's payload format {"embed", "mask"}. Closes the
  AC-5/AC-8 self-containment gap (preprocessing previously emitted only per-image caches, so a
  from-scratch user had no negative-prompt embedding to point the config at). Documented in README.
- Move migration tests to tests/examples/diffusers/fastgen/ (the repo's example CI lane) and fix
  the example-dir resolution from the new location.
- Strengthen the provenance test to assert the NVIDIA SPDX header presence + ordering (after the
  original license), not just the "Vendored from" note.
- Update stale "python -m tools.diffusion..." invocation strings in the vendored driver docs to the
  preprocess_qwen_image.py launcher.

GPU/SLURM end-to-end execution remains environment-deferred. No modelopt/torch/fastgen changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…ig test

Address the Round 2 review:
- Every real-data config on disk now targets fastgen_data.build_text_to_image_multiresolution_dataloader.
  The 2 committed configs (dmd2_qwen_image.yaml, smoke) were already repointed; the 6 local-only
  experiment configs (1M_shift3, df0.7, 320k, 320k_shift3, 8steps, 8steps_320k — git-excluded, with
  hardcoded /lustre data paths) are repointed on disk so the user's own runs survive deleting the
  AutoModel patches. (They remain excluded from the published diff.)
- The config test now enumerates configs/*.yaml and fails on ANY upstream dataloader target, so a
  new config cannot silently reintroduce the upstream dependence. Robust to whatever set ships
  (2 in a fresh clone, 8 on the user's disk).
- Assert the builder's negative_prompt_embedding_path defaults to None (CFG-less is optional).

GPU/SLURM end-to-end execution remains environment-deferred. No modelopt/torch/fastgen changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
…ntract

The stock-upstream SLURM validation (8/9 pass) surfaced a test-fixture bug, not a migration bug:
test_collate_emits_contract_keys_and_broadcasts_negative_prompt built a fake sample with only
{latent, prompt_embeds, prompt_embeds_mask}, but collate_fn_production also requires
crop_resolution / original_resolution / crop_offset / prompt / image_path / bucket_id /
aspect_ratio (the keys TextToImageDataset emits). Add them so the test exercises the real
collate path. Vendored code unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Apply the repo ruff config to the vendored data path and preprocessing files: ruff-format + autofixes (pyupgrade py310, isort) plus manual fixes for E402 (logger placement), E501 (provenance comment wrap), N806 (lowercase dims in the video-only 5D shape validator), SIM103, RUF022 (sorted __all__), C401, PERF401. No behavioral change: correctness-critical imports (recipe checkpointer injection, re-exports, # noqa: F401 side-effect imports, MultiTierBucketCalculator, DefaultLoadPlanner) and all provenance/SPDX headers preserved. ruff check + format --check clean; env-independent migration tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 60318e73-5a45-4262-b995-bde11fbd270f

📥 Commits

Reviewing files that changed from the base of the PR and between 0d46f48 and 85ae40c.

📒 Files selected for processing (4)
  • examples/diffusers/fastgen/fastgen_data/collate_fns.py
  • examples/diffusers/fastgen/preprocess/processors/caption_loaders.py
  • examples/diffusers/fastgen/preprocess/processors/qwen_image.py
  • tests/examples/diffusers/fastgen/test_vendored_migration.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • examples/diffusers/fastgen/fastgen_data/collate_fns.py
  • examples/diffusers/fastgen/preprocess/processors/qwen_image.py
  • tests/examples/diffusers/fastgen/test_vendored_migration.py
  • examples/diffusers/fastgen/preprocess/processors/caption_loaders.py

📝 Walkthrough

Walkthrough

Vendored NeMo-AutoModel preprocessing and data loaders into the fastgen example, rewired configs/scripts to use local builders, added FSDP2-tolerant optimizer restore, implemented preprocessing and negative-prompt tooling, and added tests validating the migration.

Changes

FastGen DMD2 NeMo-AutoModel Vendoring & Integration

Layer / File(s) Summary
License, configuration, and documentation updates
.pre-commit-config.yaml, LICENSE, examples/diffusers/fastgen/README.md, examples/diffusers/fastgen/requirements.txt, pyproject.toml, .github/CODEOWNERS
Updated Apache-2 attribution for vendored NVIDIA code, excluded fastgen files from automatic license insertion, extended README with vendored preprocessing instructions, pinned nemo_automodel[diffusion] to >=0.4.0,<1.0, added MyPy overrides for vendored preprocess code, and added CODEOWNERS entry.
Processor contracts, registry, and Qwen-Image processor
examples/diffusers/fastgen/preprocess/processors/{base,caption_loaders,qwen_image,registry}.py, examples/diffusers/fastgen/preprocess/processors/__init__.py
Adds BaseModelProcessor abstraction, caption loader implementations (sidecar/meta/jsonl), ProcessorRegistry, and QwenImageProcessor implementing model loading, image/text encoding, latent verification, and cache payload construction.
Multiprocess preprocessing orchestration and launcher
examples/diffusers/fastgen/preprocess/preprocessing_multiprocess.py, examples/diffusers/fastgen/preprocess/__init__.py, examples/diffusers/fastgen/preprocess_qwen_image.py
Implements multiprocess image/video preprocessing with bucketing, sharded metadata output, CLI dispatch, and a dedicated Qwen-image launcher entrypoint; includes import guards.
Vendored data package, dataset, collate, and CLI tools
examples/diffusers/fastgen/fastgen_data/{__init__,collate_fns,text_to_image_dataset}.py, examples/diffusers/fastgen/make_negative_prompt_embedding.py, examples/diffusers/fastgen/configs/dmd2_qwen_image.yaml, examples/diffusers/fastgen/dmd2_finetune.py
Adds fastgen_data package with import guards and upstream-version warnings, TextToImageDataset for cached .pt samples, collate_fn_text_to_image with negative-prompt broadcasting and mask reconstruction, dataloader builder, negative-prompt embedding generator script, and rewires config/script defaults to use vendored builders.
FSDP2-tolerant checkpoint optimizer loading
examples/diffusers/fastgen/fastgen_checkpoint.py, examples/diffusers/fastgen/dmd2_recipe.py
Introduced PartialLoadCheckpointer that loads optimizer state via torch.distributed.checkpoint with DefaultLoadPlanner(allow_partial_load=True), added make_optimizer_partial_load_tolerant() to upgrade instances in-place, and injected the upgrade call into recipe checkpoint restoration.
Validation tests
tests/examples/diffusers/fastgen/test_vendored_migration.py
Adds tests validating YAMLs target vendored builders, staged AutoModel file dispositions, provenance headers, dataloader/collate contracts (including negative-prompt broadcasting), checkpointer subclassing and idempotent upgrade, recipe injection, and processor registration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes


Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error Found torch.load(..., weights_only=False) without any inline safety comment in examples/diffusers/fastgen/dmd2_recipe.py and inference_dmd2_qwen_image.py (ema/state/discriminator checkpoints). Change these loads to weights_only=True where possible, or add an explicit inline comment justifying why the checkpoint files are internally-generated/trusted.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main objective: making the fastgen DMD2 Qwen-Image example self-contained and compatible with stock nemo_automodel.
Docstring Coverage ✅ Passed Docstring coverage is 93.42% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jingyux/fastgen-vendor-automodel-patches

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.57%. Comparing base (d3acf45) to head (85ae40c).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1688       +/-   ##
===========================================
+ Coverage   56.59%   75.57%   +18.97%     
===========================================
  Files         507      511        +4     
  Lines       55794    57337     +1543     
===========================================
+ Hits        31579    43333    +11754     
+ Misses      24215    14004    -10211     
Flag Coverage Δ
examples 40.87% <ø> (+22.31%) ⬆️
gpu 57.85% <ø> (+37.31%) ⬆️
regression 14.69% <ø> (-0.16%) ⬇️
unit 54.33% <ø> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…collate)

Reduce the migration's footprint per review: (1) delete the vendored mock_dataloader.py and the mock-data smoke config + its README/recipe references (the published quick-start was broken on stock anyway; real training does not use it); (2) delete the vendored base_video.py (video code, dead in an image-only example) and make the --list_processors display image-only; (3) rewrite collate_fns.py as thin wrappers over stock nemo_automodel (import the unpatched collate + sampler; re-add only the DMD2 delta: text_embeddings_mask + broadcast negative-prompt embedding) instead of a full ~440-line vendored copy, also dropping the dead video collate/builder. text_to_image_dataset.py stays a faithful vendored copy (its prompt_embeds_mask change is interleaved with cache loading). Default config + tests/licensing updated; ruff + env-independent checks pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 11, 2026
@jingyu-ml jingyu-ml self-assigned this Jun 11, 2026
…quality)

The preprocess/ tree is vendored verbatim from NVIDIA-NeMo/Automodel (Apache-2.0) and uses the upstream worker-global pattern (module-level 'X | None' set lazily in _init_worker), which trips mypy's strict union-attr/arg-type under examples.*. Add a scoped mypy override (ignore_errors, like tests.*) for examples.diffusers.fastgen.preprocess.* so the faithful copy is not annotated/diverged from upstream. The authored fastgen_data/ wrapper is unaffected (already mypy-clean). 36 errors in 2 files -> 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml marked this pull request as ready for review June 11, 2026 22:48
@jingyu-ml jingyu-ml requested review from a team as code owners June 11, 2026 22:48
@jingyu-ml jingyu-ml requested a review from kevalmorabia97 June 11, 2026 22:48
Assign modelopt/torch/fastgen (DMD2 library) and /examples/diffusers/fastgen (this example) to @NVIDIA/modelopt-torch-fastgen-codeowners. Both are more-specific entries placed after their parent rules (modelopt/torch and /examples/diffusers), so GitHub's last-match-wins routes fastgen review to the fastgen team, matching the repo convention (e.g. modelopt/torch/quantization carving out of modelopt/torch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/diffusers/fastgen/dmd2_recipe.py (1)

671-671: ⚠️ Potential issue | 🔴 Critical

Fix unsafe checkpoint deserialization (weights_only=False) in diffusers fastgen restore paths

examples/diffusers/fastgen/dmd2_recipe.py restores sidecar checkpoint payloads using torch.load(..., weights_only=False) at lines 671, 676, 689, and 698 with no inline justification. Update these loads to weights_only=True (or add an inline comment explaining why weights_only=False is safe).

Suggested patch
-            ema_state = torch.load(ema_path, map_location="cpu", weights_only=False)
+            ema_state = torch.load(ema_path, map_location="cpu", weights_only=True)
...
-            state = torch.load(state_path, map_location="cpu", weights_only=False)
+            state = torch.load(state_path, map_location="cpu", weights_only=True)
...
-                disc_state = torch.load(disc_path, map_location="cpu", weights_only=False)
+                disc_state = torch.load(disc_path, map_location="cpu", weights_only=True)
...
-                disc_opt_state = torch.load(disc_opt_path, map_location="cpu", weights_only=False)
+                disc_opt_state = torch.load(disc_opt_path, map_location="cpu", weights_only=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/diffusers/fastgen/dmd2_recipe.py` at line 671, Summary: Unsafe
checkpoint deserialization uses torch.load(..., weights_only=False); change to
weights_only=True or justify why full deserialization is required. Locate the
torch.load calls that load sidecar checkpoints (e.g., the ema_state load
referenced by the symbol ema_state and variable ema_path in dmd2_recipe.py) and
update the argument weights_only=False to weights_only=True for all similar
restore paths (the other torch.load calls flagged in the review). If full object
deserialization truly is required, instead leave weights_only=False but add an
inline comment explaining why it is safe and unavoidable; otherwise prefer
weights_only=True to avoid executing arbitrary pickle payloads.

Source: Coding guidelines

🧹 Nitpick comments (1)
examples/diffusers/fastgen/preprocess/processors/base.py (1)

187-190: ⚡ Quick win

Move the NumPy import to module scope (or document why it must be deferred).

Line 187 imports numpy inside preprocess_image without an explicit circular/optional/heavy-import justification, so import errors surface at runtime instead of module load.

As per coding guidelines, imports should stay at file top unless there is a documented exception.

Proposed change
 from abc import ABC, abstractmethod
 from typing import Any
 
+import numpy as np
 import torch
 from PIL import Image
@@
     def preprocess_image(self, image: Image.Image) -> torch.Tensor:
@@
-        import numpy as np
-
         image_tensor = torch.from_numpy(np.array(image)).float() / 255.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/diffusers/fastgen/preprocess/processors/base.py` around lines 187 -
190, The numpy import is done inside the preprocessing code (where image_tensor
is created in preprocess_image), which causes import errors to appear at
runtime; move "import numpy as np" to module scope near the other imports at the
top of examples/diffusers/fastgen/preprocess/processors/base.py (or add a short
comment explaining a deliberate deferred import) so that preprocess_image (and
the image_tensor creation) uses a top-level np import; update any references in
preprocess_image and related functions to use the module-scoped np.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/diffusers/fastgen/fastgen_data/collate_fns.py`:
- Line 103: The code uses torch.load(path, map_location="cpu",
weights_only=False) to load a user-configurable negative_prompt_embedding_path,
which allows unsafe pickle loading; change the load to a safe mode and validate
the input: ensure the loader uses weights_only=True (i.e., torch.load(path,
map_location="cpu", weights_only=True)) or otherwise deserialize with a safe
format (e.g., torch.jit.load or a JSON/npz reader) and validate
negative_prompt_embedding_path exists and is from a trusted source before
loading; update the line creating payload (payload = torch.load(...)) and any
code that assumes pickled objects accordingly.

In `@examples/diffusers/fastgen/preprocess/processors/base.py`:
- Around line 208-210: The code currently dereferences
models["vae"].config.scaling_factor without ensuring the attribute exists;
update the getter so that after confirming "vae" is in models and models["vae"]
has a config, you also check that the config has a scaling_factor (e.g.,
hasattr(models["vae"].config, "scaling_factor") or getattr with default) and
only return it when present, otherwise return the fallback 0.18215; reference
the models dict, the "vae" entry, its config attribute, and the scaling_factor
property when implementing this guard.

In `@examples/diffusers/fastgen/preprocess/processors/caption_loaders.py`:
- Line 425: The import "from collections import defaultdict" is placed inside a
function in caption_loaders.py—move this import to the module top-level and
remove the in-function import; search for the in-function occurrence of "from
collections import defaultdict" (and any use of defaultdict) and add a single
top-of-file import statement instead so import errors surface at module load and
the function body no longer contains the import.

In `@examples/diffusers/fastgen/preprocess/processors/qwen_image.py`:
- Around line 184-191: The second value returned by pipeline.encode_prompt (the
prompt_embeds_mask) is being ignored in encode_prompt calls and therefore not
serialized; update the encode_prompt usage in qwen_image.py to capture both
outputs (e.g., prompt_embeds, prompt_embeds_mask = pipeline.encode_prompt(...))
and include prompt_embeds_mask in the returned/persisted dict alongside
prompt_embeds (apply the same .detach().cpu().to(torch.bfloat16) or appropriate
dtype/shape handling as done for prompt_embeds) so downstream TextToImageDataset
receives the real mask instead of falling back to an all-ones mask.
- Line 217: The autocast call inside verify_latent is using an invalid dtype for
CUDA (torch.float32); update the autocast usage in verify_latent (qwen_image.py)
to use a valid reduced-precision dtype such as torch.float16 or torch.bfloat16,
or remove the dtype argument so autocast picks the default, ensuring the with
torch.no_grad(), autocast(...) context no longer raises and causes false
verification failures.

In `@LICENSE`:
- Around line 226-228: Add a top-of-file vendored header to each Python file
listed (dmd2_finetune.py, dmd2_recipe.py, export_diffusers_qwen_image.py,
fastgen_checkpoint.py, fastgen_data/__init__.py, fastgen_data/collate_fns.py,
inference_dmd2_qwen_image.py, make_negative_prompt_embedding.py,
preprocess_qwen_image.py) that matches the LICENSE expectation: a single-line or
block comment beginning with "Vendored from" and including the original
NVIDIA-NeMo/Automodel source path and commit SHA (e.g., "Vendored from
NVIDIA-NeMo/Automodel: <original/path> @ <commit>"); place it at the very top of
each file before imports and include any short attribution or license snippet
required by the original file so each file contains the "Vendored from" string
referenced in LICENSE.

In `@tests/examples/diffusers/fastgen/test_vendored_migration.py`:
- Around line 157-163: Move the local import of inspect out of the test function
to a top-level import; update the module imports to include "import inspect" at
the module scope and remove the "import inspect" line from inside
test_data_builders_importable_and_accept_negative_prompt_path so the test uses
the module-level inspect symbol.

---

Outside diff comments:
In `@examples/diffusers/fastgen/dmd2_recipe.py`:
- Line 671: Summary: Unsafe checkpoint deserialization uses torch.load(...,
weights_only=False); change to weights_only=True or justify why full
deserialization is required. Locate the torch.load calls that load sidecar
checkpoints (e.g., the ema_state load referenced by the symbol ema_state and
variable ema_path in dmd2_recipe.py) and update the argument weights_only=False
to weights_only=True for all similar restore paths (the other torch.load calls
flagged in the review). If full object deserialization truly is required,
instead leave weights_only=False but add an inline comment explaining why it is
safe and unavoidable; otherwise prefer weights_only=True to avoid executing
arbitrary pickle payloads.

---

Nitpick comments:
In `@examples/diffusers/fastgen/preprocess/processors/base.py`:
- Around line 187-190: The numpy import is done inside the preprocessing code
(where image_tensor is created in preprocess_image), which causes import errors
to appear at runtime; move "import numpy as np" to module scope near the other
imports at the top of examples/diffusers/fastgen/preprocess/processors/base.py
(or add a short comment explaining a deliberate deferred import) so that
preprocess_image (and the image_tensor creation) uses a top-level np import;
update any references in preprocess_image and related functions to use the
module-scoped np.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5a5a2507-bada-45a1-9617-9f3d3b0b7710

📥 Commits

Reviewing files that changed from the base of the PR and between dd49a46 and c7fc9bf.

📒 Files selected for processing (23)
  • .pre-commit-config.yaml
  • LICENSE
  • examples/diffusers/fastgen/README.md
  • examples/diffusers/fastgen/configs/dmd2_qwen_image.yaml
  • examples/diffusers/fastgen/configs/dmd2_qwen_image_smoke.yaml
  • examples/diffusers/fastgen/dmd2_finetune.py
  • examples/diffusers/fastgen/dmd2_recipe.py
  • examples/diffusers/fastgen/fastgen_checkpoint.py
  • examples/diffusers/fastgen/fastgen_data/__init__.py
  • examples/diffusers/fastgen/fastgen_data/collate_fns.py
  • examples/diffusers/fastgen/fastgen_data/text_to_image_dataset.py
  • examples/diffusers/fastgen/make_negative_prompt_embedding.py
  • examples/diffusers/fastgen/preprocess/__init__.py
  • examples/diffusers/fastgen/preprocess/preprocessing_multiprocess.py
  • examples/diffusers/fastgen/preprocess/processors/__init__.py
  • examples/diffusers/fastgen/preprocess/processors/base.py
  • examples/diffusers/fastgen/preprocess/processors/caption_loaders.py
  • examples/diffusers/fastgen/preprocess/processors/qwen_image.py
  • examples/diffusers/fastgen/preprocess/processors/registry.py
  • examples/diffusers/fastgen/preprocess_qwen_image.py
  • examples/diffusers/fastgen/requirements.txt
  • pyproject.toml
  • tests/examples/diffusers/fastgen/test_vendored_migration.py
💤 Files with no reviewable changes (1)
  • examples/diffusers/fastgen/configs/dmd2_qwen_image_smoke.yaml

Comment thread examples/diffusers/fastgen/fastgen_data/collate_fns.py Outdated
Comment on lines +208 to +210
if "vae" in models and hasattr(models["vae"], "config"):
return models["vae"].config.scaling_factor
return 0.18215 # Default for most models

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard scaling_factor existence before dereference.

Line 209 assumes models["vae"].config.scaling_factor always exists once config exists. If scaling_factor is missing, this raises instead of using the default fallback.

Proposed change
     def get_vae_scaling_factor(self, models: dict[str, Any]) -> float:
@@
-        if "vae" in models and hasattr(models["vae"], "config"):
-            return models["vae"].config.scaling_factor
+        if "vae" in models and hasattr(models["vae"], "config"):
+            scaling_factor = getattr(models["vae"].config, "scaling_factor", None)
+            if scaling_factor is not None:
+                return scaling_factor
         return 0.18215  # Default for most models
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/diffusers/fastgen/preprocess/processors/base.py` around lines 208 -
210, The code currently dereferences models["vae"].config.scaling_factor without
ensuring the attribute exists; update the getter so that after confirming "vae"
is in models and models["vae"] has a config, you also check that the config has
a scaling_factor (e.g., hasattr(models["vae"].config, "scaling_factor") or
getattr with default) and only return it when present, otherwise return the
fallback 0.18215; reference the models dict, the "vae" entry, its config
attribute, and the scaling_factor property when implementing this guard.

Comment thread examples/diffusers/fastgen/preprocess/processors/caption_loaders.py Outdated
Comment on lines +184 to +191
prompt_embeds, _ = pipeline.encode_prompt(
prompt=prompt,
device=device,
)

return {
"prompt_embeds": prompt_embeds.detach().cpu().to(torch.bfloat16),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

prompt_embeds_mask is dropped before cache serialization.

Line 184 receives a second output from encode_prompt but discards it, and Lines 261-266 persist only prompt_embeds. Downstream, TextToImageDataset falls back to an all-ones mask when the mask key is absent, so the true sequence mask contract is lost.

Proposed change
         with torch.no_grad():
-            prompt_embeds, _ = pipeline.encode_prompt(
+            prompt_embeds, prompt_embeds_mask = pipeline.encode_prompt(
                 prompt=prompt,
                 device=device,
             )
 
-        return {
+        result = {
             "prompt_embeds": prompt_embeds.detach().cpu().to(torch.bfloat16),
         }
+        if prompt_embeds_mask is not None:
+            result["prompt_embeds_mask"] = prompt_embeds_mask.detach().cpu().to(torch.long)
+        return result
@@
         return {
             # Image latent
             "latent": latent,
             # Text embeddings
             "prompt_embeds": text_encodings["prompt_embeds"],
+            "prompt_embeds_mask": text_encodings.get("prompt_embeds_mask"),
             # Metadata
             "original_resolution": metadata["original_resolution"],

Also applies to: 261-266

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/diffusers/fastgen/preprocess/processors/qwen_image.py` around lines
184 - 191, The second value returned by pipeline.encode_prompt (the
prompt_embeds_mask) is being ignored in encode_prompt calls and therefore not
serialized; update the encode_prompt usage in qwen_image.py to capture both
outputs (e.g., prompt_embeds, prompt_embeds_mask = pipeline.encode_prompt(...))
and include prompt_embeds_mask in the returned/persisted dict alongside
prompt_embeds (apply the same .detach().cpu().to(torch.bfloat16) or appropriate
dtype/shape handling as done for prompt_embeds) so downstream TextToImageDataset
receives the real mask instead of falling back to an all-ones mask.

Comment thread examples/diffusers/fastgen/preprocess/processors/qwen_image.py Outdated
Comment thread LICENSE
Comment on lines +226 to +228
Copyright (c) 2025 NVIDIA CORPORATION — portions of examples/diffusers/fastgen/ are
vendored from NVIDIA-NeMo/Automodel (https://github.com/NVIDIA-NeMo/Automodel); see the
per-file "Vendored from" headers for the exact source paths and commit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check that vendored files have "Vendored from" headers.
# Expected: Each file starts with a comment containing "Vendored from" + source URL.

vendored_files=(
  "examples/diffusers/fastgen/fastgen_data/text_to_image_dataset.py"
  "examples/diffusers/fastgen/preprocess/preprocessing_multiprocess.py"
  "examples/diffusers/fastgen/preprocess/processors/__init__.py"
  "examples/diffusers/fastgen/preprocess/processors/base.py"
  "examples/diffusers/fastgen/preprocess/processors/registry.py"
  "examples/diffusers/fastgen/preprocess/processors/caption_loaders.py"
  "examples/diffusers/fastgen/preprocess/processors/qwen_image.py"
)

echo "Checking for 'Vendored from' headers in vendored files..."
for file in "${vendored_files[@]}"; do
  if [ -f "$file" ]; then
    if head -20 "$file" | grep -q "Vendored from"; then
      echo "$file: has Vendored from header"
    else
      echo "$file: MISSING Vendored from header"
    fi
  else
    echo "$file: file not found"
  fi
done

Repository: NVIDIA/Model-Optimizer

Length of output: 763


🏁 Script executed:

#!/bin/bash
set -euo pipefail

dir="examples/diffusers/fastgen"
echo "Scanning for Python files under: $dir"
mapfile -t files < <(find "$dir" -type f -name "*.py" | sort)

echo "Found ${`#files`[@]} .py files."
missing=()

for f in "${files[@]}"; do
  if head -30 "$f" | rg -q "Vendored from"; then
    echo "$f"
  else
    echo "$f (missing 'Vendored from' in first 30 lines)"
    missing+=("$f")
  fi
done

if [ ${`#missing`[@]} -gt 0 ]; then
  echo
  echo "Missing count: ${`#missing`[@]}"
  printf '%s\n' "${missing[@]}"
  exit 1
else
  echo
  echo "All scanned Python files have 'Vendored from' headers."
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 201


🏁 Script executed:

#!/bin/bash
set -euo pipefail

dir="examples/diffusers/fastgen"
echo "Scanning for Python files under: $dir"

tmp="$(mktemp)"
find "$dir" -type f -name "*.py" | sort > "$tmp"

mapfile -t files < "$tmp"
rm -f "$tmp"

echo "Found ${`#files`[@]} .py files."

missing=()
for f in "${files[@]}"; do
  if head -40 "$f" | rg -q "Vendored from"; then
    echo "$f"
  else
    echo "$f (missing 'Vendored from' in first 40 lines)"
    missing+=("$f")
  fi
done

if [ ${`#missing`[@]} -gt 0 ]; then
  echo
  echo "Missing count: ${`#missing`[@]}"
  printf '%s\n' "${missing[@]}"
  exit 1
else
  echo
  echo "All scanned Python files have 'Vendored from' headers."
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 2025


🏁 Script executed:

#!/bin/bash
set -euo pipefail

dir="examples/diffusers/fastgen"
tmp="$(mktemp)"
find "$dir" -type f -name "*.py" | sort > "$tmp"
mapfile -t files < "$tmp"
rm -f "$tmp"

echo "Checking where 'Vendored from' appears (first 200 lines + anywhere in file):"

for f in "${files[@]}"; do
  first200_has=0
  any_has=0
  if head -200 "$f" | rg -q "Vendored from"; then
    first200_has=1
  fi
  if rg -q "Vendored from" "$f"; then
    any_has=1
  fi

  if [ "$any_has" -eq 1 ]; then
    if [ "$first200_has" -eq 1 ]; then
      echo "$f (Vendored from in first 200 lines)"
    else
      echo "$f (Vendored from exists but after first 200 lines)"
    fi
  else
    echo "$f (no 'Vendored from' anywhere)"
  fi
done

Repository: NVIDIA/Model-Optimizer

Length of output: 1689


Add missing # Vendored from ... headers to vendored fastgen Python files

LICENSE points to per-file “Vendored from” headers, but these vendored files under examples/diffusers/fastgen/ contain no “Vendored from” string anywhere:

  • examples/diffusers/fastgen/dmd2_finetune.py
  • examples/diffusers/fastgen/dmd2_recipe.py
  • examples/diffusers/fastgen/export_diffusers_qwen_image.py
  • examples/diffusers/fastgen/fastgen_checkpoint.py
  • examples/diffusers/fastgen/fastgen_data/__init__.py
  • examples/diffusers/fastgen/fastgen_data/collate_fns.py
  • examples/diffusers/fastgen/inference_dmd2_qwen_image.py
  • examples/diffusers/fastgen/make_negative_prompt_embedding.py
  • examples/diffusers/fastgen/preprocess_qwen_image.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@LICENSE` around lines 226 - 228, Add a top-of-file vendored header to each
Python file listed (dmd2_finetune.py, dmd2_recipe.py,
export_diffusers_qwen_image.py, fastgen_checkpoint.py, fastgen_data/__init__.py,
fastgen_data/collate_fns.py, inference_dmd2_qwen_image.py,
make_negative_prompt_embedding.py, preprocess_qwen_image.py) that matches the
LICENSE expectation: a single-line or block comment beginning with "Vendored
from" and including the original NVIDIA-NeMo/Automodel source path and commit
SHA (e.g., "Vendored from NVIDIA-NeMo/Automodel: <original/path> @ <commit>");
place it at the very top of each file before imports and include any short
attribution or license snippet required by the original file so each file
contains the "Vendored from" string referenced in LICENSE.

Comment thread tests/examples/diffusers/fastgen/test_vendored_migration.py
jingyu-ml and others added 2 commits June 11, 2026 16:09
…e diffusers team

Drop the /examples/diffusers/fastgen carve-out; @NVIDIA/modelopt-torch-fastgen-codeowners owns modelopt/torch/fastgen (the DMD2 library). The example stays under /examples/diffusers (@NVIDIA/modelopt-examples-diffusers-codeowners).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Apply the safe, sensible CodeRabbit findings: (1) collate_fns.py loads the negative-prompt embedding with weights_only=True (it is a dict of plain tensors saved by make_negative_prompt_embedding.py, so it loads cleanly) — removes the pickle/RCE vector per SECURITY.md; (2) move 'from collections import defaultdict' (caption_loaders) and 'import inspect' (test) to module top; (3) drop the no-op autocast(dtype=torch.float32) in QwenImageProcessor.verify_latent — the latent is already .float(), so this is behavior-identical and removes the invalid-CUDA-autocast-dtype warning. Preprocessing/data only; no change to DMD2 training math.

Intentionally not changed (would alter validated behavior or are false positives): not persisting prompt_embeds_mask in the cache (would switch the text-attention mask from the all-ones fallback the existing caches + running jobs use to a real mask — changes training math); the speculative scaling_factor guard (Qwen-Image VAE always defines it; vendored verbatim); 'Vendored from' headers on authored files (those correctly carry the standard NVIDIA SPDX header — LICENSE states only 'portions' are vendored).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml

Copy link
Copy Markdown
Contributor Author

/ok to test 85ae40c

Comment thread LICENSE

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also nvidia authored and apache 2 license files so wr dont need to exclude in pre commit or have duplicate nvidia license. Let license hook auto add correct license everywhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants