feat: add Gemma4 support by sharonyu-115 · Pull Request #2224 · NVIDIA-NeMo/RL

sharonyu-115 · 2026-04-07T10:25:06Z

What does this PR do ?

To address #2212

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

copy-pr-bot · 2026-04-07T10:25:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

sharonyu-115 · 2026-04-07T10:27:11Z

/ok to test b3b4d3c

zpqiu · 2026-04-08T05:43:00Z

/ok to test 360cb8a

sharonyu-115 · 2026-04-08T07:25:26Z

/ok to test 7353904

sharonyu-115 · 2026-04-08T14:43:08Z

/ok to test e90e80c

sharonyu-115 · 2026-04-09T08:36:50Z

/ok to test 04fc41c

sharonyu-115 · 2026-04-09T14:59:18Z

/ok to test 9d9fd36

- Register gemma4 in AUTOMODEL_FACTORY (utils.py) - Add KV sharing support: use_cache=True for models with num_kv_shared_layers > 0 (train.py) - Freeze visual/audio encoders for text-only training to fix checkpoint resume (setup.py) - Inject mm_token_type_ids for Gemma4 text-only inputs (train.py, dtensor_policy_worker.py) - Extend skip_tokenizer_init workaround to Gemma4ForConditionalGeneration (vllm_worker.py) - Bump transformers 5.3.0 -> 5.5.0, vllm 0.17.1 -> 0.19.0 (pyproject.toml) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

- grpo-gemma4-e2b-it-1n8g-fsdp2-automodel.yaml: E2B-it on 1 node - dapo-gemma4-31b-it-4n8g-fsdp2.yaml: 31B-it DAPO with dynamic sampling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

- Apply ruff-format line wrapping to setup.py, train.py, dtensor_policy_worker.py - Minimize recipe YAMLs (remove redundant defaults matching base config) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Regenerated lockfile to match pyproject.toml dependency bumps. Required for CI build container (uv sync --locked). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Regenerated with pinned Automodel submodule (92635e74) in CI base image (cuda-dl-base:25.05) to match CI's uv sync --locked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Update Automodel submodule from 92635e74 to 3a3f6858 (latest main). Fixes CI test_automodel_types.py TypeError caused by check_model_inputs API change in transformers 5.5.0. Regenerate uv.lock. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

vLLM 0.19.0 refactored chat preprocessing from OpenAIServingChat into a new OpenAIServingRender service class. This broke the NeMo-RL HTTP server in two ways: (1) OpenAIServingChat/Tokenization now require an openai_serving_render constructor arg, and (2) the _preprocess_chat method override was silently dead since it moved to OpenAIServingRender. Move the prefix-token replacement logic into a NeMoRLOpenAIServingRender subclass that overrides preprocess_chat, and pass it to both serving classes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Add missing test suite scripts for the two new Gemma 4 recipe configs to fix the test_all_recipe_yamls_accounted_for_in_test_suites CI check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Temporarily remove --exitfirst (-x) from pytest addopts so CI runs all tests instead of stopping at the first failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

…sion) Pin Automodel submodule to sharonyu-115/Automodel gemma4-support branch at 6d5971c3, which reverts the 2013a4dd FSDP2 prefetch commit that causes Gemma4 26B-A4B MoE expert weights to be float32 (RuntimeError in grouped_gemm). See automodel_gemma4_moe_dtype_bug.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Picks up 7804b703 which passes unnormalized residual to the MoE gate, fixing incorrect routing that caused gen_kl_error=0.116 on 26B-A4B. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Align naming with other Gemma4 recipes (E2B, E4B, 26B-A4B) that use the -automodel suffix. Renames DAPO config, test script, and updates disabled.txt reference. Signed-off-by: Shuang Yu <shuangy@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Use on-policy training (batch_size=512), activation checkpointing off, and gpu_memory_utilization=0.5. This config showed clear convergence: val accuracy 13.8% → 21.0% over 224 steps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

The Automodel submodule now tracks the fix/gemma4-moe-gate-double-norm branch on the shuangy fork, which is rebased on upstream main (bd942f20) and carries only the single MoE-gate double-norm fix plus its regression tests. This drops the three transformers 5.5 compat patches that have since landed upstream (NVIDIA-NeMo#1734, NVIDIA-NeMo#1769, NVIDIA-NeMo#1764) and collapses our carry-stack from four patches down to one. gemma4-support is preserved on the fork as an A/B fallback — flip .gitmodules branch + re-checkout the submodule to swap. Signed-off-by: Shuang Yu <shuangy@nvidia.com>

The Gemma4 MoE gate double-norm fix (PR NVIDIA-NeMo#1895) and the transformers 5.5 compat patches have all landed on NVIDIA-NeMo/Automodel main. Drop the sharonyu-115 fork / fix branch and track upstream main directly. New submodule SHA: fb62eb48 (upstream/main tip). Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Refresh uv.lock after bumping Automodel submodule to NVIDIA-NeMo main (fb62eb48). Generated with the 2026-04-17 nemo-rl nightly container. No pyproject.toml changes — transformers==5.5.0 and vllm==0.19.0 pins are preserved. Signed-off-by: Shuang Yu <shuangy@nvidia.com>

Commit 684dc56 renamed the DAPO Gemma4 31B recipe to add the -automodel suffix but left the original files behind. Remove them to complete the rename. Signed-off-by: Shuang Yu <shuangy@nvidia.com>

DAPO has converged on DAPOMath17K + AIME2024 eval for Gemma4 E2B-it on 1N8G, delivering better alignment and stability than GRPO on the same setup. Retire the GRPO E2B recipe and enable DAPO E2B in the nightly suite. - Add tests/test_suites/llm/dapo-gemma4-e2b-it-1n8g-fsdp2-automodel.sh (1 node, 20 steps, ~90 min). Metric thresholds derived from wandb run kouzgkf3 (dapo-gemma4-e2b-it-1n8g-fsdp2-automodel-2k-onpolicy-noactckpt) under ys_fishcool-nvidia/nemorl-gemma4: median(train/token_mult_prob_error) < 1.1 (baseline 1.0093) train/token_mult_prob_error[20] < 1.05 (baseline 1.0094) train/reward[20] > -1.15 (baseline -1.087) train/filtered_reward[20] > -1.10 (baseline -1.028) train/gen_kl_error[20] < 0.001 (baseline 5.4e-4) - Register the new script in tests/test_suites/nightly.txt under a new DAPO section. - Remove tests/test_suites/disabled.txt entry for the deleted GRPO E2B script; keep the 31B DAPO entry (no convergence baseline yet). - Delete the GRPO E2B recipe and test script. Signed-off-by: Shuang Yu <shuangy@nvidia.com>

github-actions · 2026-04-18T13:45:14Z

✅ Submodule Fast-Forward Check Results

Check based on commit: e0e686d (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

sharonyu-115 · 2026-04-18T13:53:40Z

/ok to test e0e686d

- Minimize DAPO Gemma4 E2B-it recipe to satisfy configs-minimize-check hook. - Guard `_needs_kv_cache_for_shared_layers` against non-int `num_kv_shared_layers` (fixes `TypeError` when tests pass a bare MagicMock as the model). Noted in a TODO that this workaround can be removed once transformers>=5.5.2 (huggingface/transformers#45312) lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

github-actions · 2026-04-19T03:11:49Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 16f3747 (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

sharonyu-115 · 2026-04-19T03:12:54Z

/ok to test 16f3747

sharonyu-115 added the CI:L1 Run doctests, unit tests, and functional tests label Apr 8, 2026

zpqiu changed the title ~~Gemma4 support~~ feat: add Gemma4 support Apr 8, 2026

zpqiu added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026

sharonyu-115 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026

zpqiu marked this pull request as ready for review April 8, 2026 05:36

zpqiu requested review from a team as code owners April 8, 2026 05:36

zpqiu added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026

zpqiu marked this pull request as draft April 8, 2026 05:37

copy-pr-bot bot had a problem deploying to nemo-ci April 8, 2026 05:43 Failure

copy-pr-bot bot had a problem deploying to nemo-ci April 8, 2026 07:25 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 8, 2026 14:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 8, 2026 15:59 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 9, 2026 08:37 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 9, 2026 10:01 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 18, 2026 05:54 Inactive

sharonyu-115 and others added 18 commits April 18, 2026 06:43

chore: delete pre-rename dapo-gemma4-31b-it-4n8g-fsdp2 files

555dcd1

Commit 684dc56 renamed the DAPO Gemma4 31B recipe to add the -automodel suffix but left the original files behind. Remove them to complete the rename. Signed-off-by: Shuang Yu <shuangy@nvidia.com>

sharonyu-115 force-pushed the gemma4-support branch from 273703f to e0e686d Compare April 18, 2026 13:43

copy-pr-bot bot temporarily deployed to nemo-ci April 18, 2026 13:54 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 18, 2026 13:56 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 03:13 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 05:01 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 19, 2026 06:34 Error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Gemma4 support#2224

feat: add Gemma4 support#2224
sharonyu-115 wants to merge 19 commits intoNVIDIA-NeMo:mainfrom
sharonyu-115:gemma4-support

sharonyu-115 commented Apr 7, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 7, 2026

Uh oh!

sharonyu-115 commented Apr 7, 2026

Uh oh!

zpqiu commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 9, 2026

Uh oh!

sharonyu-115 commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 18, 2026

Uh oh!

sharonyu-115 commented Apr 18, 2026

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

sharonyu-115 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sharonyu-115 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Apr 7, 2026

Uh oh!

sharonyu-115 commented Apr 7, 2026

Uh oh!

zpqiu commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 9, 2026

Uh oh!

sharonyu-115 commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 18, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

sharonyu-115 commented Apr 18, 2026

Uh oh!

github-actions bot commented Apr 19, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

sharonyu-115 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sharonyu-115 commented Apr 7, 2026 •

edited

Loading