Draft
Conversation
Contributor
Author
|
/ok to test b3b4d3c |
Contributor
|
/ok to test 360cb8a |
Contributor
Author
|
/ok to test 7353904 |
Contributor
Author
|
/ok to test e90e80c |
Contributor
Author
|
/ok to test 04fc41c |
Contributor
Author
|
/ok to test 9d9fd36 |
- Register gemma4 in AUTOMODEL_FACTORY (utils.py) - Add KV sharing support: use_cache=True for models with num_kv_shared_layers > 0 (train.py) - Freeze visual/audio encoders for text-only training to fix checkpoint resume (setup.py) - Inject mm_token_type_ids for Gemma4 text-only inputs (train.py, dtensor_policy_worker.py) - Extend skip_tokenizer_init workaround to Gemma4ForConditionalGeneration (vllm_worker.py) - Bump transformers 5.3.0 -> 5.5.0, vllm 0.17.1 -> 0.19.0 (pyproject.toml) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- grpo-gemma4-e2b-it-1n8g-fsdp2-automodel.yaml: E2B-it on 1 node - dapo-gemma4-31b-it-4n8g-fsdp2.yaml: 31B-it DAPO with dynamic sampling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- Apply ruff-format line wrapping to setup.py, train.py, dtensor_policy_worker.py - Minimize recipe YAMLs (remove redundant defaults matching base config) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Regenerated lockfile to match pyproject.toml dependency bumps. Required for CI build container (uv sync --locked). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Regenerated with pinned Automodel submodule (92635e74) in CI base image (cuda-dl-base:25.05) to match CI's uv sync --locked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Update Automodel submodule from 92635e74 to 3a3f6858 (latest main). Fixes CI test_automodel_types.py TypeError caused by check_model_inputs API change in transformers 5.5.0. Regenerate uv.lock. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
vLLM 0.19.0 refactored chat preprocessing from OpenAIServingChat into a new OpenAIServingRender service class. This broke the NeMo-RL HTTP server in two ways: (1) OpenAIServingChat/Tokenization now require an openai_serving_render constructor arg, and (2) the _preprocess_chat method override was silently dead since it moved to OpenAIServingRender. Move the prefix-token replacement logic into a NeMoRLOpenAIServingRender subclass that overrides preprocess_chat, and pass it to both serving classes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Add missing test suite scripts for the two new Gemma 4 recipe configs to fix the test_all_recipe_yamls_accounted_for_in_test_suites CI check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Temporarily remove --exitfirst (-x) from pytest addopts so CI runs all tests instead of stopping at the first failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
…sion) Pin Automodel submodule to sharonyu-115/Automodel gemma4-support branch at 6d5971c3, which reverts the 2013a4dd FSDP2 prefetch commit that causes Gemma4 26B-A4B MoE expert weights to be float32 (RuntimeError in grouped_gemm). See automodel_gemma4_moe_dtype_bug.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Picks up 7804b703 which passes unnormalized residual to the MoE gate, fixing incorrect routing that caused gen_kl_error=0.116 on 26B-A4B. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Align naming with other Gemma4 recipes (E2B, E4B, 26B-A4B) that use the -automodel suffix. Renames DAPO config, test script, and updates disabled.txt reference. Signed-off-by: Shuang Yu <shuangy@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Use on-policy training (batch_size=512), activation checkpointing off, and gpu_memory_utilization=0.5. This config showed clear convergence: val accuracy 13.8% → 21.0% over 224 steps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
The Automodel submodule now tracks the fix/gemma4-moe-gate-double-norm branch on the shuangy fork, which is rebased on upstream main (bd942f20) and carries only the single MoE-gate double-norm fix plus its regression tests. This drops the three transformers 5.5 compat patches that have since landed upstream (NVIDIA-NeMo#1734, NVIDIA-NeMo#1769, NVIDIA-NeMo#1764) and collapses our carry-stack from four patches down to one. gemma4-support is preserved on the fork as an A/B fallback — flip .gitmodules branch + re-checkout the submodule to swap. Signed-off-by: Shuang Yu <shuangy@nvidia.com>
The Gemma4 MoE gate double-norm fix (PR NVIDIA-NeMo#1895) and the transformers 5.5 compat patches have all landed on NVIDIA-NeMo/Automodel main. Drop the sharonyu-115 fork / fix branch and track upstream main directly. New submodule SHA: fb62eb48 (upstream/main tip). Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Refresh uv.lock after bumping Automodel submodule to NVIDIA-NeMo main (fb62eb48). Generated with the 2026-04-17 nemo-rl nightly container. No pyproject.toml changes — transformers==5.5.0 and vllm==0.19.0 pins are preserved. Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Commit 684dc56 renamed the DAPO Gemma4 31B recipe to add the -automodel suffix but left the original files behind. Remove them to complete the rename. Signed-off-by: Shuang Yu <shuangy@nvidia.com>
DAPO has converged on DAPOMath17K + AIME2024 eval for Gemma4 E2B-it on
1N8G, delivering better alignment and stability than GRPO on the same
setup. Retire the GRPO E2B recipe and enable DAPO E2B in the nightly
suite.
- Add tests/test_suites/llm/dapo-gemma4-e2b-it-1n8g-fsdp2-automodel.sh
(1 node, 20 steps, ~90 min). Metric thresholds derived from wandb run
kouzgkf3 (dapo-gemma4-e2b-it-1n8g-fsdp2-automodel-2k-onpolicy-noactckpt)
under ys_fishcool-nvidia/nemorl-gemma4:
median(train/token_mult_prob_error) < 1.1 (baseline 1.0093)
train/token_mult_prob_error[20] < 1.05 (baseline 1.0094)
train/reward[20] > -1.15 (baseline -1.087)
train/filtered_reward[20] > -1.10 (baseline -1.028)
train/gen_kl_error[20] < 0.001 (baseline 5.4e-4)
- Register the new script in tests/test_suites/nightly.txt under a new
DAPO section.
- Remove tests/test_suites/disabled.txt entry for the deleted GRPO E2B
script; keep the 31B DAPO entry (no convergence baseline yet).
- Delete the GRPO E2B recipe and test script.
Signed-off-by: Shuang Yu <shuangy@nvidia.com>
273703f to
e0e686d
Compare
Contributor
Author
|
/ok to test e0e686d |
- Minimize DAPO Gemma4 E2B-it recipe to satisfy configs-minimize-check hook. - Guard `_needs_kv_cache_for_shared_layers` against non-int `num_kv_shared_layers` (fixes `TypeError` when tests pass a bare MagicMock as the model). Noted in a TODO that this workaround can be removed once transformers>=5.5.2 (huggingface/transformers#45312) lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
Contributor
Author
|
/ok to test 16f3747 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
To address #2212
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information