Skip to content

Fixes non-catching weight init regexes as torch.compile changes the FQNs#437

Merged
le1nux merged 1 commit intomainfrom
fix_compile_weight_init_bug
Mar 7, 2026
Merged

Fixes non-catching weight init regexes as torch.compile changes the FQNs#437
le1nux merged 1 commit intomainfrom
fix_compile_weight_init_bug

Conversation

@le1nux
Copy link
Member

@le1nux le1nux commented Mar 7, 2026

What does this PR do?

torch.compile changes the FQNs of parameters by adding "_orig_mod." as a prefix to the original FQN. This causes the regexes for matching parameter names to fail. To fix this, we need to remove the "_orig_mod." prefix from the parameter names before matching them against the regexes. This change needs to be made in both the llama3_like_initialization.py and initialization_routines.py files, wherever we are matching parameter names against regexes.

Checklist before submitting final PR

  • My PR is minimal and addresses one issue in isolation
  • I have merged the latest version of the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have checked that all tests run through (python tests/tests.py)
  • I have updated the internal changelog (CHANGELOG_DEV.md)

…d." as a prefix to the original FQN. This causes the regexes for matching parameter names to fail. To fix this, we need to remove the "_orig_mod." prefix from the parameter names before matching them against the regexes. This change needs to be made in both the llama3_like_initialization.py and initialization_routines.py files, wherever we are matching parameter names against regexes.
@le1nux le1nux merged commit 8f84b2d into main Mar 7, 2026
3 checks passed
@le1nux le1nux deleted the fix_compile_weight_init_bug branch March 7, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants