Skip to content

Conversation

@kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Jan 13, 2026

What does this PR do?

  • As per slack discussion, rename modelopt.torch._compress to modelopt.torch.puzzletron
  • Auto-fix some formatting (previously skipped because folder was private (_compress)
  • Fix some doc building errors

Summary by CodeRabbit

  • New Features

    • Introduced Puzzletron as the primary optimization algorithm with updated module structure and comprehensive documentation.
  • Bug Fixes & Improvements

    • Modernized type annotations and improved code quality.
  • Documentation

    • Updated examples and tutorials to reflect Puzzletron functionality.
    • Added new Puzzletron configuration examples and step-by-step guides.
  • Chores

    • Updated dependencies in setup.py to support Puzzletron.
    • Reorganized test utilities and resources.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Large-scale refactoring migrating the modelopt.torch module structure from _compress namespace to puzzletron namespace. Updates include module path renamings, public API function and class renaming (CompressModel → PuzzletronModel, compress → puzzletron), type annotation modernization (Optional[T] → T | None), and configuration updates across examples, tools, and tests.

Changes

Cohort / File(s) Summary
Configuration & Build System
.github/CODEOWNERS, .pre-commit-config.yaml, pyproject.toml, setup.py
Added CODEOWNERS entry for puzzletron; updated pre-commit exclusion paths from _compress to puzzletron; added puzzletron optional dependency group with 8 packages (lru-dict, mip, omegaconf, pandas, typeguard, fire, hydra-core, immutabledict).
Documentation & Examples
examples/pruning/README.md, examples/puzzletron/README.md, examples/puzzletron/configs/*, examples/puzzletron/main.py
Updated tutorial from "Compress Algorithm" to "Puzzletron Algorithm"; updated example code paths from _compress to puzzletron; renamed function run_full_compress to run_full_puzzletron; updated class usage from CompressModel to PuzzletronModel; changed mode strings from "compress" to "puzzletron".
NAS Plugin & Core API
modelopt/torch/nas/plugins/megatron_hooks/base_hooks.py, modelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.py
Renamed public classes: CompressModel → PuzzletronModel, CompressConfig → PuzzletronConfig, CompressDescriptor → PuzzletronDescriptor, CompressSearcher → PuzzletronSearcher; renamed public functions: convert_compress_model → convert_puzzletron_model, restore_compress_model → restore_puzzletron_model; updated imports from _compress.tools to puzzletron.tools.
Puzzletron Main Module
modelopt/torch/puzzletron/puzzletron.py
Renamed public function compress(...) to puzzletron(...); updated all internal imports from _compress.* to puzzletron.*; updated hydra_utils import path.
MIP (Mixed Integer Programming) Modules
modelopt/torch/puzzletron/mip/mip_and_realize_models.py, modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py, modelopt/torch/puzzletron/mip/run_puzzle.py
Updated import paths from _compress to puzzletron; modernized type annotations: List[...]list[...], Optional[T] → `T
Activation Scoring & Pruning
modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py, modelopt/torch/puzzletron/activation_scoring/score_pruning_activations.py, modelopt/torch/puzzletron/pruning/pruning_ckpts.py
Updated import paths from _compress to puzzletron; modernized type annotations (Optional → union syntax); updated progress messages from "Compress Progress" to "Puzzletron Progress".
Library & Stats Building
modelopt/torch/puzzletron/build_library_and_stats.py, modelopt/torch/puzzletron/replacement_library/build_replacement_library.py, modelopt/torch/puzzletron/replacement_library/replacement_library.py, modelopt/torch/puzzletron/replacement_library/replacement_utils.py
Updated import paths; modernized type annotations; consolidated docstrings; changed return type annotations from Type[...] to type[...]; updated module docstrings.
Subblock Stats & Dataset Utilities
modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py, modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py, modelopt/torch/puzzletron/dataset/prepare_dataset.py
Updated import paths from _compress to puzzletron; modernized type annotations; updated script entry point paths in docstrings; refactored dict iteration patterns.
DeciLM Converters
modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py
Updated module import paths for decilm utilities; adjusted docstring formatting.
Scoring & Validation Tools
modelopt/torch/puzzletron/scoring/scoring.py, modelopt/torch/puzzletron/tools/validate_model.py, modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py, modelopt/torch/puzzletron/tools/validation_utils.py
Updated import paths from _compress.tools.* to puzzletron.tools.*; modernized type annotations in function signatures (Optional → union syntax).
Checkpoint & Training Tools
modelopt/torch/puzzletron/tools/checkpoint_utils.py, modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py, modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py, modelopt/torch/puzzletron/tools/post_init_sparse.py, modelopt/torch/puzzletron/tools/bypassed_training/child_init.py, modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
Updated import paths from _compress to puzzletron; modernized type annotations (Optional[T] → T | None); removed .keys() calls in dict membership checks; consolidated docstrings.
Utility & Data Modules
modelopt/torch/puzzletron/utils/checkpoint_manager.py, modelopt/torch/puzzletron/utils/data/dataloaders.py, modelopt/torch/puzzletron/utils/data/dataset.py, modelopt/torch/puzzletron/utils/utils.py, modelopt/torch/puzzletron/utils/validation.py, modelopt/torch/puzzletron/utils/validate_runtime_pipeline.py
Updated import paths; modernized type annotations; changed decorator from @functools.lru_cache(maxsize=None) to @functools.cache; updated Sequence import from collections.abc; consolidated docstrings.
Test Files & Utilities
tests/_test_utils/torch/puzzletron/resources/configs/validate_model_defaults.yaml, tests/_test_utils/torch/puzzletron/utils.py, tests/gpu/torch/puzzletron/decilm/converters/test_convert_llama3_config_to_decilm_config.py, tests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.py, tests/gpu/torch/puzzletron/nas/plugins/test_nas_search.py, tests/gpu/torch/puzzletron/test_puzzletron.py
Updated test imports to use PuzzletronModel instead of CompressModel; updated test utility imports from gpu.compress to puzzletron paths; renamed test function from test_compress to test_puzzletron; updated hydra config directory paths; updated mode strings from "compress" to "puzzletron".

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: renaming the compress module to puzzletron across the entire codebase.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 removed the request for review from realAsma January 13, 2026 19:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
modelopt/torch/puzzletron/utils/checkpoint_manager.py (1)

136-139: Method name mismatch: load_state should be load_state_dict.

Line 138 calls hook.load_state(...), but the ForwardHook abstract base class in modelopt/torch/nas/plugins/megatron_hooks/base_hooks.py declares load_state_dict(...) as the abstract method. This will cause an AttributeError at runtime since no load_state method exists on hook implementations.

🐛 Proposed fix
                 if module_name in hook_states:
-                        hook.load_state(hook_states[module_name])
+                        hook.load_state_dict(hook_states[module_name])
                         loaded_count += 1
modelopt/torch/puzzletron/replacement_library/build_replacement_library.py (1)

127-135: Bug: String literal {PUZZLE_DIR} is not interpolated.

The error message uses '{PUZZLE_DIR}/ckpts' inside a regular string, not an f-string, so the placeholder will be displayed literally instead of being substituted with the actual path.

🐛 Proposed fix
         if not teacher_checkpoint_dir.exists():
             raise ValueError(
-                "You must either provide the --teacher_checkpoint_dir argument, or create a link to the "
-                "teacher dir under '{PUZZLE_DIR}/ckpts'."
+                f"You must either provide the --teacher_checkpoint_dir argument, or create a link to the "
+                f"teacher dir under '{master_puzzle_dir}/ckpts'."
             )
modelopt/torch/puzzletron/puzzletron.py (1)

58-75: Renumber steps to be sequential - Step 3 is missing from the sequence.

The step comments skip from "Step 2" (line 61) directly to "Step 4" (line 66). Renumber the subsequent steps to "Step 3", "Step 4", and "Step 5" for clarity and consistency.

    # Step 2: pruning_ckpts (single process)
    if dist.is_master():
        pruning_ckpts.launch_prune_ckpt(hydra_cfg)
    dist.barrier()

    # Step 3: build_library_and_stats (single process)
    if dist.is_master():
        build_library_and_stats.launch_build_library_and_stats(hydra_cfg)
    dist.barrier()

    # Step 4: calc_one_block_scores (distributed processing)
    scoring.launch_scoring(hydra_cfg)

    # Step 5: mip_and_realize_models (distributed processing)
    mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)
modelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.py (1)

49-89: Typo in comment: "prunining" → "pruning".

Line 84 has a typo in the comment.

✏️ Proposed fix
-    # Dataset path to use for scoring in prunining and NAS search
+    # Dataset path to use for scoring in pruning and NAS search
modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py (1)

165-169: Bug: Incorrect condition will always evaluate to False.

The condition isinstance(mlp_init_config_yaml, str) is None is incorrect. isinstance() returns a boolean, so comparing it to None with is will always be False. This means yaml.safe_load() will never be called, and mlp_init_config will always be set to mlp_init_config_yaml regardless of its type.

The intent appears to be: parse YAML if the input is a string, otherwise use as-is.

🐛 Proposed fix
     mlp_init_config = (
         yaml.safe_load(mlp_init_config_yaml)
-        if isinstance(mlp_init_config_yaml, str) is None
+        if mlp_init_config_yaml is not None and isinstance(mlp_init_config_yaml, str)
         else mlp_init_config_yaml
     )

Alternatively, if the intent is only to parse when it's a non-None string:

mlp_init_config = (
    yaml.safe_load(mlp_init_config_yaml)
    if isinstance(mlp_init_config_yaml, str)
    else mlp_init_config_yaml
)
🤖 Fix all issues with AI agents
In @modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py:
- Line 457: Remove the erroneous duplicate TypeVar declaration "T_DataClass:
TypeVar = type[dataclasses.dataclass]" (which shadows the correct one)—delete
that line so the original proper declaration T_DataClass =
TypeVar("T_DataClass") (near the top) remains and _dataclass_from_dict and other
code use the correct TypeVar.
🧹 Nitpick comments (10)
modelopt/torch/puzzletron/utils/utils.py (1)

95-97: Pre-existing bug: Missing f-string prefix.

The string literal at line 96 won't interpolate {file_path} - it's missing the f prefix. This is a pre-existing issue, but worth fixing while you're in this file.

Suggested fix
     if not os.path.exists(file_path):
-        print("file does not exist {file_path}")
+        print(f"file does not exist {file_path}")
         return None
examples/pruning/README.md (1)

10-10: Minor grammar fix: hyphenate compound adjective.

Per static analysis, "MIP based" should be "MIP-based" when used as a compound adjective before a noun.

Suggested fix
-1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA Research using Mixed Integer Programming (MIP) based NAS search algorithm.
+1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA Research using Mixed Integer Programming (MIP)-based NAS search algorithm.
modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py (1)

16-18: Misplaced shebang line has no effect.

The shebang #!/usr/bin/env python3 on line 18 appears after the license header and docstring. For a shebang to be effective, it must be the very first line of the file. In its current position, it's just a comment with no functional impact. Consider either moving it to line 1 (before the license) or removing it entirely since this module is typically imported rather than executed directly.

tests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.py (2)

72-78: Variable shadowing: rank is redefined inside the conditional block.

The rank parameter on line 39 is shadowed by the redefinition on line 74 (rank = int(os.environ["RANK"])). This works because the code is inside an if rank == 0 block, but it's confusing and could cause maintenance issues.

♻️ Suggested improvement
     if rank == 0:
         # assertions for the score_pruning_activations step
-        rank = int(os.environ["RANK"])
+        env_rank = int(os.environ["RANK"])
         rank_filepath = (
-            f"pruning/pruning_scores/ffn_iterative/100samples_diverse_mini/rank_{rank}.pth"
+            f"pruning/pruning_scores/ffn_iterative/100samples_diverse_mini/rank_{env_rank}.pth"
         )

130-137: Same variable shadowing issue exists here.

Line 132 has the same rank redefinition pattern as the FFN pruning test.

♻️ Suggested improvement
     if rank == 0:
         # assertions for the score_pruning_activations step
-        rank = int(os.environ["RANK"])
+        env_rank = int(os.environ["RANK"])
         rank_filepath = (
             f"pruning/pruning_scores/attn_independent_kv_head_contribution/"
-            f"100samples_diverse_mini/rank_{rank}.pth"
+            f"100samples_diverse_mini/rank_{env_rank}.pth"
         )
tests/gpu/torch/puzzletron/test_puzzletron.py (1)

30-33: Comment still references "compress" terminology.

The comment on line 30 refers to "compress a model" which is inconsistent with the rename to Puzzletron. Consider updating for consistency.

Suggested update
-# The e2e test to compress a model based on Local Neural Architecture Search (Mixed Integer Programing NAS search)
+# The e2e test to run Puzzletron based on Local Neural Architecture Search (Mixed Integer Programming NAS search)
 # using a one-click command.
examples/puzzletron/main.py (2)

66-72: Docstring parameter name mismatch.

The docstring references config_path but the actual parameter is hydra_config_path.

Suggested fix
 def run_full_puzzletron(hydra_config_path: str):
     """Run the full puzzletron pipeline.

     Args:
-        config_path: Path to the YAML configuration file
+        hydra_config_path: Path to the YAML configuration file
     """

147-148: TODO comment references the renamed function.

The TODO comment now correctly references run_full_puzzletron() API. Consider tracking this TODO if it represents planned work.

Would you like me to open an issue to track integrating the MIP-only path into the mtn.search() API?

modelopt/torch/puzzletron/tools/bypassed_training/child_init.py (2)

853-856: Consider specifying encoding when opening files.

The file is opened without an explicit encoding parameter. For JSON files, it's good practice to specify encoding="utf-8" to ensure consistent behavior across platforms.

Suggested fix
-        with open(expert_scores_file) as f:
+        with open(expert_scores_file, encoding="utf-8") as f:
             expert_scores = json.load(f)

1382-1385: Consider specifying encoding when opening files.

Similar to line 855, the file is opened without an explicit encoding parameter.

Suggested fix
-            with open(channel_importance_path) as f:
+            with open(channel_importance_path, encoding="utf-8") as f:
                 channel_ranking = json.load(f)["channel_importance_ranking"]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0eecfc6 and 861c9fe.

📒 Files selected for processing (105)
  • .github/CODEOWNERS
  • .pre-commit-config.yaml
  • examples/pruning/README.md
  • examples/puzzletron/README.md
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/main.py
  • modelopt/torch/nas/plugins/megatron_hooks/base_hooks.py
  • modelopt/torch/puzzletron/README.md
  • modelopt/torch/puzzletron/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py
  • modelopt/torch/puzzletron/activation_scoring/score_pruning_activations.py
  • modelopt/torch/puzzletron/build_library_and_stats.py
  • modelopt/torch/puzzletron/dataset/__init__.py
  • modelopt/torch/puzzletron/dataset/prepare_dataset.py
  • modelopt/torch/puzzletron/decilm/conversion_utils.py
  • modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/__init__.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__megatron_tokenizer.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/megatron_lm__tokenizer.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/tokenization_decilm.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__activations.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__cache_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__configuration_llama.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_attn_mask_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_flash_attention_utils_backward_compat.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_outputs.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__modeling_rope_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_44_2__pytorch_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__cache_utils.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__configuration_llama4.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/transformers_4_51_3__modeling_llama4_attention.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/variable_cache.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/vllm_yarn_utils.py
  • modelopt/torch/puzzletron/mip/mip_and_realize_models.py
  • modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.py
  • modelopt/torch/puzzletron/pruning/pruning_ckpts.py
  • modelopt/torch/puzzletron/puzzletron.py
  • modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_utils.py
  • modelopt/torch/puzzletron/scoring/scoring.py
  • modelopt/torch/puzzletron/sewing_kit/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/passage/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/passage/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/__init__.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • modelopt/torch/puzzletron/tools/common.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/tools/kd_model.py
  • modelopt/torch/puzzletron/tools/logger.py
  • modelopt/torch/puzzletron/tools/post_init_sparse.py
  • modelopt/torch/puzzletron/tools/robust_json.py
  • modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/validate_model.py
  • modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
  • modelopt/torch/puzzletron/tools/validation_utils.py
  • modelopt/torch/puzzletron/utils/checkpoint_manager.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • modelopt/torch/puzzletron/utils/utils.py
  • modelopt/torch/puzzletron/utils/validate_runtime_pipeline.py
  • modelopt/torch/puzzletron/utils/validation.py
  • pyproject.toml
  • setup.py
  • tests/_test_utils/torch/puzzletron/resources/configs/Llama-3_1-8B-attn-pruning.yaml
  • tests/_test_utils/torch/puzzletron/resources/configs/Llama-3_1-8B-ffn-pruning.yaml
  • tests/_test_utils/torch/puzzletron/resources/configs/pruning/attn_pruning.yaml
  • tests/_test_utils/torch/puzzletron/resources/configs/pruning/ffn_pruning.yaml
  • tests/_test_utils/torch/puzzletron/resources/configs/pruning/hidden_dim_pruning.yaml
  • tests/_test_utils/torch/puzzletron/resources/configs/pruning/pruning_defaults.yaml
  • tests/_test_utils/torch/puzzletron/resources/configs/validate_model_defaults.yaml
  • tests/_test_utils/torch/puzzletron/resources/configs/validate_solutions_defaults.yaml
  • tests/_test_utils/torch/puzzletron/resources/tokenizer/special_tokens_map.json
  • tests/_test_utils/torch/puzzletron/resources/tokenizer/tokenizer.json
  • tests/_test_utils/torch/puzzletron/resources/tokenizer/tokenizer_config.json
  • tests/_test_utils/torch/puzzletron/resources/tokenizer/truncate_tokenizer.py
  • tests/_test_utils/torch/puzzletron/utils.py
  • tests/gpu/torch/puzzletron/conftest.py
  • tests/gpu/torch/puzzletron/decilm/converters/test_convert_llama3_config_to_decilm_config.py
  • tests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.py
  • tests/gpu/torch/puzzletron/nas/plugins/test_nas_search.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py
🧰 Additional context used
🧬 Code graph analysis (27)
modelopt/torch/puzzletron/activation_scoring/score_pruning_activations.py (2)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/tools/validate_model.py (1)
  • validate_model (67-216)
tests/_test_utils/torch/puzzletron/utils.py (2)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
modelopt/torch/puzzletron/tools/hydra_utils.py (1)
  • register_hydra_resolvers (39-48)
modelopt/torch/puzzletron/tools/checkpoint_utils.py (2)
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)
  • load_model_config (102-121)
modelopt/torch/puzzletron/tools/common.py (1)
  • infer_weights_dtype (19-22)
modelopt/torch/puzzletron/utils/data/dataloaders.py (2)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/utils/data/dataset.py (1)
  • ConstantLengthDataset (30-220)
modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py (3)
modelopt/torch/puzzletron/replacement_library/replacement_utils.py (1)
  • parse_layer_replacement (29-43)
modelopt/torch/puzzletron/tools/checkpoint_utils.py (1)
  • copy_tokenizer (160-192)
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (2)
  • copy_deci_lm_hf_code (443-449)
  • save_safetensors_index (322-363)
tests/gpu/torch/puzzletron/test_puzzletron.py (3)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
tests/_test_utils/torch/puzzletron/utils.py (1)
  • setup_test_model_and_data (28-69)
modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py (1)
  • convert_llama3_to_decilm (144-149)
modelopt/torch/puzzletron/mip/mip_and_realize_models.py (2)
modelopt/torch/puzzletron/mip/run_puzzle.py (1)
  • run_puzzle (430-477)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/nas/plugins/megatron_hooks/base_hooks.py (3)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
modelopt/torch/puzzletron/tools/logger.py (1)
  • aprint (141-145)
modelopt/torch/puzzletron/tools/robust_json.py (1)
  • json_dump (62-66)
modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py (4)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (2)
  • DeciLMDecoderLayer (1011-1548)
  • DeciLMForCausalLM (2435-2630)
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)
  • load_model_config (102-121)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/utils/utils.py (1)
  • EmptyInitOnDevice (217-258)
modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py (5)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (1)
  • DeciLMForCausalLM (2435-2630)
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py (5)
  • GQAInitMode (45-52)
  • HiddenSizeInitMode (70-74)
  • LinearInitMode (65-67)
  • MlpInitMode (55-62)
  • create_child_state_dict (326-512)
modelopt/torch/puzzletron/tools/checkpoint_utils.py (1)
  • copy_tokenizer (160-192)
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (3)
  • load_model_config (102-121)
  • _save_checkpoint (128-190)
  • copy_deci_lm_hf_code (443-449)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/mip/run_puzzle.py (3)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/tools/robust_json.py (1)
  • json_dump (62-66)
modelopt/torch/puzzletron/utils/parsing.py (2)
  • parse_json (63-66)
  • parse_path (69-72)
modelopt/torch/puzzletron/build_library_and_stats.py (4)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
modelopt/torch/puzzletron/replacement_library/build_replacement_library.py (2)
  • build_replacement_library (69-104)
  • launch_build_replacement_library (107-120)
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (1)
  • launch_calc_subblock_stats (217-241)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/tools/validation_utils.py (3)
modelopt/torch/puzzletron/tools/validate_model.py (1)
  • validate_model (67-216)
modelopt/torch/puzzletron/tools/robust_json.py (1)
  • json_dump (62-66)
modelopt/torch/puzzletron/utils/validation.py (1)
  • LowMemorySparseTensor (68-85)
modelopt/torch/puzzletron/tools/post_init_sparse.py (1)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (1)
  • DeciLMForCausalLM (2435-2630)
modelopt/torch/puzzletron/pruning/pruning_ckpts.py (3)
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py (1)
  • GQAInitMode (45-52)
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)
  • load_model_config (102-121)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/scoring/scoring.py (3)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
modelopt/torch/puzzletron/tools/hydra_utils.py (1)
  • register_hydra_resolvers (39-48)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
tests/gpu/torch/puzzletron/decilm/converters/test_convert_llama3_config_to_decilm_config.py (3)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
tests/_test_utils/torch/puzzletron/utils.py (2)
  • create_and_save_small_llama_model (72-107)
  • create_tokenizer (110-116)
modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py (1)
  • convert_llama3_to_decilm (144-149)
modelopt/torch/puzzletron/dataset/prepare_dataset.py (1)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/utils/validate_runtime_pipeline.py (8)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (2)
  • DeciLMForCausalLM (2435-2630)
  • LMHead (2429-2432)
modelopt/torch/puzzletron/sewing_kit/core.py (6)
  • ExternalTarget (199-202)
  • ModuleTarget (226-240)
  • Needle (791-883)
  • RemoteTarget (244-260)
  • StitchedModule (342-764)
  • InputReducer (71-89)
modelopt/torch/puzzletron/sewing_kit/passage/core.py (1)
  • InputArgs (38-79)
modelopt/torch/puzzletron/sewing_kit/utils.py (2)
  • distributed_recv_obj (406-422)
  • distributed_send_obj (395-403)
modelopt/torch/puzzletron/tools/checkpoint_utils.py (1)
  • init_module_with_state_dict (99-108)
modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py (1)
  • DummyBlock (62-73)
modelopt/torch/puzzletron/utils/validation.py (1)
  • _organize_outputs (210-222)
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (4)
modelopt/torch/puzzletron/tools/common.py (1)
  • infer_weights_dtype (19-22)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/tools/post_init_sparse.py (1)
  • SparsityMethod (28-95)
modelopt/torch/puzzletron/tools/robust_json.py (1)
  • json_dumps (58-59)
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py (5)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py (3)
  • BlockConfig (297-308)
  • _get_dataclass_type (96-109)
  • _is_dataclass_type (85-93)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (1)
  • DeciLMConfig (39-202)
modelopt/torch/opt/config.py (1)
  • keys (132-134)
modelopt/torch/quantization/qtensor/base_qtensor.py (1)
  • dim (110-112)
modelopt/onnx/quantization/graph_utils.py (1)
  • layer_idx (808-810)
modelopt/torch/puzzletron/puzzletron.py (1)
modelopt/torch/puzzletron/tools/hydra_utils.py (1)
  • initialize_hydra_config_for_dir (51-71)
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (5)
modelopt/torch/puzzletron/replacement_library/replacement_utils.py (1)
  • parse_layer_replacement (29-43)
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)
  • load_model_config (102-121)
modelopt/torch/puzzletron/tools/logger.py (1)
  • mprint (155-159)
modelopt/torch/puzzletron/tools/robust_json.py (1)
  • json_dump (62-66)
modelopt/torch/puzzletron/utils/parsing.py (1)
  • format_global_config (202-300)
modelopt/torch/puzzletron/replacement_library/replacement_utils.py (4)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py (1)
  • BlockConfig (297-308)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (1)
  • DeciLMConfig (39-202)
modelopt/torch/puzzletron/mip/utils.py (1)
  • sort_replacements (27-36)
modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py (4)
modelopt/torch/puzzletron/puzzletron.py (1)
  • puzzletron (32-77)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py (2)
  • FFNConfig (257-287)
  • MambaConfig (161-165)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/configuration_decilm.py (1)
  • DeciLMConfig (39-202)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (1)
  • DeciLMMoe (2270-2368)
modelopt/torch/puzzletron/utils/checkpoint_manager.py (1)
modelopt/torch/puzzletron/tools/logger.py (2)
  • aprint (141-145)
  • mprint (155-159)
modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py (1)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py (1)
  • DeciLMForCausalLM (2435-2630)
🪛 LanguageTool
examples/pruning/README.md

[grammar] ~10-~10: Use a hyphen to join words.
Context: ...ch using Mixed Integer Programming (MIP) based NAS search algorithm. 1. FastNAS: ...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (113)
.github/CODEOWNERS (1)

27-27: LGTM!

The new CODEOWNERS entry for modelopt/torch/puzzletron follows the established naming convention and is correctly placed in alphabetical order among the other modelopt/torch paths.

Ensure the GitHub team @NVIDIA/modelopt-torch-puzzletron-codeowners has been created before merging this PR, otherwise code ownership rules won't apply.

.pre-commit-config.yaml (2)

27-38: LGTM!

The exclusion paths for ruff-check and ruff-format are consistently updated from _compress to puzzletron. The comment reference to test_puzzletron.py aligns with the broader rename.


110-110: LGTM!

The license header exclusion path is correctly updated to match the new puzzletron namespace.

modelopt/torch/puzzletron/tools/post_init_sparse.py (3)

20-20: LGTM!

Import path correctly updated to align with the package rename from _compress to puzzletron.


29-30: LGTM!

Docstring normalization looks good. Note that this abstract-like method relies on subclasses (e.g., SparsityMethod2o4) to provide actual implementations.


99-100: LGTM!

Docstring normalization consistent with the parent class method.

modelopt/torch/puzzletron/utils/utils.py (2)

217-258: LGTM!

The EmptyInitOnDevice class implementation is clean. The docstring reformatting maintains clarity.


24-28: Import path update is correct.

The import path has been updated from modelopt.torch._compress to modelopt.torch.puzzletron to reflect the package rename, and all three imported classes (AttentionConfig, BlockConfig, FFNConfig) are properly defined in the target module.

modelopt/torch/puzzletron/pruning/pruning_ckpts.py (4)

29-39: LGTM!

Import paths correctly updated to the new puzzletron namespace.


153-163: LGTM!

Plain strings are appropriate here since no variable interpolation is needed. Good practice to avoid unnecessary f-string overhead.


330-331: LGTM!

Raw string literal is the correct choice for regex patterns containing escape sequences.


43-43: No action required—Python version is already 3.10+.

The project explicitly targets Python 3.10+ via the target-version = "py310" configuration in pyproject.toml. The int | None union syntax (PEP 604) is natively supported at this version, so no compatibility issue exists and from __future__ import annotations is unnecessary.

Likely an incorrect or invalid review comment.

pyproject.toml (1)

83-102: LGTM!

The lint ignore path correctly updated from _compress to puzzletron namespace. The TODO comment appropriately reflects the ongoing migration.

examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml (1)

11-21: LGTM!

The comment updates correctly reflect the rename to puzzletron. Configuration values are unchanged.

modelopt/torch/puzzletron/utils/validation.py (2)

36-36: LGTM!

The import path correctly updated from _compress to puzzletron namespace.


16-20: Docstring formatting looks good.

The module docstring correctly describes the validation utilities.

modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py (1)

30-30: LGTM on import path update!

The import path has been correctly updated to the new puzzletron namespace.

modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py (3)

31-42: Import paths correctly updated to puzzletron namespace.

The imports have been properly migrated from modelopt.torch._compress.* to modelopt.torch.puzzletron.*, aligning with the PR objective.


119-122: Removal of explicit file mode is acceptable.

The explicit "r" mode was removed from open(stats_file). This is safe since "r" (read text mode) is the default for open().


181-184: Docstring reformatting approved.

Minor docstring formatting changes without semantic impact.

modelopt/torch/puzzletron/scoring/scoring.py (1)

29-33: Import paths correctly migrated to puzzletron namespace.

The imports have been properly updated from modelopt.torch._compress.tools.* to modelopt.torch.puzzletron.tools.*. The imported utilities (register_hydra_resolvers, mprint, validate_puzzle_solutions) are available at the new locations as confirmed by the relevant code snippets.

modelopt/torch/puzzletron/decilm/converters/convert_llama3_to_decilm.py (1)

25-28: Import paths correctly migrated to puzzletron namespace.

The imports have been properly updated from the old _compress namespace to modelopt.torch.puzzletron.*.

modelopt/torch/nas/plugins/megatron_hooks/base_hooks.py (1)

29-30: Import paths correctly migrated to puzzletron namespace.

The imports for aprint and json_dump have been properly updated from modelopt.torch._compress.tools.* to modelopt.torch.puzzletron.tools.*. The cross-package dependency (nas → puzzletron) is consistent with the broader refactoring in this PR.

modelopt/torch/puzzletron/utils/checkpoint_manager.py (3)

21-24: Type annotation modernization and import path updates look good.

The removal of Optional from typing imports and the migration to modelopt.torch.puzzletron.tools.logger are consistent with the PR objectives and Python 3.10+ type annotation style.


63-64: Return type annotation correctly modernized.

The return type dict[str, Any] | None uses the modern union syntax, consistent with the codebase's type annotation style.


75-76: Removal of explicit file mode is acceptable.

The explicit "r" mode was removed from open(self.progress_file). This is safe since read text mode is the default for open().

modelopt/torch/puzzletron/tools/validate_model.py (2)

38-50: LGTM!

Import paths correctly updated to the new puzzletron namespace. The imports align with the module's new location under modelopt.torch.puzzletron.


158-158: LGTM!

Lazy import of ScoringCheckpointManager correctly updated to the puzzletron namespace.

tests/_test_utils/torch/puzzletron/utils.py (3)

25-25: LGTM!

Import path correctly updated to the new puzzletron namespace for hydra utilities.


32-32: LGTM!

Docstring correctly updated to reflect the new "puzzletron" terminology.


114-114: Tokenizer resource directory is properly configured. The directory tests/_test_utils/torch/puzzletron/resources/tokenizer exists and contains all required tokenizer files (special_tokens_map.json, tokenizer.json, tokenizer_config.json, and truncate_tokenizer.py).

tests/gpu/torch/puzzletron/decilm/converters/test_convert_llama3_config_to_decilm_config.py (1)

19-23: LGTM!

Import paths correctly updated:

  • Test utilities now imported from _test_utils.torch.puzzletron.utils
  • Converter import updated to modelopt.torch.puzzletron.decilm.converters

Both align with the namespace refactor.

modelopt/torch/puzzletron/utils/validate_runtime_pipeline.py (2)

31-51: LGTM!

All import paths correctly updated to the puzzletron namespace:

  • DeciLM model classes from puzzletron.decilm
  • Sewing kit components from puzzletron.sewing_kit
  • Utility modules from puzzletron.tools and puzzletron.utils

The imports are consistent with the module structure shown in the relevant code snippets.


72-72: LGTM!

Minor docstring formatting change moving content to the same line as the opening quotes.

modelopt/torch/puzzletron/activation_scoring/score_pruning_activations.py (1)

22-23: LGTM!

Import paths correctly updated to the puzzletron namespace:

  • mprint from puzzletron.tools.logger
  • validate_model from puzzletron.tools.validate_model

Both imports align with the module locations confirmed in the relevant code snippets.

setup.py (1)

105-115: LGTM! Dependency group renamed and expanded consistently.

The rename from compress to puzzletron aligns with the broader namespace refactoring. The comment correctly references the new subpackage path modelopt.torch.puzzletron.

modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py (4)

22-33: LGTM! Import modernization and namespace update.

The move of Hashable and Iterable to collections.abc follows Python 3.9+ best practices. The import path update from _compress to puzzletron is consistent with the PR's renaming objective.


45-45: Type annotation modernization looks good.

The float | None syntax is valid given the project's Python 3.10+ requirement specified in setup.py.


63-77: LGTM! Idiomatic dict iteration.

Iterating directly over the dict (for constraint_key in constraints) is equivalent to iterating over .keys() and is the preferred Pythonic style.


110-120: LGTM! Consistent dict initialization and iteration.

Using dict.fromkeys(constraints.keys(), 0) is an appropriate pattern for initializing a dict with a uniform default value. The direct iteration over constraints in the loop is consistent with earlier changes.

modelopt/torch/puzzletron/utils/data/dataset.py (3)

17-17: LGTM! Import modernization.

Moving Sequence from typing to collections.abc follows Python 3.9+ conventions.


61-64: LGTM! Type annotation modernization.

The union syntax Sequence[str] | None and int | None is cleaner than Optional[...] and valid for Python 3.10+.


290-291: Good simplification using @functools.cache.

@functools.cache (Python 3.9+) is equivalent to @functools.lru_cache(maxsize=None) and is more concise. Note that this caches based on the tokenizer object's identity/hash, which is the expected behavior here.

modelopt/torch/puzzletron/tools/validation_utils.py (3)

24-38: LGTM! Namespace migration and import cleanup.

The import paths correctly reference the new puzzletron namespace. The TYPE_CHECKING import for StitchedModule properly avoids circular imports at runtime.


47-50: LGTM! Function signature with modernized type hints.

The dict[str, Any] | None syntax is cleaner and consistent with the codebase-wide type annotation updates.


80-84: LGTM! Consistent type annotation style.

Matches the pattern used in validate_model_and_extract_hidden_states.

modelopt/torch/puzzletron/mip/run_puzzle.py (4)

23-49: LGTM! Import reorganization and namespace migration.

The imports have been correctly migrated to:

  1. collections.abc for Hashable and Iterable (Python 3.9+ best practice)
  2. New puzzletron namespace for all internal module imports

422-430: LGTM! Modernized generic type hints.

Using list[dict] and list[str] instead of List[dict] and List[str] is the preferred style for Python 3.9+.


582-584: LGTM! Clean default value initialization with override.

The pattern {**dict.fromkeys(all_metric_names, 0.0), **_extract_average_metrics(raw_metrics)} elegantly sets default values for all metrics while allowing real values to override them.


599-604: LGTM! Direct dict iteration.

Iterating directly over raw_metrics is equivalent to raw_metrics.keys() and is the idiomatic Python style.

modelopt/torch/puzzletron/puzzletron.py (2)

16-29: Import paths and module docstring updated correctly for the namespace refactor.

The imports have been properly migrated from _compress to puzzletron namespace, and the module docstring is appropriately updated.


32-47: Function renamed and documentation updated appropriately.

The public API rename from compress to puzzletron is consistent with the PR objectives. The TODO comment format change from @TODO to TODO is a minor improvement.

modelopt/torch/puzzletron/replacement_library/build_replacement_library.py (2)

15-63: Import paths and module docstring correctly updated for the namespace refactor.

The imports have been properly migrated to the puzzletron namespace, and the usage documentation in the docstring is updated accordingly.


378-388: Type annotation modernized correctly.

The return type annotation uses Python 3.10+ style type[AttentionConfig] | type[FFNConfig] instead of Type[...] from typing, which is consistent with other type annotation updates in this PR.

modelopt/torch/puzzletron/replacement_library/replacement_library.py (4)

15-58: Import paths and module docstring correctly updated for the namespace refactor.

All imports have been properly migrated from _compress to puzzletron namespace.


62-66: Type annotation modernized correctly.

The parameter type annotation dict | None is consistent with Python 3.10+ style used throughout this refactor.


221-225: Minor iteration simplification.

Iterating directly over state_dict instead of state_dict.keys() is semantically equivalent and slightly more idiomatic.


319-322: Simplified string literal.

Using a plain string "non_block.pth" instead of an f-string is appropriate since no interpolation is needed.

modelopt/torch/puzzletron/replacement_library/replacement_utils.py (2)

15-26: Import paths correctly updated for the namespace refactor.

The imports have been properly migrated to the puzzletron namespace, including the relocated sort_replacements function from mip.utils.


46-46: Comment updated to reflect relocated function.

The comment correctly documents that sort_replacements was moved to modelopt.torch.puzzletron.mip.utils.

modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (5)

22-54: Import paths correctly updated for the namespace refactor.

All imports have been properly migrated to the puzzletron namespace. The Iterable import from collections.abc is the modern approach (Python 3.9+).


56-61: Type variable and usage documentation updated.

The T_DataClass TypeVar declaration and the usage example path are updated consistently.


69-92: Type annotation modernized correctly.

The benchmark_iterations: int | None parameter uses the modern union syntax consistently with other files in this PR.


217-223: Docstring condensed appropriately.

The single-line docstring is sufficient for this launcher function.


520-526: Minor iteration change in list comprehension.

The iteration for key in corresponding_bf16_args is semantically equivalent to the previous form. The change is neutral.

modelopt/torch/puzzletron/build_library_and_stats.py (2)

17-37: LGTM on namespace migration.

The module docstring and import paths have been correctly updated to the puzzletron namespace. The usage example in the docstring (line 25) properly references the new module path.


40-85: LGTM on the unified workflow function.

The function correctly orchestrates both steps with proper error handling and informative logging. The defensive hasattr check on line 84 for the optional moe_stats_filename is appropriate.

modelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.py (4)

26-46: LGTM on import updates.

All imports correctly migrated to the puzzletron namespace, maintaining consistency with the broader refactor.


92-138: LGTM on the convert function.

The function is correctly renamed and the progress messages are consistently updated to use "Puzzletron Progress". The TODO on line 116 regarding making the converter generic is noted.


141-183: LGTM on restore function and descriptor.

The restore_puzzletron_model function and PuzzletronDescriptor class are correctly renamed with the mode name updated to "puzzletron". All property returns reference the correct Puzzletron-prefixed types.


186-219: LGTM on the searcher implementation.

The PuzzletronSearcher class is correctly renamed with updated docstrings and progress messages. The search workflow steps (5-7/8) are properly orchestrated.

tests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.py (2)

23-27: LGTM on import updates.

Test utility and model imports correctly updated to the puzzletron namespace.


46-67: LGTM on test configuration updates.

The hydra config directory path and mode string are correctly updated to puzzletron. Test assertions remain unchanged, ensuring backward compatibility of the functionality.

modelopt/torch/puzzletron/utils/data/dataloaders.py (1)

16-32: LGTM on namespace migration.

The docstring is appropriately concise, and the import paths are correctly updated to the puzzletron namespace. No functional changes to the dataloader utilities.

modelopt/torch/puzzletron/dataset/prepare_dataset.py (1)

22-22: LGTM on import update.

The mprint import is correctly updated to the puzzletron namespace. No functional changes to the dataset preparation logic.

tests/_test_utils/torch/puzzletron/resources/configs/validate_model_defaults.yaml (1)

17-17: LGTM!

The load_dataset_fn path is correctly updated to reference the new puzzletron namespace, consistent with the broader refactor.

modelopt/torch/puzzletron/tools/checkpoint_utils.py (2)

33-34: LGTM!

Import paths are correctly updated to the new puzzletron namespace, aligning with the refactor.


58-60: LGTM!

The local import path is correctly updated to the puzzletron namespace while preserving the circular import avoidance pattern.

examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml (1)

17-17: LGTM!

The load_dataset_fn path is correctly updated to reference the new puzzletron namespace, consistent with other config files in this refactor.

modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py (2)

34-52: LGTM!

All import paths are correctly updated to the puzzletron namespace, maintaining proper module dependencies.


272-276: LGTM!

The type annotation modernization from Optional[str] to str | None is appropriate and consistent with Python 3.10+ conventions used elsewhere in this refactor.

modelopt/torch/puzzletron/mip/mip_and_realize_models.py (2)

25-29: LGTM!

Import paths are correctly updated to the puzzletron namespace, properly importing run_puzzle, mprint, and validate_puzzle_solutions from their new locations.


32-34: LGTM!

The return type annotation modernization from List[str] to list[str] uses Python 3.9+ built-in generics, which is consistent with the type annotation style used across this refactor.

modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (3)

36-42: LGTM! Import paths correctly updated to puzzletron namespace.

The imports are consistently updated from the old _compress namespace to the new puzzletron namespace, aligning with the PR objectives.


72-77: LGTM! Docstring and import correctly updated.

The import path is properly updated to reference the new puzzletron.tools.checkpoint_utils module.


193-196: LGTM! Consistent import path update.

Same import path update applied consistently within split_checkpoint_to_subblocks.

modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py (2)

38-46: LGTM! Import paths correctly migrated to puzzletron namespace.

All imports are consistently updated:

  • configuration_decilm and modeling_decilm from puzzletron.decilm
  • checkpoint_utils and logger from puzzletron.tools
  • EmptyInitOnDevice from puzzletron.utils.utils

243-246: LGTM! Sewing kit import path updated.

The import path for distributed utilities is correctly updated to the new puzzletron namespace.

tests/gpu/torch/puzzletron/nas/plugins/test_nas_search.py (2)

22-26: LGTM! Imports correctly updated for puzzletron namespace.

The test utilities and model class imports are properly migrated from _compress to puzzletron.


45-66: LGTM! Test configuration correctly updated.

The hydra config directory path and mode string are properly updated to align with the puzzletron namespace migration. The PuzzletronModel is correctly instantiated, the mode is set to "puzzletron", and the referenced config file Llama-3_1-8B-ffn-pruning.yaml exists at the specified path.

examples/puzzletron/README.md (4)

1-3: LGTM! Documentation title and description correctly updated.

The README properly reflects the rename from "compress" to "puzzletron" while maintaining the reference to the Puzzle paper.


19-19: LGTM! Installation dependency correctly updated.

The pip install command correctly references the new [puzzletron] extra instead of [compress].


37-37: LGTM! Dataset preparation command updated.

The module path is correctly updated to modelopt.torch.puzzletron.dataset.prepare_dataset.


57-70: LGTM! CLI commands and progress labels consistently updated.

All example commands and progress log output correctly reference the new puzzletron namespace and entry points.

modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py (4)

25-43: LGTM! Import paths correctly migrated to puzzletron namespace.

All imports are consistently updated to reference the new puzzletron module paths.


88-88: LGTM! CLI usage example correctly updated.

The module invocation path in the usage docstring is properly updated to modelopt.torch.puzzletron.tools.bypassed_training.init_child_from_parent.


103-108: LGTM! Type annotations modernized to PEP 604 union syntax.

The nullable type annotations are correctly updated from Optional[T] to T | None syntax.


210-219: LGTM! Profiling summary output formatting is clean.

The profiling summary formatting changes are minor and improve readability.

tests/gpu/torch/puzzletron/test_puzzletron.py (3)

22-28: LGTM!

Import paths correctly updated to the new puzzletron namespace. The imports align with the module structure changes in the PR.


36-46: LGTM!

Function renames from test_compress to test_puzzletron and _test_compress_multiprocess_job to _test_puzzletron_multiprocess_job are consistent with the PR objectives.


52-66: LGTM!

Config path correctly updated to tests/_test_utils/torch/puzzletron/resources/configs and the API call updated to puzzletron.puzzletron(...). The changes are consistent with the module rename.

examples/puzzletron/main.py (3)

36-43: LGTM!

Import paths correctly updated to the puzzletron namespace. The imports are consistent with the module restructuring.


91-106: LGTM!

Model class updated to PuzzletronModel and mode string correctly changed from "compress" to "puzzletron". The conversion flow aligns with the renamed NAS plugin.


155-162: LGTM!

The main() function correctly calls run_full_puzzletron in the non-MIP path.

modelopt/torch/puzzletron/tools/bypassed_training/child_init.py (9)

25-30: LGTM on import modernization.

Using Callable from collections.abc instead of typing is the modern approach (recommended since Python 3.9). The simplified typing import is cleaner.


35-42: LGTM!

Import paths correctly updated to the puzzletron namespace, consistent with the module restructuring.


122-128: LGTM on idiomatic dict membership checks.

Removing .keys() from in checks is idiomatic Python. key in dict is equivalent to key in dict.keys() but more concise.


183-185: LGTM on dict iteration simplification.

Iterating directly over the dict (for new_key in new_state_dict) instead of .keys() is the preferred Python idiom.


334-339: Type hints consistently modernized.

The function signature uses modern union syntax throughout. This is consistent with the rest of the file.


1113-1116: LGTM on dict iteration simplification.

Iterating directly over bias_sd instead of bias_sd.keys() is the preferred Python idiom.


1366-1371: LGTM on type hints and docstring update.

The modernized type hints and updated docstring are consistent with the rest of the file's changes.


1573-1580: LGTM!

The function signature and docstring are consistent with the type modernization throughout the file.


96-103: No action needed; union syntax is compliant with project requirements.

The project explicitly targets Python 3.10+ (as specified in pyproject.toml with target-version = "py310" and confirmed by CI testing on Python 3.10). The T | None union syntax is fully supported at this version and is appropriately used throughout the codebase.

Likely an incorrect or invalid review comment.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/rename-compress-to-puzzletron branch from 861c9fe to c95e2f2 Compare January 13, 2026 20:03
@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.62%. Comparing base (83ac3b1) to head (4108780).
⚠️ Report is 2 commits behind head on feature/compress.

Additional details and impacted files
@@                Coverage Diff                @@
##           feature/compress     #776   +/-   ##
=================================================
  Coverage             74.62%   74.62%           
=================================================
  Files                   192      192           
  Lines                 18989    18989           
=================================================
  Hits                  14171    14171           
  Misses                 4818     4818           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/rename-compress-to-puzzletron branch from c95e2f2 to 8fe2496 Compare January 13, 2026 20:43
Copy link

@danielkorzekwa danielkorzekwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor comments

@@ -312,7 +312,7 @@ def _process_single_layer(
]:
key_possibly_missing_in_student = f".{layer_idx}.{key_possibly_missing_in_student}"
is_key_missing_from_student = (
len([k for k in new_state_dict.keys() if key_possibly_missing_in_student in k]) == 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: be careful when removing keys (by cursor ai), one time, cursors ai removed keys() for me where it should not,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not by cursor but by ruff. It only applies changes it is sure of. Unsafe fixes are disabled in ruff

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant that ruff removed some .keys(), making assumption that some variable was a dictionary but it was not, for me that broke puzzletron e2e test, so I then reviewed carefully all changes ruff was making.


@dataclass
class ExternalTarget(TargetWithNamedInputs, TargetWithNamedOutputs, metaclass=Singleton):
"""External target for stitched modules."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning less comment, improve it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to add something else doc building was failing. Pls suggest what should I change this to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep as is, I would need to learn more about sewingkit

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) January 14, 2026 10:50
@kevalmorabia97 kevalmorabia97 merged commit 43b3cfa into feature/compress Jan 14, 2026
23 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/rename-compress-to-puzzletron branch January 14, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants