Skip to content

Add SwinTransformer support for torch_onnx quantization workflow#1235

Open
ajrasane wants to merge 1 commit intomainfrom
ajrasane/pytorch_quantization
Open

Add SwinTransformer support for torch_onnx quantization workflow#1235
ajrasane wants to merge 1 commit intomainfrom
ajrasane/pytorch_quantization

Conversation

@ajrasane
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane commented Apr 10, 2026

Summary

  • Enable end-to-end quantize → ONNX export → TRT engine pipeline for SwinTransformer models (v1 and v2) across FP8, INT8, MXFP8, NVFP4, and auto precision modes
  • Add Conv2d quantization overrides for TRT compatibility (TRT only supports FP8/INT8 for convolutions)
  • Fix FP8 LayerNorm type mismatch in TRT stronglyTyped mode by adding LayerNormalization to change_casts_to_fp16
  • Fix cast_initializer_to_dtype crash when node has no initializer inputs
  • Add vision model support matrix to README (ViT, Swin, SwinV2)
  • Rewrite tests: parametrize over (ViT, Swin) × (fp8, int8, mxfp8, nvfp4, auto) with TRT engine build verification

Test plan

  • python -m pytest tests/examples/torch_onnx/test_torch_quant_to_onnx.py -v — 10 tests (2 models × 5 modes), all pass
  • Verified Swin accuracy on ImageNet-1k across all precisions (FP8: 81.29%, INT8: 81.12%, MXFP8: 81.32%, NVFP4: 80.79%, Auto: 80.84% TRT top-1 vs 81.37% base)
  • INT4_AWQ deferred (TODO in test file) — requires INT4 exporter changes for non-MatMul/Gemm consumer patterns

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • ONNX export now supports arbitrary timm vision models (auto device selection) and new CLI options for model selection and kwargs.
    • Configurable quantization overrides for Conv2d across quant modes; option to skip pretrained weights.
  • Bug Fixes

    • Broader FP16 cast handling for more ONNX ops to improve low-bit export fidelity.
    • Disabled inplace ReLU before auto-quantization to avoid incorrect transforms.
  • Documentation

    • Updated docs with supported models table, quantization mappings, and example CLI usage.
  • Tests

    • Expanded tests to cover multiple architectures, quant modes, and TensorRT build verification.

@ajrasane ajrasane requested review from a team as code owners April 10, 2026 22:18
@ajrasane ajrasane requested review from cjluo-nv and galagam April 10, 2026 22:18
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Adds timm-model support to ONNX export and quantization flows: new CLI option to export arbitrary timm vision models, model-specific input-shape resolution, Conv2d quantization overrides, expanded tests that build TensorRT engines for exported ONNX files.

Changes

Cohort / File(s) Summary
ONNX Export CLI
examples/onnx_ptq/download_example_onnx.py
Added --timm_model_name option, device selection, timm model instantiation, resolved input_shape via timm.data.resolve_model_data_config, ONNX path selection, and fp16/fp32 weights handling; branch runs alongside existing ViT export.
Quantize & Export Logic
examples/torch_onnx/torch_quant_to_onnx.py
Added get_quant_config(quantize_mode) and QUANT_CONFIG_DICT typing; introduced Conv2d override lists for FP8/INT8, apply overrides with warnings, added _disable_inplace_relu, tightened filter_func, added --no_pretrained and --model_kwargs, and route standard quantization via get_quant_config.
Documentation
examples/torch_onnx/README.md
Rewrote Vision Models docs and examples to reference --timm_model_name, added supported-models table and Conv2d quantization override descriptions; updated example CLI usage.
ONNX utils
modelopt/torch/_deploy/utils/torch_onnx.py
Extended change_casts_to_fp16 target consumer ops to also include LayerNormalization, Clip, Mul, and Exp alongside existing ops.
Tests & Test Helpers
tests/_test_utils/torch/vision_models.py, tests/examples/torch_onnx/test_torch_quant_to_onnx.py
Dummy input creation now uses timm.data.resolve_model_data_config(...)[\"input_size\"]; added swin_tiny_patch4_window7_224 to benchmarks; reworked parametrized test to cover multiple timm models and quant modes, added _verify_trt_engine_build() and TensorRT build assertions; tests pass --no_pretrained.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Export CLI
    participant Timm as Timm
    participant Model as Model (timm)
    participant Exporter as Export/Quantize Logic
    participant ONNX as ONNX File

    CLI->>Timm: create_model(timm_model_name, pretrained=...)
    Timm-->>CLI: model
    CLI->>Timm: resolve_model_data_config(model)
    Timm-->>CLI: input_size
    CLI->>Exporter: export_to_onnx(model, input_shape, weights_dtype...)
    Exporter->>Exporter: apply quant config / Conv2d overrides
    Exporter->>ONNX: write ONNX file
    ONNX-->>CLI: onnx_save_path
Loading
sequenceDiagram
    participant Test as Test Suite
    participant Exporter as Export Script
    participant ONNX as ONNX File
    participant TRT as trtexec
    participant Result as Build Result

    Test->>Exporter: run export (model_key, quantize_mode, --no_pretrained, ...)
    Exporter->>ONNX: save quantized ONNX
    ONNX-->>Test: onnx_save_path
    Test->>TRT: trtexec(onnx_save_path, --builderOptimizationLevel=...)
    TRT->>TRT: build engine
    TRT-->>Test: return code / success
    Test->>Result: assert build success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error json.loads(args.model_kwargs) in torch_quant_to_onnx.py lacks error handling and input validation, allowing unsanitized command-line arguments to be parsed without JSONDecodeError protection or dictionary key validation. Add try-except around json.loads to handle JSONDecodeError, validate parsed dictionary keys against a whitelist of acceptable timm model kwargs, and document expected JSON format in argument help text.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: adding SwinTransformer support to the torch_onnx quantization workflow, which is the primary objective of this PR.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ajrasane/pytorch_quantization

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1235/

Built to branch gh-pages at 2026-04-10 23:33 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/onnx_ptq/download_example_onnx.py`:
- Around line 53-58: The flag --timm_model_name is set with a non-None default
so the check if args.timm_model_name always succeeds; change the
parser.add_argument for "timm_model_name" to use default=None and then make the
model selection checks mutually exclusive (replace the separate if for
args.timm_model_name with an elif in the same chain that handles args.vit,
args.swin, etc.) so only one export path runs; update references to
args.timm_model_name and ensure help text still documents the expected default
behavior.

In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py`:
- Line 52: Remove the prohibited "# nosec" bypass on the subprocess call in
tests/examples/torch_onnx/test_torch_quant_to_onnx.py: locate the
subprocess.run(...) invocation (the call using variables cmd,
capture_output=True, text=True, timeout=600) and simply delete the "# nosec"
comment; ensure the call remains a list-based invocation (no shell=True and no
external user input) so Bandit flags are not bypassed—if Bandit still flags it,
escalate to `@NVIDIA/modelopt-setup-codeowners` or adjust the project Bandit
config instead of adding "# nosec".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7a34ea4a-81c7-4f70-b2c8-b9d65a485dd1

📥 Commits

Reviewing files that changed from the base of the PR and between da0e8ff and 6455295.

📒 Files selected for processing (7)
  • examples/onnx_ptq/download_example_onnx.py
  • examples/torch_onnx/README.md
  • examples/torch_onnx/torch_quant_to_onnx.py
  • modelopt/onnx/quantization/qdq_utils.py
  • modelopt/torch/_deploy/utils/torch_onnx.py
  • tests/_test_utils/torch/vision_models.py
  • tests/examples/torch_onnx/test_torch_quant_to_onnx.py

Comment on lines +53 to +58
parser.add_argument(
"--timm_model_name",
type=str,
default="vit_base_patch16_224",
help="Export any timm model to ONNX (e.g., swin_tiny_patch4_window7_224).",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Logic issue: --timm_model_name always evaluates to truthy due to default value.

Since --timm_model_name has default="vit_base_patch16_224", the condition if args.timm_model_name: on line 99 is always True. This causes unintended behavior:

  1. Running python download_example_onnx.py --vit exports the ViT model twice (once via --vit block, once via --timm_model_name block).
  2. Running with no model flags still triggers the --timm_model_name block.
🐛 Proposed fix: Change default to None and use elif
     parser.add_argument(
         "--timm_model_name",
         type=str,
-        default="vit_base_patch16_224",
+        default=None,
         help="Export any timm model to ONNX (e.g., swin_tiny_patch4_window7_224).",
     )
-    if args.timm_model_name:
+    elif args.timm_model_name:
         device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Also applies to: 99-116

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/onnx_ptq/download_example_onnx.py` around lines 53 - 58, The flag
--timm_model_name is set with a non-None default so the check if
args.timm_model_name always succeeds; change the parser.add_argument for
"timm_model_name" to use default=None and then make the model selection checks
mutually exclusive (replace the separate if for args.timm_model_name with an
elif in the same chain that handles args.vit, args.swin, etc.) so only one
export path runs; update references to args.timm_model_name and ensure help text
still documents the expected default behavior.

f"--builderOptimizationLevel={opt_level}",
]

result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) # nosec
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: # nosec comment is prohibited by coding guidelines.

The # nosec comment to bypass Bandit security checks is explicitly prohibited. Per coding guidelines:

"Any use of '# nosec' comments to bypass Bandit security checks is not allowed. If a security-sensitive pattern is genuinely necessary, the PR must be reviewed and approved by @NVIDIA/modelopt-setup-codeowners with an explicit justification in the PR description."

The subprocess.run() call here appears safe (no shell=True, arguments passed as a list, no user-supplied input), but the bypass mechanism itself is not allowed.

🔒 Proposed fix: Remove the nosec comment

The subprocess call is safe as-is since:

  • Arguments are passed as a list (not shell string)
  • shell=True is not used
  • All arguments are controlled by the test, not external input

Simply remove the # nosec comment:

-    result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)  # nosec
+    result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)

If Bandit still flags this, consider using a more targeted exclusion in the Bandit config file or requesting formal review from @NVIDIA/modelopt-setup-codeowners.

As per coding guidelines: "Prohibit the use of '# nosec' comments to bypass Bandit security checks."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) # nosec
result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py` at line 52, Remove the
prohibited "# nosec" bypass on the subprocess call in
tests/examples/torch_onnx/test_torch_quant_to_onnx.py: locate the
subprocess.run(...) invocation (the call using variables cmd,
capture_output=True, text=True, timeout=600) and simply delete the "# nosec"
comment; ensure the call remains a list-based invocation (no shell=True and no
external user input) so Bandit flags are not bypassed—if Bandit still flags it,
escalate to `@NVIDIA/modelopt-setup-codeowners` or adjust the project Bandit
config instead of adding "# nosec".

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.43%. Comparing base (da0e8ff) to head (15f3809).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1235      +/-   ##
==========================================
+ Coverage   76.04%   77.43%   +1.39%     
==========================================
  Files         350      350              
  Lines       40478    40478              
==========================================
+ Hits        30781    31344     +563     
+ Misses       9697     9134     -563     
Flag Coverage Δ
examples 43.76% <100.00%> (+2.39%) ⬆️
gpu 57.42% <0.00%> (-0.10%) ⬇️
unit 55.53% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajrasane ajrasane force-pushed the ajrasane/pytorch_quantization branch from 6455295 to 9793444 Compare April 10, 2026 23:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/torch_onnx/torch_quant_to_onnx.py (1)

351-357: ⚠️ Potential issue | 🟠 Major

--no_pretrained / --model_kwargs are not propagated into calibration-data model setup

Line 351 and Line 373 call load_calibration_data(...), but that helper still builds its own model with pretrained=True and fixed kwargs (see examples/torch_onnx/torch_quant_to_onnx.py Line 129). This bypasses the new CLI behavior and can force unexpected weight downloads or mismatched data config for custom model kwargs.

💡 Suggested fix
-def load_calibration_data(model_name, data_size, batch_size, device, with_labels=False):
+def load_calibration_data(model, data_size, batch_size, device, with_labels=False):
@@
-    model = timm.create_model(model_name, pretrained=True, num_classes=1000)
     data_config = timm.data.resolve_model_data_config(model)
@@
-        data_loader = load_calibration_data(
-            args.timm_model_name,
+        data_loader = load_calibration_data(
+            model,
             args.calibration_data_size,
             args.batch_size,
             device,
             with_labels=True,
         )
@@
-            data_loader = load_calibration_data(
-                args.timm_model_name,
+            data_loader = load_calibration_data(
+                model,
                 args.calibration_data_size,
                 args.batch_size,
                 device,
                 with_labels=False,
             )

Also applies to: 373-379

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 351 - 357, Calls to
load_calibration_data are building their own model with hardcoded
pretrained=True and fixed kwargs; update the two callers to pass through the CLI
options (e.g., args.no_pretrained and args.model_kwargs) and change
load_calibration_data's signature to accept these parameters (e.g.,
pretrained_override or no_pretrained and model_kwargs) and use them when
constructing the timm model (set pretrained = not no_pretrained and pass
model_kwargs into the model creation path instead of fixed kwargs). Ensure both
places that call load_calibration_data (the lines around data_loader =
load_calibration_data(...) at start and the later call) are updated to forward
the flags so calibration uses the same model configuration as the rest of the
script.
♻️ Duplicate comments (1)
tests/examples/torch_onnx/test_torch_quant_to_onnx.py (1)

57-57: ⚠️ Potential issue | 🔴 Critical

Remove prohibited # nosec suppression

The subprocess invocation itself is fine (list args, no shell=True), but # nosec is not allowed in this repo and must be removed.

🔧 Minimal fix
-    result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)  # nosec
+    result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)

As per coding guidelines, "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py` at line 57, Remove the
prohibited "# nosec" suppression from the subprocess.run call in the test (the
line assigning to result via subprocess.run(cmd, capture_output=True, text=True,
timeout=600)) — leave the invocation intact (list args, no shell=True) but
delete the trailing "  # nosec" comment so Bandit checks will run normally;
verify tests still pass after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/torch_onnx/README.md`:
- Around line 305-309: Update the support matrix row for the model identifier
"swinv2_tiny_window8_256" in README.md to remove the Auto ✅ (change to ❌) or add
a footnote explaining it's currently unsupported; reference the test that skips
this combo (test_torch_quant_to_onnx.py) as the reason for the change so readers
know the limitation is intentional.

---

Outside diff comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 351-357: Calls to load_calibration_data are building their own
model with hardcoded pretrained=True and fixed kwargs; update the two callers to
pass through the CLI options (e.g., args.no_pretrained and args.model_kwargs)
and change load_calibration_data's signature to accept these parameters (e.g.,
pretrained_override or no_pretrained and model_kwargs) and use them when
constructing the timm model (set pretrained = not no_pretrained and pass
model_kwargs into the model creation path instead of fixed kwargs). Ensure both
places that call load_calibration_data (the lines around data_loader =
load_calibration_data(...) at start and the later call) are updated to forward
the flags so calibration uses the same model configuration as the rest of the
script.

---

Duplicate comments:
In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py`:
- Line 57: Remove the prohibited "# nosec" suppression from the subprocess.run
call in the test (the line assigning to result via subprocess.run(cmd,
capture_output=True, text=True, timeout=600)) — leave the invocation intact
(list args, no shell=True) but delete the trailing "  # nosec" comment so Bandit
checks will run normally; verify tests still pass after removal.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 95b4fbb8-da54-4e92-8ebf-74879aa8e76f

📥 Commits

Reviewing files that changed from the base of the PR and between 6455295 and 9793444.

📒 Files selected for processing (6)
  • examples/onnx_ptq/download_example_onnx.py
  • examples/torch_onnx/README.md
  • examples/torch_onnx/torch_quant_to_onnx.py
  • modelopt/torch/_deploy/utils/torch_onnx.py
  • tests/_test_utils/torch/vision_models.py
  • tests/examples/torch_onnx/test_torch_quant_to_onnx.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/_test_utils/torch/vision_models.py
  • modelopt/torch/_deploy/utils/torch_onnx.py

Comment on lines +305 to +309
| Model | FP8 | INT8 | MXFP8 | NVFP4 | INT4_AWQ | Auto |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| [vit_base_patch16_224](https://huggingface.co/timm/vit_base_patch16_224.augreg_in21k_ft_in1k) | | ✅ | ✅ | ✅ | ✅ | ✅ |
| [swin_tiny_patch4_window7_224](https://huggingface.co/timm/swin_tiny_patch4_window7_224.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Support matrix overstates swinv2_tiny Auto support

Line 309 marks Auto as ✅ for swinv2_tiny_window8_256, but tests explicitly skip that combo (tests/examples/torch_onnx/test_torch_quant_to_onnx.py Line 35). Please mark it unsupported (or add a footnote with the current limitation).

📝 Suggested docs correction
-| [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Model | FP8 | INT8 | MXFP8 | NVFP4 | INT4_AWQ | Auto |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| [vit_base_patch16_224](https://huggingface.co/timm/vit_base_patch16_224.augreg_in21k_ft_in1k) |||||||
| [swin_tiny_patch4_window7_224](https://huggingface.co/timm/swin_tiny_patch4_window7_224.ms_in1k) |||||||
| [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) |||||| |
| Model | FP8 | INT8 | MXFP8 | NVFP4 | INT4_AWQ | Auto |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| [vit_base_patch16_224](https://huggingface.co/timm/vit_base_patch16_224.augreg_in21k_ft_in1k) |||||||
| [swin_tiny_patch4_window7_224](https://huggingface.co/timm/swin_tiny_patch4_window7_224.ms_in1k) |||||||
| [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) |||||| |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/torch_onnx/README.md` around lines 305 - 309, Update the support
matrix row for the model identifier "swinv2_tiny_window8_256" in README.md to
remove the Auto ✅ (change to ❌) or add a footnote explaining it's currently
unsupported; reference the test that skips this combo
(test_torch_quant_to_onnx.py) as the reason for the change so readers know the
limitation is intentional.

Enable end-to-end quantize-export-TRT pipeline for SwinTransformer models
(v1 and v2) across FP8, INT8, MXFP8, NVFP4, and auto precision modes.

Core fixes:
- Add LayerNormalization, Clip, Mul, Exp to change_casts_to_fp16 for FP8
  stronglyTyped compatibility (fixes type mismatches in Swin/SwinV2 TRT builds)

Example/test changes:
- Add Conv2d quantization overrides for TRT compatibility (MXFP8/NVFP4->FP8,
  INT4_AWQ->INT8) since TRT only supports FP8/INT8 for convolutions
- Add cpb_mlp and downsample to quantization filter exclusion list
- Add --no_pretrained and --model_kwargs CLI args for testing with tiny models
- Add --timm_model_name to download_example_onnx.py (default: ViT)
- Add SwinTransformer to vision_models.py with dynamic input size resolution
- Rewrite tests: parametrize over (ViT, Swin, SwinV2) x (fp8, int8, mxfp8,
  nvfp4, auto) with TRT engine build verification using --stronglyTyped
- Update README with vision model support matrix and Conv2d override docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/pytorch_quantization branch from 9793444 to 15f3809 Compare April 10, 2026 23:29
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/examples/torch_onnx/test_torch_quant_to_onnx.py (1)

53-53: ⚠️ Potential issue | 🔴 Critical

Remove the # nosec bypass.

This subprocess.run(...) call is already using a list of args and no shell=True, so the bypass is unnecessary and violates repo policy.

Suggested fix
-    result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)  # nosec
+    result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)

As per coding guidelines, "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py` at line 53, The
subprocess.run call assigning to result (result = subprocess.run(cmd,
capture_output=True, text=True, timeout=600)) includes an unnecessary "# nosec"
suppression; remove the "# nosec" comment and leave the call as-is (ensure cmd
remains a list and no shell=True is used) so Bandit checks are not bypassed
while preserving capture_output, text, and timeout parameters.
examples/onnx_ptq/download_example_onnx.py (1)

53-58: ⚠️ Potential issue | 🟠 Major

Make --timm_model_name opt-in and mutually exclusive.

With default="vit_base_patch16_224", Line 99 is always truthy. That means --vit exports twice, and --llama or even “no flags” still export a timm model.

Suggested fix
     parser.add_argument(
         "--timm_model_name",
         type=str,
-        default="vit_base_patch16_224",
+        default=None,
         help="Export any timm model to ONNX (e.g., swin_tiny_patch4_window7_224).",
     )
-    if args.vit:
+    if args.vit:
         device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         model = timm.create_model("vit_base_patch16_224", pretrained=True, num_classes=1000).to(
             device
         )
         ...
-    if args.timm_model_name:
+    elif args.timm_model_name:
         device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         model = timm.create_model(args.timm_model_name, pretrained=True, num_classes=1000).to(
             device
         )
         ...
-    if args.llama:
+    elif args.llama:
         ...

Also applies to: 99-116

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/onnx_ptq/download_example_onnx.py` around lines 53 - 58, The parser
currently always exports a timm model because
parser.add_argument("--timm_model_name", default="vit_base_patch16_224") is
truthy; change this to be opt-in by removing the default (use default=None or
action that requires the flag) and validate presence before exporting in the
export flow (e.g., check timm_model_name is not None). Also create an argparse
mutually exclusive group (parser.add_mutually_exclusive_group) and add the timm
option alongside the existing flags like --vit and --llama so only one export
path is allowed; update any checks that assume timm_model_name is always set
(export logic around the timm export function) to rely on the new exclusivity
and None-check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/onnx_ptq/download_example_onnx.py`:
- Around line 53-58: The parser currently always exports a timm model because
parser.add_argument("--timm_model_name", default="vit_base_patch16_224") is
truthy; change this to be opt-in by removing the default (use default=None or
action that requires the flag) and validate presence before exporting in the
export flow (e.g., check timm_model_name is not None). Also create an argparse
mutually exclusive group (parser.add_mutually_exclusive_group) and add the timm
option alongside the existing flags like --vit and --llama so only one export
path is allowed; update any checks that assume timm_model_name is always set
(export logic around the timm export function) to rely on the new exclusivity
and None-check.

In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py`:
- Line 53: The subprocess.run call assigning to result (result =
subprocess.run(cmd, capture_output=True, text=True, timeout=600)) includes an
unnecessary "# nosec" suppression; remove the "# nosec" comment and leave the
call as-is (ensure cmd remains a list and no shell=True is used) so Bandit
checks are not bypassed while preserving capture_output, text, and timeout
parameters.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cd3f4d08-7ded-4c8d-b84e-dec1aa75f848

📥 Commits

Reviewing files that changed from the base of the PR and between 9793444 and 15f3809.

📒 Files selected for processing (6)
  • examples/onnx_ptq/download_example_onnx.py
  • examples/torch_onnx/README.md
  • examples/torch_onnx/torch_quant_to_onnx.py
  • modelopt/torch/_deploy/utils/torch_onnx.py
  • tests/_test_utils/torch/vision_models.py
  • tests/examples/torch_onnx/test_torch_quant_to_onnx.py
✅ Files skipped from review due to trivial changes (1)
  • examples/torch_onnx/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/_test_utils/torch/vision_models.py
  • modelopt/torch/_deploy/utils/torch_onnx.py
  • examples/torch_onnx/torch_quant_to_onnx.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant