Add SwinTransformer support for torch_onnx quantization workflow#1235
Add SwinTransformer support for torch_onnx quantization workflow#1235
Conversation
📝 WalkthroughWalkthroughAdds timm-model support to ONNX export and quantization flows: new CLI option to export arbitrary timm vision models, model-specific input-shape resolution, Conv2d quantization overrides, expanded tests that build TensorRT engines for exported ONNX files. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Export CLI
participant Timm as Timm
participant Model as Model (timm)
participant Exporter as Export/Quantize Logic
participant ONNX as ONNX File
CLI->>Timm: create_model(timm_model_name, pretrained=...)
Timm-->>CLI: model
CLI->>Timm: resolve_model_data_config(model)
Timm-->>CLI: input_size
CLI->>Exporter: export_to_onnx(model, input_shape, weights_dtype...)
Exporter->>Exporter: apply quant config / Conv2d overrides
Exporter->>ONNX: write ONNX file
ONNX-->>CLI: onnx_save_path
sequenceDiagram
participant Test as Test Suite
participant Exporter as Export Script
participant ONNX as ONNX File
participant TRT as trtexec
participant Result as Build Result
Test->>Exporter: run export (model_key, quantize_mode, --no_pretrained, ...)
Exporter->>ONNX: save quantized ONNX
ONNX-->>Test: onnx_save_path
Test->>TRT: trtexec(onnx_save_path, --builderOptimizationLevel=...)
TRT->>TRT: build engine
TRT-->>Test: return code / success
Test->>Result: assert build success
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/onnx_ptq/download_example_onnx.py`:
- Around line 53-58: The flag --timm_model_name is set with a non-None default
so the check if args.timm_model_name always succeeds; change the
parser.add_argument for "timm_model_name" to use default=None and then make the
model selection checks mutually exclusive (replace the separate if for
args.timm_model_name with an elif in the same chain that handles args.vit,
args.swin, etc.) so only one export path runs; update references to
args.timm_model_name and ensure help text still documents the expected default
behavior.
In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py`:
- Line 52: Remove the prohibited "# nosec" bypass on the subprocess call in
tests/examples/torch_onnx/test_torch_quant_to_onnx.py: locate the
subprocess.run(...) invocation (the call using variables cmd,
capture_output=True, text=True, timeout=600) and simply delete the "# nosec"
comment; ensure the call remains a list-based invocation (no shell=True and no
external user input) so Bandit flags are not bypassed—if Bandit still flags it,
escalate to `@NVIDIA/modelopt-setup-codeowners` or adjust the project Bandit
config instead of adding "# nosec".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7a34ea4a-81c7-4f70-b2c8-b9d65a485dd1
📒 Files selected for processing (7)
examples/onnx_ptq/download_example_onnx.pyexamples/torch_onnx/README.mdexamples/torch_onnx/torch_quant_to_onnx.pymodelopt/onnx/quantization/qdq_utils.pymodelopt/torch/_deploy/utils/torch_onnx.pytests/_test_utils/torch/vision_models.pytests/examples/torch_onnx/test_torch_quant_to_onnx.py
| parser.add_argument( | ||
| "--timm_model_name", | ||
| type=str, | ||
| default="vit_base_patch16_224", | ||
| help="Export any timm model to ONNX (e.g., swin_tiny_patch4_window7_224).", | ||
| ) |
There was a problem hiding this comment.
Logic issue: --timm_model_name always evaluates to truthy due to default value.
Since --timm_model_name has default="vit_base_patch16_224", the condition if args.timm_model_name: on line 99 is always True. This causes unintended behavior:
- Running
python download_example_onnx.py --vitexports the ViT model twice (once via--vitblock, once via--timm_model_nameblock). - Running with no model flags still triggers the
--timm_model_nameblock.
🐛 Proposed fix: Change default to None and use elif
parser.add_argument(
"--timm_model_name",
type=str,
- default="vit_base_patch16_224",
+ default=None,
help="Export any timm model to ONNX (e.g., swin_tiny_patch4_window7_224).",
)- if args.timm_model_name:
+ elif args.timm_model_name:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")Also applies to: 99-116
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/onnx_ptq/download_example_onnx.py` around lines 53 - 58, The flag
--timm_model_name is set with a non-None default so the check if
args.timm_model_name always succeeds; change the parser.add_argument for
"timm_model_name" to use default=None and then make the model selection checks
mutually exclusive (replace the separate if for args.timm_model_name with an
elif in the same chain that handles args.vit, args.swin, etc.) so only one
export path runs; update references to args.timm_model_name and ensure help text
still documents the expected default behavior.
| f"--builderOptimizationLevel={opt_level}", | ||
| ] | ||
|
|
||
| result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) # nosec |
There was a problem hiding this comment.
CRITICAL: # nosec comment is prohibited by coding guidelines.
The # nosec comment to bypass Bandit security checks is explicitly prohibited. Per coding guidelines:
"Any use of '# nosec' comments to bypass Bandit security checks is not allowed. If a security-sensitive pattern is genuinely necessary, the PR must be reviewed and approved by
@NVIDIA/modelopt-setup-codeownerswith an explicit justification in the PR description."
The subprocess.run() call here appears safe (no shell=True, arguments passed as a list, no user-supplied input), but the bypass mechanism itself is not allowed.
🔒 Proposed fix: Remove the nosec comment
The subprocess call is safe as-is since:
- Arguments are passed as a list (not shell string)
shell=Trueis not used- All arguments are controlled by the test, not external input
Simply remove the # nosec comment:
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) # nosec
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)If Bandit still flags this, consider using a more targeted exclusion in the Bandit config file or requesting formal review from @NVIDIA/modelopt-setup-codeowners.
As per coding guidelines: "Prohibit the use of '# nosec' comments to bypass Bandit security checks."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) # nosec | |
| result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py` at line 52, Remove the
prohibited "# nosec" bypass on the subprocess call in
tests/examples/torch_onnx/test_torch_quant_to_onnx.py: locate the
subprocess.run(...) invocation (the call using variables cmd,
capture_output=True, text=True, timeout=600) and simply delete the "# nosec"
comment; ensure the call remains a list-based invocation (no shell=True and no
external user input) so Bandit flags are not bypassed—if Bandit still flags it,
escalate to `@NVIDIA/modelopt-setup-codeowners` or adjust the project Bandit
config instead of adding "# nosec".
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1235 +/- ##
==========================================
+ Coverage 76.04% 77.43% +1.39%
==========================================
Files 350 350
Lines 40478 40478
==========================================
+ Hits 30781 31344 +563
+ Misses 9697 9134 -563
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6455295 to
9793444
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/torch_onnx/torch_quant_to_onnx.py (1)
351-357:⚠️ Potential issue | 🟠 Major
--no_pretrained/--model_kwargsare not propagated into calibration-data model setup
Line 351andLine 373callload_calibration_data(...), but that helper still builds its own model withpretrained=Trueand fixed kwargs (seeexamples/torch_onnx/torch_quant_to_onnx.pyLine 129). This bypasses the new CLI behavior and can force unexpected weight downloads or mismatched data config for custom model kwargs.💡 Suggested fix
-def load_calibration_data(model_name, data_size, batch_size, device, with_labels=False): +def load_calibration_data(model, data_size, batch_size, device, with_labels=False): @@ - model = timm.create_model(model_name, pretrained=True, num_classes=1000) data_config = timm.data.resolve_model_data_config(model) @@ - data_loader = load_calibration_data( - args.timm_model_name, + data_loader = load_calibration_data( + model, args.calibration_data_size, args.batch_size, device, with_labels=True, ) @@ - data_loader = load_calibration_data( - args.timm_model_name, + data_loader = load_calibration_data( + model, args.calibration_data_size, args.batch_size, device, with_labels=False, )Also applies to: 373-379
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/torch_onnx/torch_quant_to_onnx.py` around lines 351 - 357, Calls to load_calibration_data are building their own model with hardcoded pretrained=True and fixed kwargs; update the two callers to pass through the CLI options (e.g., args.no_pretrained and args.model_kwargs) and change load_calibration_data's signature to accept these parameters (e.g., pretrained_override or no_pretrained and model_kwargs) and use them when constructing the timm model (set pretrained = not no_pretrained and pass model_kwargs into the model creation path instead of fixed kwargs). Ensure both places that call load_calibration_data (the lines around data_loader = load_calibration_data(...) at start and the later call) are updated to forward the flags so calibration uses the same model configuration as the rest of the script.
♻️ Duplicate comments (1)
tests/examples/torch_onnx/test_torch_quant_to_onnx.py (1)
57-57:⚠️ Potential issue | 🔴 CriticalRemove prohibited
# nosecsuppressionThe subprocess invocation itself is fine (list args, no
shell=True), but# nosecis not allowed in this repo and must be removed.🔧 Minimal fix
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) # nosec + result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)As per coding guidelines, "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py` at line 57, Remove the prohibited "# nosec" suppression from the subprocess.run call in the test (the line assigning to result via subprocess.run(cmd, capture_output=True, text=True, timeout=600)) — leave the invocation intact (list args, no shell=True) but delete the trailing " # nosec" comment so Bandit checks will run normally; verify tests still pass after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/torch_onnx/README.md`:
- Around line 305-309: Update the support matrix row for the model identifier
"swinv2_tiny_window8_256" in README.md to remove the Auto ✅ (change to ❌) or add
a footnote explaining it's currently unsupported; reference the test that skips
this combo (test_torch_quant_to_onnx.py) as the reason for the change so readers
know the limitation is intentional.
---
Outside diff comments:
In `@examples/torch_onnx/torch_quant_to_onnx.py`:
- Around line 351-357: Calls to load_calibration_data are building their own
model with hardcoded pretrained=True and fixed kwargs; update the two callers to
pass through the CLI options (e.g., args.no_pretrained and args.model_kwargs)
and change load_calibration_data's signature to accept these parameters (e.g.,
pretrained_override or no_pretrained and model_kwargs) and use them when
constructing the timm model (set pretrained = not no_pretrained and pass
model_kwargs into the model creation path instead of fixed kwargs). Ensure both
places that call load_calibration_data (the lines around data_loader =
load_calibration_data(...) at start and the later call) are updated to forward
the flags so calibration uses the same model configuration as the rest of the
script.
---
Duplicate comments:
In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py`:
- Line 57: Remove the prohibited "# nosec" suppression from the subprocess.run
call in the test (the line assigning to result via subprocess.run(cmd,
capture_output=True, text=True, timeout=600)) — leave the invocation intact
(list args, no shell=True) but delete the trailing " # nosec" comment so Bandit
checks will run normally; verify tests still pass after removal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 95b4fbb8-da54-4e92-8ebf-74879aa8e76f
📒 Files selected for processing (6)
examples/onnx_ptq/download_example_onnx.pyexamples/torch_onnx/README.mdexamples/torch_onnx/torch_quant_to_onnx.pymodelopt/torch/_deploy/utils/torch_onnx.pytests/_test_utils/torch/vision_models.pytests/examples/torch_onnx/test_torch_quant_to_onnx.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/_test_utils/torch/vision_models.py
- modelopt/torch/_deploy/utils/torch_onnx.py
| | Model | FP8 | INT8 | MXFP8 | NVFP4 | INT4_AWQ | Auto | | ||
| | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | ||
| | [vit_base_patch16_224](https://huggingface.co/timm/vit_base_patch16_224.augreg_in21k_ft_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ||
| | [swin_tiny_patch4_window7_224](https://huggingface.co/timm/swin_tiny_patch4_window7_224.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ||
| | [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
There was a problem hiding this comment.
Support matrix overstates swinv2_tiny Auto support
Line 309 marks Auto as ✅ for swinv2_tiny_window8_256, but tests explicitly skip that combo (tests/examples/torch_onnx/test_torch_quant_to_onnx.py Line 35). Please mark it unsupported (or add a footnote with the current limitation).
📝 Suggested docs correction
-| [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | Model | FP8 | INT8 | MXFP8 | NVFP4 | INT4_AWQ | Auto | | |
| | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | |
| | [vit_base_patch16_224](https://huggingface.co/timm/vit_base_patch16_224.augreg_in21k_ft_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | |
| | [swin_tiny_patch4_window7_224](https://huggingface.co/timm/swin_tiny_patch4_window7_224.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | |
| | [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | |
| | Model | FP8 | INT8 | MXFP8 | NVFP4 | INT4_AWQ | Auto | | |
| | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | |
| | [vit_base_patch16_224](https://huggingface.co/timm/vit_base_patch16_224.augreg_in21k_ft_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | |
| | [swin_tiny_patch4_window7_224](https://huggingface.co/timm/swin_tiny_patch4_window7_224.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | |
| | [swinv2_tiny_window8_256](https://huggingface.co/timm/swinv2_tiny_window8_256.ms_in1k) | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/torch_onnx/README.md` around lines 305 - 309, Update the support
matrix row for the model identifier "swinv2_tiny_window8_256" in README.md to
remove the Auto ✅ (change to ❌) or add a footnote explaining it's currently
unsupported; reference the test that skips this combo
(test_torch_quant_to_onnx.py) as the reason for the change so readers know the
limitation is intentional.
Enable end-to-end quantize-export-TRT pipeline for SwinTransformer models (v1 and v2) across FP8, INT8, MXFP8, NVFP4, and auto precision modes. Core fixes: - Add LayerNormalization, Clip, Mul, Exp to change_casts_to_fp16 for FP8 stronglyTyped compatibility (fixes type mismatches in Swin/SwinV2 TRT builds) Example/test changes: - Add Conv2d quantization overrides for TRT compatibility (MXFP8/NVFP4->FP8, INT4_AWQ->INT8) since TRT only supports FP8/INT8 for convolutions - Add cpb_mlp and downsample to quantization filter exclusion list - Add --no_pretrained and --model_kwargs CLI args for testing with tiny models - Add --timm_model_name to download_example_onnx.py (default: ViT) - Add SwinTransformer to vision_models.py with dynamic input size resolution - Rewrite tests: parametrize over (ViT, Swin, SwinV2) x (fp8, int8, mxfp8, nvfp4, auto) with TRT engine build verification using --stronglyTyped - Update README with vision model support matrix and Conv2d override docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
9793444 to
15f3809
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
tests/examples/torch_onnx/test_torch_quant_to_onnx.py (1)
53-53:⚠️ Potential issue | 🔴 CriticalRemove the
# nosecbypass.This
subprocess.run(...)call is already using a list of args and noshell=True, so the bypass is unnecessary and violates repo policy.Suggested fix
- result = subprocess.run(cmd, capture_output=True, text=True, timeout=600) # nosec + result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)As per coding guidelines, "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py` at line 53, The subprocess.run call assigning to result (result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)) includes an unnecessary "# nosec" suppression; remove the "# nosec" comment and leave the call as-is (ensure cmd remains a list and no shell=True is used) so Bandit checks are not bypassed while preserving capture_output, text, and timeout parameters.examples/onnx_ptq/download_example_onnx.py (1)
53-58:⚠️ Potential issue | 🟠 MajorMake
--timm_model_nameopt-in and mutually exclusive.With
default="vit_base_patch16_224", Line 99 is always truthy. That means--vitexports twice, and--llamaor even “no flags” still export a timm model.Suggested fix
parser.add_argument( "--timm_model_name", type=str, - default="vit_base_patch16_224", + default=None, help="Export any timm model to ONNX (e.g., swin_tiny_patch4_window7_224).", )- if args.vit: + if args.vit: device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = timm.create_model("vit_base_patch16_224", pretrained=True, num_classes=1000).to( device ) ... - if args.timm_model_name: + elif args.timm_model_name: device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = timm.create_model(args.timm_model_name, pretrained=True, num_classes=1000).to( device ) ... - if args.llama: + elif args.llama: ...Also applies to: 99-116
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/onnx_ptq/download_example_onnx.py` around lines 53 - 58, The parser currently always exports a timm model because parser.add_argument("--timm_model_name", default="vit_base_patch16_224") is truthy; change this to be opt-in by removing the default (use default=None or action that requires the flag) and validate presence before exporting in the export flow (e.g., check timm_model_name is not None). Also create an argparse mutually exclusive group (parser.add_mutually_exclusive_group) and add the timm option alongside the existing flags like --vit and --llama so only one export path is allowed; update any checks that assume timm_model_name is always set (export logic around the timm export function) to rely on the new exclusivity and None-check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/onnx_ptq/download_example_onnx.py`:
- Around line 53-58: The parser currently always exports a timm model because
parser.add_argument("--timm_model_name", default="vit_base_patch16_224") is
truthy; change this to be opt-in by removing the default (use default=None or
action that requires the flag) and validate presence before exporting in the
export flow (e.g., check timm_model_name is not None). Also create an argparse
mutually exclusive group (parser.add_mutually_exclusive_group) and add the timm
option alongside the existing flags like --vit and --llama so only one export
path is allowed; update any checks that assume timm_model_name is always set
(export logic around the timm export function) to rely on the new exclusivity
and None-check.
In `@tests/examples/torch_onnx/test_torch_quant_to_onnx.py`:
- Line 53: The subprocess.run call assigning to result (result =
subprocess.run(cmd, capture_output=True, text=True, timeout=600)) includes an
unnecessary "# nosec" suppression; remove the "# nosec" comment and leave the
call as-is (ensure cmd remains a list and no shell=True is used) so Bandit
checks are not bypassed while preserving capture_output, text, and timeout
parameters.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cd3f4d08-7ded-4c8d-b84e-dec1aa75f848
📒 Files selected for processing (6)
examples/onnx_ptq/download_example_onnx.pyexamples/torch_onnx/README.mdexamples/torch_onnx/torch_quant_to_onnx.pymodelopt/torch/_deploy/utils/torch_onnx.pytests/_test_utils/torch/vision_models.pytests/examples/torch_onnx/test_torch_quant_to_onnx.py
✅ Files skipped from review due to trivial changes (1)
- examples/torch_onnx/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/_test_utils/torch/vision_models.py
- modelopt/torch/_deploy/utils/torch_onnx.py
- examples/torch_onnx/torch_quant_to_onnx.py
Summary
LayerNormalizationtochange_casts_to_fp16cast_initializer_to_dtypecrash when node has no initializer inputsTest plan
python -m pytest tests/examples/torch_onnx/test_torch_quant_to_onnx.py -v— 10 tests (2 models × 5 modes), all pass🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests