[6287717][ONNX][Quantization] Preserve trt.plugins custom-op value_info in clear_stale_value_info by gcunhase · Pull Request #1697 · NVIDIA/Model-Optimizer

gcunhase · 2026-06-12T13:59:28Z

What does this PR do?

Type of change: Bug fix

INT4 quantization upgrades the model to opset >= 21, at which point ONNX Runtime
runs type inference while building the AWQ calibration InferenceSession. Custom
ops backed by TensorRT plugins (domain trt.plugins) have no ORT type-inference
function, so their output types are only known from the value_info that TensorRT
type/shape inference populated earlier in preprocessing. clear_stale_value_info
cleared value_info wholesale, dropping those types, so ORT failed output type
inference for the custom op at model load, e.g.:

Node (Conv-2) Op (IdentityConv) output arg (X2) type inference failed

modelopt/onnx/utils.py: in clear_stale_value_info, preserve value_info
entries for outputs of trt.plugins-domain nodes (which ORT cannot re-derive);
clear the rest as before.
tests/gpu/onnx/quantization/test_plugin.py: add a regression test quantizing a
model with the built-in CustomSkipLayerNormPluginDynamic plugin at INT4 +
awq_clip (the opset >= 21 path), asserting the quantized model is produced and the
custom op survives.

Usage

python -m modelopt.onnx.quantization \
    --onnx_path=model.onnx \
    --quantize_mode=int4 \
    --calibration_method=awq_clip \
    --trt_plugins=/path/to/plugin.so

Testing

pytest tests/gpu/onnx/quantization/test_plugin.py -k int4_awq — fails before the fix
(ORT type-inference error at calibration-session load) and passes after. The full
test_plugin.py (including the existing INT8 quantization and autocast cases) passes.
The example here also failed before this fix, now passes.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices.

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A

Additional info

Fixing regression inserted by #1565

Summary by CodeRabbit

Bug Fixes
- Preserve metadata for TensorRT plugin outputs during cleanup and correctly reconcile output data types so custom plugin ops remain intact after optimization/quantization.
Tests
- Added a GPU ONNX regression test covering int4 quantization with AWQ calibration to ensure TensorRT plugins are retained.

copy-pr-bot · 2026-06-12T13:59:32Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-12T13:59:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4a47d19b-7980-48fe-af2d-e25367dacc20

📥 Commits

Reviewing files that changed from the base of the PR and between fbf9c20 and 1af3ace.

📒 Files selected for processing (2)

modelopt/onnx/utils.py
tests/gpu/onnx/quantization/test_plugin.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/gpu/onnx/quantization/test_plugin.py
modelopt/onnx/utils.py

📝 Walkthrough

Walkthrough

clear_stale_value_info() now preserves value_info entries for outputs of nodes in the trt.plugins domain and returns the sum of reconciled Cast output fixes plus cleared value_info entries. A new INT4 + AWQ quantization regression test verifies the TRT custom op remains after quantization.

Changes

TRT Plugin Value Info Handling

Layer / File(s)	Summary
TRT plugin value_info preservation in `clear_stale_value_info()` `modelopt/onnx/utils.py`	Updated docstring and implementation to build a `preserve_names` set from nodes with `domain == "trt.plugins"`, retain matching `value_info` entries, count cleared entries, and return `fixed_outputs + n_cleared`.
INT4 quantization regression test for TRT plugins `tests/gpu/onnx/quantization/test_plugin.py`	Added `test_trt_plugin_quantization_int4_awq` which quantizes the TRT plugin ONNX model with `quantize_mode="int4"` and `calibration_method="awq_clip"`, checks the `.quant.onnx` file is produced, and asserts `CustomSkipLayerNormPluginDynamic` remains in the quantized graph.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: preserving trt.plugins custom-op value_info entries in the clear_stale_value_info function, which is the core fix described in both file changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR diff vs main changes only modelopt/onnx/utils.py and tests/gpu/onnx/quantization/test_plugin.py; these files contain none of the forbidden patterns (torch.load weights_only=False, numpy allow_pi...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-12T14:08:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.94%. Comparing base (cc17f2c) to head (1af3ace).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1697      +/-   ##
==========================================
- Coverage   77.09%   75.94%   -1.16%     
==========================================
  Files         511      511              
  Lines       56168    56171       +3     
==========================================
- Hits        43302    42657     -645     
- Misses      12866    13514     +648

Flag	Coverage Δ
examples	`40.59% <0.00%> (-1.36%)`	⬇️
gpu	`57.70% <100.00%> (-0.61%)`	⬇️
unit	`54.34% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gcunhase · 2026-06-12T14:11:52Z

/claude review

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/onnx/utils.py`:
- Around line 1894-1907: Reformat the multi-line set comprehension that builds
preserve_names to satisfy ruff formatting rules: adjust line breaks/indentation
so the comprehension over model.graph.node (filtering node.domain ==
"trt.plugins" and iterating for out in node.output) adheres to the project's
style; you can run `ruff format modelopt/onnx/utils.py` or manually reflow the
comprehension lines around preserve_names until ruff stops flagging it (ensure
preserve_names, the generator expressions over model.graph.node, node.domain and
node.output remain unchanged logically).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6d438101-5eb6-4870-9104-1a0f21b7ace0

📥 Commits

Reviewing files that changed from the base of the PR and between cc17f2c and fbf9c20.

📒 Files selected for processing (2)

modelopt/onnx/utils.py
tests/gpu/onnx/quantization/test_plugin.py

…lue_info int4 quantization forces an opset upgrade to >= 21, where ORT performs type inference at InferenceSession load. Custom (trt.plugins) ops have no ORT type-inference function, so their output types only exist via the value_info populated by TensorRT inference. clear_stale_value_info cleared value_info wholesale, dropping those types, which made ORT fail output type inference for the custom op (e.g. "Op (IdentityConv) output arg (X2) type inference failed"). Preserve value_info entries for trt.plugins-domain node outputs; clear the rest as before. Add a GPU regression test that quantizes a model with the built-in CustomSkipLayerNormPluginDynamic plugin at int4 + awq_clip (no plugin .so compilation needed). Signed-off-by: Gwenaelle Cunha Sergio <gcunhasergio@nvidia.com>

claude

Claude review passed — no blocking issues found. LGTM

Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1

Focused, well-scoped fix for a real regression. The change preserves value_info entries for outputs of trt.plugins-domain nodes (whose types ORT cannot re-derive) while still clearing the rest, which restores the int4+AWQ path through opset≥21.

Verified:

Algorithm: preserve_names correctly collects outputs of trt.plugins-domain nodes; the del + extend(preserved) reorder respects protobuf RepeatedField semantics.
No regression for non-plugin models: when no plugin nodes exist, preserve_names is empty and behavior is identical to before — the existing CPU unit test (test_clear_stale_value_info) is unaffected.
Caller compatibility: quantize.py:124 uses the return as a truthy check, so the slight change in counting semantics (n_cleared instead of len(value_info)) is benign and the docstring is updated to match.
Domain literal: "trt.plugins" matches the existing producer (set_trt_plugin_domain in trt_utils.py:253) and consumer (get_opset_version in utils.py:706).
Regression test: new GPU test exercises the exact int4 + awq_clip + opset≥21 path that originally failed at ORT calibration-session load.

One non-blocking suggestion left inline: add a CPU-only unit-test parametrization to tests/unit/onnx/test_onnx_utils.py::test_clear_stale_value_info covering the plugin-preservation branch so direct regressions are caught without GPU/TRT.

cjluo-nv

Bot review — DM the bot to share feedback.

Focused bug fix (+37/-6, 2 files) for a regression introduced by #1565: clear_stale_value_info previously dropped value_info wholesale, removing output types of trt.plugins custom-op nodes that ORT cannot re-derive, causing INT4/awq (opset >= 21) calibration-session loads to fail type inference. The fix preserves value_info entries whose names match outputs of trt.plugins-domain nodes and clears the rest as before.

Correctness checks:

The preserve set is built from node.output of trt.plugins-domain nodes; only those entries survive the clear. Logic is correct and minimal.
Return value changed from fixed_outputs + n_vi (total value_info) to fixed_outputs + n_cleared; docstring updated to match. The return value is diagnostic only — verified the three callers (quantize.py, referencerunner.py) don't depend on it.
Existing CPU unit test test_clear_stale_value_info still passes: the stale-output+value_info case yields 1 reconciled + 1 cleared = 2 (no trt.plugins nodes → preserved=[]), and the no-op case yields 0.

Tests: adds a GPU regression test (test_trt_plugin_quantization_int4_awq) exercising the opset>=21 path and asserting the custom op survives. The new preserve branch is only covered by the GPU test (no fast CPU unit case), but combined with the existing CPU test this is reasonable coverage for a targeted fix.

No licensing changes (standard NVIDIA header only), no architectural/design-review trigger, well under size limits. The PR body includes an auto-generated CodeRabbit summary but no injection attempts.

gcunhase marked this pull request as ready for review June 12, 2026 14:04

gcunhase requested a review from a team as a code owner June 12, 2026 14:04

gcunhase requested a review from cjluo-nv June 12, 2026 14:04

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelopt/onnx/utils.py

gcunhase force-pushed the dev/agent/6287717_preserve_custom_op_value_info branch from fbf9c20 to 1af3ace Compare June 12, 2026 14:15

claude Bot approved these changes Jun 12, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 12, 2026

View reviewed changes

cjluo-nv approved these changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6287717][ONNX][Quantization] Preserve trt.plugins custom-op value_info in clear_stale_value_info#1697

[6287717][ONNX][Quantization] Preserve trt.plugins custom-op value_info in clear_stale_value_info#1697
gcunhase wants to merge 1 commit into
NVIDIA:mainfrom
gcunhase:dev/agent/6287717_preserve_custom_op_value_info

gcunhase commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

gcunhase commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

cjluo-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gcunhase commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional info

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gcunhase commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcunhase commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

codecov Bot commented Jun 12, 2026 •

edited

Loading