Skip to content

[6287717][ONNX][Quantization] Preserve trt.plugins custom-op value_info in clear_stale_value_info#1697

Open
gcunhase wants to merge 1 commit into
NVIDIA:mainfrom
gcunhase:dev/agent/6287717_preserve_custom_op_value_info
Open

[6287717][ONNX][Quantization] Preserve trt.plugins custom-op value_info in clear_stale_value_info#1697
gcunhase wants to merge 1 commit into
NVIDIA:mainfrom
gcunhase:dev/agent/6287717_preserve_custom_op_value_info

Conversation

@gcunhase

@gcunhase gcunhase commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: Bug fix

INT4 quantization upgrades the model to opset >= 21, at which point ONNX Runtime
runs type inference while building the AWQ calibration InferenceSession. Custom
ops backed by TensorRT plugins (domain trt.plugins) have no ORT type-inference
function, so their output types are only known from the value_info that TensorRT
type/shape inference populated earlier in preprocessing. clear_stale_value_info
cleared value_info wholesale, dropping those types, so ORT failed output type
inference for the custom op at model load, e.g.:

Node (Conv-2) Op (IdentityConv) output arg (X2) type inference failed
  • modelopt/onnx/utils.py: in clear_stale_value_info, preserve value_info
    entries for outputs of trt.plugins-domain nodes (which ORT cannot re-derive);
    clear the rest as before.
  • tests/gpu/onnx/quantization/test_plugin.py: add a regression test quantizing a
    model with the built-in CustomSkipLayerNormPluginDynamic plugin at INT4 +
    awq_clip (the opset >= 21 path), asserting the quantized model is produced and the
    custom op survives.

Usage

python -m modelopt.onnx.quantization \
    --onnx_path=model.onnx \
    --quantize_mode=int4 \
    --calibration_method=awq_clip \
    --trt_plugins=/path/to/plugin.so

Testing

  • pytest tests/gpu/onnx/quantization/test_plugin.py -k int4_awq — fails before the fix
    (ORT type-inference error at calibration-session load) and passes after. The full
    test_plugin.py (including the existing INT8 quantization and autocast cases) passes.
  • The example here also failed before this fix, now passes.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices.

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A

Additional info

Fixing regression inserted by #1565

Summary by CodeRabbit

  • Bug Fixes
    • Preserve metadata for TensorRT plugin outputs during cleanup and correctly reconcile output data types so custom plugin ops remain intact after optimization/quantization.
  • Tests
    • Added a GPU ONNX regression test covering int4 quantization with AWQ calibration to ensure TensorRT plugins are retained.

@copy-pr-bot

copy-pr-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4a47d19b-7980-48fe-af2d-e25367dacc20

📥 Commits

Reviewing files that changed from the base of the PR and between fbf9c20 and 1af3ace.

📒 Files selected for processing (2)
  • modelopt/onnx/utils.py
  • tests/gpu/onnx/quantization/test_plugin.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/gpu/onnx/quantization/test_plugin.py
  • modelopt/onnx/utils.py

📝 Walkthrough

Walkthrough

clear_stale_value_info() now preserves value_info entries for outputs of nodes in the trt.plugins domain and returns the sum of reconciled Cast output fixes plus cleared value_info entries. A new INT4 + AWQ quantization regression test verifies the TRT custom op remains after quantization.

Changes

TRT Plugin Value Info Handling

Layer / File(s) Summary
TRT plugin value_info preservation in clear_stale_value_info()
modelopt/onnx/utils.py
Updated docstring and implementation to build a preserve_names set from nodes with domain == "trt.plugins", retain matching value_info entries, count cleared entries, and return fixed_outputs + n_cleared.
INT4 quantization regression test for TRT plugins
tests/gpu/onnx/quantization/test_plugin.py
Added test_trt_plugin_quantization_int4_awq which quantizes the TRT plugin ONNX model with quantize_mode="int4" and calibration_method="awq_clip", checks the .quant.onnx file is produced, and asserts CustomSkipLayerNormPluginDynamic remains in the quantized graph.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: preserving trt.plugins custom-op value_info entries in the clear_stale_value_info function, which is the core fix described in both file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR diff vs main changes only modelopt/onnx/utils.py and tests/gpu/onnx/quantization/test_plugin.py; these files contain none of the forbidden patterns (torch.load weights_only=False, numpy allow_pi...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@gcunhase gcunhase marked this pull request as ready for review June 12, 2026 14:04
@gcunhase gcunhase requested a review from a team as a code owner June 12, 2026 14:04
@gcunhase gcunhase requested a review from cjluo-nv June 12, 2026 14:04
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.94%. Comparing base (cc17f2c) to head (1af3ace).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1697      +/-   ##
==========================================
- Coverage   77.09%   75.94%   -1.16%     
==========================================
  Files         511      511              
  Lines       56168    56171       +3     
==========================================
- Hits        43302    42657     -645     
- Misses      12866    13514     +648     
Flag Coverage Δ
examples 40.59% <0.00%> (-1.36%) ⬇️
gpu 57.70% <100.00%> (-0.61%) ⬇️
unit 54.34% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gcunhase

Copy link
Copy Markdown
Contributor Author

/claude review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/onnx/utils.py`:
- Around line 1894-1907: Reformat the multi-line set comprehension that builds
preserve_names to satisfy ruff formatting rules: adjust line breaks/indentation
so the comprehension over model.graph.node (filtering node.domain ==
"trt.plugins" and iterating for out in node.output) adheres to the project's
style; you can run `ruff format modelopt/onnx/utils.py` or manually reflow the
comprehension lines around preserve_names until ruff stops flagging it (ensure
preserve_names, the generator expressions over model.graph.node, node.domain and
node.output remain unchanged logically).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6d438101-5eb6-4870-9104-1a0f21b7ace0

📥 Commits

Reviewing files that changed from the base of the PR and between cc17f2c and fbf9c20.

📒 Files selected for processing (2)
  • modelopt/onnx/utils.py
  • tests/gpu/onnx/quantization/test_plugin.py

Comment thread modelopt/onnx/utils.py
…lue_info

int4 quantization forces an opset upgrade to >= 21, where ORT performs type
inference at InferenceSession load. Custom (trt.plugins) ops have no ORT
type-inference function, so their output types only exist via the value_info
populated by TensorRT inference. clear_stale_value_info cleared value_info
wholesale, dropping those types, which made ORT fail output type inference for
the custom op (e.g. "Op (IdentityConv) output arg (X2) type inference failed").

Preserve value_info entries for trt.plugins-domain node outputs; clear the rest
as before. Add a GPU regression test that quantizes a model with the built-in
CustomSkipLayerNormPluginDynamic plugin at int4 + awq_clip (no plugin .so
compilation needed).

Signed-off-by: Gwenaelle Cunha Sergio <gcunhasergio@nvidia.com>
@gcunhase gcunhase force-pushed the dev/agent/6287717_preserve_custom_op_value_info branch from fbf9c20 to 1af3ace Compare June 12, 2026 14:15

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no blocking issues found. LGTM

Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1

Focused, well-scoped fix for a real regression. The change preserves value_info entries for outputs of trt.plugins-domain nodes (whose types ORT cannot re-derive) while still clearing the rest, which restores the int4+AWQ path through opset≥21.

Verified:

  • Algorithm: preserve_names correctly collects outputs of trt.plugins-domain nodes; the del + extend(preserved) reorder respects protobuf RepeatedField semantics.
  • No regression for non-plugin models: when no plugin nodes exist, preserve_names is empty and behavior is identical to before — the existing CPU unit test (test_clear_stale_value_info) is unaffected.
  • Caller compatibility: quantize.py:124 uses the return as a truthy check, so the slight change in counting semantics (n_cleared instead of len(value_info)) is benign and the docstring is updated to match.
  • Domain literal: "trt.plugins" matches the existing producer (set_trt_plugin_domain in trt_utils.py:253) and consumer (get_opset_version in utils.py:706).
  • Regression test: new GPU test exercises the exact int4 + awq_clip + opset≥21 path that originally failed at ORT calibration-session load.

One non-blocking suggestion left inline: add a CPU-only unit-test parametrization to tests/unit/onnx/test_onnx_utils.py::test_clear_stale_value_info covering the plugin-preservation branch so direct regressions are caught without GPU/TRT.

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Focused bug fix (+37/-6, 2 files) for a regression introduced by #1565: clear_stale_value_info previously dropped value_info wholesale, removing output types of trt.plugins custom-op nodes that ORT cannot re-derive, causing INT4/awq (opset >= 21) calibration-session loads to fail type inference. The fix preserves value_info entries whose names match outputs of trt.plugins-domain nodes and clears the rest as before.

Correctness checks:

  • The preserve set is built from node.output of trt.plugins-domain nodes; only those entries survive the clear. Logic is correct and minimal.
  • Return value changed from fixed_outputs + n_vi (total value_info) to fixed_outputs + n_cleared; docstring updated to match. The return value is diagnostic only — verified the three callers (quantize.py, referencerunner.py) don't depend on it.
  • Existing CPU unit test test_clear_stale_value_info still passes: the stale-output+value_info case yields 1 reconciled + 1 cleared = 2 (no trt.plugins nodes → preserved=[]), and the no-op case yields 0.

Tests: adds a GPU regression test (test_trt_plugin_quantization_int4_awq) exercising the opset>=21 path and asserting the custom op survives. The new preserve branch is only covered by the GPU test (no fast CPU unit case), but combined with the existing CPU test this is reasonable coverage for a targeted fix.

No licensing changes (standard NVIDIA header only), no architectural/design-review trigger, well under size limits. The PR body includes an auto-generated CodeRabbit summary but no injection attempts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants