[6287717][ONNX][Quantization] Preserve trt.plugins custom-op value_info in clear_stale_value_info#1697
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughclear_stale_value_info() now preserves value_info entries for outputs of nodes in the trt.plugins domain and returns the sum of reconciled Cast output fixes plus cleared value_info entries. A new INT4 + AWQ quantization regression test verifies the TRT custom op remains after quantization. ChangesTRT Plugin Value Info Handling
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1697 +/- ##
==========================================
- Coverage 77.09% 75.94% -1.16%
==========================================
Files 511 511
Lines 56168 56171 +3
==========================================
- Hits 43302 42657 -645
- Misses 12866 13514 +648
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
/claude review |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/onnx/utils.py`:
- Around line 1894-1907: Reformat the multi-line set comprehension that builds
preserve_names to satisfy ruff formatting rules: adjust line breaks/indentation
so the comprehension over model.graph.node (filtering node.domain ==
"trt.plugins" and iterating for out in node.output) adheres to the project's
style; you can run `ruff format modelopt/onnx/utils.py` or manually reflow the
comprehension lines around preserve_names until ruff stops flagging it (ensure
preserve_names, the generator expressions over model.graph.node, node.domain and
node.output remain unchanged logically).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6d438101-5eb6-4870-9104-1a0f21b7ace0
📒 Files selected for processing (2)
modelopt/onnx/utils.pytests/gpu/onnx/quantization/test_plugin.py
…lue_info int4 quantization forces an opset upgrade to >= 21, where ORT performs type inference at InferenceSession load. Custom (trt.plugins) ops have no ORT type-inference function, so their output types only exist via the value_info populated by TensorRT inference. clear_stale_value_info cleared value_info wholesale, dropping those types, which made ORT fail output type inference for the custom op (e.g. "Op (IdentityConv) output arg (X2) type inference failed"). Preserve value_info entries for trt.plugins-domain node outputs; clear the rest as before. Add a GPU regression test that quantizes a model with the built-in CustomSkipLayerNormPluginDynamic plugin at int4 + awq_clip (no plugin .so compilation needed). Signed-off-by: Gwenaelle Cunha Sergio <gcunhasergio@nvidia.com>
fbf9c20 to
1af3ace
Compare
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1
Focused, well-scoped fix for a real regression. The change preserves value_info entries for outputs of trt.plugins-domain nodes (whose types ORT cannot re-derive) while still clearing the rest, which restores the int4+AWQ path through opset≥21.
Verified:
- Algorithm:
preserve_namescorrectly collects outputs oftrt.plugins-domain nodes; thedel+extend(preserved)reorder respects protobuf RepeatedField semantics. - No regression for non-plugin models: when no plugin nodes exist,
preserve_namesis empty and behavior is identical to before — the existing CPU unit test (test_clear_stale_value_info) is unaffected. - Caller compatibility:
quantize.py:124uses the return as a truthy check, so the slight change in counting semantics (n_clearedinstead oflen(value_info)) is benign and the docstring is updated to match. - Domain literal:
"trt.plugins"matches the existing producer (set_trt_plugin_domainintrt_utils.py:253) and consumer (get_opset_versioninutils.py:706). - Regression test: new GPU test exercises the exact int4 + awq_clip + opset≥21 path that originally failed at ORT calibration-session load.
One non-blocking suggestion left inline: add a CPU-only unit-test parametrization to tests/unit/onnx/test_onnx_utils.py::test_clear_stale_value_info covering the plugin-preservation branch so direct regressions are caught without GPU/TRT.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Focused bug fix (+37/-6, 2 files) for a regression introduced by #1565: clear_stale_value_info previously dropped value_info wholesale, removing output types of trt.plugins custom-op nodes that ORT cannot re-derive, causing INT4/awq (opset >= 21) calibration-session loads to fail type inference. The fix preserves value_info entries whose names match outputs of trt.plugins-domain nodes and clears the rest as before.
Correctness checks:
- The preserve set is built from
node.outputoftrt.plugins-domain nodes; only those entries survive the clear. Logic is correct and minimal. - Return value changed from
fixed_outputs + n_vi(total value_info) tofixed_outputs + n_cleared; docstring updated to match. The return value is diagnostic only — verified the three callers (quantize.py, referencerunner.py) don't depend on it. - Existing CPU unit test
test_clear_stale_value_infostill passes: the stale-output+value_info case yields 1 reconciled + 1 cleared = 2 (no trt.plugins nodes → preserved=[]), and the no-op case yields 0.
Tests: adds a GPU regression test (test_trt_plugin_quantization_int4_awq) exercising the opset>=21 path and asserting the custom op survives. The new preserve branch is only covered by the GPU test (no fast CPU unit case), but combined with the existing CPU test this is reasonable coverage for a targeted fix.
No licensing changes (standard NVIDIA header only), no architectural/design-review trigger, well under size limits. The PR body includes an auto-generated CodeRabbit summary but no injection attempts.
What does this PR do?
Type of change: Bug fix
INT4 quantization upgrades the model to opset >= 21, at which point ONNX Runtime
runs type inference while building the AWQ calibration
InferenceSession. Customops backed by TensorRT plugins (domain
trt.plugins) have no ORT type-inferencefunction, so their output types are only known from the
value_infothat TensorRTtype/shape inference populated earlier in preprocessing.
clear_stale_value_infocleared
value_infowholesale, dropping those types, so ORT failed output typeinference for the custom op at model load, e.g.:
modelopt/onnx/utils.py: inclear_stale_value_info, preservevalue_infoentries for outputs of
trt.plugins-domain nodes (which ORT cannot re-derive);clear the rest as before.
tests/gpu/onnx/quantization/test_plugin.py: add a regression test quantizing amodel with the built-in
CustomSkipLayerNormPluginDynamicplugin at INT4 +awq_clip (the opset >= 21 path), asserting the quantized model is produced and the
custom op survives.
Usage
Testing
pytest tests/gpu/onnx/quantization/test_plugin.py -k int4_awq— fails before the fix(ORT type-inference error at calibration-session load) and passes after. The full
test_plugin.py(including the existing INT8 quantization and autocast cases) passes.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices.
CONTRIBUTING.md: N/AAdditional info
Fixing regression inserted by #1565
Summary by CodeRabbit