Integrate Automated QDQ placement tool - Part 2 #702

willg-nv · 2025-12-17T06:29:51Z

What does this PR do?

Type of change: new feature

Overview: This PR integrate automated Q/DQ placement tool to ModelOpt. This PR is 2/4 parts of the cahnges.

Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704

This PR contains the following changes:

Implement RegionPattern to represent the topology structure of Regions. InsertionPoints are also defined on RegionPattern. Regions with same pattern are optimized at the same time
Implement RegionSearch class to divide ONNX graph into small regions
RegionSearch python file also provides an entry point to print out the region structures.
Unit tests for new classse.

Usage

python -m modelopt.onnx.quantization.autotune.region_search --model model.onnx --verbose

Example output:

    ├─ Region 212 (Level 0, Type: COMPOSITE)
    │  ├─ Direct nodes: 0
    │  ├─ Total nodes (recursive): 9
    │  ├─ Children: 1
    │  ├─ Inputs: 3 tensors
    │  │    - xxx
    │  │    - xxx
    │  │    - xxx
    │  └─ Outputs: 1 tensors
    │       - xxx
    │
    │  Child regions:
    │
      ├─ Region 209 (Level 2, Type: LEAF) 
      │  ├─ Direct nodes: 9
      │  ├─ Total nodes (recursive): 9
      │  ├─ Children: 0
      │  ├─ Inputs: 11 tensors
      │  │    - xxx

Testing

Implemented unit tests for new classes. All unit tests could get pass locally.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: No, document change will be in part 4.
Did you update Changelog?: No. Change log will be included in part 4.

Additional Information

copy-pr-bot · 2025-12-17T06:29:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

willg-nv · 2025-12-22T01:47:10Z

Hi @ajrasane , could you help me review this PR, thanks!

modelopt/onnx/quantization/autotune/common.py

modelopt/onnx/quantization/autotune/region_pattern.py

ajrasane · 2026-01-07T23:39:29Z

modelopt/onnx/quantization/autotune/region_pattern.py

+
+        return scheme
+
+    def format_tree(self, region: Region, graph: gs.Graph, indent: int = 0) -> str:


Could you provide a small example of how this tree looks?

I have added tests for region search to print tree structure, below is an example:

tests/unit/onnx/quantization/autotune/test_region_search.py::TestPrintTree::test_print_tree_top_down_builder ============================================================ Region Tree Structure: ============================================================ ├─ Region 0 (Level 0, Type: LEAF) │ ├─ Direct nodes: 2 │ ├─ Total nodes (recursive): 2 │ ├─ Children: 0 │ ├─ Inputs: 1 tensors │ │ - input │ └─ Outputs: 1 tensors │ - output │ │ Nodes in this region: │ - Node 0: Conv (name: conv) │ - Node 1: Relu (name: relu) | ├─ <child region if exists> ============================================================

Currently, the 2 stage region partitioner only creates regions with depth <= 2.

ajrasane · 2026-01-08T00:01:35Z

modelopt/onnx/quantization/autotune/region_pattern.py

+
+        Signature formats:
+        - Empty region: "EMPTY"
+        - Leaf region: "Op1->Op2->Op3" or "Op1[params]->Op2[params]"


Can this be saved as LEAF(ops)

Why additional LEAF is needed? I think op1->op2->...opn is okay to represent op sequence. Adding LEAF would make complex region too long.

modelopt/onnx/quantization/autotune/region_pattern.py

ajrasane · 2026-01-08T00:15:40Z

modelopt/onnx/quantization/autotune/region_pattern.py

+            region_node_indices: Set of node indices in the current region
+
+        Returns:
+            Signature string examples:


How will the signatures of custom ops look? Could you provide an example?

example res-block signature:

COMPOSITE(Conv[kernel_shape=3x3]->BatchNormalization->Relu->Conv[kernel_shape=3x3]->BatchNormalization+Conv[kernel_shape=1x1]->BatchNormalization+Add->Relu)

graph structure:

Input │ ┌──────────────┴──────────────┐ │ │ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ Conv (3x3) │ │ Conv (1x1) │ (projection) └───────────────┘ └───────────────┘ │ │ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ BatchNorm │ │ BatchNorm │ └───────────────┘ └───────────────┘ │ │ ▼ │ ┌───────────────┐ │ │ Relu │ │ └───────────────┘ │ │ │ ▼ │ ┌───────────────┐ │ │ Conv (3x3) │ │ └───────────────┘ │ │ │ ▼ │ ┌───────────────┐ │ │ BatchNorm │ │ └───────────────┘ │ │ │ └──────────────┬──────────────┘ ▼ ┌─────────┐ │ Add │ └─────────┘ │ ▼ ┌─────────┐ │ Relu │ └─────────┘

For custom plugin node, thier name will also be added to the signature.

modelopt/onnx/quantization/autotune/region_pattern.py

modelopt/onnx/quantization/autotune/__init__.py

modelopt/onnx/quantization/autotune/region_pattern.py

modelopt/onnx/quantization/autotune/region_search.py

tests/unit/onnx/quantization/autotune/test_pattern_cache.py

gcunhase · 2026-01-09T00:37:31Z

modelopt/onnx/quantization/qdq_utils.py

+    quantized_tensors = set()
+
+    for node in onnx_model.graph.node:
+        if node.op_type == "QuantizeLinear":


If --dq_only is enabled, there may only be the DQ node indicating that a tensor is being quantized. Please verify that those cases are supporting with this function.

See

Model-Optimizer/modelopt/onnx/quantization/__main__.py

Line 210 in 307fe71

"--dq_only",

Signed-off-by: Will Guo <willg@nvidia.com>

ajrasane · 2026-01-12T23:45:44Z

modelopt/onnx/quantization/autotune/region_search.py

+from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_node_indices
+
+# Module logger
+logger = logging.getLogger(__name__)


Could you use the logger created here for all the logging?
https://github.com/NVIDIA/Model-Optimizer/blob/727da95a9188aaeef6872a61acae9f1ffae844f6/modelopt/onnx/logging_config.py

ajrasane · 2026-01-13T00:10:36Z

modelopt/onnx/quantization/autotune/region_search.py

+        divergent_outputs = [
+            out.name for out in node.outputs if self._is_tensor_divergent(out.name)
+        ]
+        is_divergent = len(divergent_outputs) > 0


This can be simplified to:

is_divergent = any(self._is_tensor_divergent(out.name) for out in node.outputs)

ajrasane · 2026-01-13T00:15:39Z

modelopt/onnx/quantization/autotune/region_search.py

+                for next_node_idx in self.tensor_users_map[output.name]:
+                    if next_node_idx not in reachable:
+                        reachable[next_node_idx] = distance + 1
+                        queue.append((next_node_idx, distance + 1))


nit: can we skip adding the nodes to the queue if the distance + 1 < maxsteps?

ajrasane · 2026-01-13T00:18:28Z

modelopt/onnx/quantization/autotune/region_search.py

+        2. All nodes between divergence and convergence
+
+        **Algorithm:**
+        1. Identify all branches from the divergent node


Is it a mandatory criteria that a region must start with a divergent node and end with a convergent node?

ajrasane · 2026-01-13T00:33:14Z

modelopt/onnx/quantization/autotune/region_search.py

+
+            # Share the tensor users map from Phase 1 to avoid recomputation.
+            # This map is expensive to build and is shared across all refinements.
+            region_builder.tensor_users_map = region_partitioner.tensor_users_map


Can we also share the forward_reachable_nodes map form Phase 1 to avoid recomputation?

ajrasane · 2026-01-13T00:38:29Z

modelopt/onnx/quantization/autotune/region_search.py

+        seen: set[int] = set()
+        unique_branches: list[int] = []
+        for branch_idx in branches:
+            if branch_idx not in seen:
+                seen.add(branch_idx)
+                unique_branches.append(branch_idx)
+        branches = unique_branches


This can be simplified to:

branches = list(dict.fromkeys(branches))

ajrasane · 2026-01-13T00:39:43Z

modelopt/onnx/quantization/autotune/region_search.py

+        branch_reachable: list[dict[int, int]] = []
+        for branch_idx in branches:
+            reachable = self.forward_reachable_nodes_map.get(branch_idx, {})
+            branch_reachable.append(reachable)


Can be simplified to:

branch_reachable = [self.forward_reachable_nodes_map.get(b, {}) for b in branches]

ajrasane · 2026-01-13T00:42:13Z

modelopt/onnx/quantization/autotune/region_search.py

+        common_nodes = set(branch_reachable[0].keys())
+        for reachable in branch_reachable[1:]:
+            common_nodes.intersection_update(reachable.keys())


Can this be simplified to:

common_nodes = set.intersection(*[set(r.keys()) for r in branch_reachable]) - {node_idx}

willg-nv requested a review from a team as a code owner December 17, 2025 06:29

willg-nv requested a review from ajrasane December 17, 2025 06:29

willg-nv changed the title ~~Dev willg integrate auto qdq placement part2~~ Integrate Automated QDQ placement tool - Part 2 Dec 17, 2025

This was referenced Dec 17, 2025

Integrate Automated QDQ placement tool - Part 3 #703

Open

Integrate Automated QDQ placement tool - Part 4 #704

Open

Integrate Automated QDQ placement tool - Part 1 #701

Open

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch from 3f7ff31 to d3a6765 Compare December 31, 2025 02:16

ajrasane reviewed Jan 8, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 2 times, most recently from 616285d to c95939a Compare January 8, 2026 08:35

gcunhase reviewed Jan 8, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/__init__.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_pattern.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_pattern.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_search.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/region_search.py Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

tests/unit/onnx/quantization/autotune/test_pattern_cache.py Outdated Show resolved Hide resolved

gcunhase reviewed Jan 9, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch 2 times, most recently from 0c4f114 to 4468ca2 Compare January 9, 2026 05:00

Integrate Automated QDQ placement tool - part 2

bc87ca7

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part2 branch from 4468ca2 to bc87ca7 Compare January 9, 2026 05:02

ajrasane reviewed Jan 13, 2026

View reviewed changes


		return scheme

		def format_tree(self, region: Region, graph: gs.Graph, indent: int = 0) -> str:

Integrate Automated QDQ placement tool - Part 2 #702

Are you sure you want to change the base?

Integrate Automated QDQ placement tool - Part 2 #702

Uh oh!

Conversation

willg-nv commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Dec 17, 2025

Uh oh!

willg-nv commented Dec 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willg-nv Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

willg-nv commented Dec 17, 2025 •

edited

Loading

willg-nv Jan 8, 2026 •

edited

Loading