-
Notifications
You must be signed in to change notification settings - Fork 237
Integrate Automated QDQ placement tool - Part 2 #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integrate Automated QDQ placement tool - Part 2 #702
Conversation
|
Hi @ajrasane , could you help me review this PR, thanks! |
3f7ff31 to
d3a6765
Compare
|
|
||
| return scheme | ||
|
|
||
| def format_tree(self, region: Region, graph: gs.Graph, indent: int = 0) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a small example of how this tree looks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added tests for region search to print tree structure, below is an example:
tests/unit/onnx/quantization/autotune/test_region_search.py::TestPrintTree::test_print_tree_top_down_builder
============================================================
Region Tree Structure:
============================================================
├─ Region 0 (Level 0, Type: LEAF)
│ ├─ Direct nodes: 2
│ ├─ Total nodes (recursive): 2
│ ├─ Children: 0
│ ├─ Inputs: 1 tensors
│ │ - input
│ └─ Outputs: 1 tensors
│ - output
│
│ Nodes in this region:
│ - Node 0: Conv (name: conv)
│ - Node 1: Relu (name: relu)
|
├─ <child region if exists>
============================================================
Currently, the 2 stage region partitioner only creates regions with depth <= 2.
| Signature formats: | ||
| - Empty region: "EMPTY" | ||
| - Leaf region: "Op1->Op2->Op3" or "Op1[params]->Op2[params]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be saved as LEAF(ops)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why additional LEAF is needed? I think op1->op2->...opn is okay to represent op sequence. Adding LEAF would make complex region too long.
| region_node_indices: Set of node indices in the current region | ||
| Returns: | ||
| Signature string examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the signatures of custom ops look? Could you provide an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example res-block signature:
COMPOSITE(Conv[kernel_shape=3x3]->BatchNormalization->Relu->Conv[kernel_shape=3x3]->BatchNormalization+Conv[kernel_shape=1x1]->BatchNormalization+Add->Relu)
graph structure:
Input
│
┌──────────────┴──────────────┐
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Conv (3x3) │ │ Conv (1x1) │ (projection)
└───────────────┘ └───────────────┘
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ BatchNorm │ │ BatchNorm │
└───────────────┘ └───────────────┘
│ │
▼ │
┌───────────────┐ │
│ Relu │ │
└───────────────┘ │
│ │
▼ │
┌───────────────┐ │
│ Conv (3x3) │ │
└───────────────┘ │
│ │
▼ │
┌───────────────┐ │
│ BatchNorm │ │
└───────────────┘ │
│ │
└──────────────┬──────────────┘
▼
┌─────────┐
│ Add │
└─────────┘
│
▼
┌─────────┐
│ Relu │
└─────────┘
For custom plugin node, thier name will also be added to the signature.
616285d to
c95939a
Compare
| quantized_tensors = set() | ||
|
|
||
| for node in onnx_model.graph.node: | ||
| if node.op_type == "QuantizeLinear": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If --dq_only is enabled, there may only be the DQ node indicating that a tensor is being quantized. Please verify that those cases are supporting with this function.
See
| "--dq_only", |
0c4f114 to
4468ca2
Compare
Signed-off-by: Will Guo <willg@nvidia.com>
4468ca2 to
bc87ca7
Compare
| from modelopt.onnx.quantization.graph_utils import get_tensor_consumer_node_indices | ||
|
|
||
| # Module logger | ||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the logger created here for all the logging?
https://github.com/NVIDIA/Model-Optimizer/blob/727da95a9188aaeef6872a61acae9f1ffae844f6/modelopt/onnx/logging_config.py
| divergent_outputs = [ | ||
| out.name for out in node.outputs if self._is_tensor_divergent(out.name) | ||
| ] | ||
| is_divergent = len(divergent_outputs) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to:
is_divergent = any(self._is_tensor_divergent(out.name) for out in node.outputs)| for next_node_idx in self.tensor_users_map[output.name]: | ||
| if next_node_idx not in reachable: | ||
| reachable[next_node_idx] = distance + 1 | ||
| queue.append((next_node_idx, distance + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we skip adding the nodes to the queue if the distance + 1 < maxsteps?
| 2. All nodes between divergence and convergence | ||
|
|
||
| **Algorithm:** | ||
| 1. Identify all branches from the divergent node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a mandatory criteria that a region must start with a divergent node and end with a convergent node?
|
|
||
| # Share the tensor users map from Phase 1 to avoid recomputation. | ||
| # This map is expensive to build and is shared across all refinements. | ||
| region_builder.tensor_users_map = region_partitioner.tensor_users_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also share the forward_reachable_nodes map form Phase 1 to avoid recomputation?
| seen: set[int] = set() | ||
| unique_branches: list[int] = [] | ||
| for branch_idx in branches: | ||
| if branch_idx not in seen: | ||
| seen.add(branch_idx) | ||
| unique_branches.append(branch_idx) | ||
| branches = unique_branches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to:
branches = list(dict.fromkeys(branches))| branch_reachable: list[dict[int, int]] = [] | ||
| for branch_idx in branches: | ||
| reachable = self.forward_reachable_nodes_map.get(branch_idx, {}) | ||
| branch_reachable.append(reachable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simplified to:
branch_reachable = [self.forward_reachable_nodes_map.get(b, {}) for b in branches]| common_nodes = set(branch_reachable[0].keys()) | ||
| for reachable in branch_reachable[1:]: | ||
| common_nodes.intersection_update(reachable.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified to:
common_nodes = set.intersection(*[set(r.keys()) for r in branch_reachable]) - {node_idx}
What does this PR do?
Type of change: new feature
Overview: This PR integrate automated Q/DQ placement tool to ModelOpt. This PR is 2/4 parts of the cahnges.
Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704
This PR contains the following changes:
Usage
Example output:
Testing
Implemented unit tests for new classes. All unit tests could get pass locally.
Before your PR is "Ready for review"
Additional Information