Add Qwen3-VL model support + multi-image input support in Qwen VL family by hanbitmyths · Pull Request #2345 · microsoft/Olive

hanbitmyths · 2026-03-04T06:32:43Z

This PR adds support for exporting and optimizing Qwen3-VL (and Qwen2.5-VL) vision-language models through Olive, including new ONNX graph surgery passes, 8-bit quantization enhancements, and a cast chain elimination pass.

Add Qwen3-VL / Qwen2.5-VL model export support via Model Builder and torch export
New pass: CastChainElimination removes redundant Cast→Cast chains (e.g., fp32→fp16→fp32) by collapsing them into a single Cast or eliminating them entirely when source and target types match.
GemmToMatMulAdd graph surgery converts Gemm nodes to MatMul+Add for broader runtime compatibility.
ReciprocalMulToDiv graph surgery fuses Reciprocal→Mul patterns into a single Div node.
DeduplicateSubgraphInitializers graph surgery merges duplicate initializers that share identical tensor data.
DeduplicateNodes graph surgery removes duplicate nodes that have identical op_type, attributes, and inputs.
Add 8-bit integer Gather quantization into RTN quantization.
Skip quantization of unused initializers.

- graph_surgeries.py: add QwenVL-specific graph surgery passes for vision embedding merge and positional encoding fixup - rtn_quantization.py: extend RTN quantization for multimodal models, handle vision encoder exclusion patterns - cast_chain_elimination.py: new pass to eliminate redundant Cast chains in Dynamo-exported models (fp32->fp16->fp32 patterns) - olive_config.json: register new passes

…surgery passes - rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather - common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import - graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes

…author (TD002), fix formatting

- Apply ruff format to 4 files (cast_chain_elimination.py, rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py) - Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with axis-aware _pack_int4_along_axis that correctly packs zero_point when k_blocks is small (e.g. 1), avoiding ValueError on reshape - Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized always uses quantize_axis=data_rank-1, not pass_config['axis']

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

…t#2351)

@devang-ml

Address PR review feedback from @devang-ml and @justinchuby: use onnxscript.optimizer.optimize() instead of ORT InferenceSession with session.enable_cast_chain_elimination to eliminate redundant Cast chains. - Remove onnxruntime dependency from cast_chain_elimination pass - Use onnxscript.optimizer.optimize() with TypeInferenceError fallback (same pattern as OnnxPeepholeOptimizer) - Update test comment to reflect onnxscript optimizer - Verified: numerically identical outputs (0.00 max abs diff) - Verified: no eval regression (69% on AI2D 100 samples)

Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR microsoft#2355 (ByteSize EncodeError handling).

…n elimination Use a custom CastCastRoundTrip rewrite rule instead of the full onnxscript.optimizer.optimize() call. The rewrite rule specifically targets round-trip Cast chains (e.g. fp32->fp16->fp32) by checking that the final cast type matches the original input type, and replaces them with Identity. This is simpler, faster, and avoids the TypeInferenceError fallback that was needed with the full optimizer. The onnxscript rewrite() function also runs RemoveUnusedNodesPass and RemoveUnusedOpsetsPass automatically. Validated: weights identical, 0.00 max abs diff, eval 69% unchanged.

…ents-differ)

Move _ensure_com_microsoft_opset and eliminate_cast_chains into ModelOptimizer class. Add fix_com_microsoft_opset and cast_chain_elimination config flags to OnnxPeepholeOptimizer. Remove standalone OnnxCastChainElimination pass, its olive_config entry, and its test file. Move tests into test_peephole_optimizer.py. Per devang-ml's review: consolidate into existing pass to avoid introducing a new one.

Add onnxscript_optimize, onnxoptimizer_optimize, and fuse_reshape_operations config flags (default True for backward compatibility). This allows recipe configs to disable the default optimizations and only run opset fixup + cast chain elimination, producing byte-identical models to the old standalone pass.

devang-ml · 2026-03-18T04:21:29Z

+
+    Why this is needed:
+        ORT's ``convert_float_to_float16`` (``float16.py``) may insert identical
+        ``Cast`` nodes in parallel branches that each declare the same output tensor


Would it make sense to fix the convert_float_to_float16 ?

hanbitmyths · 2026-03-18T04:57:28Z

/azp run Olive CI

azure-pipelines · 2026-03-18T04:57:35Z

Commenter does not have sufficient privileges for PR 2345 in repo microsoft/Olive

Copilot

Pull request overview

This PR extends Olive’s ONNX optimization/quantization pipeline to better support Qwen VL-family exports by adding new ONNX graph-surgery utilities, enhancing RTN quantization (notably Gather + shared weights + initializer cleanup), and expanding the peephole optimizer with optional cast-chain elimination and com.microsoft opset fixups.

Changes:

Enhanced OnnxBlockWiseRtnQuantization to support Gather 8-bit quantization, handle shared-weight initializers, and remove unused initializers post-quantization.
Added new GraphSurgeries proto-level surgeons: GemmToMatMulAdd, ReciprocalMulToDiv, DeduplicateSubgraphInitializers, and DeduplicateNodes, plus corresponding tests.
Extended OnnxPeepholeOptimizer with configurable optimization steps, an opset fix-up helper, and a cast-chain elimination rewrite rule, with new unit tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`olive/passes/onnx/rtn_quantization.py`	Adds Gather 8-bit support, shared initializer de-duping, and unused-initializer cleanup for RTN quantization.
`olive/passes/onnx/peephole_optimizer.py`	Adds optional com.microsoft opset fixup and cast-chain elimination; makes optimizer steps configurable.
`olive/passes/onnx/graph_surgeries.py`	Introduces several new proto-based graph surgery passes for compatibility and cleanup.
`olive/passes/onnx/common.py`	Adds compatibility fallback for `FOLDED_FROM_KEY` import.
`test/passes/onnx/test_rtn_quantization.py`	Expands RTN quantization tests for Gather 8-bit, axis forcing, shared weights, and initializer cleanup.
`test/passes/onnx/test_peephole_optimizer.py`	Adds unit tests for opset fixup and cast-chain elimination behavior.
`test/passes/onnx/test_graph_surgeries.py`	Adds tests validating new graph surgery passes and numerical correctness where applicable.

CI uses an ORT version that supports max IR version 11, but newer ONNX packages default to IR version 13. Pin to 10 to match the convention used by existing tests.

… assert - GemmToMatMulAdd: create new transposed initializer instead of mutating shared one in-place; use base_name fallback for empty node.name to avoid duplicate tensor names. - ReciprocalMulToDiv: build consumer map upfront to avoid O(N^2) graph scans; re-check actual inputs for stale consumer references. - test_rtn_quantization: add found assertion in test_gather_quantize_axis_forced_to_last_dim. Validated: 0.00 max abs diff, eval 69% unchanged.

hanbitmyths added 7 commits February 26, 2026 11:19

Fix ModelBuilder sys.path for ort-genai builders package import

514362d

Expose real ModelBuilder import error for debugging

cb1987b

Clean up ModelBuilder import fix (expose chain, not debug print)

2c2269e

Remove sys.path hack for onnxruntime-genai builder import

e77864f

Add unit tests for Qwen3-VL graph surgery and quantization passes

4d5283e

github-advanced-security AI found potential problems Mar 4, 2026

View reviewed changes

hanbitmyths mentioned this pull request Mar 4, 2026

Add Qwen3-VL-2B and Qwen2.5-VL-3B builtin optimization recipes microsoft/olive-recipes#254

Merged

hanbitmyths and others added 4 commits March 3, 2026 22:55

Fix lintrunner warnings: rename uppercase variables (N806), add TODO …

9fc9bd3

…author (TD002), fix formatting

Merge branch 'main' into sunghcho/qwen3-vl

32cc2ce

Add linkcheck_ignore for broken intel/neural-compressor URL

62544da

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

hanbitmyths mentioned this pull request Mar 4, 2026

Add Qwen3-VL runtime, export, and Python guide support microsoft/onnxruntime-genai#1999

Closed

hanbitmyths and others added 7 commits March 5, 2026 21:50

Merge branch 'main' into sunghcho/qwen3-vl

efe845f

Remove neural-compressor linkcheck_ignore (fixed upstream in microsof…

3d0029c

…t#2351)

Merge branch 'main' into sunghcho/qwen3-vl

5ad0fa4

Trigger CI rebuild

448e8a2

Trigger CI rebuild (lint)

b41c25f

Trigger CI rebuild (all green)

a35f6e9

Trigger CI rebuild (CodeQL)

9846f31

devang-ml reviewed Mar 13, 2026

View reviewed changes

Comment thread olive/passes/onnx/cast_chain_elimination.py Outdated

hanbitmyths added 2 commits March 13, 2026 16:37

Merge origin/main into sunghcho/qwen3-vl

f8146c5

Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR microsoft#2355 (ByteSize EncodeError handling).

justinchuby reviewed Mar 13, 2026

View reviewed changes

Comment thread olive/passes/onnx/cast_chain_elimination.py Outdated

justinchuby reviewed Mar 13, 2026

View reviewed changes

Comment thread test/passes/onnx/test_cast_chain_elimination.py Outdated

github-advanced-security AI found potential problems Mar 16, 2026

View reviewed changes

devang-ml reviewed Mar 16, 2026

View reviewed changes

Comment thread olive/passes/onnx/cast_chain_elimination.py Outdated

devang-ml reviewed Mar 16, 2026

View reviewed changes

Comment thread olive/passes/onnx/cast_chain_elimination.py Outdated

hanbitmyths added 6 commits March 16, 2026 12:59

Fix lint: move onnxscript imports to top level (PLC0415)

4ecba49

Fix lint: use functional RewriteRule API to avoid pylint W0221 (argum…

054bd7c

…ents-differ)

Move _get_cast_chain_rewrite_rules into ModelOptimizer as static method

9578497

Fix lint: remove duplicate numpy import in test (W0621/W0404)

f50743d

devang-ml previously approved these changes Mar 18, 2026

View reviewed changes

Merge branch 'main' into sunghcho/qwen3-vl

6f99073

Copilot AI review requested due to automatic review settings March 19, 2026 00:53

Copilot AI reviewed Mar 19, 2026

View reviewed changes

Fix CI: pin model.ir_version=10 in all new tests for ORT compatibility

5052cf7

CI uses an ORT version that supports max IR version 11, but newer ONNX packages default to IR version 13. Pin to 10 to match the convention used by existing tests.

hanbitmyths dismissed devang-ml’s stale review via 5052cf7 March 20, 2026 00:09

xiaoyu-work approved these changes Mar 20, 2026

View reviewed changes

xiaoyu-work enabled auto-merge (squash) March 20, 2026 22:51

xiaoyu-work merged commit f56d223 into microsoft:main Mar 20, 2026
15 checks passed

jclab-joseph mentioned this pull request Mar 31, 2026

qwen3-vl does not work. #2376

Open

Conversation

hanbitmyths commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devang-ml Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

hanbitmyths commented Mar 18, 2026

Uh oh!

azure-pipelines Bot commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hanbitmyths commented Mar 4, 2026 •

edited

Loading