Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.
Describe the bug
Found that after upgrading to 0.40 tag a model conversion workflow which previously worked began failing. This is due to the additional input validation checks added to reference runner:
.
When model input is something like ['batch', 1, 1, 1], input_shape in ReferenceRunner becomes [0, 1, 1, 1]. Only dim_value field is being checked, which does not exist for a dynamic input (instead it is dim_param field), this results in 0 value for all dynamic inputs. This is later checked exactly against the incoming calibration data and results in input validation failing.
A side note, args for calibration_data in convert_to_mixed_precision are just "str" | None.
|
calibration_data: str | None = None, |
ReferenceRunner does support passing an OrderedDict
|
elif isinstance(inputs, (dict, OrderedDict)): |
It is nicer as a user to just pass through a Dict in-memory rather than creating a file just to pass the path along, updating the calibration_data args to match what ReferenceRunner can use would be nice.
Steps/Code to reproduce bug
import numpy as np
import onnx
from modelopt.onnx.autocast.convert import convert_to_mixed_precision
input = onnx.helper.make_tensor_value_info(
"input", onnx.TensorProto.FLOAT, ["batch", 1]
)
output = onnx.helper.make_tensor_value_info(
"output", onnx.TensorProto.FLOAT, ["batch", 1]
)
node = onnx.helper.make_node(
"Relu",
["input"],
["output"],
)
graph = onnx.helper.make_graph(
nodes=[node],
name="foo",
inputs=[input],
outputs=[output],
initializer=[],
)
model = onnx.helper.make_model(
graph,
)
onnx.save(model, "foo.onnx")
input_arr = np.ones((1, 1),dtype=np.float32)
np.savez("calibration_data.npz", input=input_arr)
convert_to_mixed_precision("foo.onnx", calibration_data="calibration_data.npz")
`
Expected behavior
Expect the above reproducer to pass, currently get ValueError on the dimension check
ValueError: Input shape from 'input' does not match provided input shape: [0, 1] vs [1, 1]. Please make sure that your calibration data matches the ONNX input shapes
Who can help?
System information
- Container used (if applicable): n/a
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): SUSE Linux Enterprise Server 15 SP6
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): NVIDIA RTX A4000
- GPU memory size: 15.0 GB
- Number of GPUs: 1
- Library versions (if applicable):
- Python: 3.12.11
- ModelOpt version or commit hash: 0.40.0
- CUDA: 13.0
- PyTorch: n/a
- Transformers: n/a
- TensorRT-LLM: ?
- ONNXRuntime: 1.23.2
- TensorRT: 10.13
Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.
Describe the bug
Found that after upgrading to 0.40 tag a model conversion workflow which previously worked began failing. This is due to the additional input validation checks added to reference runner:
Model-Optimizer/modelopt/onnx/autocast/referencerunner.py
Line 47 in b286165
When model input is something like ['batch', 1, 1, 1], input_shape in ReferenceRunner becomes [0, 1, 1, 1]. Only dim_value field is being checked, which does not exist for a dynamic input (instead it is dim_param field), this results in 0 value for all dynamic inputs. This is later checked exactly against the incoming calibration data and results in input validation failing.
A side note, args for calibration_data in convert_to_mixed_precision are just "str" | None.
Model-Optimizer/modelopt/onnx/autocast/convert.py
Line 56 in bdd10c2
ReferenceRunner does support passing an OrderedDict
Model-Optimizer/modelopt/onnx/autocast/referencerunner.py
Line 110 in bdd10c2
It is nicer as a user to just pass through a Dict in-memory rather than creating a file just to pass the path along, updating the calibration_data args to match what ReferenceRunner can use would be nice.
Steps/Code to reproduce bug
import numpy as np
import onnx
from modelopt.onnx.autocast.convert import convert_to_mixed_precision
input = onnx.helper.make_tensor_value_info(
"input", onnx.TensorProto.FLOAT, ["batch", 1]
)
output = onnx.helper.make_tensor_value_info(
"output", onnx.TensorProto.FLOAT, ["batch", 1]
)
node = onnx.helper.make_node(
"Relu",
["input"],
["output"],
)
graph = onnx.helper.make_graph(
nodes=[node],
name="foo",
inputs=[input],
outputs=[output],
initializer=[],
)
model = onnx.helper.make_model(
graph,
)
onnx.save(model, "foo.onnx")
input_arr = np.ones((1, 1),dtype=np.float32)
np.savez("calibration_data.npz", input=input_arr)
convert_to_mixed_precision("foo.onnx", calibration_data="calibration_data.npz")
`
Expected behavior
Expect the above reproducer to pass, currently get ValueError on the dimension check
ValueError: Input shape from 'input' does not match provided input shape: [0, 1] vs [1, 1]. Please make sure that your calibration data matches the ONNX input shapesWho can help?
System information