-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
I used quantize_dynamic from onnxruntime to quantize onnx model. But it could not convert to tensorrt plan.
error is Non-zero zero point is not supported. Do you know how to fix it?
[02/11/2026-23:33:24] [E] [TRT] ModelImporter.cpp:138: --- Begin node ---
input: "/model/embeddings/tok_embeddings/Gather_output_0_quantized"
input: "model.embeddings.tok_embeddings.weight_scale"
input: "model.embeddings.tok_embeddings.weight_zero_point"
output: "/model/embeddings/tok_embeddings/Gather_output_0"
name: "/model/embeddings/tok_embeddings/Gather_output_0_DequantizeLinear"
op_type: "DequantizeLinear"
[02/11/2026-23:33:24] [E] [TRT] ModelImporter.cpp:139: --- End node ---
[02/11/2026-23:33:24] [E] [TRT] ModelImporter.cpp:141: ERROR: onnxOpImporters.cpp:1584 In function QuantDequantLinearHelper:
[6] Assertion failed: shiftIsAllZeros(zeroPoint): Non-zero zero point is not supported. Please set kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLAto enable asymmetric quantization if it is on DLA.
import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType
model_fp32 = 'onnx_models/model.onnx'
model_quant = 'onnx_models/model.quant.opt.onnx'
opt = {
"WeightSymmetric": True,
"ActivationSymmetric": True,
}
quantized_model = quantize_dynamic(model_fp32, model_quant, extra_options=opt)
Environment
TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):