I’m testing INT8 quantization using ASP 2:4 sparse model.
-
Case A: Export sparse FP32 ONNX without QuantizeLinear/DequantizeLinear(Q/DQ) and build INT8 model using Polygraphy (with sparsity enabled).
-> TensorRT layer info indicates sparsity-enabled tactics/kernels are selected for some layers.
-
Case B: Starting from the same sparse model, export an ONNX with Q/DQ using pytorch-quantization and build with: trtexec --int8 --sparsity=enable
-> Layer info still shows HasSparseWeights=1 for some layers, but sparsity-enabled tactics/kernels do not appear to be selected.
Question:
Is this difference between no Q/DQ ONNX vs Q/DQ ONNX expected for 2:4 sparsity?
If not, are there recommended export/build settings for Q/DQ ONNX to enable sparsity tactics ?
Environment: Jetson (JetPack 6.1.2 / TensorRT 8.6.1)
I’m testing INT8 quantization using ASP 2:4 sparse model.
Case A: Export sparse FP32 ONNX without QuantizeLinear/DequantizeLinear(Q/DQ) and build INT8 model using Polygraphy (with sparsity enabled).
-> TensorRT layer info indicates sparsity-enabled tactics/kernels are selected for some layers.
Case B: Starting from the same sparse model, export an ONNX with Q/DQ using pytorch-quantization and build with: trtexec --int8 --sparsity=enable
-> Layer info still shows HasSparseWeights=1 for some layers, but sparsity-enabled tactics/kernels do not appear to be selected.
Question:
Is this difference between no Q/DQ ONNX vs Q/DQ ONNX expected for 2:4 sparsity?
If not, are there recommended export/build settings for Q/DQ ONNX to enable sparsity tactics ?
Environment: Jetson (JetPack 6.1.2 / TensorRT 8.6.1)