TensorRT 2:4 sparsity is not applied with Q/DQ quantization

I’m testing INT8 quantization using  [ASP 2:4 sparse](https://github.com/NVIDIA/apex) model.

- Case A: Export sparse FP32 ONNX  without QuantizeLinear/DequantizeLinear(Q/DQ) and build INT8 model using Polygraphy (with sparsity enabled).
  -> TensorRT layer info indicates sparsity-enabled tactics/kernels are selected for some layers.

- Case B: Starting from the same sparse model, export an ONNX with Q/DQ using pytorch-quantization and build with: trtexec --int8 --sparsity=enable
  -> Layer info still shows HasSparseWeights=1 for some layers, but sparsity-enabled tactics/kernels do not appear to be selected.

Question:
Is this difference between no Q/DQ ONNX vs Q/DQ ONNX expected for 2:4 sparsity?
If not, are there recommended export/build settings for Q/DQ ONNX to enable sparsity tactics ?

Environment: Jetson (JetPack 6.1.2 / TensorRT 8.6.1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT 2:4 sparsity is not applied with Q/DQ quantization #4694

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorRT 2:4 sparsity is not applied with Q/DQ quantization #4694

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions