New LP on PTQ/QAT in ExecuTorch#2889
New LP on PTQ/QAT in ExecuTorch#2889annietllnd wants to merge 2 commits intoArmDeveloperEcosystem:mainfrom
Conversation
|
|
||
| ## PTQ vs QAT: what changes in practice? | ||
|
|
||
| PTQ and QAT both aim to run your model with quantized operators (typically INT8). The difference is where you pay the cost: PTQ optimizes for speed of iteration, while QAT optimizes for quality and robustness. |
There was a problem hiding this comment.
We should probably have the terms written out in full to start with before using the acronyms everywhere.
E.g.
Post-training quantization (PTQ) and Quantization-aware training (QAT)
| @@ -0,0 +1,46 @@ | |||
| --- | |||
| title: Understanding PTQ and QAT | |||
There was a problem hiding this comment.
Thoughts on some more detail in the title, e.g.
Understanding quantization with ExecuTorch and the Arm backend
|
|
||
| In this Learning Path, you use quantization as part of the ExecuTorch Arm backend. The goal is to export a quantized model that can run on Arm hardware with dedicated neural accelerators (NX). | ||
|
|
||
| To keep the workflow concrete, you start with a complete, runnable CIFAR-10-based example that exports `.vgf` artifacts end to end. After you have a known-good baseline, you can apply the same steps to your own upscaler model and training loop. |
There was a problem hiding this comment.
to your own neural network and training code.
|
|
||
| In a nutshell, the Arm backend in ExecuTorch consists of the following building blocks: | ||
|
|
||
| - TOSA (Tensor Operator Set Architecture) provides a standardized operator set for acceleration on Arm platforms. |
There was a problem hiding this comment.
provides an open, standardized, minimal operator set for neural networks operations to be lowered to. It is utilized by Arm platforms and accelerators.
|
|
||
| PTQ keeps training simple. You train your FP32 model as usual, then run a calibration pass using representative inputs to determine quantization parameters (for example, scales). After calibration, you convert the model and export a quantized graph. | ||
|
|
||
| PTQ is a good default when you need a fast iteration loop and you have a calibration set that looks like the actual inference data. For upscalers, PTQ can be good enough for early bring-up, especially when your goal is to validate the export and integration path. |
There was a problem hiding this comment.
I would also add that PTQ, depending on the model and use-case can still provide good quality results equal to the original floating point graph.
|
|
||
| QAT simulates quantization effects during training. You prepare the model for QAT, fine-tune with fake-quantization enabled, then convert and export. | ||
|
|
||
| QAT is worth the extra effort when PTQ introduces visible artifacts. This is common for image-to-image tasks because small numeric changes can show up as banding, ringing, or loss of fine detail. |
There was a problem hiding this comment.
introduces visible drop in model accuracy. For example, this is common for
|
|
||
| For Arm-based platforms, the workflow stays consistent across models: | ||
|
|
||
| 1. Train and evaluate the upscaler in PyTorch. |
There was a problem hiding this comment.
- Train and evaluate the neural network in PyTorch.
|
|
||
| 1. Train and evaluate the upscaler in PyTorch. | ||
| 2. Quantize (PTQ or QAT) to reduce runtime cost. | ||
| 3. Export through TOSA and generate a `.vgf` artifact. |
There was a problem hiding this comment.
- Export with ExecuTorch (via TOSA) to generate a
.vgfartifact.
|
|
||
| After that, you take the same PTQ export logic and apply it to your own model and calibration data. | ||
|
|
||
| ## Run the end-to-end PTQ example (CIFAR-10) |
There was a problem hiding this comment.
I would just remove the CIFAR-10 part as this is just the dataset and not the task
|
|
||
| ## Advanced: drop-in QAT export to VGF for your own project | ||
|
|
||
| If PTQ introduces visible artifacts, QAT is the next step. The workflow is the same as PTQ, but you insert a short fine-tuning phase after you prepare the model for QAT. |
There was a problem hiding this comment.
If PTQ degrades model accuracy too much,
|
|
||
| You now have a complete reference workflow for quantizing an image-to-image model with TorchAO and exporting INT8 `.vgf` artifacts using the ExecuTorch Arm backend. You also have a practical baseline you can use to debug export issues before you switch to your production model and data. | ||
|
|
||
| When you move from the CIFAR-10 proxy model to your own upscaler, keep these constraints in mind: |
There was a problem hiding this comment.
+When you move from the CIFAR-10 proxy model to your own model, keep these constraints in mind:
Wait for author feedback before merging