Skip to content

New LP on PTQ/QAT in ExecuTorch#2889

Open
annietllnd wants to merge 2 commits intoArmDeveloperEcosystem:mainfrom
annietllnd:neural-graphics
Open

New LP on PTQ/QAT in ExecuTorch#2889
annietllnd wants to merge 2 commits intoArmDeveloperEcosystem:mainfrom
annietllnd:neural-graphics

Conversation

@annietllnd
Copy link
Collaborator

Wait for author feedback before merging


## PTQ vs QAT: what changes in practice?

PTQ and QAT both aim to run your model with quantized operators (typically INT8). The difference is where you pay the cost: PTQ optimizes for speed of iteration, while QAT optimizes for quality and robustness.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have the terms written out in full to start with before using the acronyms everywhere.

E.g.
Post-training quantization (PTQ) and Quantization-aware training (QAT)

@@ -0,0 +1,46 @@
---
title: Understanding PTQ and QAT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on some more detail in the title, e.g.

Understanding quantization with ExecuTorch and the Arm backend


In this Learning Path, you use quantization as part of the ExecuTorch Arm backend. The goal is to export a quantized model that can run on Arm hardware with dedicated neural accelerators (NX).

To keep the workflow concrete, you start with a complete, runnable CIFAR-10-based example that exports `.vgf` artifacts end to end. After you have a known-good baseline, you can apply the same steps to your own upscaler model and training loop.
Copy link

@Burton2000 Burton2000 Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to your own neural network and training code.


In a nutshell, the Arm backend in ExecuTorch consists of the following building blocks:

- TOSA (Tensor Operator Set Architecture) provides a standardized operator set for acceleration on Arm platforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provides an open, standardized, minimal operator set for neural networks operations to be lowered to. It is utilized by Arm platforms and accelerators.


PTQ keeps training simple. You train your FP32 model as usual, then run a calibration pass using representative inputs to determine quantization parameters (for example, scales). After calibration, you convert the model and export a quantized graph.

PTQ is a good default when you need a fast iteration loop and you have a calibration set that looks like the actual inference data. For upscalers, PTQ can be good enough for early bring-up, especially when your goal is to validate the export and integration path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For neural networks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add that PTQ, depending on the model and use-case can still provide good quality results equal to the original floating point graph.


QAT simulates quantization effects during training. You prepare the model for QAT, fine-tune with fake-quantization enabled, then convert and export.

QAT is worth the extra effort when PTQ introduces visible artifacts. This is common for image-to-image tasks because small numeric changes can show up as banding, ringing, or loss of fine detail.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduces visible drop in model accuracy. For example, this is common for


For Arm-based platforms, the workflow stays consistent across models:

1. Train and evaluate the upscaler in PyTorch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Train and evaluate the neural network in PyTorch.


1. Train and evaluate the upscaler in PyTorch.
2. Quantize (PTQ or QAT) to reduce runtime cost.
3. Export through TOSA and generate a `.vgf` artifact.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Export with ExecuTorch (via TOSA) to generate a .vgf artifact.


After that, you take the same PTQ export logic and apply it to your own model and calibration data.

## Run the end-to-end PTQ example (CIFAR-10)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove the CIFAR-10 part as this is just the dataset and not the task


## Advanced: drop-in QAT export to VGF for your own project

If PTQ introduces visible artifacts, QAT is the next step. The workflow is the same as PTQ, but you insert a short fine-tuning phase after you prepare the model for QAT.
Copy link

@Burton2000 Burton2000 Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PTQ degrades model accuracy too much,


You now have a complete reference workflow for quantizing an image-to-image model with TorchAO and exporting INT8 `.vgf` artifacts using the ExecuTorch Arm backend. You also have a practical baseline you can use to debug export issues before you switch to your production model and data.

When you move from the CIFAR-10 proxy model to your own upscaler, keep these constraints in mind:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+When you move from the CIFAR-10 proxy model to your own model, keep these constraints in mind:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants