-
Notifications
You must be signed in to change notification settings - Fork 263
New LP on PTQ/QAT in ExecuTorch #2889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| --- | ||
| title: Overview | ||
| weight: 2 | ||
|
|
||
| ### FIXED, DO NOT MODIFY | ||
| layout: learningpathall | ||
| --- | ||
|
|
||
| ## Quantization with ExecuTorch and the Arm backend | ||
|
|
||
| PTQ and QAT both aim to run your model with quantized operators (typically INT8). The difference is where you pay the cost: PTQ optimizes for speed of iteration, while QAT optimizes for quality and robustness. | ||
|
|
||
| In this Learning Path, you use quantization as part of the ExecuTorch Arm backend. The goal is to export a quantized model that can run on Arm hardware with dedicated neural accelerators (NX). | ||
|
|
||
| To keep the workflow concrete, you start with a complete, runnable CIFAR-10-based example that exports `.vgf` artifacts end to end. After you have a known-good baseline, you can apply the same steps to your own neural network and training code. | ||
|
|
||
| In a nutshell, the Arm backend in ExecuTorch provides an open, standardized, minimal operator set for neural networks operations to be lowered to. It is utilized by Arm platforms and accelerators. Below is an overview of the main components. | ||
|
|
||
| - TOSA (Tensor Operator Set Architecture) provides a standardized operator set for acceleration on Arm platforms. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. provides an open, standardized, minimal operator set for neural networks operations to be lowered to. It is utilized by Arm platforms and accelerators.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
| - The ExecuTorch Arm backend lowers your PyTorch model to TOSA and uses an ahead-of-time (AOT) compilation flow. | ||
| - The VGF backend produces a portable artifact you can carry into downstream tools, including `.vgf` files. | ||
|
|
||
| ### Post-training quantization (PTQ) | ||
|
|
||
| PTQ keeps training simple. You train your FP32 model as usual, then run a calibration pass using representative inputs to determine quantization parameters (for example, scales). After calibration, you convert the model and export a quantized graph. | ||
|
|
||
| PTQ is a good default when you need a fast iteration loop and you have a calibration set that looks like the actual inference data. For neural networks, PTQ can be good enough for early bring-up, especially when your goal is to validate the export and integration path. Depending on the model and use-case, PTQ can provide good quality results equal to the original floating point graph. | ||
|
|
||
| ### Quantization-aware training (QAT) | ||
|
|
||
| QAT simulates quantization effects during training. You prepare the model for QAT, fine-tune with fake-quantization enabled, then convert and export. | ||
|
|
||
| QAT introduces visible drop in model accuracy. For example, this is common for image-to-image tasks because small numeric changes can show up as banding, ringing, or loss of fine detail. | ||
|
|
||
| ## How this maps to the Arm backend | ||
|
|
||
| For Arm-based platforms, the workflow stays consistent across models: | ||
|
|
||
| 1. Train and evaluate the neural network in PyTorch. | ||
| 2. Quantize (PTQ or QAT) to reduce runtime cost. | ||
| 3. Export with ExecuTorch (via TOSA) to generate a `.vgf` artifact. | ||
| 4. Run the `.vgf` model in your Vulkan-based pipeline. | ||
|
|
||
| In later sections, you will generate the `.vgf` file by using the ExecuTorch Arm backend VGF partitioner. | ||
|
|
||
| With this background, you will now set up a working Python environment and run a baseline export-ready model. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| --- | ||
| title: Set up your environment for ExecuTorch quantization | ||
| weight: 3 | ||
|
|
||
| ### FIXED, DO NOT MODIFY | ||
| layout: learningpathall | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| In this section, you create a Python environment with PyTorch, TorchAO, and ExecuTorch components needed for quantization and `.vgf` export. | ||
|
|
||
| {{% notice Note %}} | ||
| If you already use [Neural Graphics Model Gym](/learning-paths/mobile-graphics-and-gaming/model-training-gym), keep that environment and reuse it here. | ||
| {{% /notice %}} | ||
|
|
||
| ## Create a virtual environment | ||
|
|
||
| Create and activate a virtual environment: | ||
|
|
||
| ```bash | ||
| python3 -m venv venv | ||
| source venv/bin/activate | ||
| python -m pip install --upgrade pip | ||
| ``` | ||
|
|
||
| ## Clone the ExecuTorch repository | ||
|
|
||
| In your virtual environment, clone the ExecuTorch repository and run the installation script: | ||
|
|
||
| ```bash | ||
| git clone https://github.com/pytorch/executorch.git | ||
| cd executorch | ||
| ./install_executorch.sh | ||
| ``` | ||
|
|
||
| ## Run the Arm backend setup script | ||
|
|
||
| From the root of the cloned `executorch` repository, run the Arm backend setup script: | ||
|
|
||
| ```bash | ||
| ./examples/arm/setup.sh \ | ||
| --i-agree-to-the-contained-eula \ | ||
| --disable-ethos-u-deps \ | ||
| --enable-mlsdk-deps | ||
| ``` | ||
|
|
||
| In the same terminal session, source the generated setup script so the Arm backend tools (including the model converter) are available on your `PATH`: | ||
|
|
||
| ```bash | ||
| source ./examples/arm/arm-scratch/setup_path.sh | ||
| ``` | ||
|
|
||
| Verify the model converter is available: | ||
|
|
||
| ```bash | ||
| command -v model-converter || command -v model_converter | ||
| ``` | ||
|
|
||
| Verify your imports: | ||
|
|
||
| ```python | ||
| import torch | ||
| import torchvision | ||
| import torchao | ||
|
|
||
| import executorch | ||
| import executorch.backends.arm | ||
| from executorch.backends.arm.vgf.partitioner import VgfPartitioner | ||
|
|
||
| print("torch:", torch.__version__) | ||
| print("torchvision:", torchvision.__version__) | ||
| print("torchao:", torchao.__version__) | ||
| ``` | ||
|
|
||
| {{% notice Tip %}} | ||
| If `executorch.backends.arm` is missing, you installed an ExecuTorch build without the Arm backend. Use an ExecuTorch build that includes `executorch.backends.arm` and the VGF partitioner. | ||
|
|
||
| If you checked out a specific ExecuTorch branch (for example, `release/1.0`) and you run into version mismatches, check out the main branch of ExecuTorch from the cloned repository and install from source: | ||
|
|
||
| ```bash | ||
| pip install -e . | ||
| ``` | ||
| {{% /notice %}} | ||
|
|
||
| With your environment set up, you are ready to run PTQ and generate a `.vgf` artifact from a calibrated model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have the terms written out in full to start with before using the acronyms everywhere.
E.g.
Post-training quantization (PTQ) and Quantization-aware training (QAT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done on the starting page for the LP which will be displayed before the introduction, but good catch! In _index.md if you're curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah nice thanks for confirming!