Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions assets/contributors.csv
Original file line number Diff line number Diff line change
Expand Up @@ -112,3 +112,4 @@ Yahya Abouelseoud,Arm,,,,
Steve Suzuki,Arm,,,,
Qixiang Xu,Arm,,,,
Phalani Paladugu,Arm,phalani-paladugu,phalani-paladugu,,
Richard Burton,Arm,Burton2000,,,
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ further_reading:
link: https://huggingface.co/Arm/neural-super-sampling
type: website
- resource:
title: Vulkan ML Sample Learning Path
title: Vulkan Samples Learning Path
link: /learning-paths/mobile-graphics-and-gaming/vulkan-ml-sample/
type: learningpath

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Overview
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Quantization with ExecuTorch and the Arm backend

PTQ and QAT both aim to run your model with quantized operators (typically INT8). The difference is where you pay the cost: PTQ optimizes for speed of iteration, while QAT optimizes for quality and robustness.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have the terms written out in full to start with before using the acronyms everywhere.

E.g.
Post-training quantization (PTQ) and Quantization-aware training (QAT)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done on the starting page for the LP which will be displayed before the introduction, but good catch! In _index.md if you're curious

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice thanks for confirming!


In this Learning Path, you use quantization as part of the ExecuTorch Arm backend. The goal is to export a quantized model that can run on Arm hardware with dedicated neural accelerators (NX).

To keep the workflow concrete, you start with a complete, runnable CIFAR-10-based example that exports `.vgf` artifacts end to end. After you have a known-good baseline, you can apply the same steps to your own neural network and training code.

In a nutshell, the Arm backend in ExecuTorch provides an open, standardized, minimal operator set for neural networks operations to be lowered to. It is utilized by Arm platforms and accelerators. Below is an overview of the main components.

- TOSA (Tensor Operator Set Architecture) provides a standardized operator set for acceleration on Arm platforms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provides an open, standardized, minimal operator set for neural networks operations to be lowered to. It is utilized by Arm platforms and accelerators.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- The ExecuTorch Arm backend lowers your PyTorch model to TOSA and uses an ahead-of-time (AOT) compilation flow.
- The VGF backend produces a portable artifact you can carry into downstream tools, including `.vgf` files.

### Post-training quantization (PTQ)

PTQ keeps training simple. You train your FP32 model as usual, then run a calibration pass using representative inputs to determine quantization parameters (for example, scales). After calibration, you convert the model and export a quantized graph.

PTQ is a good default when you need a fast iteration loop and you have a calibration set that looks like the actual inference data. For neural networks, PTQ can be good enough for early bring-up, especially when your goal is to validate the export and integration path. Depending on the model and use-case, PTQ can provide good quality results equal to the original floating point graph.

### Quantization-aware training (QAT)

QAT simulates quantization effects during training. You prepare the model for QAT, fine-tune with fake-quantization enabled, then convert and export.

QAT introduces visible drop in model accuracy. For example, this is common for image-to-image tasks because small numeric changes can show up as banding, ringing, or loss of fine detail.

## How this maps to the Arm backend

For Arm-based platforms, the workflow stays consistent across models:

1. Train and evaluate the neural network in PyTorch.
2. Quantize (PTQ or QAT) to reduce runtime cost.
3. Export with ExecuTorch (via TOSA) to generate a `.vgf` artifact.
4. Run the `.vgf` model in your Vulkan-based pipeline.

In later sections, you will generate the `.vgf` file by using the ExecuTorch Arm backend VGF partitioner.

With this background, you will now set up a working Python environment and run a baseline export-ready model.
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
title: Set up your environment for ExecuTorch quantization
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Overview

In this section, you create a Python environment with PyTorch, TorchAO, and ExecuTorch components needed for quantization and `.vgf` export.

{{% notice Note %}}
If you already use [Neural Graphics Model Gym](/learning-paths/mobile-graphics-and-gaming/model-training-gym), keep that environment and reuse it here.
{{% /notice %}}

## Create a virtual environment

Create and activate a virtual environment:

```bash
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
```

## Clone the ExecuTorch repository

In your virtual environment, clone the ExecuTorch repository and run the installation script:

```bash
git clone https://github.com/pytorch/executorch.git
cd executorch
./install_executorch.sh
```

## Run the Arm backend setup script

From the root of the cloned `executorch` repository, run the Arm backend setup script:

```bash
./examples/arm/setup.sh \
--i-agree-to-the-contained-eula \
--disable-ethos-u-deps \
--enable-mlsdk-deps
```

In the same terminal session, source the generated setup script so the Arm backend tools (including the model converter) are available on your `PATH`:

```bash
source ./examples/arm/arm-scratch/setup_path.sh
```

Verify the model converter is available:

```bash
command -v model-converter || command -v model_converter
```

Verify your imports:

```python
import torch
import torchvision
import torchao

import executorch
import executorch.backends.arm
from executorch.backends.arm.vgf.partitioner import VgfPartitioner

print("torch:", torch.__version__)
print("torchvision:", torchvision.__version__)
print("torchao:", torchao.__version__)
```

{{% notice Tip %}}
If `executorch.backends.arm` is missing, you installed an ExecuTorch build without the Arm backend. Use an ExecuTorch build that includes `executorch.backends.arm` and the VGF partitioner.

If you checked out a specific ExecuTorch branch (for example, `release/1.0`) and you run into version mismatches, check out the main branch of ExecuTorch from the cloned repository and install from source:

```bash
pip install -e .
```
{{% /notice %}}

With your environment set up, you are ready to run PTQ and generate a `.vgf` artifact from a calibrated model.
Loading