diff --git a/assets/contributors.csv b/assets/contributors.csv index 69b9b68be3..b1ed1f8230 100644 --- a/assets/contributors.csv +++ b/assets/contributors.csv @@ -112,3 +112,4 @@ Yahya Abouelseoud,Arm,,,, Steve Suzuki,Arm,,,, Qixiang Xu,Arm,,,, Phalani Paladugu,Arm,phalani-paladugu,phalani-paladugu,, +Richard Burton,Arm,Burton2000,,, \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/model-training-gym/_index.md b/content/learning-paths/mobile-graphics-and-gaming/model-training-gym/_index.md index e106b163bc..f93b9bad7e 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/model-training-gym/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/model-training-gym/_index.md @@ -47,7 +47,7 @@ further_reading: link: https://huggingface.co/Arm/neural-super-sampling type: website - resource: - title: Vulkan ML Sample Learning Path + title: Vulkan Samples Learning Path link: /learning-paths/mobile-graphics-and-gaming/vulkan-ml-sample/ type: learningpath diff --git a/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/1-introduction.md b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/1-introduction.md new file mode 100644 index 0000000000..ea2991bb3e --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/1-introduction.md @@ -0,0 +1,46 @@ +--- +title: Overview +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Quantization with ExecuTorch and the Arm backend + +PTQ and QAT both aim to run your model with quantized operators (typically INT8). The difference is where you pay the cost: PTQ optimizes for speed of iteration, while QAT optimizes for quality and robustness. + +In this Learning Path, you use quantization as part of the ExecuTorch Arm backend. The goal is to export a quantized model that can run on Arm hardware with dedicated neural accelerators (NX). + +To keep the workflow concrete, you start with a complete, runnable CIFAR-10-based example that exports `.vgf` artifacts end to end. After you have a known-good baseline, you can apply the same steps to your own neural network and training code. + +In a nutshell, the Arm backend in ExecuTorch provides an open, standardized, minimal operator set for neural networks operations to be lowered to. It is utilized by Arm platforms and accelerators. Below is an overview of the main components. + +- TOSA (Tensor Operator Set Architecture) provides a standardized operator set for acceleration on Arm platforms. +- The ExecuTorch Arm backend lowers your PyTorch model to TOSA and uses an ahead-of-time (AOT) compilation flow. +- The VGF backend produces a portable artifact you can carry into downstream tools, including `.vgf` files. + +### Post-training quantization (PTQ) + +PTQ keeps training simple. You train your FP32 model as usual, then run a calibration pass using representative inputs to determine quantization parameters (for example, scales). After calibration, you convert the model and export a quantized graph. + +PTQ is a good default when you need a fast iteration loop and you have a calibration set that looks like the actual inference data. For neural networks, PTQ can be good enough for early bring-up, especially when your goal is to validate the export and integration path. Depending on the model and use-case, PTQ can provide good quality results equal to the original floating point graph. + +### Quantization-aware training (QAT) + +QAT simulates quantization effects during training. You prepare the model for QAT, fine-tune with fake-quantization enabled, then convert and export. + +QAT introduces visible drop in model accuracy. For example, this is common for image-to-image tasks because small numeric changes can show up as banding, ringing, or loss of fine detail. + +## How this maps to the Arm backend + +For Arm-based platforms, the workflow stays consistent across models: + +1. Train and evaluate the neural network in PyTorch. +2. Quantize (PTQ or QAT) to reduce runtime cost. +3. Export with ExecuTorch (via TOSA) to generate a `.vgf` artifact. +4. Run the `.vgf` model in your Vulkan-based pipeline. + +In later sections, you will generate the `.vgf` file by using the ExecuTorch Arm backend VGF partitioner. + +With this background, you will now set up a working Python environment and run a baseline export-ready model. diff --git a/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/2-set-up-your-environment.md b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/2-set-up-your-environment.md new file mode 100644 index 0000000000..3d222445df --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/2-set-up-your-environment.md @@ -0,0 +1,86 @@ +--- +title: Set up your environment for ExecuTorch quantization +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Overview + +In this section, you create a Python environment with PyTorch, TorchAO, and ExecuTorch components needed for quantization and `.vgf` export. + +{{% notice Note %}} +If you already use [Neural Graphics Model Gym](/learning-paths/mobile-graphics-and-gaming/model-training-gym), keep that environment and reuse it here. +{{% /notice %}} + +## Create a virtual environment + +Create and activate a virtual environment: + +```bash +python3 -m venv venv +source venv/bin/activate +python -m pip install --upgrade pip +``` + +## Clone the ExecuTorch repository + +In your virtual environment, clone the ExecuTorch repository and run the installation script: + +```bash +git clone https://github.com/pytorch/executorch.git +cd executorch +./install_executorch.sh +``` + +## Run the Arm backend setup script + +From the root of the cloned `executorch` repository, run the Arm backend setup script: + +```bash +./examples/arm/setup.sh \ + --i-agree-to-the-contained-eula \ + --disable-ethos-u-deps \ + --enable-mlsdk-deps +``` + +In the same terminal session, source the generated setup script so the Arm backend tools (including the model converter) are available on your `PATH`: + +```bash +source ./examples/arm/arm-scratch/setup_path.sh +``` + +Verify the model converter is available: + +```bash +command -v model-converter || command -v model_converter +``` + +Verify your imports: + +```python +import torch +import torchvision +import torchao + +import executorch +import executorch.backends.arm +from executorch.backends.arm.vgf.partitioner import VgfPartitioner + +print("torch:", torch.__version__) +print("torchvision:", torchvision.__version__) +print("torchao:", torchao.__version__) +``` + +{{% notice Tip %}} +If `executorch.backends.arm` is missing, you installed an ExecuTorch build without the Arm backend. Use an ExecuTorch build that includes `executorch.backends.arm` and the VGF partitioner. + +If you checked out a specific ExecuTorch branch (for example, `release/1.0`) and you run into version mismatches, check out the main branch of ExecuTorch from the cloned repository and install from source: + +```bash +pip install -e . +``` +{{% /notice %}} + +With your environment set up, you are ready to run PTQ and generate a `.vgf` artifact from a calibrated model. diff --git a/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/3-run-ptq-and-export-vgf.md b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/3-run-ptq-and-export-vgf.md new file mode 100644 index 0000000000..7b4c2f8cf2 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/3-run-ptq-and-export-vgf.md @@ -0,0 +1,299 @@ +--- +title: Apply PTQ and export a quantized VGF model +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Overview + +In this section, you apply post-training quantization (PTQ) to an image-to-image model and export a `.vgf` artifact. + +The default workflow in this Learning Path is end to end: you run a complete CIFAR-10-based example, generate a `.vgf` artifact, and validate that the Arm backend export path works on your machine. + +After that, you take the same PTQ export logic and apply it to your own model and calibration data. + +## Run the end-to-end PTQ example + +Create a file called `quantize_and_export_vgf.py` and add the following code. + +This example uses CIFAR-10 as a convenient image source. It constructs a low-resolution input by downsampling an image, then trains the model to reconstruct the original image. This is a practical proxy for a real neural upscaler. + +```python +import torch +from torch.utils.data import DataLoader +from torch import nn +from torchvision import datasets, transforms + +import torch.nn.functional as F + +from executorch.backends.arm.tosa.specification import TosaSpecification +from executorch.backends.arm.vgf.compile_spec import VgfCompileSpec +from executorch.backends.arm.vgf.partitioner import VgfPartitioner +from executorch.backends.arm.quantizer.arm_quantizer import ( + get_symmetric_quantization_config, + TOSAQuantizer, +) +from executorch.exir import to_edge_transform_and_lower + +from torchao.quantization.pt2e.quantize_pt2e import ( + convert_pt2e, + prepare_pt2e, +) + + +class SmallUpscalerModel(nn.Module): + """Small image-to-image model for upscaling workflows.""" + + def __init__(self): + super().__init__() + self.net = nn.Sequential( + nn.Conv2d(3, 32, kernel_size=3, padding=1), + nn.ReLU(), + nn.Conv2d(32, 32, kernel_size=3, padding=1), + nn.ReLU(), + nn.Conv2d(32, 3, kernel_size=3, padding=1), + ) + + def forward(self, x_lowres): + # Upscale input first, then refine. + x = F.interpolate(x_lowres, scale_factor=2.0, mode="bilinear", align_corners=False) + x = self.net(x) + return x + + +def get_data_loaders(root="./data", batch_size=64): + tfm = transforms.Compose([ + transforms.ToTensor(), + ]) + train_ds = datasets.CIFAR10(root=root, train=True, download=True, transform=tfm) + test_ds = datasets.CIFAR10(root=root, train=False, download=True, transform=tfm) + train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, drop_last=True) + test_loader = DataLoader(test_ds, batch_size=batch_size, shuffle=False) + return train_loader, test_loader + + +def make_lowres_input(x_hr: torch.Tensor): + # Simulate a game render at lower resolution. + return F.interpolate(x_hr, scale_factor=0.5, mode="bilinear", align_corners=False) + + +@torch.no_grad() +def evaluate_psnr(model, loader, device="cpu", max_batches=50): + psnr_sum = 0.0 + count = 0 + + for i, (x_hr, _y) in enumerate(loader): + if max_batches is not None and i >= max_batches: + break + + x_hr = x_hr.to(device) + x_lr = make_lowres_input(x_hr) + pred = model(x_lr) + + mse = F.mse_loss(pred, x_hr) + psnr = 10.0 * torch.log10(1.0 / mse.clamp_min(1e-12)) + + psnr_sum += psnr.item() + count += 1 + + return psnr_sum / max(1, count) + + +def train_model( + model: nn.Module, + train_loader: DataLoader, + test_loader: DataLoader | None = None, + device: str = "cpu", + epochs: int = 1, + lr: float = 1e-3, + log_every: int = 50, +): + model.to(device) + optimizer = torch.optim.AdamW(model.parameters(), lr=lr) + + for epoch in range(epochs): + model.train() + + for step, (x_hr, _y) in enumerate(train_loader): + x_hr = x_hr.to(device) + x_lr = make_lowres_input(x_hr) + + optimizer.zero_grad() + pred = model(x_lr) + loss = F.mse_loss(pred, x_hr) + loss.backward() + optimizer.step() + + if log_every and (step % log_every == 0): + print(f"epoch={epoch+1} step={step} loss={loss.item():.6f}") + + if test_loader is not None: + model.eval() + psnr = evaluate_psnr(model, test_loader, device=device, max_batches=50) + print(f"epoch={epoch+1} end psnr={psnr:.2f} dB") + model.train() + + return model + + +def make_example_input_from_loader(loader, batch_size=1): + x_hr, _y = next(iter(loader)) + + # Use channels_last to reduce transpose noise in the exported graph. + x_hr = x_hr[:batch_size].to("cpu").to(memory_format=torch.channels_last) + x_lr = make_lowres_input(x_hr) + + return (x_lr,) + + +def make_calibration_batches(loader: DataLoader, num_batches: int): + cal = [] + for i, (x_hr, _y) in enumerate(loader): + if i >= num_batches: + break + x_lr = make_lowres_input(x_hr.to("cpu")) + cal.append(x_lr) + + if len(cal) == 0: + raise RuntimeError("Calibration set is empty; check loader/num_batches.") + + return cal + + +def ptq_example(device="cpu"): + """PTQ example: calibrate, convert, then export to VGF.""" + + # 1) Train (or load) a baseline model. + model = SmallUpscalerModel() + train_loader, test_loader = get_data_loaders() + + # Keep training short for the tutorial. + model = train_model(model, train_loader, test_loader, device=device, epochs=1) + + model = model.to("cpu") + model.eval() + + # 2) Export the FP32 model. + example_input = make_example_input_from_loader(train_loader, batch_size=1) + exported_model = torch.export.export(model, example_input, strict=True).module(check_guards=False) + + # 3) Configure the Arm backend quantizer. + tosa_spec = "TOSA-1.00+INT" + quantizer = TOSAQuantizer(TosaSpecification.create_from_string(tosa_spec)) + quantizer.set_global(get_symmetric_quantization_config(is_qat=False)) + + # 4) Prepare for PTQ. + quantized_export_model = prepare_pt2e(exported_model, quantizer) + + # 5) Calibrate with representative inputs. + calibration_loader, _ = get_data_loaders(batch_size=1) + calibration_data = make_calibration_batches(calibration_loader, num_batches=500) + + with torch.no_grad(): + for x_lr in calibration_data: + quantized_export_model(x_lr) + + # 6) Convert to an INT8 model. + quantized_export_model = convert_pt2e(quantized_export_model) + + # 7) Export again so the quantized graph is captured. + aten_dialect = torch.export.export( + quantized_export_model, + args=example_input, + strict=True, + ) + + # 8) Partition and dump a `.vgf` artifact. + compile_spec = VgfCompileSpec(TosaSpecification.create_from_string(tosa_spec)) + vgf_partitioner = VgfPartitioner( + compile_spec.dump_intermediate_artifacts_to("./output/") + ) + + to_edge_transform_and_lower(aten_dialect, partitioner=[vgf_partitioner]) + + +if __name__ == "__main__": + ptq_example(device="cpu") +``` + +## Run the PTQ example + +Run the script: + +```bash +python quantize_and_export_vgf.py +``` + +The output is similar to: + +```output +epoch=1 step=0 loss=0.208134 +epoch=1 step=50 loss=0.053812 +epoch=1 end psnr=19.42 dB +``` + +You should also see files created under `./output/`. The exact filenames depend on your ExecuTorch version and backend configuration, but the directory should include an exported `.vgf` artifact. + +{{% notice Tip %}} +If export fails because of `bilinear` resize, switch the interpolation modes in `make_lowres_input()` and `forward()` to `mode="nearest"`. This keeps the tutorial flow intact while you investigate backend operator support. +{{% /notice %}} + +## Advanced: export PTQ to VGF in your own project + +Once the end-to-end example works, the next step is to apply the same flow to your own model. + +{{% notice Note %}} +If you don't have a workflow or model, you can skip this section and proceed to the next page. +{{% /notice %}} + + +If you already have a trained model, this is the minimal PTQ-to-`.vgf` flow. Start from your FP32 PyTorch module (`model_fp32`), an `example_input` tuple that matches your real inference inputs, and a list of representative `calibration_batches` (typically 100–500 samples). + +```python +import torch + +from executorch.backends.arm.tosa.specification import TosaSpecification +from executorch.backends.arm.vgf.compile_spec import VgfCompileSpec +from executorch.backends.arm.vgf.partitioner import VgfPartitioner +from executorch.backends.arm.quantizer.arm_quantizer import ( + get_symmetric_quantization_config, + TOSAQuantizer, +) +from executorch.exir import to_edge_transform_and_lower + +from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e + + +def export_vgf_int8_ptq( + model_fp32: torch.nn.Module, + example_input: tuple[torch.Tensor, ...], + calibration_batches: list[torch.Tensor], + output_dir: str, + tosa_spec: str = "TOSA-1.00+INT", +): + model_fp32 = model_fp32.to("cpu") + model_fp32.eval() + + exported = torch.export.export(model_fp32, example_input, strict=True).module(check_guards=False) + + quantizer = TOSAQuantizer(TosaSpecification.create_from_string(tosa_spec)) + quantizer.set_global(get_symmetric_quantization_config(is_qat=False)) + q = prepare_pt2e(exported, quantizer) + + with torch.no_grad(): + for x in calibration_batches: + q(x) + + q = convert_pt2e(q) + aten_dialect = torch.export.export(q, args=example_input, strict=True) + + compile_spec = VgfCompileSpec(TosaSpecification.create_from_string(tosa_spec)) + vgf_partitioner = VgfPartitioner(compile_spec.dump_intermediate_artifacts_to(output_dir)) + to_edge_transform_and_lower(aten_dialect, partitioner=[vgf_partitioner]) +``` + +When you use your own model, the most important input is the calibration set. Treat it like a contract: if it does not look like your actual inference data, PTQ quality can degrade. + +Next, you will repeat the flow with QAT. diff --git a/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/4-run-qat-and-export-vgf.md b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/4-run-qat-and-export-vgf.md new file mode 100644 index 0000000000..1825cbac18 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/4-run-qat-and-export-vgf.md @@ -0,0 +1,191 @@ +--- +title: Apply QAT and export a quantized VGF model +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Overview + +In this section, you run quantization-aware training (QAT) so the model learns to be robust to quantization effects. + +The default workflow is to extend the runnable CIFAR-10 based example from the PTQ step so you can compare PTQ and QAT outputs using the same model and data. + +## Extend the sample model with QAT + +In `quantize_and_export_vgf.py`, update your TorchAO imports to include the QAT helpers: + +```python +from torchao.quantization.pt2e.quantize_pt2e import ( + convert_pt2e, + prepare_qat_pt2e, +) +from torchao.quantization.pt2e import ( + move_exported_model_to_eval, + move_exported_model_to_train, +) +``` + +Then add the QAT training function and example entry point. This code prepares the exported model for QAT, fine-tunes for a small number of epochs, converts to INT8, and exports to `.vgf`. + +```python +def train_model_qat( + model: nn.Module, + train_loader: DataLoader, + test_loader: DataLoader | None = None, + device: str = "cpu", + epochs: int = 1, + lr: float = 1e-4, + log_every: int = 50, +): + model.to(device) + optimizer = torch.optim.AdamW(model.parameters(), lr=lr) + + for epoch in range(epochs): + move_exported_model_to_train(model) + + for step, (x_hr, _y) in enumerate(train_loader): + x_hr = x_hr.to(device) + x_lr = make_lowres_input(x_hr) + + optimizer.zero_grad() + pred = model(x_lr) + loss = F.mse_loss(pred, x_hr) + loss.backward() + optimizer.step() + + if log_every and (step % log_every == 0): + print(f"qat epoch={epoch+1} step={step} loss={loss.item():.6f}") + + if test_loader is not None: + move_exported_model_to_eval(model) + psnr = evaluate_psnr(model, test_loader, device=device, max_batches=50) + print(f"qat epoch={epoch+1} end psnr={psnr:.2f} dB") + + return model + + +def qat_example(device="cpu"): + """QAT example: prepare for QAT, fine-tune, convert, then export to VGF.""" + + # 1) Train (or load) a baseline FP32 model. + model = SmallUpscalerModel() + train_loader, test_loader = get_data_loaders() + model = train_model(model, train_loader, test_loader, device=device, epochs=1) + + # 2) Export the FP32 model. + example_input = make_example_input_from_loader(train_loader, batch_size=1) + exported_model = torch.export.export(model, example_input, strict=True).module(check_guards=False) + + # 3) Configure the Arm backend quantizer. + tosa_spec = "TOSA-1.00+INT" + quantizer = TOSAQuantizer(TosaSpecification.create_from_string(tosa_spec)) + quantizer.set_global(get_symmetric_quantization_config(is_qat=True)) + + # 4) Prepare for QAT. + qat_ready_model = prepare_qat_pt2e(exported_model, quantizer) + + # 5) Fine-tune with fake-quant enabled. + qat_ready_model = train_model_qat( + qat_ready_model, + train_loader, + test_loader, + device=device, + epochs=1, + lr=1e-4, + ) + + # 6) Convert to an INT8 model. + qat_ready_model = qat_ready_model.to("cpu") + move_exported_model_to_eval(qat_ready_model) + qat_int8_model = convert_pt2e(qat_ready_model) + + # 7) Export again so the quantized graph is captured. + aten_dialect = torch.export.export( + qat_int8_model, + args=example_input, + strict=True, + ) + + # 8) Partition and dump a `.vgf` artifact. + compile_spec = VgfCompileSpec(TosaSpecification.create_from_string(tosa_spec)) + vgf_partitioner = VgfPartitioner( + compile_spec.dump_intermediate_artifacts_to("./output_qat/") + ) + + to_edge_transform_and_lower(aten_dialect, partitioner=[vgf_partitioner]) +``` + +## Run the QAT example + +Update `__main__` to call `qat_example()` and run the script: + +```bash +python quantize_and_export_vgf.py +``` + +{{% notice Tip %}} +If export fails with a missing model converter error, you likely forgot to source the Arm backend `setup_path.sh` in your current terminal session. +{{% /notice %}} + +As the script runs, you should see QAT training logs (prefixed with `qat`). When export completes, you should see `.vgf` output under `./output_qat/`. + +## Advanced: drop-in QAT export to VGF for your own project + +If PTQ degrades model accuracy too much, QAT is the next step. The workflow is the same as PTQ, but you insert a short fine-tuning phase after you prepare the model for QAT. + +In practice, you already have a training loop for your upscaler. The simplest way to use QAT is to reuse that loop and point it at the QAT-prepared exported model. + +The snippet below gives you a minimal structure you can drop into your project. You supply: + +- `model_fp32`: your baseline FP32 model +- `example_input`: a tuple of input tensors +- `fine_tune_qat`: a function that runs your fine-tuning loop (one or more epochs) + +```python +import torch + +from executorch.backends.arm.tosa.specification import TosaSpecification +from executorch.backends.arm.vgf.compile_spec import VgfCompileSpec +from executorch.backends.arm.vgf.partitioner import VgfPartitioner +from executorch.backends.arm.quantizer.arm_quantizer import ( + get_symmetric_quantization_config, + TOSAQuantizer, +) +from executorch.exir import to_edge_transform_and_lower + +from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_qat_pt2e +from torchao.quantization.pt2e import move_exported_model_to_eval + + +def export_vgf_int8_qat( + model_fp32: torch.nn.Module, + example_input: tuple[torch.Tensor, ...], + fine_tune_qat, + output_dir: str, + tosa_spec: str = "TOSA-1.00+INT", +): + model_fp32 = model_fp32.to("cpu") + model_fp32.eval() + + exported = torch.export.export(model_fp32, example_input, strict=True).module(check_guards=False) + + quantizer = TOSAQuantizer(TosaSpecification.create_from_string(tosa_spec)) + quantizer.set_global(get_symmetric_quantization_config(is_qat=True)) + qat_ready = prepare_qat_pt2e(exported, quantizer) + + # Run your fine-tuning loop here. This is where QAT earns its keep. + fine_tune_qat(qat_ready) + + qat_ready = qat_ready.to("cpu") + move_exported_model_to_eval(qat_ready) + q = convert_pt2e(qat_ready) + aten_dialect = torch.export.export(q, args=example_input, strict=True) + + compile_spec = VgfCompileSpec(TosaSpecification.create_from_string(tosa_spec)) + vgf_partitioner = VgfPartitioner(compile_spec.dump_intermediate_artifacts_to(output_dir)) + to_edge_transform_and_lower(aten_dialect, partitioner=[vgf_partitioner]) +``` + +Next, you will validate your model files by visualizing them in the Model Explorer. diff --git a/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/5-validate-and-choose-a-quantization-strategy.md b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/5-validate-and-choose-a-quantization-strategy.md new file mode 100644 index 0000000000..8c9062c9c6 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/5-validate-and-choose-a-quantization-strategy.md @@ -0,0 +1,41 @@ +--- +title: Inspect the graph with Model Explorer +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +If you use `.vgf` as an intermediate artifact, it helps to inspect the exported graph before you integrate it into your runtime. + +## Install the Model Explorer + +Install and launch Model Explorer with the VGF adapter: + +```bash +pip install vgf-adapter-model-explorer +pip install torch ai-edge-model-explorer +model-explorer --extensions=vgf_adapter_model_explorer +``` + +Open the `.vgf` file from `./output/` and `./output_qat/`. + +When you review the graph, look for unexpected layout conversions (for example, extra transpose operations), operators that you did not intend to run on your GPU path, and model I/O shapes that do not match your integration. + +## Advanced: connect the model to an ML Extensions for Vulkan workflow + +The fastest way to understand the integration constraints is to start from a known-good sample and then replace the model. + +Use the Learning Path [Get started with neural graphics using ML Extensions for Vulkan](/learning-paths/mobile-graphics-and-gaming/vulkan-ml-sample/) and focus on how the sample loads and executes `.vgf` artifacts. This is where you validate assumptions about input and output tensor formats and where any required color-space or layout conversions happen. + +## Wrap-up + +You now have a complete reference workflow for quantizing an image-to-image model with TorchAO and exporting INT8 `.vgf` artifacts using the ExecuTorch Arm backend. You also have a practical baseline you can use to debug export issues before you switch to your production model and data. + +When you move from the CIFAR-10 proxy model to your own model, keep these constraints in mind: + +- Treat calibration data as part of your model contract. If PTQ quality drops, start by fixing the representativeness of calibration inputs. +- Use QAT when PTQ introduces visible artifacts or regressions that matter to your visual quality bar. +- Validate early by inspecting the exported graph so you can catch unexpected layouts, operators, or tensor shapes before runtime integration. + +Continue to the last page to go deeper on further resources. diff --git a/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/_index.md b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/_index.md new file mode 100644 index 0000000000..767628fd22 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/_index.md @@ -0,0 +1,61 @@ +--- +title: Quantize models with ExecuTorch + +draft: true +cascade: + draft: true + +minutes_to_complete: 60 + +who_is_this_for: This is an advanced topic for ML developers who want to reduce latency and memory bandwidth by exporting INT8 models to the `.vgf` file format using the ExecuTorch Arm backend. + +learning_objectives: + - Explain when to use post-training quantization (PTQ) vs quantization-aware training (QAT) + - Prepare and quantize a PyTorch model using TorchAO PT2E quantization APIs + - Export the quantized model to TOSA and generate a model artifact with the ExecuTorch Arm backend + - Validate the exported graph by visualizing it using Google's Model Explorer + +prerequisites: + - Basic PyTorch model training and evaluation experience + - A development machine with Python 3.10+ and PyTorch installed that runs ExecuTorch + +author: +- Richard Burton +- Annie Tallund + +### Tags +skilllevels: Advanced +subjects: ML +armips: + - Mali +tools_software_languages: + - ExecuTorch + - TorchAO + - Vulkan + - TOSA + - NX +operatingsystems: + - Linux + - macOS + - Windows + +further_reading: + - resource: + title: Get started with neural graphics using ML Extensions for Vulkan + link: /learning-paths/mobile-graphics-and-gaming/vulkan-ml-sample/ + type: learningpath + - resource: + title: Neural Graphics Development Kit + link: https://developer.arm.com/mobile-graphics-and-gaming/neural-graphics + type: website + - resource: + title: Arm neural technology in ExecuTorch 1.0 + link: https://developer.arm.com/community/arm-community-blogs/b/ai-blog/posts/arm-neural-technology-in-executorch-1-0 + type: website + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/_next-steps.md b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/quantize-neural-upscaling-models/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +---