diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/01_Fundamentals.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/01_Fundamentals.md
new file mode 100644
index 0000000000..0dcca61c93
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/01_Fundamentals.md
@@ -0,0 +1,103 @@
+---
+# User change
+title: "ONNX Fundamentals"
+
+weight: 2
+
+layout: "learningpathall"
+---
+The goal of this tutorial is to provide developers with a practical, end-to-end pathway for working with Open Neural Network Exchange (ONNX) in real-world scenarios. Starting from the fundamentals, we will build a simple neural network model in Python, export it to the ONNX format, and demonstrate how it can be used for both inference and training on Arm64 platforms. Along the way, we will cover model optimization techniques such as layer fusion, and conclude by deploying the optimized model into a fully functional Android application. By following this series, you will gain not only a solid understanding of ONNX’s philosophy and ecosystem but also the hands-on skills required to integrate ONNX into your own projects from prototyping to deployment.
+
+In this first step, we will introduce the ONNX standard and explain why it has become a cornerstone of modern machine learning workflows. You will learn what ONNX is, how it represents models in a framework-agnostic format, and why this matters for developers targeting different platforms such as desktops, Arm64 devices, or mobile environments. We will also discuss the role of ONNX Runtime as the high-performance engine that brings these models to life, enabling efficient inference and even training across CPUs, GPUs, and specialized accelerators. Finally, we will outline the typical ONNX workflow, from training in frameworks like PyTorch or TensorFlow, through export and optimization, to deployment on edge and Android devices, which we will gradually demonstrate throughout the tutorial.
+
+## What is ONNX
+The ONNX is an open standard for representing machine learning models in a framework-independent format. Instead of being tied to the internal model representation of a specific framework—such as PyTorch, TensorFlow, or scikit-learn—ONNX provides a universal way to describe models using a common set of operators, data types, and computational graphs.
+
+At its core, an ONNX model is a directed acyclic graph (DAG) where nodes represent mathematical operations (e.g., convolution, matrix multiplication, activation functions) and edges represent tensors flowing between these operations. This standardized representation allows models trained in one framework to be exported once and executed anywhere, without requiring the original framework at runtime.
+
+ONNX was originally developed by Microsoft and Facebook to address a growing need in the machine learning community: the ability to move models seamlessly between training environments and deployment targets. Today, it is supported by a wide ecosystem of contributors and hardware vendors, making it the de facto choice for interoperability and cross-platform deployment.
+
+For developers, this means flexibility. You can train your model in PyTorch, export it to ONNX, run it with ONNX Runtime on an Arm64 device such as a Raspberry Pi, and later deploy it inside an Android application without rewriting the model. This portability is the main reason ONNX has become a central building block in modern AI workflows.
+
+A useful way to think of ONNX is to compare it to a PDF for machine learning models. Just as a PDF file ensures that a document looks the same regardless of whether you open it in Adobe Reader, Preview on macOS, or a web browser, ONNX ensures that a machine learning model behaves consistently whether you run it on a server GPU, a Raspberry Pi, or an Android phone. It is this “write once, run anywhere” principle that makes ONNX especially powerful for developers working across diverse hardware platforms.
+
+At the same time, ONNX is not a closed box. Developers can extend the format with custom operators or layers when standard ones are not sufficient. This flexibility makes it possible to inject novel research ideas, proprietary operations, or hardware-accelerated kernels into an ONNX model while still benefiting from the portability of the core standard. In other words, ONNX gives you both consistency across platforms and extensibility for innovation.
+
+## Why ONNX Matters
+Machine learning today is not limited to one framework or one device. A model might be trained in PyTorch on a GPU workstation, tested in TensorFlow on a cloud server, and then finally deployed on an Arm64-based edge device or Android phone. Without a common standard, moving models between these environments would be complex, error-prone, and often impossible. ONNX solves this problem by acting as a universal exchange format, ensuring that models can flow smoothly across the entire development and deployment pipeline.
+
+The main reasons ONNX matters are:
+1. Interoperability – ONNX eliminates framework lock-in. You can train in PyTorch, validate in TensorFlow, and deploy with ONNX Runtime on almost any device, from servers to IoT boards.
+2. Performance – ONNX Runtime includes highly optimized execution backends, supporting hardware acceleration through Arm NEON, CUDA, DirectML, and Android NNAPI. This means the same model can run efficiently across a wide spectrum of hardware.
+3. Portability – Once exported to ONNX, the model can be deployed to Arm64 devices (like Raspberry Pi or AWS Graviton servers) or even embedded in an Android app, without rewriting the code.
+4. Ecosystem – The ONNX Model Zoo provides ready-to-use, pre-trained models for vision, NLP, and speech tasks, making it easy to start from state-of-the-art baselines.
+5. Extensibility – Developers can inject their own layers or custom operators when the built-in operator set is not sufficient, enabling innovation while preserving compatibility.
+
+In short, ONNX matters because it turns the fragmented ML ecosystem into a cohesive workflow, empowering developers to focus on building applications rather than wrestling with conversion scripts or hardware-specific code.
+
+## ONNX Model Structure
+An ONNX model is more than just a collection of weights—it is a complete description of the computation graph that defines how data flows through the network. Understanding this structure is key to seeing why ONNX is both portable and extensible.
+
+At a high level, an ONNX model consists of three main parts:
+1. Graph, which is the heart of the model, represented as a directed acyclic graph (DAG). In this graph nodes correspond to operations (e.g., Conv, Relu, MatMul), edges represent tensors flowing between nodes, carrying input and output data.
+2. Opset (Operator Set), which is a versioned collection of supported operations. Opsets guarantee that models exported with one framework will behave consistently when loaded by another, as long as the same opset version is supported.
+3. Metadata, which contains information about inputs, outputs, tensor shapes, and data types. Metadata can also include custom annotations such as the model author, domain, or framework version.
+
+This design allows ONNX to describe anything from a simple logistic regression to a deep convolutional neural network. For example, a single ONNX graph might define:
+* An input tensor representing a camera image.
+* A sequence of convolution and pooling layers.
+* Fully connected layers leading to classification probabilities.
+* An output tensor with predicted labels.
+
+Because the ONNX format is based on a standardized graph representation, it is both human-readable (with tools like Netron for visualization) and machine-executable (parsed directly by ONNX Runtime or other backends).
+
+Importantly, ONNX models are not static. Developers can insert, remove, or replace nodes in the graph, making it possible to add new layers, prune unnecessary ones, or fuse operations for optimization. This graph-level flexibility is what enables many of the performance improvements we’ll explore later in this tutorial, such as layer fusion and quantization.
+
+## ONNX Runtime
+While ONNX provides a standard way to represent models, it still needs a high-performance engine to actually execute them. This is where ONNX Runtime (ORT) comes in. ONNX Runtime is the official, open-source inference engine for ONNX models, designed to run them quickly and efficiently across a wide variety of hardware.
+
+At its core, ONNX Runtime is optimized for speed, portability, and extensibility:
+1. Cross-platform support. ORT runs on Windows, Linux, and macOS, as well as mobile platforms like Android and iOS. It supports both x86 and Arm64 architectures, making it suitable for deployment from cloud servers to edge devices such as Raspberry Pi boards and smartphones.
+
+2. Hardware acceleration. ORT integrates with a wide range of execution providers (EPs) that tap into hardware capabilities:
+* Arm Kleidi kernels accelerated with Arm NEON, SVE2, and SME2 instructions for efficient CPU execution on Arm64.
+* CUDA for NVIDIA GPUs.
+* DirectML for Windows.
+* NNAPI on Android, enabling direct access to mobile accelerators (DSPs, NPUs).
+
+3. Inference and training. ONNX Runtime also supports training and fine-tuning, making it possible to use the same runtime across the entire ML lifecycle.
+
+4. Optimization built in. ORT can automatically apply graph optimizations such as constant folding, operator fusion, or memory layout changes to squeeze more performance out of your model.
+
+For developers, this means you can take a model trained in PyTorch, export it to ONNX, and then run it with ONNX Runtime on virtually any device—without worrying about the underlying hardware differences. The runtime abstracts away the complexity, choosing the best available execution provider for your environment.
+
+This flexibility makes ONNX Runtime a powerful bridge between training frameworks and deployment targets, and it is the key technology that allows ONNX models to run effectively on Arm64 platforms and Android devices.
+
+## How ONNX Fits into the Workflow
+
+One of the biggest advantages of ONNX is how naturally it integrates into a developer’s machine learning workflow. Instead of locking you into a single framework from training to deployment, ONNX provides a bridge that connects different stages of the ML lifecycle.
+
+A typical ONNX workflow looks like this:
+1. Train the model. You first use your preferred framework (e.g., PyTorch, TensorFlow, or scikit-learn) to design and train a model. At this stage, you benefit from the flexibility and ecosystem of the framework of your choice.
+2. Export to ONNX. Once trained, the model is exported into the ONNX format using built-in converters (such as torch.onnx.export for PyTorch). This produces a portable .onnx file describing the network architecture, weights, and metadata.
+3. Run inference with ONNX Runtime. The ONNX model can now be executed on different devices using ONNX Runtime. On Arm64 hardware, ONNX Runtime can take advantage of Arm Kleidi kernels accelerated with NEON, SVE2, and SME2 instructions, while on Android devices it can leverage NNAPI to access mobile accelerators (where available).
+4. Optimize the model. Apply graph optimizations like layer fusion, constant folding, or quantization to improve performance and reduce memory usage, making the model more suitable for edge and mobile deployments.
+5. Deploy. Finally, the optimized ONNX model is packaged into its target environment. This could be an Arm64-based embedded system (e.g., Raspberry Pi), a server powered by Arm CPUs (e.g., AWS Graviton), or an Android application distributed via the Play Store.
+
+This modularity means developers are free to mix and match the best tools for each stage: train in PyTorch, optimize with ONNX Runtime, and deploy to Android—all without rewriting the model. By decoupling training from inference, ONNX enables efficient workflows that span from research experiments to production-grade applications.
+
+## Example Use Cases
+ONNX is already widely adopted in real-world applications where portability and performance are critical. A few common examples include:
+1. Computer Vision at the Edge – Running an object detection model (e.g., YOLOv5 exported to ONNX) on a Raspberry Pi 4 or NVIDIA Jetson, enabling low-cost cameras to detect people, vehicles, or defects in real time.
+2. Mobile Applications – Deploying face recognition or image classification models inside an Android app using ONNX Runtime Mobile, with NNAPI acceleration for efficient on-device inference.
+3. Natural Language Processing (NLP) – Running BERT-based models on Arm64 cloud servers (like AWS Graviton) to provide fast, low-cost inference for chatbots and translation services.
+4. Healthcare Devices – Using ONNX to integrate ML models into portable diagnostic tools or wearable sensors, where Arm64 processors dominate due to their low power consumption.
+5. Cross-platform Research to Production – Training experimental architectures in PyTorch, exporting them to ONNX, and validating them across different backends to ensure consistent performance.
+6. AI Accelerator Integration – ONNX is especially useful for hardware vendors building custom AI accelerators. Since accelerators often cannot support the full range of ML operators, ONNX’s extensible operator model allows manufacturers to plug in custom kernels where hardware acceleration is available, while gracefully falling back to the standard runtime for unsupported ops. This makes it easier to adopt new hardware without rewriting entire models.
+
+## Summary
+In this section, we introduced ONNX as an open standard for representing machine learning models across frameworks and platforms. We explored its model structure—graphs, opsets, and metadata—and explained the role of ONNX Runtime as the high-performance execution engine. We also showed how ONNX fits naturally into the ML workflow: from training in PyTorch or TensorFlow, to exporting and optimizing the model, and finally deploying it on Arm64 or Android devices.
+
+A useful way to think of ONNX is as the PDF of machine learning models—a universal, consistent format that looks the same no matter where you open it, but with the added flexibility to inject your own layers and optimizations.
+
+Beyond portability for developers, ONNX is also valuable for hardware and AI-accelerator builders. Because accelerators often cannot support every possible ML operator, ONNX’s extensible operator model allows manufacturers to seamlessly integrate custom kernels where acceleration is available, while relying on the runtime for unsupported operations. This combination of consistency, flexibility, and extensibility makes ONNX a cornerstone technology for both AI application developers and hardware vendors.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/02_Setup.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/02_Setup.md
new file mode 100644
index 0000000000..fe9179c87c
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/02_Setup.md
@@ -0,0 +1,130 @@
+---
+# User change
+title: "Environment Setup"
+
+weight: 3
+
+layout: "learningpathall"
+---
+
+## Objective
+This step gets you ready to build, export, run, and optimize ONNX models on Arm64. You’ll set up Python, install ONNX & ONNX Runtime, confirm hardware-backed execution providers.
+
+## Choosing the hardware
+You can choose a variety of hardware, including:
+* Edge boards (Linux/Arm64) - Raspberry Pi 4/5 (64-bit OS), Jetson (Arm64 CPU; GPU via CUDA if using NVIDIA stack), Arm servers (e.g., AWS Graviton).
+* Apple Silicon (macOS/Arm64) - Great for development, deploy to Arm64 Linux later.
+* Windows on Arm - Dev/test on WoA, deploy to Linux Arm64 for production if desired.
+
+The nice thing about ONNX is that the **same model file** can run across all of these, so your setup is flexible.
+
+## Install Python
+Depending on the hardware you use you follow different installation paths
+
+1. Linux (Arm64). In the console type:
+```console
+sudo apt update
+sudo apt install -y python3 python3-venv python3-pip build-essential libopenblas-dev
+```
+
+2. macOS (Apple Sillicon):
+```console
+brew install python
+```
+
+3. Windows on Arm:
+* Install Python 3.10+ from python.org (Arm64 build).
+* Ensure pip is on PATH.
+
+After installing Python, open a terminal or console, create a clean virtual environment, and update pip and wheel:
+
+```console
+python3 -m venv .venv
+source .venv/bin/activate # on Windows use: .venv\Scripts\activate
+python -m pip install --upgrade pip wheel
+```
+
+Using a virtual environment keeps dependencies isolated and avoids conflicts with system-wide Python packages.
+
+## Install Core Packages
+Start by installing the minimal stack:
+```console
+pip install onnx onnxruntime onnxscript netron numpy
+```
+The above will install the following:
+* onnx – core library for loading/saving ONNX models.
+* onnxruntime – high-performance runtime to execute models.
+* onnxscript – required for the new Dynamo-based exporter.
+* netron – tool for visualizing ONNX models.
+* numpy – used for tensor manipulation.
+
+Now, install PyTorch (we’ll use it later to build and export a sample model):
+
+```console
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
+```
+
+## Verify the installation
+Let’s verify everything works end-to-end by training a toy network and exporting it to ONNX.
+
+Create a new file 01_Init.py and add the following code
+
+```python
+import torch, torch.nn as nn
+import onnx, onnxruntime as ort
+import numpy as np
+
+class SmallNet(nn.Module):
+ def __init__(self):
+ super().__init__()
+ self.seq = nn.Sequential(
+ nn.Conv2d(1, 8, 3, padding=1),
+ nn.ReLU(),
+ nn.AdaptiveAvgPool2d((1,1)),
+ nn.Flatten(),
+ nn.Linear(8, 10)
+ )
+ def forward(self, x): return self.seq(x)
+
+m = SmallNet().eval()
+dummy = torch.randn(1, 1, 28, 28)
+
+torch.onnx.export(
+ m, dummy, "smallnet.onnx",
+ input_names=["input"], output_names=["logits"],
+ opset_version=19,
+ do_constant_folding=True,
+ keep_initializers_as_inputs=False,
+ dynamo=True
+)
+
+# Quick sanity run
+sess = ort.InferenceSession("smallnet.onnx", providers=["CPUExecutionProvider"])
+out = sess.run(["logits"], {"input": dummy.numpy()})[0]
+print("Output shape:", out.shape, "Providers:", sess.get_providers())
+```
+
+Then, run it as follows
+
+```console
+python3 01_Init.py
+```
+
+You should see the following output:
+```output
+python3 01_Init.py
+[torch.onnx] Obtain model graph for `SmallNet([...]` with `torch.export.export(..., strict=False)`...
+[torch.onnx] Obtain model graph for `SmallNet([...]` with `torch.export.export(..., strict=False)`... ✅
+[torch.onnx] Run decomposition...
+[torch.onnx] Run decomposition... ✅
+[torch.onnx] Translate the graph into ONNX...
+[torch.onnx] Translate the graph into ONNX... ✅
+Output shape: (1, 10) Providers: ['CPUExecutionProvider']
+```
+
+The 01_Init.py script serves as a quick end-to-end validation of your ONNX environment. It defines a very small convolutional neural network (SmallNet) in PyTorch, which consists of a convolution layer, activation function, pooling, flattening, and a final linear layer that outputs 10 logits. Instead of training the model, we simply run it in evaluation mode on a random input tensor to make sure the graph structure works. This model is then exported to the ONNX format using PyTorch’s new Dynamo-based exporter, producing a portable smallnet.onnx file.
+
+After export, the script immediately loads the ONNX model with ONNX Runtime and executes a forward pass using the CPU execution provider. This verifies that the installation of ONNX, ONNX Runtime, and PyTorch is correct and that models can flow seamlessly from definition to inference. By printing the output tensor’s shape and the active execution provider, the script demonstrates that the toolchain is fully functional on your Arm64 device, giving you a solid baseline before moving on to more advanced models and optimizations.
+
+## Summary
+You now have a fully functional ONNX development environment on Arm64. Python and all required packages are installed, and you successfully exported a small PyTorch model to ONNX using the new Dynamo exporter, ensuring forward compatibility. Running the model with ONNX Runtime confirmed that inference works end-to-end with the CPU execution provider, proving that your toolchain is correctly configured. With this foundation in place, the next step is to build and export a more complete model and run it on Arm64 hardware to establish baseline performance before applying optimizations.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/03_PreparingData.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/03_PreparingData.md
new file mode 100644
index 0000000000..b5bea9f3b2
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/03_PreparingData.md
@@ -0,0 +1,261 @@
+---
+# User change
+title: "Preparing a Synthetic Sudoku Digit Dataset"
+
+weight: 4
+
+layout: "learningpathall"
+---
+
+## Big picture
+Our end goal is a camera-to-solution Sudoku app that runs efficiently on Arm64 devices (e.g., Raspberry Pi or Android phones). ONNX is the glue: we’ll train the digit recognizer in PyTorch, export it to ONNX, and run it anywhere with ONNX Runtime (CPU EP on edge devices, NNAPI EP on Android). Everything around the model—grid detection, perspective rectification, and solving—stays deterministic and lightweight.
+
+## Objective
+In this step, we will generate a custom dataset of Sudoku puzzles and their digit crops, which we’ll use to train a digit recognition model. Starting from a Hugging Face parquet dataset that provides paired puzzle/solution strings, we transform raw boards into realistic, book-style Sudoku pages, apply camera-like augmentations to mimic mobile captures, and automatically slice each page into 81 labeled cell images. This yields a large, diverse, perfectly labeled set of digits (0–9 with 0 = blank) without manual annotation. By the end, you’ll have a structured dataset ready to train a lightweight model in the next section.
+
+## Why Synthetic Generation?
+When building a Sudoku digit recognizer, the hardest part is obtaining a well-labeled dataset that matches real capture conditions. MNIST contains handwritten digits, which differ from printed, grid-aligned Sudoku digits; relying on it alone hurts real-world performance.
+
+By generating synthetic Sudoku pages directly from the parquet dataset, we get:
+1. Perfect labeling. Since the puzzle content is known, every cropped cell automatically comes with the correct label (digit or blank), eliminating manual annotation.
+2. Control over style. We can render Sudoku pages to look like those in printed books, with realistic fonts, grid lines, and difficulty levels controlled by how many cells are left blank.
+3. Robustness through augmentation: By applying perspective warps, blur, noise, and lighting variations, we simulate how a smartphone camera might capture a Sudoku page, improving the model’s ability to handle real-world photos.
+4. Scalability. With millions of Sudoku solutions available, we can easily generate tens of thousands of training samples in minutes, ensuring a dataset that is both large and diverse.
+
+This synthetic data generation strategy allows us to create a custom-fit dataset for our Sudoku digit recognition problem, bridging the gap between clean digital puzzles and noisy real-world inputs.
+
+## What we’ll produce
+By the end of this step, you will have two complementary datasets:
+1. Digit crops for training the classifier. A folder tree structured for torchvision.datasets.ImageFolder, containing tens of thousands of labeled 28×28 images of Sudoku digits (0–9, with 0 meaning blank):
+
+```console
+data/
+ train/
+ 0/....png (blank)
+ 1/....png
+ ...
+ 9/....png
+ val/
+ 0/....png
+ ...
+ 9/....png
+```
+
+These will be used in next step to train a lightweight model for digit recognition.
+
+2. Rendered Sudoku grids for camera simulation. Full-page Sudoku images (both clean book-style and augmented camera-like versions) stored in:
+```console
+data/
+ grids/
+ train/
+ 000001_clean.png
+ 000001_cam.png
+ ...
+ val/
+ ...
+```
+
+These grid images allow us to later test the end-to-end pipeline: detect the board with OpenCV, rectify perspective, classify each cell using the ONNX digit recognizer, and then solve the Sudoku puzzle.
+
+Together, these datasets provide both the micro-level data needed to train the digit recognizer and the macro-level data to simulate the camera pipeline for testing and deployment.
+
+## Implementation
+Start by creating a new file 02_PrepareData.py and modify it as follows:
+```python
+import os, random, pathlib
+import numpy as np
+import cv2 as cv
+import pandas as pd
+from tqdm import tqdm
+
+random.seed(0)
+
+# Parameters
+PARQUET_PATH = "train_1.parquet" # path to your downloaded HF Parquet
+OUT_DIR = pathlib.Path("data")
+N_TRAIN = 1000 # how many puzzles to render for training
+N_VAL = 100 # how many for validation
+IMG_H, IMG_W = 1200, 800 # page size (portrait-ish)
+GRID_MARGIN = 60 # outer margin, px
+CELL_SIZE = 28 # output crop size for classifier (MNIST-like)
+FONT = cv.FONT_HERSHEY_SIMPLEX
+#
+
+def str_to_grid(s: str):
+ """81-char '012345678' string -> 9x9 list of ints."""
+ s = s.strip()
+ assert len(s) == 81, f"bad length: {len(s)}"
+ return [[int(s[9*r+c]) for c in range(9)] for r in range(9)]
+
+def load_puzzles(parquet_path, n_train, n_val):
+ """Load puzzles/solutions; return two lists of 9x9 int grids for train/val."""
+ df = pd.read_parquet(parquet_path, engine="pyarrow")
+ # Shuffle reproducibly
+ df = df.sample(frac=1.0, random_state=0).reset_index(drop=True)
+ # Keep only needed columns if present
+ need_cols = [c for c in ["puzzle", "solution"] if c in df.columns]
+ if not need_cols or "puzzle" not in need_cols:
+ raise ValueError(f"Expected 'puzzle' (and optionally 'solution') columns; got: {list(df.columns)}")
+
+ # Slice train/val partitions
+ df_train = df.iloc[:n_train]
+ df_val = df.iloc[n_train:n_train+n_val]
+
+ puzzles_train = [str_to_grid(p) for p in df_train["puzzle"].astype(str)]
+ puzzles_val = [str_to_grid(p) for p in df_val["puzzle"].astype(str)]
+
+ # Solutions are optional (useful later for solver validation)
+ solutions_train = [str_to_grid(s) for s in df_train["solution"].astype(str)] if "solution" in df_train else None
+ solutions_val = [str_to_grid(s) for s in df_val["solution"].astype(str)] if "solution" in df_val else None
+
+ return (puzzles_train, solutions_train), (puzzles_val, solutions_val)
+
+def draw_grid(img, size=9, margin=GRID_MARGIN):
+ H, W = img.shape[:2]
+ step = (min(H, W) - 2*margin) // size
+ x0 = (W - size*step) // 2
+ y0 = (H - size*step) // 2
+ for i in range(size+1):
+ thickness = 3 if i % 3 == 0 else 1
+ # vertical
+ cv.line(img, (x0 + i*step, y0), (x0 + i*step, y0 + size*step), (0, 0, 0), thickness)
+ # horizontal
+ cv.line(img, (x0, y0 + i*step), (x0 + size*step, y0 + i*step), (0, 0, 0), thickness)
+ return (x0, y0, step)
+
+def put_digit(img, r, c, d, x0, y0, step):
+ if d == 0:
+ return # blank cell
+ text = str(d)
+ scale = step / 60.0
+ thickness = 2
+ (tw, th), base = cv.getTextSize(text, FONT, scale, thickness)
+ cx = x0 + c*step + (step - tw)//2
+ cy = y0 + r*step + (step + th)//2 - th//4
+ cv.putText(img, text, (cx, cy), FONT, scale, (0, 0, 0), thickness, cv.LINE_AA)
+
+def render_page(puzzle9x9):
+ page = np.full((IMG_H, IMG_W, 3), 255, np.uint8)
+ x0, y0, step = draw_grid(page, 9, GRID_MARGIN)
+ for r in range(9):
+ for c in range(9):
+ put_digit(page, r, c, puzzle9x9[r][c], x0, y0, step)
+ return page, (x0, y0, step)
+
+def aug_camera(img):
+ """Light camera-like augmentation: perspective jitter + optional Gaussian blur."""
+ H, W = img.shape[:2]
+ def jitter(pt, s=20):
+ return (pt[0] + random.randint(-s, s), pt[1] + random.randint(-s, s))
+ src = np.float32([(0, 0), (W, 0), (W, H), (0, H)])
+ dst = np.float32([jitter((0,0)), jitter((W,0)), jitter((W,H)), jitter((0,H))])
+ M = cv.getPerspectiveTransform(src, dst)
+ warped = cv.warpPerspective(img, M, (W, H), flags=cv.INTER_LINEAR, borderValue=(220, 220, 220))
+ if random.random() < 0.5:
+ k = random.choice([1, 2])
+ warped = cv.GaussianBlur(warped, (2*k+1, 2*k+1), 0)
+ return warped
+
+def ensure_dirs(split):
+ for cls in range(10): # 0..9 (0 == blank)
+ (OUT_DIR / split / str(cls)).mkdir(parents=True, exist_ok=True)
+
+def save_crops(page, geom, puzzle9x9, split, base_id):
+ x0, y0, step = geom
+ idx = 0
+ for r in range(9):
+ for c in range(9):
+ x1, y1 = x0 + c*step, y0 + r*step
+ roi = page[y1:y1+step, x1:x1+step]
+ g = cv.cvtColor(roi, cv.COLOR_BGR2GRAY)
+ g = cv.resize(g, (CELL_SIZE, CELL_SIZE), interpolation=cv.INTER_AREA)
+ label = puzzle9x9[r][c] # 0 for blank, 1..9 digits
+ out_path = OUT_DIR / split / str(label) / f"{base_id}_{idx:02d}.png"
+ cv.imwrite(str(out_path), g)
+ idx += 1
+
+def process_split(puzzles, split_name, n_limit):
+ ensure_dirs(split_name)
+ grid_dir = OUT_DIR / "grids" / split_name
+ grid_dir.mkdir(parents=True, exist_ok=True)
+
+ N = min(n_limit, len(puzzles))
+ for i in tqdm(range(N), desc=f"render {split_name}"):
+ puzzle = puzzles[i]
+
+ # Clean page
+ page, geom = render_page(puzzle)
+ save_crops(page, geom, puzzle, split_name, base_id=f"{i:06d}_clean")
+ cv.imwrite(str(grid_dir / f"{i:06d}_clean.png"), page)
+
+ # Camera-like
+ warped = aug_camera(page)
+ save_crops(warped, geom, puzzle, split_name, base_id=f"{i:06d}_cam")
+ cv.imwrite(str(grid_dir / f"{i:06d}_cam.png"), warped)
+
+def main():
+ (p_train, _s_train), (p_val, _s_val) = load_puzzles(PARQUET_PATH, N_TRAIN, N_VAL)
+ process_split(p_train, "train", N_TRAIN)
+ process_split(p_val, "val", N_VAL)
+ print("Done. Output under:", OUT_DIR.resolve())
+
+if __name__ == "__main__":
+ main()
+```
+
+At the top, you set basic knobs for the generator: where to read the Parquet file, where to write outputs, how many puzzles to render for train/val, page size, grid margin, crop size, and the OpenCV font. Tweaking these lets you control dataset scale, visual style, and classifier input size (e.g., CELL_SIZE=32 if you want a slightly larger digit crop).
+
+The method str_to_grid(s) converts an 81-character Sudoku string into a 9×9 list of integers. Each character represents a cell: 0 is blank, 1–9 are digits. This is the canonical internal representation used throughout the script.
+
+Then, we have load_puzzles(parquet_path, n_train, n_val), which loads the dataset from Parquet, shuffles it deterministically, and slices it into train/val partitions. It returns the puzzles (and, if present, solutions) as 9×9 integer grids. In this step we only need puzzle for rendering and labeling digit crops (blanks included); solution is useful later for solver validation.
+
+Subsequently, draw_grid(img, size=9, margin=GRID_MARGIN) draws a Sudoku grid on a blank page image. It computes the step size from the page dimensions and margin, then draws both thin inner lines and thick 3×3 box boundaries. It returns the top-left corner (x0, y0) and the cell size (step), which are reused to place digits and to locate each cell for cropping.
+
+Next, put_digit(img, r, c, d, x0, y0, step) renders a single digit d at row r, column c inside the grid. The text is centered in the cell using the font metrics; if d == 0, it leaves the cell blank. This mirrors printed-book Sudoku styling so our crops look realistic.
+
+Another method, render_page(puzzle9x9) builds a complete “book-style” Sudoku page: creates a white canvas, draws the grid, loops over all 81 cells, and writes digits using put_digit. It returns the page plus the grid geometry (x0, y0, step) for subsequent cropping.
+
+A method aug_camera(img) applies a light, camera-like augmentation to mimic smartphone captures: a small perspective warp (random corner jitter) and optional Gaussian blur. The warp uses a light gray border fill so any exposed areas look like paper rather than colored artifacts. This produces a second version of each page that’s closer to real-world inputs.
+
+Afterward, ensure_dirs(split) makes the class directories for a given split (train or val) so that crops can be saved in data/{split}/{class}/.... The classes are 0..9 with 0 = blank.
+
+A method save_crops(page, geom, puzzle9x9, split, base_id) slices the page into 81 cell crops using the grid geometry, converts each crop to grayscale, resizes it to CELL_SIZE × CELL_SIZE, and saves it into the appropriate class directory based on the puzzle’s value at that cell (0..9). Using the puzzle for labels ensures we learn to recognize blanks as well as digits.
+
+Then, process_split(puzzles, split_name, n_limit) is the workhorse for each partition. For each puzzle, it (1) renders a clean page, saves its 81 crops, and writes the full page under data/grids/{split}; then (2) generates an augmented “camera-like” version and saves its crops and full page too. This gives you both micro-level training data (crops) and macro-level test images (full grids) for the later camera pipeline.
+
+Finally, main() loads train/val puzzles from Parquet and calls process_split for each. When it finishes, you’ll have:
+```console
+data/
+ train/
+ 0/… 1/… … 9/…
+ val/
+ 0/… … 9/…
+ grids/
+ train/ (..._clean.png, ..._cam.png)
+ val/ (..._clean.png, ..._cam.png)
+```
+
+## Launching instructions
+1. Install dependencies (inside your virtual env):
+```console
+pip install pandas pyarrow opencv-python tqdm numpy
+```
+
+2. Place the Parquet file (e.g., train_1.parquet) next to the script or update PARQUET_PATH accordingly. Here we used the file from [this location](https://huggingface.co/datasets/Ritvik19/Sudoku-Dataset/blob/main/train_1.parquet).
+
+3. Run the generator
+```console
+python3 02_PrepareData.py
+```
+
+4. Inspect outputs:
+* Digit crops live under data/train/{0..9}/ and data/val/{0..9}/.
+* Full-page grids (clean + camera-like) live under data/grids/train/ and data/grids/val/.
+
+Tips
+* Start small (N_TRAIN=1000, N_VAL=100) to verify everything, then scale up.
+* If you want larger inputs for the classifier, increase CELL_SIZE to 32 or 40.
+* To make augmentation a bit stronger (more realistic), slightly increase the perspective jitter in aug_camera, add brightness/contrast jitter, or a faint gradient shadow overlay.
+
+## Summary
+After running this step you’ll have a robust, labeled, Sudoku-specific dataset: thousands of digit crops (including blanks) for training and realistic full-page grids for pipeline testing. You’re ready for the next step—training the digit recognizer and exporting it to ONNX.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/04_Training.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/04_Training.md
new file mode 100644
index 0000000000..332db92273
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/04_Training.md
@@ -0,0 +1,238 @@
+---
+# User change
+title: "Train the Digit Recognizer"
+
+weight: 5
+
+layout: "learningpathall"
+---
+
+## Objective ##
+We will now train a small CNN to classify Sudoku cell crops into 10 classes (0=blank, 1..9=digit), verify accuracy, then export the model to ONNX using the Dynamo exporter and sanity-check parity with ONNX Runtime. This gives us a portable model ready for Arm64 inference and later Android deployment.
+
+## Creating a model
+We use a tiny convolutional neural network (CNN) called DigitNet, designed to be both fast (so it runs efficiently on Arm64 and mobile) and accurate enough for recognizing 28×28 grayscale crops of Sudoku digits. It expects 1 input channel (in_channels=1) because we forced grayscale in the preprocessing step.
+
+We start by creating a new file digitnet_model.py and defining the DigitNet class:
+```python
+import torch
+import torch.nn as nn
+
+class DigitNet(nn.Module):
+ """
+ Tiny CNN for Sudoku digit classification.
+ Classes: 0..9 where 0 = blank.
+ Input: (N,1,H,W) grayscale (default 28x28).
+ """
+ def __init__(self, num_classes: int = 10):
+ super().__init__()
+ self.net = nn.Sequential(
+ nn.Conv2d(1, 16, 3, padding=1), nn.ReLU(),
+ nn.MaxPool2d(2),
+ nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(),
+ nn.AdaptiveAvgPool2d((1,1)),
+ nn.Flatten(),
+ nn.Linear(32, num_classes),
+ )
+
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
+ return self.net(x)
+```
+
+We use a very compact convolutional neural network (CNN), which we call DigitNet, to recognize Sudoku digits. The goal is to have a model that is simple enough to run efficiently on Arm64 and mobile devices, but still powerful enough to tell apart the ten classes we care about (0 for blank, and digits 1 through 9).
+
+The network expects each input to be a 28×28 grayscale crop, so it begins with a convolution layer that has one input channel and sixteen filters. This first convolution is responsible for learning very low-level patterns such as strokes or edges. Immediately after, a ReLU activation introduces non-linearity, which allows the network to combine those simple features into more expressive ones. A max-pooling layer then reduces the spatial resolution by half, making the representation more compact and less sensitive to small translations.
+
+At this point, the feature maps are passed through a second convolutional layer with thirty-two filters. This stage learns richer patterns, for example combinations of edges that form loops or intersections that distinguish an “8” from a “0” or a “6”. Another ReLU activation adds the necessary non-linearity to these higher-level features.
+
+Instead of flattening the entire feature map, we apply an adaptive average pooling operation that squeezes each of the thirty-two channels down to a single number. This effectively summarizes the information across the whole image and ensures the model produces a fixed-length representation regardless of the exact input size. After pooling, the features are flattened into a one-dimensional vector.
+
+The final step is a fully connected layer that maps the thirty-two features to ten output values, one for each class. These values are raw scores (logits) that indicate how strongly the model associates the input crop with each digit. During training, a cross-entropy loss will turn these logits into probabilities and guide the model to adjust its weights.
+
+In practice, this means that when you feed in a batch of grayscale Sudoku cells of shape [N, 1, 28, 28], DigitNet transforms them step by step into a batch of [N, 10] outputs, where each row contains the scores for the ten possible classes. Despite its simplicity, this small CNN strikes a balance between speed and accuracy that makes it ideal for Sudoku digit recognition on resource-constrained devices.
+
+## Training a model
+We will now prepare the self-containing script that trains the above model on the data prepared earlier. Start by creating the new file 03_Training.py and modify it as follows:
+```python
+import os, random, numpy as np
+import torch as tr
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.utils.data import DataLoader
+from torchvision import datasets, transforms
+from tqdm import tqdm
+from torch.onnx import dynamo_export
+from torch.export import Dim
+import onnxruntime as ort
+
+from digitnet_model import DigitNet
+
+# Configuration
+random.seed(0); np.random.seed(0); tr.manual_seed(0)
+DEVICE = "cpu" # keep CPU for portability
+DATA_DIR = "data" # data/train/0..9, data/val/0..9
+ARTI_DIR = "artifacts"
+os.makedirs(ARTI_DIR, exist_ok=True)
+
+BATCH = 256
+EPOCHS = 10
+LR = 1e-3
+WEIGHT_DECAY = 1e-4
+LABEL_SMOOTH = 0.05
+
+# Datasets (force grayscale to match model)
+tfm_train = transforms.Compose([
+ transforms.Grayscale(num_output_channels=1), # force 1-channel input
+ transforms.ToTensor(),
+ transforms.Normalize((0.5,), (0.5,)),
+ transforms.RandomApply([transforms.GaussianBlur(3)], p=0.15),
+ transforms.RandomAffine(degrees=5, translate=(0.02,0.02), scale=(0.95,1.05)),
+])
+tfm_val = transforms.Compose([
+ transforms.Grayscale(num_output_channels=1), # force 1-channel input
+ transforms.ToTensor(),
+ transforms.Normalize((0.5,), (0.5,)),
+])
+
+train_ds = datasets.ImageFolder(os.path.join(DATA_DIR, "train"), transform=tfm_train)
+val_ds = datasets.ImageFolder(os.path.join(DATA_DIR, "val"), transform=tfm_val)
+
+train_loader = DataLoader(train_ds, batch_size=BATCH, shuffle=True, num_workers=0)
+val_loader = DataLoader(val_ds, batch_size=BATCH, shuffle=False, num_workers=0)
+
+def evaluate(model: nn.Module, loader: DataLoader) -> float:
+ model.eval()
+ correct = total = 0
+ with tr.no_grad():
+ for x, y in loader:
+ x, y = x.to(DEVICE), y.to(DEVICE)
+ pred = model(x).argmax(1)
+ correct += (pred == y).sum().item()
+ total += y.numel()
+ return correct / total if total else 0.0
+
+def main():
+ # Sanity: verify loader channels
+ xb, _ = next(iter(train_loader))
+ print("Train batch shape:", xb.shape) # expect [B, 1, 28, 28]
+
+ model = DigitNet(num_classes=10).to(DEVICE)
+ opt = tr.optim.AdamW(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY)
+
+ best_acc, best_state = 0.0, None
+ for ep in range(1, EPOCHS + 1):
+ model.train()
+ for x, y in tqdm(train_loader, desc=f"epoch {ep}/{EPOCHS}"):
+ x, y = x.to(DEVICE), y.to(DEVICE)
+ opt.zero_grad()
+ logits = model(x)
+ loss = F.cross_entropy(logits, y, label_smoothing=LABEL_SMOOTH)
+ loss.backward()
+ opt.step()
+
+ acc = evaluate(model, val_loader)
+ print(f"val acc: {acc:.4f}")
+ if acc > best_acc:
+ best_acc = acc
+ best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
+
+ if best_state is not None:
+ model.load_state_dict(best_state)
+ print(f"Best val acc: {best_acc:.4f}")
+
+ # Save PyTorch weights (optional)
+ tr.save(model.state_dict(), os.path.join(ARTI_DIR, "digitnet_best.pth"))
+
+ # Export to ONNX with dynamic batch using the Dynamo API
+ model.eval()
+ dummy = tr.randn(1, 1, 28, 28)
+ onnx_path = os.path.join(ARTI_DIR, "sudoku_digitnet.onnx")
+
+ tr.onnx.export(
+ model, # model
+ dummy, # input tensor corresponds to arg name 'x'
+ onnx_path, # output .onnx
+ input_names=["input"], # ONNX *display* name (independent of arg name)
+ output_names=["logits"],
+ opset_version=19,
+ do_constant_folding=True,
+ keep_initializers_as_inputs=False,
+ dynamo=True,
+ dynamic_shapes={"x": {0: Dim("N")}}
+ )
+
+ print("Exported:", onnx_path)
+
+ # quick parity with a big batch (proves dynamic batch works)
+ sess = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
+ x = tr.randn(512, 1, 28, 28)
+ onnx_logits = sess.run(["logits"], {"input": x.numpy().astype(np.float32)})[0]
+ pt_logits = model(x).detach().numpy()
+ print("Parity MAE:", np.mean(np.abs(onnx_logits - pt_logits)))
+
+if __name__ == "__main__":
+ main()
+```
+
+This file is a self-contained trainer for the Sudoku digit classifier. It starts by fixing random seeds for reproducibility and sets DEVICE="cpu" so the workflow runs the same on desktops and Arm64 boards. It expects the dataset from the previous step under data/train/0..9 and data/val/0..9, and creates an artifacts/ folder for all outputs.
+
+The script builds two dataloaders (train/val) with a preprocessing stack that forces grayscale (Grayscale(num_output_channels=1)) so inputs match the model’s first convolution, converts to tensors, and normalizes to a centered range. Light augmentations on the training split—small affine jitter and occasional blur—mimic camera variability without distorting the digits. Batch size, epochs, and learning rate are set to conservative defaults so training is smooth on CPU; you can scale them up later.
+
+Then, the script it instantiates DigitNet(num_classes=10) model. The optimizer is AdamW with mild weight decay to control overfitting. The loss is cross-entropy with label smoothing (e.g., 0.05), which reduces over-confidence and helps on easily confused shapes (like 6/8/9).
+
+The training loop runs for a fixed number of epochs, iterating mini-batches from the training set. After each epoch, it evaluates on the validation split and logs the accuracy. The script keeps track of the best model state seen so far (based on val accuracy) and restores it at the end, ensuring the final model corresponds to your best epoch, not just the last one.
+
+The file will create two artifacts:
+1. digitnet_best.pth — the best PyTorch weights (handy for quick experiments, fine-tuning, or debugging later).
+2. sudoku_digitnet.onnx — the exported ONNX model, produced with PyTorch’s Dynamo exporter and a dynamic batch dimension. Dynamic batch means the model accepts input of shape [N, 1, 28, 28] for any N, which is ideal for efficient batched inference on Arm64 and for Android integration.
+
+Right after export, the script runs a parity test: it feeds the same randomly generated batch through both the PyTorch model and the ONNX model (executed by ONNX Runtime) and prints the mean absolute error between their logits. A tiny value confirms the exported graph faithfully matches your trained network.
+
+## Running the script
+To run the training script, type:
+
+```console
+python3 03_Training.py
+```
+
+The script will train, validate, export, and verify the digit recognizer in one go. After it finishes, you’ll have both a portable ONNX model and a PyTorch checkpoint ready for the next step—building the image processor that detects the Sudoku grid, rectifies it, segments cells, and performs batched ONNX inference to reconstruct the board for solving.
+
+Here is a sample run:
+
+```output
+python3 03_Training.py
+Train batch shape: torch.Size([256, 1, 28, 28])
+epoch 1/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:24<00:00, 7.82it/s]
+val acc: 0.8099
+epoch 2/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:18<00:00, 8.05it/s]
+val acc: 0.8378
+epoch 3/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:17<00:00, 8.09it/s]
+val acc: 0.8855
+epoch 4/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:20<00:00, 7.97it/s]
+val acc: 0.9180
+epoch 5/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:20<00:00, 7.97it/s]
+val acc: 0.9527
+epoch 6/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:22<00:00, 7.88it/s]
+val acc: 0.9635
+epoch 7/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:22<00:00, 7.88it/s]
+val acc: 0.9777
+epoch 8/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:21<00:00, 7.91it/s]
+val acc: 0.9854
+epoch 9/10: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:21<00:00, 7.91it/s]
+val acc: 0.9912
+epoch 10/10: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1597/1597 [03:21<00:00, 7.91it/s]
+val acc: 0.9928
+Best val acc: 0.9928
+[torch.onnx] Obtain model graph for `DigitNet([...]` with `torch.export.export(..., strict=False)`...
+[torch.onnx] Obtain model graph for `DigitNet([...]` with `torch.export.export(..., strict=False)`... ✅
+[torch.onnx] Run decomposition...
+[torch.onnx] Run decomposition... ✅
+[torch.onnx] Translate the graph into ONNX...
+[torch.onnx] Translate the graph into ONNX... ✅
+Applied 1 of general pattern rewrite rules.
+Exported: artifacts/sudoku_digitnet.onnx
+Parity MAE: 1.0251999e-05
+```
+
+## Summary
+By running the training script you train the DigitNet CNN on the Sudoku digit dataset, steadily improving accuracy across epochs until the model surpasses 99% validation accuracy. The process builds on the earlier steps where we first defined the model architecture in digitnet_model.py and then prepared a dedicated training script to handle data loading, augmentation, optimization, and evaluation. During training the best-performing model state is saved, and at the end it is exported to the ONNX format with dynamic batch support. A parity check confirms that the ONNX and PyTorch versions produce virtually identical outputs (mean error ~1e-5). You now have a validated ONNX model (artifacts/sudoku_digitnet.onnx) and a PyTorch checkpoint (digitnet_best.pth), both ready for integration into the Sudoku image processing pipeline. Before moving on to grid detection and solving, however, we will first run standalone inference to confirm the model’s predictions on individual digit crops.
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/05_Inference.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/05_Inference.md
new file mode 100644
index 0000000000..11bae36fd9
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/05_Inference.md
@@ -0,0 +1,249 @@
+---
+# User change
+title: "Inference and Model Evaluation"
+
+weight: 6
+
+layout: "learningpathall"
+---
+
+## Objective ##
+In this section, we validate the digit recognizer by running inference on the validation dataset using both the PyTorch checkpoint and the exported ONNX model. We verify that PyTorch and ONNX Runtime produce consistent results, analyze class-level behavior using a confusion matrix, and generate visual diagnostics for debugging and documentation. This step acts as a final verification checkpoint before integrating the model into the full OpenCV-based Sudoku processing pipeline.
+
+Before introducing geometric processing, grid detection, and perspective correction, it is important to confirm that the digit recognizer works reliably in isolation. By validating inference and analyzing errors at the digit level, we ensure that any future issues in the end-to-end system can be attributed to image processing or geometry rather than the classifier itself.
+
+## Inference and Evaluation Script
+Create a new file named 04_Test.py and paste the script below into it. This script evaluates the digit recognizer in a way that closely mirrors deployment conditions. It compares PyTorch and ONNX Runtime inference, measures accuracy on the validation dataset, and generates visual diagnostics that reveal both strengths and remaining failure modes of the model.
+
+```python
+import os, numpy as np, torch
+from torchvision import datasets, transforms
+from torch.utils.data import DataLoader
+from tqdm import tqdm
+import matplotlib.pyplot as plt
+
+from digitnet_model import DigitNet
+
+DATA_DIR = "data"
+ARTI_DIR = "artifacts"
+os.makedirs(ARTI_DIR, exist_ok=True)
+
+ONNX_PATH = os.path.join(ARTI_DIR, "sudoku_digitnet.onnx") # fp32
+
+# Same normalization as training (and force grayscale → 1 channel)
+tfm_val = transforms.Compose([
+ transforms.Grayscale(num_output_channels=1),
+ transforms.ToTensor(),
+ transforms.Normalize((0.5,), (0.5,))
+])
+val_ds = datasets.ImageFolder(os.path.join(DATA_DIR, "val"), transform=tfm_val)
+val_loader = DataLoader(val_ds, batch_size=512, shuffle=False, num_workers=0)
+
+DIGIT_NAMES = [str(i) for i in range(10)] # 0 = blank, 1..9 = digits
+
+
+def evaluate_pytorch(model, loader):
+ model.eval()
+ correct = total = 0
+ with torch.no_grad():
+ for x, y in loader:
+ pred = model(x).argmax(1)
+ correct += (pred == y).sum().item()
+ total += y.numel()
+ return correct / total if total else 0.0
+
+
+def confusion_matrix_onnx(onnx_model_path, loader):
+ import onnxruntime as ort
+ sess = ort.InferenceSession(onnx_model_path, providers=["CPUExecutionProvider"])
+ mat = np.zeros((10, 10), dtype=np.int64)
+ total = 0
+ correct = 0
+ for x, y in tqdm(loader, desc="ONNX eval"):
+ # x: torch tensor [N,1,28,28] normalized to [-1,1]
+ inp = x.numpy().astype(np.float32)
+ logits = sess.run(["logits"], {"input": inp})[0] # [N,10]
+ pred = logits.argmax(axis=1)
+ y_np = y.numpy()
+ for t, p in zip(y_np, pred):
+ mat[t, p] += 1
+ correct += (pred == y_np).sum()
+ total += y_np.size
+ acc = float(correct) / float(total) if total else 0.0
+ return acc, mat
+
+
+def plot_confusion_matrix(cm, classes=DIGIT_NAMES, normalize=False, title="Confusion matrix", fname=None):
+ """Plot confusion matrix. If normalize=True, rows sum to 1."""
+ cm_plot = cm.astype("float")
+ if normalize:
+ row_sums = cm_plot.sum(axis=1, keepdims=True) + 1e-12
+ cm_plot = cm_plot / row_sums
+
+ plt.figure(figsize=(6, 5))
+ plt.imshow(cm_plot, interpolation="nearest")
+ plt.title(title)
+ plt.colorbar()
+ tick_marks = np.arange(len(classes))
+ plt.xticks(tick_marks, classes)
+ plt.yticks(tick_marks, classes)
+
+ # Label each cell
+ thresh = cm_plot.max() / 2.0
+ for i in range(cm_plot.shape[0]):
+ for j in range(cm_plot.shape[1]):
+ txt = f"{cm_plot[i, j]:.2f}" if normalize else f"{int(cm_plot[i, j])}"
+ plt.text(j, i, txt,
+ horizontalalignment="center",
+ verticalalignment="center",
+ fontsize=7,
+ color="white" if cm_plot[i, j] > thresh else "black")
+
+ plt.ylabel("True label")
+ plt.xlabel("Predicted label")
+ plt.tight_layout()
+ if fname:
+ plt.savefig(fname, dpi=150)
+ print(f"Saved: {fname}")
+ plt.show()
+
+
+def sample_predictions_onnx(onnx_path, dataset, k=24, seed=0):
+ """Show a grid of sample predictions (mix of correct and misclassified)."""
+ import onnxruntime as ort
+ rng = np.random.default_rng(seed)
+ sess = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
+
+ # Over-sample candidates then choose some wrong + some right
+ idxs = rng.choice(len(dataset), size=min(k * 2, len(dataset)), replace=False)
+ imgs, ys, preds = [], [], []
+
+ for i in idxs:
+ x, y = dataset[i] # x: [1,28,28] after transforms; y: int
+ x_np = x.unsqueeze(0).numpy().astype(np.float32) # [1,1,28,28]
+ logits = sess.run(["logits"], {"input": x_np})[0] # [1,10]
+ p = int(np.argmax(logits, axis=1)[0])
+ imgs.append(x.squeeze(0).numpy()) # [28,28]
+ ys.append(int(y))
+ preds.append(p)
+
+ mis_idx = [i for i, (t, p) in enumerate(zip(ys, preds)) if t != p]
+ cor_idx = [i for i, (t, p) in enumerate(zip(ys, preds)) if t == p]
+ picked = (mis_idx[:k // 2] + cor_idx[:k - len(mis_idx[:k // 2])])[:k]
+ if not picked: # fallback
+ picked = list(range(min(k, len(imgs))))
+
+ # Plot grid
+ import math
+ cols = 8
+ rows = math.ceil(len(picked) / cols)
+ plt.figure(figsize=(cols * 1.6, rows * 1.8))
+ for j, idx in enumerate(picked):
+ plt.subplot(rows, cols, j + 1)
+ plt.imshow(imgs[idx], cmap="gray")
+ t, p = ys[idx], preds[idx]
+ title = f"T:{t} P:{p}"
+ color = "green" if t == p else "red"
+ plt.title(title, color=color, fontsize=9)
+ plt.axis("off")
+ plt.tight_layout()
+ out = os.path.join(ARTI_DIR, "samples_grid.png")
+ plt.savefig(out, dpi=150)
+ print(f"Saved: {out}")
+ plt.show()
+
+def main():
+ # Optional: evaluate the best PyTorch checkpoint for reference
+ pt_ckpt = os.path.join(ARTI_DIR, "digitnet_best.pth")
+ if os.path.exists(pt_ckpt):
+ model = DigitNet()
+ model.load_state_dict(torch.load(pt_ckpt, map_location="cpu"))
+ pt_acc = evaluate_pytorch(model, val_loader)
+ print(f"PyTorch val acc: {pt_acc:.4f}")
+ else:
+ print("No PyTorch checkpoint found; skipping PT eval.")
+
+ # Evaluate ONNX fp32
+ if os.path.exists(ONNX_PATH):
+ acc, cm = confusion_matrix_onnx(ONNX_PATH, val_loader)
+ print(f"ONNX fp32 val acc: {acc:.4f}")
+ print("Confusion matrix (rows=true, cols=pred):\n", cm)
+
+ # Plots: counts + normalized
+ plot_confusion_matrix(cm, normalize=False,
+ title="ONNX fp32 – Confusion (counts)",
+ fname=os.path.join(ARTI_DIR, "cm_fp32_counts.png"))
+ plot_confusion_matrix(cm, normalize=True,
+ title="ONNX fp32 – Confusion (row-normalized)",
+ fname=os.path.join(ARTI_DIR, "cm_fp32_norm.png"))
+
+ # Sample predictions grid
+ try:
+ sample_predictions_onnx(ONNX_PATH, val_ds, k=24)
+ except Exception as e:
+ print("Sample grid skipped:", e)
+ else:
+ print("Missing ONNX model:", ONNX_PATH)
+
+if __name__ == "__main__":
+ main()
+```
+
+The script first loads the validation dataset using the same preprocessing pipeline as training, including forced grayscale conversion to ensure a single input channel. It then optionally evaluates the best PyTorch checkpoint (digitnet_best.pth) to establish a reference accuracy.
+
+Next, the exported ONNX model (sudoku_digitnet.onnx) is loaded using ONNX Runtime and evaluated in batches. Because the model was exported with a dynamic batch dimension, inference can be performed efficiently on larger batches, which is representative of how the model will be used later in the pipeline.
+
+The script expects two things from the earlier steps:
+1. A validation dataset stored under data/val/0..9/…
+2. A trained model exported in previous step and stored under artifacts/
+ * artifacts/digitnet_best.pth (optional, PyTorch weights)
+ * artifacts/sudoku_digitnet.onnx (required, ONNX model)
+
+When you run the script, it first loads the validation dataset using the same preprocessing as training, including forcing grayscale so the input has a single channel. It then optionally evaluates the PyTorch checkpoint to provide a reference accuracy. After that, it runs batched inference with ONNX Runtime, computes an overall accuracy, and builds a confusion matrix (true class vs predicted class) that reveals which digits are being confused.
+
+In addition to printing accuracy metrics, the script generates two types of diagnostic outputs:
+1. Confusion matrix visualizations, saved as:
+ * artifacts/cm_fp32_counts.png (raw counts)
+ * artifacts/cm_fp32_norm.png (row-normalized)
+2. A grid of example predictions, saved as:
+ *artifacts/samples_grid.png
+
+These artifacts provide both quantitative and qualitative insight into model performance.
+
+In the sample grid, each tile shows one crop together with its True label (T:) and Predicted label (P:), with correct predictions highlighted in green and mistakes highlighted in red. This makes it easy to quickly verify that the classifier behaves sensibly and to spot remaining failure modes.
+
+## Running the script
+Run the evaluation script from the project root:
+
+```console
+python3 04_Test.py
+```
+
+In the example below, the PyTorch and ONNX accuracies match exactly, confirming that the export process preserved model behavior.
+
+```console
+python3 04_Test.py
+PyTorch val acc: 0.9928
+ONNX eval: 100%|███████████████████████████████████████████████████████████| 32/32 [00:01<00:00, 21.06it/s]
+ONNX fp32 val acc: 0.9928
+Confusion matrix (rows=true, cols=pred):
+ [[12623 7 0 0 0 0 0 0 0 0]
+ [ 0 420 0 0 0 0 0 0 0 0]
+ [ 0 0 331 0 4 0 1 0 0 0]
+ [ 0 1 0 332 0 1 0 0 0 0]
+ [ 0 0 0 0 460 0 0 0 0 0]
+ [ 0 1 0 1 0 486 2 0 0 0]
+ [ 1 0 0 0 0 19 387 0 1 2]
+ [ 0 1 0 0 0 0 0 375 0 0]
+ [ 0 0 0 0 0 6 27 0 297 10]
+ [ 0 1 0 0 0 14 10 0 7 372]]
+Saved: artifacts/cm_fp32_counts.png
+```
+
+
+The confusion matrix provides more insight than a single accuracy number. Each row corresponds to the true class, and each column corresponds to the predicted class. A strong diagonal indicates correct classification. In this output, blank cells (class 0) are almost always recognized correctly, while the remaining errors occur primarily between visually similar printed digits such as 6, 8, and 9.
+
+This behavior is expected and indicates that the model has learned meaningful digit features. The remaining confusions are rare and can be addressed later through targeted augmentation or higher-resolution crops if needed.
+
+## Summary
+With inference validated and error modes understood, the digit recognizer is now ready to be embedded into the full Sudoku image-processing pipeline, where OpenCV will be used to detect the grid, rectify perspective, segment cells, and run batched ONNX inference to reconstruct and solve complete puzzles.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/06_SudokuProcessor.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/06_SudokuProcessor.md
new file mode 100644
index 0000000000..4a6a7d1173
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/06_SudokuProcessor.md
@@ -0,0 +1,448 @@
+---
+# User change
+title: "Sudoku Processor. From Image to Solution"
+weight: 7
+layout: "learningpathall"
+
+---
+
+## Objective ##
+
+In this section, we integrate all previous components into a complete Sudoku processing pipeline. Starting from a full Sudoku image, we detect and rectify the grid, split it into individual cells, recognize digits using the ONNX model, and finally solve the puzzle using a deterministic solver. By the end of this step, you will have an end-to-end system that takes a photograph of a Sudoku puzzle and produces a solved board, along with visual outputs for debugging and validation.
+
+## Context
+So far, we have:
+1. Generated a synthetic, well-labeled Sudoku digit dataset,
+2. Trained a lightweight CNN (DigitNet) to recognize digits and blanks,
+3. Exported the model to ONNX with dynamic batch support,
+4. Validated inference correctness and analyzed errors using confusion matrices.
+
+At this point, the digit recognizer is reliable in isolation. The remaining challenge is connecting vision with reasoning: extracting the Sudoku grid from an image, mapping each cell to a digit, and applying a solver. This section bridges that gap.
+
+## Overview of the pipeline
+To implement the Sudoku processor, create the file (sudoku_processor.py) and paste the implementation below:
+
+```python
+import cv2 as cv
+import numpy as np
+import onnxruntime as ort
+
+class SudokuProcessor:
+ def __init__(
+ self,
+ onnx_path: str,
+ input_size: int = 28,
+ warp_size: int = 450,
+ blank_class: int = 0,
+ blank_conf_threshold: float = 0.65,
+ providers=("CPUExecutionProvider",),
+ ):
+ """
+ onnx_path: path to sudoku_digitnet.onnx
+ input_size: model input spatial size (28)
+ warp_size: size of rectified square board (e.g., 450 => each cell ~50px)
+ blank_class: class index used for blanks (0)
+ blank_conf_threshold: if model confidence < threshold, treat as blank (helps noisy cells)
+ """
+ self.onnx_path = onnx_path
+ self.input_size = input_size
+ self.warp_size = warp_size
+ self.blank_class = blank_class
+ self.blank_conf_threshold = blank_conf_threshold
+
+ self.sess = ort.InferenceSession(onnx_path, providers=list(providers))
+ self.input_name = self.sess.get_inputs()[0].name # typically "input"
+ self.output_name = self.sess.get_outputs()[0].name # typically "logits"
+
+ def process_image(self, bgr: np.ndarray, overlay: bool = True):
+ """
+ Returns:
+ board (9x9 ints with 0 for blank),
+ solved_board (9x9 ints, or None if unsolved),
+ debug dict (warped, contours, etc.),
+ overlay_bgr (optional solution overlay)
+ """
+ warped, H, quad = self.detect_and_warp_board(bgr)
+ cells = self.split_cells(warped)
+ board, conf = self.recognize_board(cells)
+
+ solved = [row[:] for row in board]
+ ok = solve_sudoku(solved)
+
+ overlay_img = None
+ if overlay and ok:
+ overlay_img = self.overlay_solution(bgr, H, board, solved)
+
+ debug = {
+ "warped": warped,
+ "homography": H,
+ "quad": quad,
+ "confidence": conf,
+ }
+ return board, (solved if ok else None), debug, overlay_img
+
+ # -----------------------------
+ # Board detection / rectification
+ # -----------------------------
+ def detect_and_warp_board(self, bgr: np.ndarray):
+ """
+ Finds the largest Sudoku-like quadrilateral and warps it to a square.
+ Returns warped_board, homography, quad_points.
+ """
+ gray = cv.cvtColor(bgr, cv.COLOR_BGR2GRAY)
+ blur = cv.GaussianBlur(gray, (5, 5), 0)
+
+ # Strong binary image helps contour finding (works well for printed grids)
+ thr = cv.adaptiveThreshold(
+ blur, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY_INV, 31, 7
+ )
+
+ # Remove small noise, connect lines a bit
+ kernel = cv.getStructuringElement(cv.MORPH_RECT, (3, 3))
+ thr = cv.morphologyEx(thr, cv.MORPH_CLOSE, kernel, iterations=2)
+
+ contours, _ = cv.findContours(thr, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
+ if not contours:
+ raise RuntimeError("No contours found. Try a clearer image or different thresholding.")
+
+ # Pick the largest contour that approximates to 4 points
+ contours = sorted(contours, key=cv.contourArea, reverse=True)
+ quad = None
+ for c in contours[:20]:
+ peri = cv.arcLength(c, True)
+ approx = cv.approxPolyDP(c, 0.02 * peri, True)
+ if len(approx) == 4:
+ quad = approx.reshape(4, 2).astype(np.float32)
+ break
+
+ if quad is None:
+ raise RuntimeError("Could not find a 4-corner Sudoku grid. Try a more fronto-parallel image.")
+
+ quad = order_quad_points(quad)
+
+ dst = np.array(
+ [[0, 0], [self.warp_size - 1, 0], [self.warp_size - 1, self.warp_size - 1], [0, self.warp_size - 1]],
+ dtype=np.float32,
+ )
+ H = cv.getPerspectiveTransform(quad, dst)
+ warped = cv.warpPerspective(bgr, H, (self.warp_size, self.warp_size))
+
+ return warped, H, quad
+
+ # -----------------------------
+ # Cell splitting / preprocessing
+ # -----------------------------
+ def split_cells(self, warped_bgr: np.ndarray):
+ """
+ Splits a rectified square board into 81 cell images.
+ Returns list of (r, c, cell_bgr).
+ """
+ cells = []
+ step = self.warp_size // 9
+ for r in range(9):
+ for c in range(9):
+ y0, y1 = r * step, (r + 1) * step
+ x0, x1 = c * step, (c + 1) * step
+ cell = warped_bgr[y0:y1, x0:x1].copy()
+ cells.append((r, c, cell))
+ return cells
+
+ def preprocess_cell(self, cell_bgr: np.ndarray):
+ """
+ Produces a 28x28 float32 tensor in the same normalization as training:
+ grayscale -> [0,1] -> normalize to [-1,1] via (x-0.5)/0.5
+ Also tries to suppress grid lines / borders by cropping margins.
+ """
+ g = cv.cvtColor(cell_bgr, cv.COLOR_BGR2GRAY)
+
+ # Crop a margin to remove grid lines/borders
+ h, w = g.shape
+ m = int(0.12 * min(h, w)) # ~12% margin
+ g = g[m:h - m, m:w - m]
+
+ # Binarize & clean (helps isolate printed digits)
+ g_blur = cv.GaussianBlur(g, (3, 3), 0)
+ bw = cv.adaptiveThreshold(g_blur, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY_INV, 21, 5)
+
+ # Remove small specks
+ bw = cv.morphologyEx(bw, cv.MORPH_OPEN, np.ones((2, 2), np.uint8), iterations=1)
+
+ # If almost empty => likely blank
+ if (bw > 0).sum() < 15:
+ # Return a near-empty input; classifier should produce blank
+ resized = cv.resize(g, (self.input_size, self.input_size), interpolation=cv.INTER_AREA)
+ else:
+ # Use bw mask to focus on digit; keep as grayscale for the model
+ resized = cv.resize(g, (self.input_size, self.input_size), interpolation=cv.INTER_AREA)
+
+ x = resized.astype(np.float32) / 255.0
+ x = (x - 0.5) / 0.5 # [-1,1]
+ x = x[None, None, :, :] # [1,1,H,W]
+ return x
+
+ # -----------------------------
+ # Inference
+ # -----------------------------
+ def recognize_board(self, cells):
+ """
+ Runs batched ONNX inference on 81 cells and returns:
+ board[9][9] with 0 for blank
+ conf[9][9] with max softmax probability
+ """
+ xs = []
+ coords = []
+ for r, c, cell in cells:
+ coords.append((r, c))
+ xs.append(self.preprocess_cell(cell))
+
+ X = np.concatenate(xs, axis=0).astype(np.float32) # [81,1,28,28]
+ logits = self.sess.run([self.output_name], {self.input_name: X})[0] # [81,10]
+ probs = softmax(logits, axis=1)
+ pred = probs.argmax(axis=1)
+ conf = probs.max(axis=1)
+
+ board = [[0 for _ in range(9)] for _ in range(9)]
+ conf_grid = [[0.0 for _ in range(9)] for _ in range(9)]
+ for i, (r, c) in enumerate(coords):
+ p = int(pred[i])
+ cf = float(conf[i])
+
+ # Optional safety: low-confidence => blank
+ if cf < self.blank_conf_threshold:
+ p = self.blank_class
+
+ board[r][c] = p
+ conf_grid[r][c] = cf
+
+ return board, conf_grid
+
+ # -----------------------------
+ # Overlay
+ # -----------------------------
+ def overlay_solution(self, original_bgr, H, board, solved):
+ """
+ Overlays ONLY the filled-in digits (where original board has 0).
+ """
+ invH = np.linalg.inv(H)
+ overlay = original_bgr.copy()
+
+ step = self.warp_size // 9
+ # Create a transparent layer in warped space then map back
+ layer = np.zeros((self.warp_size, self.warp_size, 3), dtype=np.uint8)
+
+ for r in range(9):
+ for c in range(9):
+ if board[r][c] != 0:
+ continue
+ d = solved[r][c]
+ # text placement in warped coordinates
+ x = int(c * step + step * 0.32)
+ y = int(r * step + step * 0.72)
+ cv.putText(layer, str(d), (x, y), cv.FONT_HERSHEY_SIMPLEX, 1.2, (0, 200, 0), 2, cv.LINE_AA)
+
+ # Warp overlay layer back to original image
+ h0, w0 = original_bgr.shape[:2]
+ back = cv.warpPerspective(layer, invH, (w0, h0))
+
+ # Blend
+ mask = (back.sum(axis=2) > 0).astype(np.uint8) * 255
+ mask3 = cv.merge([mask, mask, mask])
+ overlay = np.where(mask3 > 0, cv.addWeighted(overlay, 0.6, back, 0.4, 0), overlay)
+ return overlay
+
+
+# -----------------------------
+# Solver (backtracking)
+# -----------------------------
+def solve_sudoku(board):
+ pos = find_empty(board)
+ if pos is None:
+ return True
+ r, c = pos
+ for v in range(1, 10):
+ if valid(board, r, c, v):
+ board[r][c] = v
+ if solve_sudoku(board):
+ return True
+ board[r][c] = 0
+ return False
+
+
+def find_empty(board):
+ for r in range(9):
+ for c in range(9):
+ if board[r][c] == 0:
+ return (r, c)
+ return None
+
+
+def valid(board, r, c, v):
+ # row
+ for j in range(9):
+ if board[r][j] == v:
+ return False
+ # col
+ for i in range(9):
+ if board[i][c] == v:
+ return False
+ # box
+ br, bc = 3 * (r // 3), 3 * (c // 3)
+ for i in range(br, br + 3):
+ for j in range(bc, bc + 3):
+ if board[i][j] == v:
+ return False
+ return True
+
+
+# -----------------------------
+# Utilities
+# -----------------------------
+def order_quad_points(pts):
+ """
+ Orders 4 points into: top-left, top-right, bottom-right, bottom-left.
+ """
+ pts = np.array(pts, dtype=np.float32)
+ s = pts.sum(axis=1)
+ diff = np.diff(pts, axis=1).reshape(-1)
+
+ tl = pts[np.argmin(s)]
+ br = pts[np.argmax(s)]
+ tr = pts[np.argmin(diff)]
+ bl = pts[np.argmax(diff)]
+
+ return np.array([tl, tr, br, bl], dtype=np.float32)
+
+
+def softmax(x, axis=1):
+ x = x - np.max(x, axis=axis, keepdims=True)
+ e = np.exp(x)
+ return e / (np.sum(e, axis=axis, keepdims=True) + 1e-12)
+```
+
+The Sudoku processor follows a sequence of steps:
+1. Grid detection – find the outer Sudoku grid in the input image.
+2. Perspective rectification – warp the grid to a square, top-down view.
+3. Cell extraction – split the rectified grid into 81 cell images.
+4. Digit recognition – run batched ONNX inference to classify each cell.
+5. Board reconstruction – assemble a 9×9 numeric board.
+6. Solving – apply a backtracking Sudoku solver.
+7. Visualization – overlay the solution and render clean board images.
+
+We encapsulate the entire pipeline in a reusable class called SudokuProcessor. This class loads the ONNX model once and exposes a single high-level method that processes an input image and returns both intermediate results and final outputs.
+
+Conceptually, the processor:
+* Accepts a BGR image,
+* Returns the recognized board, the solved board (if solvable), and optional visual overlays.
+
+This design keeps inference fast and makes the processor easy to integrate later into an Android application or embedded system.
+
+## Grid detection and rectification
+The first task is to locate the Sudoku grid in the image. We convert the image to grayscale, apply adaptive thresholding, and use contour detection to find large rectangular shapes. The largest contour that approximates a quadrilateral is assumed to be the Sudoku grid.
+
+Once the four corners are identified, we compute a perspective transform and warp the grid into a square image. This rectified representation removes camera tilt and perspective distortion, allowing all subsequent steps to assume a fixed geometry.
+
+We order the four corners consistently (top-left → top-right → bottom-right → bottom-left) before computing the perspective transform.
+
+## Splitting the grid into cells
+After rectification, the grid is divided evenly into a 9×9 array. Each cell is cropped based on its row and column index. At this stage, every cell corresponds to one Sudoku position and is ready for preprocessing and classification.
+
+Each cell undergoes light preprocessing before inference:
+* Conversion to grayscale,
+* Cropping of a small margin to suppress grid lines,
+* Adaptive thresholding and morphological cleanup to isolate printed digits,
+* Resizing to the model’s input size (28×28),
+* Normalization to match the training distribution.
+
+We crop a margin to suppress grid lines, because grid strokes can dominate the digit pixels and cause systematic misclassification. Cells with very little foreground content are treated as blank candidates, reducing false digit detections in empty cells.
+
+## Batched ONNX inference
+All 81 cell tensors are stacked into a single batch and passed to ONNX Runtime in one call. Because the model was exported with a dynamic batch dimension, this batched inference is efficient and mirrors how the model will be used in production.
+
+The output logits are converted to probabilities, and the most likely class is selected for each cell. Optionally, a confidence threshold can be applied so that low-confidence predictions are treated as blanks.
+
+The result is a 9×9 board where:
+* 0 represents a blank cell,
+* 1–9 represent recognized digits.
+
+## Solving the Sudoku
+With the recognized board constructed, we apply a classic backtracking Sudoku solver. This solver deterministically fills empty cells while respecting Sudoku constraints (row, column, and 3×3 block rules).
+
+If the solver succeeds, we obtain a complete solution. If it fails, the failure usually indicates one or more recognition errors, which can be diagnosed using the intermediate visual outputs.
+
+## Visualization and outputs
+The processor saves several artifacts to help debugging and demonstration:
+- `artifacts/warped.png` – rectified top-down view of the Sudoku grid.
+- `artifacts/overlay_solution.png` – solution digits overlaid onto the original image (if solved).
+- (Optional) `artifacts/recognized_board.png`, `artifacts/solved_board.png`, `artifacts/boards_side_by_side.png` – clean board renderings if you enabled those helpers.
+
+The driver script below saves warped.png and overlay_solution.png by default.
+
+## Running the processor
+A small driver script (05_RunSudokuProcessor.py) demonstrates how to use the SudokuProcessor:
+
+```python
+import os
+import cv2 as cv
+
+from sudoku_processor import SudokuProcessor
+
+def print_board(board, title="Board"):
+ print("\n" + title)
+ for r in range(9):
+ row = ""
+ for c in range(9):
+ v = board[r][c]
+ row += (". " if v == 0 else f"{v} ")
+ if c % 3 == 2 and c != 8:
+ row += "| "
+ print(row.strip())
+ if r % 3 == 2 and r != 8:
+ print("-" * 21)
+
+
+def main():
+ # Use any image path you like:
+ # - a real photo
+ # - a synthetic grid, e.g. data/grids/val/000001_cam.png
+ img_path = "data/grids/val/000001_cam.png"
+ onnx_path = os.path.join("artifacts", "sudoku_digitnet.onnx")
+
+ bgr = cv.imread(img_path)
+ if bgr is None:
+ raise RuntimeError(f"Could not read image: {img_path}")
+
+ proc = SudokuProcessor(onnx_path=onnx_path, warp_size=450, blank_conf_threshold=0.65)
+
+ board, solved, dbg, overlay = proc.process_image(bgr, overlay=True)
+
+ print_board(board, "Recognized board")
+ if solved is None:
+ print("\nSolver failed (board might contain recognition errors).")
+ else:
+ print_board(solved, "Solved board")
+
+ # Save debug outputs
+ cv.imwrite("artifacts/warped.png", dbg["warped"])
+ if overlay is not None:
+ cv.imwrite("artifacts/overlay_solution.png", overlay)
+ print("\nSaved: artifacts/overlay_solution.png")
+ print("Saved: artifacts/warped.png")
+
+if __name__ == "__main__":
+ main()
+```
+
+You simply provide the path to a Sudoku image and the ONNX model, and the script saves all intermediate and final results to the artifacts/ directory.
+
+Representational result is shown below:
+
+
+
+## Summary
+By completing this section, you have built a full vision-to-solution Sudoku system:
+1. A trained and validated ONNX digit recognizer,
+2. A robust OpenCV-based image processing pipeline,
+3. A deterministic solver,
+4. Clear visual diagnostics at every stage.
+
+In the next step of the learning path, we will focus on optimization and deployment.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/07_Optimisation.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/07_Optimisation.md
new file mode 100644
index 0000000000..a8e38c6ca7
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/07_Optimisation.md
@@ -0,0 +1,424 @@
+---
+title: "Model Enhancements and Optimizations"
+weight: 8
+layout: "learningpathall"
+---
+
+## Objective
+In this section, we improve the Sudoku system from a working prototype into something that is faster, smaller, and more robust on Arm64-class hardware. We start by measuring a baseline, then apply ONNX Runtime optimizations and quantization, and finally address the most common real bottleneck: image preprocessing. At each step we re-check accuracy and solve rate so performance gains don’t come at the cost of correctness.
+
+## Establish a baseline
+Before applying any optimizations, it is essential to understand where time is actually being spent in the Sudoku pipeline. Without this baseline, it is impossible to tell whether an optimization is effective or whether it simply shifts the bottleneck elsewhere.
+
+In the current system, the total latency of processing a single Sudoku image is composed of four main stages:
+* Grid detection and warping – locating the outer Sudoku grid and rectifying it using a perspective transform. This step relies entirely on OpenCV and depends on image resolution, lighting, and grid clarity.
+* Cell preprocessing – converting each of the 81 cells into a normalized 28×28 grayscale input for the neural network. This includes cropping margins, thresholding, and morphological operations. In practice, this stage is often the dominant cost.
+* ONNX inference – running the digit recognizer on all 81 cells as a single batch. Thanks to dynamic batch support, this step is typically fast compared to preprocessing.
+* Solving – applying a backtracking Sudoku solver to the recognized board. This step is usually negligible in runtime, unless recognition errors lead to difficult or contradictory boards.
+
+To quantify these contributions, we will add simple timing measurements around each stage of the pipeline using a high-resolution clock (time.perf_counter()). For each processed image, we will print a breakdown:
+* warp_ms – time spent on grid detection and perspective rectification
+* preprocess_ms – total time spent preprocessing all 81 cells
+* onnx_ms – time spent running batched ONNX inference
+* solve_ms – time spent solving the Sudoku
+* split_ms – time spent splitting the warped grid into 81 cells
+* total_ms – end-to-end processing time
+
+## Performance measurements
+Open the sudoku_processor.py and add the following import
+
+```python
+import time
+```
+
+Then, modify the process_image as follows
+```python
+def process_image(self, bgr: np.ndarray, overlay: bool = True):
+ """
+ Returns:
+ board (9x9 ints with 0 for blank),
+ solved_board (9x9 ints, or None if unsolved),
+ debug dict (warped, homography, confidence, timing),
+ overlay_bgr (optional solution overlay)
+ """
+ timing = {}
+
+ t_total0 = time.perf_counter()
+
+ # --- Grid detection + warp ---
+ t0 = time.perf_counter()
+ warped, H, quad = self.detect_and_warp_board(bgr)
+ timing["warp_ms"] = (time.perf_counter() - t0) * 1000.0
+
+ # --- Cell splitting ---
+ t0 = time.perf_counter()
+ cells = self.split_cells(warped)
+ timing["split_ms"] = (time.perf_counter() - t0) * 1000.0
+
+ # --- Preprocessing (81 cells) ---
+ t0 = time.perf_counter()
+ xs = []
+ coords = []
+ for r, c, cell in cells:
+ coords.append((r, c))
+ xs.append(self.preprocess_cell(cell))
+ X = np.concatenate(xs, axis=0).astype(np.float32) # [81,1,28,28]
+ timing["preprocess_ms"] = (time.perf_counter() - t0) * 1000.0
+
+ # --- ONNX inference ---
+ t0 = time.perf_counter()
+ logits = self.sess.run([self.output_name], {self.input_name: X})[0]
+ timing["onnx_ms"] = (time.perf_counter() - t0) * 1000.0
+
+ # --- Postprocess predictions ---
+ probs = softmax(logits, axis=1)
+ pred = probs.argmax(axis=1)
+ conf = probs.max(axis=1)
+
+ board = [[0 for _ in range(9)] for _ in range(9)]
+ conf_grid = [[0.0 for _ in range(9)] for _ in range(9)]
+ for i, (r, c) in enumerate(coords):
+ p = int(pred[i])
+ cf = float(conf[i])
+ if cf < self.blank_conf_threshold:
+ p = self.blank_class
+ board[r][c] = p
+ conf_grid[r][c] = cf
+
+ # --- Solve ---
+ t0 = time.perf_counter()
+ solved = [row[:] for row in board]
+ ok = solve_sudoku(solved)
+ timing["solve_ms"] = (time.perf_counter() - t0) * 1000.0
+
+ # --- Overlay (optional) ---
+ overlay_img = None
+ if overlay and ok:
+ t0 = time.perf_counter()
+ overlay_img = self.overlay_solution(bgr, H, board, solved)
+ timing["overlay_ms"] = (time.perf_counter() - t0) * 1000.0
+ else:
+ timing["overlay_ms"] = 0.0
+
+ timing["total_ms"] = (time.perf_counter() - t_total0) * 1000.0
+
+ debug = {
+ "warped": warped,
+ "homography": H,
+ "quad": quad,
+ "confidence": conf_grid,
+ "timing": timing,
+ }
+
+ return board, (solved if ok else None), debug, overlay_img
+```
+
+Finally, print the timings in the 05_RunSudokuProcessor.py:
+```python
+def main():
+ # Use any image path you like:
+ # - a real photo
+ # - a synthetic grid, e.g. data/grids/val/000001_cam.png
+ img_path = "data/grids/val/000002_cam.png"
+ onnx_path = os.path.join("artifacts", "sudoku_digitnet.onnx")
+
+ bgr = cv.imread(img_path)
+ if bgr is None:
+ raise RuntimeError(f"Could not read image: {img_path}")
+
+ proc = SudokuProcessor(onnx_path=onnx_path, warp_size=450, blank_conf_threshold=0.65)
+
+ board, solved, dbg, overlay = proc.process_image(bgr, overlay=True)
+
+ print_board(board, "Recognized board")
+ if solved is None:
+ print("\nSolver failed (board might contain recognition errors).")
+ else:
+ print_board(solved, "Solved board")
+
+ # Save debug outputs
+ cv.imwrite("artifacts/warped.png", dbg["warped"])
+ if overlay is not None:
+ cv.imwrite("artifacts/overlay_solution.png", overlay)
+ print("\nSaved: artifacts/overlay_solution.png")
+ print("Saved: artifacts/warped.png")
+
+ tim = dbg["timing"]
+ print(
+ f"warp={tim['warp_ms']:.1f} ms | "
+ f"preprocess={tim['preprocess_ms']:.1f} ms | "
+ f"onnx={tim['onnx_ms']:.1f} ms | "
+ f"solve={tim['solve_ms']:.1f} ms | "
+ f"total={tim['total_ms']:.1f} ms"
+ )
+
+if __name__ == "__main__":
+ main()
+```
+
+The sample output will look as follows:
+```output
+python3 05_RunSudokuProcessor.py
+
+Recognized board
+. . . | 7 . . | 6 . .
+. . 4 | . . . | 1 . 9
+. . . | 1 5 . | . . .
+---------------------
+. . . | . 1 . | . . .
+. . . | . . . | . . .
+3 . . | . . . | . 6 .
+---------------------
+7 . . | . . . | . . .
+. . 9 | . . . | . . .
+. . . | . . . | . . .
+
+Solved board
+1 2 3 | 7 4 9 | 6 5 8
+5 6 4 | 2 3 8 | 1 7 9
+8 9 7 | 1 5 6 | 2 3 4
+---------------------
+2 4 5 | 6 1 3 | 8 9 7
+9 1 6 | 4 8 7 | 3 2 5
+3 7 8 | 5 9 2 | 4 6 1
+---------------------
+7 3 1 | 8 2 5 | 9 4 6
+4 5 9 | 3 6 1 | 7 8 2
+6 8 2 | 9 7 4 | 5 1 3
+
+Saved: artifacts/overlay_solution.png
+Saved: artifacts/warped.png
+warp=11.9 ms | preprocess=3.3 ms | onnx=1.9 ms | solve=3.1 ms | total=48.2 ms
+```
+
+## Folder benchmark
+The single-image measurements introduced earlier are useful for understanding the rough structure of the pipeline and for verifying that ONNX inference is not the main computational bottleneck. In our case, batched ONNX inference typically takes less than 2 ms, while grid detection, warping, and preprocessing dominate the runtime. However, individual measurements can be noisy due to caching effects, operating system scheduling, and Python overhead.
+
+To obtain more reliable performance numbers, we extend the evaluation to multiple images and compute aggregated statistics. This allows us to track not only average performance, but also variability and tail latency, which are particularly important for interactive applications.
+
+To do this, we add two helper functions to 05_RunSudokuProcessor.py, and make sure you have import glob and import numpy as np at the top of the runner script.
+
+The first function, summarize, computes basic statistics from a list of timing measurements:
+* mean – average runtime
+* median – robust central tendency
+* p90 / p95 – tail latency (90th and 95th percentiles), which indicate how bad the slow cases are
+
+```python
+def summarize(values):
+ values = np.asarray(values, dtype=np.float64)
+ return {
+ "mean": float(values.mean()),
+ "median": float(np.median(values)),
+ "p90": float(np.percentile(values, 90)),
+ "p95": float(np.percentile(values, 95)),
+ }
+```
+
+The second function, benchmark_folder, runs the full Sudoku pipeline on a collection of images and aggregates timing results across multiple runs:
+
+```python
+def benchmark_folder(proc, folder_glob, limit=100, warmup=10, overlay=False):
+ paths = sorted(glob.glob(folder_glob))
+ if not paths:
+ raise RuntimeError(f"No images matched: {folder_glob}")
+ paths = paths[:limit]
+
+ # Warmup
+ for p in paths[:min(warmup, len(paths))]:
+ bgr = cv.imread(p)
+ if bgr is None:
+ continue
+ proc.process_image(bgr, overlay=overlay)
+
+ # Benchmark
+ agg = {k: [] for k in ["warp_ms", "preprocess_ms", "onnx_ms", "solve_ms", "total_ms"]}
+ solved_cnt = 0
+ total_cnt = 0
+
+ for p in paths:
+ bgr = cv.imread(p)
+ if bgr is None:
+ continue
+
+ board, solved, dbg, _ = proc.process_image(bgr, overlay=overlay)
+ tim = dbg["timing"]
+
+ for k in agg:
+ agg[k].append(tim[k])
+
+ total_cnt += 1
+ if solved is not None:
+ solved_cnt += 1
+
+ print(f"\nSolved {solved_cnt}/{total_cnt} ({(solved_cnt/total_cnt*100.0 if total_cnt else 0):.1f}%)")
+
+ print("\nTiming summary (ms):")
+ for k in ["warp_ms", "preprocess_ms", "onnx_ms", "solve_ms", "total_ms"]:
+ s = summarize(agg[k])
+ print(f"{k:14s} mean={s['mean']:.2f} median={s['median']:.2f} p90={s['p90']:.2f} p95={s['p95']:.2f}")
+```
+
+Finally, we invoke the benchmark in the main() function:
+
+```python
+def main():
+ onnx_path = os.path.join("artifacts", "sudoku_digitnet.onnx")
+
+ proc = SudokuProcessor(onnx_path=onnx_path, warp_size=450, blank_conf_threshold=0.65)
+
+ benchmark_folder(proc, "data/grids/val/*_cam.png", limit=30, warmup=10, overlay=False)
+
+if __name__ == "__main__":
+ main()
+```
+
+This evaluates the processor on a representative subset of camera-like validation grids, prints aggregated timing statistics, and reports the overall solve rate.
+
+Aggregated benchmarks provide a much more accurate picture than single measurements, especially when individual stages take only a few milliseconds. By reporting median and tail latencies, you can see whether occasional slow cases exist and whether an optimization truly improves user-perceived performance. Percentiles are particularly useful when a few slow cases exist (e.g., harder solves), because they reveal tail latency. These results form a solid quantitative baseline that you can reuse to evaluate every optimization that follows.
+
+Here is the sample output of the updated script:
+```output
+python3 05_RunSudokuProcessor.py
+
+Solved 30/30 (100.0%)
+
+Timing summary (ms):
+warp_ms mean=10.25 median=10.27 p90=10.57 p95=10.59
+preprocess_ms mean=3.01 median=2.98 p90=3.16 p95=3.21
+onnx_ms mean=1.27 median=1.24 p90=1.30 p95=1.45
+solve_ms mean=74.76 median=2.02 p90=48.51 p95=74.82
+total_ms mean=89.41 median=16.97 p90=62.95 p95=89.43
+```
+
+Notice that solve_ms (and therefore total_ms) has a much larger mean than median. This indicates a small number of outliers where the solver takes significantly longer. In practice, this occurs when one or more digits are misrecognized, forcing the backtracking solver to explore many branches before finding a solution (or failing). For interactive applications, median and p95 latency are more informative than the mean, as they better reflect typical user experience.
+
+## ONNX Runtime session optimizations
+Now that you can measure onnx_ms and total_ms, the first low-effort improvement is to enable ONNX Runtime’s built-in graph optimizations and tune CPU threading. These changes do not modify the model, but can reduce inference overhead and improve throughput.
+
+In sudoku_processor.py, update the ONNX Runtime session initialization in __init__ to use SessionOptions:
+```python
+so = ort.SessionOptions()
+so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
+
+self.sess = ort.InferenceSession(onnx_path, sess_options=so, providers=list(providers))
+```
+
+Re-run 05_RunSudokuProcessor.py and compare onnx_ms and total_ms to the baseline.
+
+```output
+python3 05_RunSudokuProcessor.py
+
+Solved 30/30 (100.0%)
+
+Timing summary (ms):
+warp_ms mean=10.43 median=10.36 p90=10.89 p95=10.96
+preprocess_ms mean=3.13 median=3.11 p90=3.34 p95=3.42
+onnx_ms mean=1.28 median=1.26 p90=1.37 p95=1.47
+solve_ms mean=78.61 median=2.01 p90=50.15 p95=77.87
+total_ms mean=93.58 median=17.06 p90=65.10 p95=92.55
+```
+
+This result is expected for such a small model: ONNX inference is already efficient, and the dominant costs lie in image preprocessing and occasional solver backtracking. This highlights why system-level profiling is essential before focusing on model-level optimizations.
+
+## Quantize the model (FP32 -> INT8)
+Quantization is one of the most impactful optimizations for Arm64 and mobile deployments because it reduces both model size and compute cost. For CNNs, the most compatible approach is static INT8 quantization in QDQ format. This uses a small calibration set to estimate activation ranges and typically works well across runtimes.
+
+Create a small script 06_QuantizeModel.py:
+
+```python
+import os, glob
+import numpy as np
+import cv2 as cv
+
+from onnxruntime.quantization import (
+ quantize_static, CalibrationDataReader, QuantFormat, QuantType
+)
+
+ARTI_DIR = "artifacts"
+FP32_PATH = os.path.join(ARTI_DIR, "sudoku_digitnet.onnx")
+INT8_PATH = os.path.join(ARTI_DIR, "sudoku_digitnet.int8.onnx")
+
+# ---- Calibration data reader ----
+class SudokuCalibReader(CalibrationDataReader):
+ def __init__(self, folder_glob="data/train/0/*.png", limit=500, input_name="input", input_size=28):
+ self.input_name = input_name
+ self.input_size = input_size
+
+ paths = sorted(glob.glob(folder_glob))[:limit]
+ self._iter = iter(paths)
+
+ def get_next(self):
+ try:
+ p = next(self._iter)
+ except StopIteration:
+ return None
+
+ g = cv.imread(p, cv.IMREAD_GRAYSCALE)
+ if g is None:
+ return self.get_next()
+
+ g = cv.resize(g, (self.input_size, self.input_size), interpolation=cv.INTER_AREA)
+ x = g.astype(np.float32) / 255.0
+ x = (x - 0.5) / 0.5
+ x = x[None, None, :, :] # [1,1,28,28]
+ return {self.input_name: x}
+
+# ---- Run quantization ----
+reader = SudokuCalibReader(folder_glob="data/train/*/*.png", limit=1000)
+
+print("Quantizing (QDQ static INT8)...")
+quantize_static(
+ model_input=FP32_PATH,
+ model_output=INT8_PATH,
+ calibration_data_reader=reader,
+ quant_format=QuantFormat.QDQ, # key: keep Conv as Conv with Q/DQ wrappers
+ activation_type=QuantType.QInt8,
+ weight_type=QuantType.QInt8,
+ per_channel=True # usually helps conv accuracy
+)
+
+print("Saved:", INT8_PATH)
+```
+
+Run python 06_QuantizeModel.py
+
+Then update the runner script to point to the quantized model:
+
+```python
+onnx_path = os.path.join("artifacts", "sudoku_digitnet.int8.onnx")
+```
+
+Re-run the processor and compare:
+* onnx_ms (should improve or remain similar)
+* total_ms
+* solve success (should remain stable)
+
+Also compare file sizes:
+```console
+ls -lh artifacts/sudoku_digitnet.onnx artifacts/sudoku_digitnet.int8.onnx
+```
+Even when inference time changes only modestly, size reduction is typically significant and matters for Android packaging.
+
+In this pipeline, quantization primarily reduces model size and improves deployability, while runtime speedups may be modest because inference is already a small fraction of the total latency.
+
+## Preprocessing-focused optimizations (highest impact)
+The measurements above show that ONNX inference accounts for only a small fraction of the total runtime. In practice, the largest performance gains come from optimizing image preprocessing.
+
+The most effective improvements include:
+- Converting the rectified board to grayscale **once**, instead of converting each cell independently.
+- Adding an early “blank cell” check to skip expensive thresholding and morphology for empty cells.
+- Using simpler thresholding (e.g., Otsu) on clean images, and reserving adaptive thresholding for difficult lighting conditions.
+- Reducing or conditionally disabling morphological operations when cells already appear clean.
+
+These changes typically reduce `preprocess_ms` more than any model-level optimization, and therefore have the greatest impact on end-to-end latency.
+
+## Summary
+In this section, we transformed the Sudoku solver from a functional prototype into a system with measurable, well-understood performance characteristics. By instrumenting the pipeline with fine-grained timing, we identified where computation is actually spent and established a quantitative baseline.
+
+We showed that:
+- Batched ONNX inference is already efficient (≈1–2 ms per board).
+- Image preprocessing dominates runtime and offers the largest optimization potential.
+- Solver backtracking introduces rare but significant tail-latency outliers.
+- ONNX Runtime optimizations and INT8 quantization improve deployability, even when raw inference speed gains are modest.
+
+Most importantly, we demonstrated a systematic optimization workflow: **measure first, optimize second, and always re-validate correctness**. With performance, robustness, and accuracy validated, the Sudoku pipeline is now ready for its final step—deployment as a fully on-device Android application.
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/08_Android.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/08_Android.md
new file mode 100644
index 0000000000..e05b6fd2a2
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/08_Android.md
@@ -0,0 +1,1059 @@
+---
+# User change
+title: "Android Deployment. From Model to App"
+
+weight: 9
+
+layout: "learningpathall"
+---
+
+## Objective ##
+In this section, we transition from a desktop prototype to a fully on-device Android application. The goal is to demonstrate how the optimized Sudoku pipeline—image preprocessing, ONNX inference, and deterministic solving—can be packaged and executed entirely on a mobile device, without relying on any cloud services.
+
+Rather than starting with a live camera feed, we begin with a fixed input bitmap that was generated earlier in the learning path. This approach allows us to focus on correctness, performance, and integration details before introducing additional complexity such as camera permissions, real-time capture, and varying lighting conditions. By keeping the input controlled, we can verify that the Android implementation faithfully reproduces the behavior observed in Python.
+
+Over the course of this section, we will:
+1. Create a new Android project and add the required dependencies.
+2. Bundle the trained ONNX model and a sample Sudoku image with the application.
+3. Implement a minimal user interface that loads the image and triggers the solver.
+4. Re-implement the Sudoku processing pipeline on Android, including preprocessing, batched ONNX inference, and solving.
+5. Display the solved result as an image, confirming that the entire pipeline runs locally on the device.
+
+By the end of this section, you will have a working Android app that takes a Sudoku image, runs neural network inference and solving on-device, and displays the solution. This completes the learning path by showing how a trained and optimized ONNX model can be deployed in a real mobile application, closing the loop from data generation and training to practical, end-user deployment.
+
+## Project creation
+We start by creating a new Android project using Android Studio. This project will host the Sudoku solver application and serve as the foundation for integrating ONNX Runtime and OpenCV.
+
+1. Create a new project:
+* Open Android Studio and click New Project.
+* In the Templates screen, select Phone and Tablet, then choose Empty Views Activity.
+
+
+This template creates a minimal Android application without additional UI components, which is ideal for a focused, step-by-step integration.
+
+* Click Next to proceed to the project configuration screen.
+
+2. Configure the project. In the configuration screen, fill in the fields as follows:
+* Name: SudokuSolverOnnx. This is the application name that will appear in Android Studio and on the device.
+* Package name: com.arm.sudokusolveronnx. This package name clearly reflects the purpose of the app and its use of ONNX on Arm platforms.
+* Save location. Choose a convenient directory on your system (for example, your repositories folder).
+* Language: Kotlin. Kotlin is the recommended language for modern Android development and integrates cleanly with ONNX Runtime APIs.
+* Minimum SDK: API 24 (Android 7.0 – Nougat). This provides wide device coverage while remaining compatible with ONNX Runtime and OpenCV.
+* Build configuration language: Kotlin DSL (build.gradle.kts). We use the Kotlin DSL for Gradle, which is now the recommended option.
+
+
+
+* After confirming these settings, click Finish. Android Studio will create the project and generate a basic MainActivity along with the necessary Gradle files.
+
+## View
+We now define the user interface of the Android application. The goal of this view is to remain intentionally simple while clearly exposing the end-to-end Sudoku workflow. The interface will consist of:
+* A button row at the top that allows the user to load a Sudoku image and trigger the solver.
+* A status text area used to display short messages (for example, whether an image has been loaded or the puzzle has been solved).
+* An input image view that displays the selected Sudoku bitmap.
+* An output image view that displays the solved result.
+
+This layout is sufficient to validate that the ONNX model, preprocessing pipeline, and solver are all working correctly on Android before adding more advanced features such as camera input or animations.
+
+To define the view, open the file res/layout/activity_main.xml and replace its contents with the following layout definition:
+```xml
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+```
+
+This layout uses a ConstraintLayout as the root container to ensure it adapts cleanly across different screen sizes. The UI is split into two parts: a pinned header and a scrollable content area. The pinned header at the top contains a horizontal button row with two equally sized buttons:
+* Load image randomly selects one of the bundled Sudoku bitmaps and displays it in the Input view.
+* Solve triggers the inference and solving pipeline (it starts disabled and becomes enabled after an image is loaded).
+* Directly below the buttons, a status text field provides quick feedback to the user (for example, whether an image has been loaded or the solver is running).
+
+Below the header, the screen contains a ScrollView that holds the image content:
+* The Input section displays the selected Sudoku bitmap.
+* The Solved section displays the output image produced after running inference and solving.
+
+Because the image area is scrollable, the layout remains usable even on smaller screens, while the buttons and status remain accessible at all times.
+
+When rendered, this produces a clear, vertically structured interface with a fixed control panel at the top and large input/output images underneath, as shown in the figure below.
+
+
+
+At this stage, the UI is intentionally minimal. In the next step, we will connect this view to the application logic in MainActivity, load a sample Sudoku bitmap, and wire up the Load image and Solve buttons to the ONNX-based processing pipeline.
+
+## Preparing input images for the Android app
+Before wiring the application logic, we need to provide the Android app with a small set of Sudoku images that it can load and solve. For this learning path, we deliberately use a fixed collection of pre-generated images (from Preparing a Synthetic Sudoku Digit Dataset) instead of a camera feed. This keeps the Android integration simple and allows us to focus on ONNX inference and solver integration first.
+
+1. Select Sudoku images. From the earlier Python steps, select a small number of generated Sudoku images. These can be either:
+* Clean grids (book-style), or
+* Camera-like grids (with perspective distortion and noise).
+
+For example, you might choose:
+data/grids/val/000000_cam.png
+data/grids/val/000000_clean.png
+...
+
+Using both clean and camera-like images is useful later for testing robustness.
+
+2. Rename images to Android-friendly names. Android resource names must follow strict rules:
+* lowercase letters only,
+* numbers allowed,
+* underscores allowed,
+* no spaces or hyphens.
+
+Rename your files accordingly, for example:
+* sudoku_01.png
+* sudoku_cam_01.png
+
+3. Copy the renamed PNG files into the following directory of your Android project:
+
+app/src/main/res/drawable/
+
+After copying, Android Studio will automatically generate resource IDs for these images.
+
+4. Once the images are in place:
+* Let Android Studio finish indexing and syncing.
+* Open the Project view and navigate to res/drawable/.
+* Verify that the images appear without errors.
+
+You should now be able to reference these images in Kotlin code using identifiers such as:
+
+R.drawable.sudoku_01
+R.drawable.sudoku_cam_01
+
+In this tutorial, the Load image button will randomly select one of these drawable resources and display it in the app. This provides a deterministic and repeatable input source while validating the full Sudoku pipeline on Android.
+
+At this point, the Android project has all the static resources it needs. In the next step, we will implement MainActivity.kt, wire up the Load image and Solve buttons, and display the selected Sudoku image in the UI.
+
+## Preparing the ONNX model for the Android app
+In addition to the input images, the Android application needs access to the trained ONNX model so that it can run inference directly on the device. Android does not allow arbitrary file access by default, so the model must be bundled with the app as an asset.
+
+1. From the previous optimization steps, you should have at least one ONNX model available in your Python project, for example:
+* sudoku_digitnet.onnx (FP32 model), or
+* sudoku_digitnet.int8.onnx (INT8 quantized model, if supported on your target device).
+
+For the initial Android integration, it is recommended to start with the FP32 model, as it offers the broadest compatibility. You can switch to the quantized model later once everything is working end-to-end.
+
+2. In your Android project, create the following directory if it does not already exist:
+
+app/src/main/assets/
+
+If you prefer to keep assets organized, you can also create a subfolder:
+
+app/src/main/assets/models/
+
+Both approaches work. In the examples that follow, we will assume the model is placed directly under assets/.
+
+3. Prepare the model for android such that it does not contain external resources. Android assets are not normal filesystem paths, so models that reference external.onnx.data files will fail to load unless they are merged into a single ONNX file. To do so, create another Python file 07_PrepareModelForAndroid.py:
+```python
+import onnx
+from onnx import external_data_helper
+
+IN_PATH = "artifacts/sudoku_digitnet.onnx"
+OUT_PATH = "artifacts/sudoku_digitnet_android.onnx"
+
+model = onnx.load(IN_PATH)
+
+# If the model references external data, load it into the model object
+external_data_helper.load_external_data_for_model(model, base_dir="artifacts")
+
+# Clear external locations so it can be saved as a single file
+for init in model.graph.initializer:
+ if init.data_location == onnx.TensorProto.EXTERNAL:
+ init.data_location = onnx.TensorProto.DEFAULT
+ # Remove external data metadata entries
+ del init.external_data[:]
+
+# Save as a single-file ONNX
+onnx.save_model(model, OUT_PATH, save_as_external_data=False)
+print("Saved:", OUT_PATH)
+```
+
+4. Run the script. Then, copy the selected ONNX model file (sudoku_digitnet_android) into the assets directory, for example:
+```console
+app/src/main/assets/sudoku_digitnet_android.onnx
+```
+
+or, if using a subfolder:
+```console
+app/src/main/assets/models/sudoku_digitnet_android.onnx
+```
+5. After copying the model:
+* Let Android Studio finish indexing and syncing the project.
+* In the Project view, expand the assets folder.
+* Verify that the ONNX file appears without any warnings or errors.
+
+6. Assets are accessed via the Android AssetManager. Later in this tutorial, we will load the model using code similar to:
+```console
+assets.open("sudoku_digitnet_android.onnx")
+```
+If you placed the model in a subfolder, include the relative path:
+```console
+assets.open("models/sudoku_digitnet_android.onnx")
+```
+This input stream will be passed to ONNX Runtime to create an inference session on the device. At this point, the Android project contains:
+* A set of Sudoku images in res/drawable/,
+* A trained ONNX model in assets/.
+
+In the next step, we will implement MainActivity.kt, wire up the Load image and Solve buttons, and verify that the app can successfully load both the image and the ONNX model before running inference.
+
+## Implement MainActivity.kt (Load image + basic UI wiring)
+Open app/src/main/java/com/arm/sudokusolveronnx/MainActivity.kt and replace it with:
+
+```kotlin
+package com.arm.sudokusolveronnx
+
+import android.graphics.Bitmap
+import android.graphics.BitmapFactory
+import android.os.Bundle
+import android.widget.Button
+import android.widget.ImageView
+import android.widget.TextView
+import androidx.appcompat.app.AppCompatActivity
+import kotlin.random.Random
+
+class MainActivity : AppCompatActivity() {
+
+ private lateinit var btnLoadRandom: Button
+ private lateinit var btnSolve: Button
+ private lateinit var txtStatus: TextView
+ private lateinit var imgInput: ImageView
+ private lateinit var imgOutput: ImageView
+
+ private var currentBitmap: Bitmap? = null
+
+ // Clean and camera-like pools (you copied these into res/drawable/)
+ private val sudokuCleanImages = listOf(
+ R.drawable.sudoku_01,
+ R.drawable.sudoku_02,
+ R.drawable.sudoku_03,
+ R.drawable.sudoku_04,
+ R.drawable.sudoku_05,
+ R.drawable.sudoku_06,
+ R.drawable.sudoku_07,
+ R.drawable.sudoku_08,
+ R.drawable.sudoku_09,
+ R.drawable.sudoku_10,
+ )
+
+ private val sudokuCamImages = listOf(
+ R.drawable.sudoku_cam_01,
+ R.drawable.sudoku_cam_02,
+ R.drawable.sudoku_cam_03,
+ R.drawable.sudoku_cam_04,
+ R.drawable.sudoku_cam_05,
+ R.drawable.sudoku_cam_06,
+ R.drawable.sudoku_cam_07,
+ R.drawable.sudoku_cam_08,
+ R.drawable.sudoku_cam_09,
+ R.drawable.sudoku_cam_10,
+ )
+
+ override fun onCreate(savedInstanceState: Bundle?) {
+ super.onCreate(savedInstanceState)
+ setContentView(R.layout.activity_main)
+
+ btnLoadRandom = findViewById(R.id.btnLoadRandom)
+ btnSolve = findViewById(R.id.btnSolve)
+ txtStatus = findViewById(R.id.txtStatus)
+ imgInput = findViewById(R.id.imgInput)
+ imgOutput = findViewById(R.id.imgOutput)
+
+ btnSolve.isEnabled = false
+ txtStatus.text = "Ready"
+
+ btnLoadRandom.setOnClickListener {
+ loadRandomSudokuImage(useCameraLike = true)
+ }
+
+ btnSolve.setOnClickListener {
+ txtStatus.text = "Solve clicked (engine not wired yet)"
+ imgOutput.setImageBitmap(currentBitmap) // temporary: mirror input
+ }
+ }
+
+ private fun loadRandomSudokuImage(useCameraLike: Boolean) {
+ val pool = if (useCameraLike) sudokuCamImages else sudokuCleanImages
+ val resId = pool[Random.nextInt(pool.size)]
+
+ val bmp = BitmapFactory.decodeResource(resources, resId)
+ currentBitmap = bmp
+
+ imgInput.setImageBitmap(bmp)
+ imgOutput.setImageDrawable(null)
+
+ btnSolve.isEnabled = true
+ txtStatus.text = if (useCameraLike) "Loaded camera-like Sudoku" else "Loaded clean Sudoku"
+ }
+}
+```
+
+What this gives you immediately
+* Tap Load image. Random sudoku_cam_XX loads into Input
+* Solve becomes enabled
+* Tap Solve. For now it just mirrors the input to output + updates status (We’ll replace that with the real ONNX+OpenCV solver next.)
+
+Run the app now. This verifies your layout IDs are correct and your drawables are packaged properly.
+
+Here is a clean, reader-facing addition you can include in Step 1 (MainActivity wiring) to explain the expected compile error and how to fix it. It’s written in the same instructional tone as the rest of the learning path.
+
+After implementing MainActivity.kt and running the app for the first time, you may encounter a compile-time error. This is expected and related to the Android SDK level used by the project template.
+
+The error occurs because recent versions of Android Studio and its dependencies (in particular androidx.activity and related libraries) require a newer Compile SDK than the default project configuration provides.
+
+Android Studio templates sometimes lag behind the latest library requirements. In this project, we are using up-to-date AndroidX components, which expect the project to be compiled against Android API level 35.
+
+This does not affect which devices your app can run on. It only affects which APIs are available at compile time.
+
+To fix the error, open the Gradle build file for the app module:
+
+app/build.gradle.kts
+
+Update the android {} block so that compileSdk is set to 35, as shown below:
+
+```text
+plugins {
+ alias(libs.plugins.android.application)
+ alias(libs.plugins.kotlin.android)
+}
+
+android {
+ namespace = "com.arm.sudokusolveronnx"
+ compileSdk = 35
+
+ defaultConfig {
+ applicationId = "com.arm.sudokusolveronnx"
+ minSdk = 24
+ targetSdk = 34
+ versionCode = 1
+ versionName = "1.0"
+
+ testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner"
+ }
+
+ buildTypes {
+ release {
+ isMinifyEnabled = false
+ proguardFiles(
+ getDefaultProguardFile("proguard-android-optimize.txt"),
+ "proguard-rules.pro"
+ )
+ }
+ }
+
+ compileOptions {
+ sourceCompatibility = JavaVersion.VERSION_11
+ targetCompatibility = JavaVersion.VERSION_11
+ }
+
+ kotlinOptions {
+ jvmTarget = "11"
+ }
+}
+
+dependencies {
+ implementation(libs.androidx.core.ktx)
+ implementation(libs.androidx.appcompat)
+ implementation(libs.material)
+ implementation(libs.androidx.activity)
+ implementation(libs.androidx.constraintlayout)
+
+ testImplementation(libs.junit)
+ androidTestImplementation(libs.androidx.junit)
+ androidTestImplementation(libs.androidx.espresso.core)
+}
+```
+
+After making this change:
+1. Click Sync Now when Android Studio prompts you.
+2. Rebuild and run the app.
+
+The project should now compile and launch successfully.
+
+At this point, the app should start, display the UI, and allow you to load random Sudoku images. In the next step, we will replace the placeholder logic in the Solve button with the real ONNX- and OpenCV-based Sudoku processing engine.
+
+## Processing pipeline on Android
+With the user interface and static resources in place, we can now wire the full Sudoku processing pipeline on Android. Conceptually, this pipeline mirrors the Python implementation developed earlier in the learning path, but is reimplemented using Android-compatible components.
+
+The pipeline consists of four stages:
+1. Grid detection and rectification (OpenCV). The input bitmap is converted to an OpenCV matrix, the Sudoku grid is detected, and a perspective transform is applied to obtain a top-down, square view of the board.
+2. Digit recognition (ONNX Runtime). The rectified grid is split into 81 cells, each cell is preprocessed to match the training distribution, and all cells are passed as a single batch to the ONNX model for digit recognition.
+3. Solving (Kotlin). The recognized board is solved using a deterministic backtracking algorithm. This step is lightweight but can exhibit occasional tail latency when recognition errors introduce ambiguity.
+4. Rendering and overlay. The solution is rendered back onto the original image by inverse-warping a transparent overlay from the rectified grid space to the input image.
+
+### Dependencies
+To support this pipeline, we add three dependencies:
+* ONNX Runtime for on-device inference,
+* OpenCV for image processing and geometric transformations,
+* Kotlin coroutines to ensure that heavy computation runs off the UI thread.
+
+We open build.gradle.kts and add the following
+```text
+dependencies {
+ implementation("com.microsoft.onnxruntime:onnxruntime-android:1.18.0")
+ implementation("org.opencv:opencv:4.10.0")
+ implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.8.1")
+}
+```
+
+Then sync Gradle. Using the Maven dependency keeps the setup simple for this tutorial.
+
+Then, make sure you have:
+```console
+app/src/main/assets/sudoku_digitnet_android.onnx
+```
+
+### Core components
+The Android implementation is organized into three small, focused components:
+1. SudokuSolver.kt. Implements a classic backtracking Sudoku solver. This logic is deterministic and independent of the machine learning model.
+2. SudokuEngine.kt. Encapsulates the full vision and inference pipeline. It loads the ONNX model from assets, performs grid detection, preprocessing, batched inference, solving, and overlay generation.
+3. BoardRenderer.kt. Provides a utility to render a clean Sudoku grid bitmap. This is useful for debugging and for visualizing results independent of the original image.
+
+This separation keeps the codebase readable and makes it easy to extend or replace individual stages later.
+
+Create these Kotlin files under:
+```console
+app/src/main/java/com/arm/sudokusolveronnx/
+```
+
+* SudokuSolver.kt (backtracking)
+
+```kotlin
+package com.arm.sudokusolveronnx
+
+object SudokuSolver {
+ fun solve(board: Array): Boolean {
+ val pos = findEmpty(board) ?: return true
+ val r = pos.first
+ val c = pos.second
+ for (v in 1..9) {
+ if (isValid(board, r, c, v)) {
+ board[r][c] = v
+ if (solve(board)) return true
+ board[r][c] = 0
+ }
+ }
+ return false
+ }
+
+ private fun findEmpty(board: Array): Pair? {
+ for (r in 0 until 9) for (c in 0 until 9) if (board[r][c] == 0) return r to c
+ return null
+ }
+
+ private fun isValid(board: Array, r: Int, c: Int, v: Int): Boolean {
+ for (j in 0 until 9) if (board[r][j] == v) return false
+ for (i in 0 until 9) if (board[i][c] == v) return false
+ val br = (r / 3) * 3
+ val bc = (c / 3) * 3
+ for (i in br until br + 3) for (j in bc until bc + 3) if (board[i][j] == v) return false
+ return true
+ }
+}
+```
+
+* SudokuEngine.kt (OpenCV + ORT inference)
+
+```kotlin
+package com.arm.sudokusolveronnx
+
+import ai.onnxruntime.OnnxTensor
+import ai.onnxruntime.OrtEnvironment
+import ai.onnxruntime.OrtSession
+import android.content.Context
+import android.graphics.Bitmap
+import org.opencv.android.Utils
+import org.opencv.core.*
+import org.opencv.imgproc.Imgproc
+import java.nio.FloatBuffer
+
+class SudokuEngine(
+ private val context: Context,
+ private val modelAssetName: String = "sudoku_digitnet_android.onnx",
+ private val warpSize: Int = 450,
+ private val inputSize: Int = 28,
+ private val blankConfThreshold: Float = 0.65f
+) {
+ private val env: OrtEnvironment = OrtEnvironment.getEnvironment()
+ private val session: OrtSession
+
+ init {
+ val modelBytes = context.assets.open(modelAssetName).use { it.readBytes() }
+ val opts = OrtSession.SessionOptions()
+ session = env.createSession(modelBytes, opts)
+ }
+
+ private data class WarpResult(
+ val warped: Mat,
+ val H: Mat // perspective transform from original -> warped
+ )
+
+ data class Result(
+ val recognized: Array,
+ val solved: Array?,
+ val solvedBitmap: Bitmap?,
+ val overlayBitmap: Bitmap?
+ )
+
+ fun solveBitmap(input: Bitmap): Result {
+ // Bitmap -> Mat (BGR/RGBA depending on Utils, but works for our pipeline)
+ val bgr = Mat()
+ Utils.bitmapToMat(input, bgr)
+
+ val warp = detectAndWarp(bgr) ?: return Result(emptyBoard(), null, null, null)
+
+ val board = recognizeBoard(warp.warped)
+ val solved = board.map { it.clone() }.toTypedArray()
+ val ok = SudokuSolver.solve(solved)
+
+ return if (ok) {
+ val solvedGrid = BoardRenderer.render(solved) // keep if you still want it
+ val overlay = makeOverlayBitmap(input, warp.H, board, solved)
+ Result(board, solved, solvedGrid, overlay)
+ } else {
+ Result(board, null, null, null)
+ }
+ }
+
+ private fun emptyBoard(): Array = Array(9) { IntArray(9) }
+
+ private fun detectAndWarp(bgr: Mat): WarpResult? {
+ val gray = Mat()
+ Imgproc.cvtColor(bgr, gray, Imgproc.COLOR_BGR2GRAY)
+ Imgproc.GaussianBlur(gray, gray, Size(5.0, 5.0), 0.0)
+
+ val thr = Mat()
+ Imgproc.adaptiveThreshold(
+ gray, thr, 255.0,
+ Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C,
+ Imgproc.THRESH_BINARY_INV,
+ 31, 7.0
+ )
+
+ val kernel = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, Size(3.0, 3.0))
+ Imgproc.morphologyEx(thr, thr, Imgproc.MORPH_CLOSE, kernel, Point(-1.0, -1.0), 2)
+
+ val contours = ArrayList()
+ Imgproc.findContours(thr, contours, Mat(), Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE)
+ if (contours.isEmpty()) return null
+ contours.sortByDescending { Imgproc.contourArea(it) }
+
+ var quad: MatOfPoint2f? = null
+ for (i in 0 until minOf(20, contours.size)) {
+ val c = contours[i]
+ val peri = Imgproc.arcLength(MatOfPoint2f(*c.toArray()), true)
+ val approx = MatOfPoint2f()
+ Imgproc.approxPolyDP(MatOfPoint2f(*c.toArray()), approx, 0.02 * peri, true)
+ if (approx.total().toInt() == 4) {
+ quad = approx
+ break
+ }
+ }
+ if (quad == null) return null
+
+ val pts = orderQuad(quad.toArray())
+ val dst = arrayOf(
+ Point(0.0, 0.0),
+ Point((warpSize - 1).toDouble(), 0.0),
+ Point((warpSize - 1).toDouble(), (warpSize - 1).toDouble()),
+ Point(0.0, (warpSize - 1).toDouble())
+ )
+
+ val M = Imgproc.getPerspectiveTransform(MatOfPoint2f(*pts), MatOfPoint2f(*dst))
+ val warped = Mat()
+ Imgproc.warpPerspective(bgr, warped, M, Size(warpSize.toDouble(), warpSize.toDouble()))
+ return WarpResult(warped = warped, H = M)
+ }
+
+ private fun orderQuad(pts: Array): Array {
+ // order: TL, TR, BR, BL
+ val sum = pts.map { it.x + it.y }
+ val diff = pts.map { it.x - it.y }
+ val tl = pts[sum.indices.minBy { sum[it] }]
+ val br = pts[sum.indices.maxBy { sum[it] }]
+ val tr = pts[diff.indices.maxBy { diff[it] }]
+ val bl = pts[diff.indices.minBy { diff[it] }]
+ return arrayOf(tl, tr, br, bl)
+ }
+
+ private fun recognizeBoard(warpedBgr: Mat): Array {
+ val step = warpSize / 9
+ val inputs = FloatArray(81 * 1 * inputSize * inputSize)
+
+ var idx = 0
+ for (r in 0 until 9) {
+ for (c in 0 until 9) {
+ val cell = warpedBgr.submat(r * step, (r + 1) * step, c * step, (c + 1) * step)
+ val tensor = preprocessCell(cell) // FloatArray length = 1*28*28
+ System.arraycopy(tensor, 0, inputs, idx * inputSize * inputSize, inputSize * inputSize)
+ idx++
+ }
+ }
+
+ val shape = longArrayOf(81, 1, inputSize.toLong(), inputSize.toLong())
+ val fb = FloatBuffer.wrap(inputs)
+ val inputTensor = OnnxTensor.createTensor(env, fb, shape)
+
+ val out = session.run(mapOf("input" to inputTensor))
+ val logits = out[0].value as Array // [81][10]
+ out.close()
+ inputTensor.close()
+
+ val board = Array(9) { IntArray(9) }
+ for (i in 0 until 81) {
+ val probs = softmax(logits[i])
+ var bestK = 0
+ var bestV = probs[0]
+ for (k in 1 until probs.size) {
+ if (probs[k] > bestV) { bestV = probs[k]; bestK = k }
+ }
+ val r = i / 9
+ val c = i % 9
+ board[r][c] = if (bestV < blankConfThreshold) 0 else bestK
+ }
+ return board
+ }
+
+ private fun preprocessCell(cellBgr: Mat): FloatArray {
+ val gray = Mat()
+ Imgproc.cvtColor(cellBgr, gray, Imgproc.COLOR_BGR2GRAY)
+
+ val m = (0.12 * minOf(gray.rows(), gray.cols())).toInt()
+ val cropped = gray.submat(m, gray.rows() - m, m, gray.cols() - m)
+
+ val resized = Mat()
+ Imgproc.resize(cropped, resized, Size(inputSize.toDouble(), inputSize.toDouble()), 0.0, 0.0, Imgproc.INTER_AREA)
+
+ val out = FloatArray(inputSize * inputSize)
+ var k = 0
+ for (y in 0 until inputSize) {
+ for (x in 0 until inputSize) {
+ val v = resized.get(y, x)[0].toFloat() / 255f
+ out[k++] = (v - 0.5f) / 0.5f
+ }
+ }
+ return out
+ }
+
+ private fun softmax(x: FloatArray): FloatArray {
+ var max = x[0]
+ for (v in x) if (v > max) max = v
+ val e = FloatArray(x.size)
+ var sum = 0f
+ for (i in x.indices) {
+ val v = kotlin.math.exp((x[i] - max).toDouble()).toFloat()
+ e[i] = v
+ sum += v
+ }
+ for (i in e.indices) e[i] /= (sum + 1e-12f)
+ return e
+ }
+
+ private fun makeOverlayBitmap(
+ originalBitmap: Bitmap,
+ H: Mat,
+ recognized: Array,
+ solved: Array
+ ): Bitmap {
+ // Convert original to Mat (could be RGBA on Android)
+ val original = Mat()
+ Utils.bitmapToMat(originalBitmap, original)
+
+ // Ensure original is BGR (3 channels)
+ val originalBgr = Mat()
+ if (original.channels() == 4) {
+ Imgproc.cvtColor(original, originalBgr, Imgproc.COLOR_RGBA2BGR)
+ } else {
+ original.copyTo(originalBgr)
+ }
+
+ // Create layer in warped space (BGR)
+ val layer = Mat.zeros(warpSize, warpSize, CvType.CV_8UC3)
+ val step = warpSize / 9
+ for (r in 0 until 9) {
+ for (c in 0 until 9) {
+ if (recognized[r][c] != 0) continue
+ val d = solved[r][c]
+ val x = (c * step + step * 0.32).toInt()
+ val y = (r * step + step * 0.72).toInt()
+ Imgproc.putText(
+ layer, d.toString(),
+ Point(x.toDouble(), y.toDouble()),
+ Imgproc.FONT_HERSHEY_SIMPLEX,
+ 1.2,
+ Scalar(0.0, 200.0, 0.0), // green in BGR
+ 2,
+ Imgproc.LINE_AA
+ )
+ }
+ }
+
+ // Inverse warp to original size (BGR)
+ val invH = Mat()
+ Core.invert(H, invH)
+
+ val back = Mat.zeros(originalBgr.size(), CvType.CV_8UC3)
+ Imgproc.warpPerspective(layer, back, invH, originalBgr.size())
+
+ // Mask where back has pixels
+ val mask = Mat()
+ Imgproc.cvtColor(back, mask, Imgproc.COLOR_BGR2GRAY)
+ Imgproc.threshold(mask, mask, 1.0, 255.0, Imgproc.THRESH_BINARY)
+
+ // Blend (same size + same channels)
+ val blended = Mat()
+ Core.addWeighted(originalBgr, 0.6, back, 0.4, 0.0, blended)
+
+ // Copy only where mask is present
+ val outBgr = originalBgr.clone()
+ blended.copyTo(outBgr, mask)
+
+ // Convert back to bitmap (need RGBA for Android Bitmap)
+ val outRgba = Mat()
+ Imgproc.cvtColor(outBgr, outRgba, Imgproc.COLOR_BGR2RGBA)
+
+ val outBmp = Bitmap.createBitmap(originalBitmap.width, originalBitmap.height, Bitmap.Config.ARGB_8888)
+ Utils.matToBitmap(outRgba, outBmp)
+ return outBmp
+ }
+}
+```
+
+* BoardRenderer.kt
+```
+package com.arm.sudokusolveronnx
+
+import android.graphics.*
+
+object BoardRenderer {
+ fun render(board: Array, cell: Int = 80, margin: Int = 24): Bitmap {
+ val size = 9 * cell + 2 * margin
+ val bmp = Bitmap.createBitmap(size, size, Bitmap.Config.ARGB_8888)
+ val canvas = Canvas(bmp)
+ canvas.drawColor(Color.WHITE)
+
+ val thin = Paint().apply { color = Color.BLACK; strokeWidth = 2f; isAntiAlias = true }
+ val thick = Paint().apply { color = Color.BLACK; strokeWidth = 6f; isAntiAlias = true }
+
+ // grid
+ for (i in 0..9) {
+ val p = if (i % 3 == 0) thick else thin
+ val x = (margin + i * cell).toFloat()
+ val y = (margin + i * cell).toFloat()
+ canvas.drawLine(x, margin.toFloat(), x, (margin + 9 * cell).toFloat(), p)
+ canvas.drawLine(margin.toFloat(), y, (margin + 9 * cell).toFloat(), y, p)
+ }
+
+ val textPaint = Paint().apply {
+ color = Color.BLACK
+ textSize = (cell * 0.62f)
+ isAntiAlias = true
+ textAlign = Paint.Align.CENTER
+ typeface = Typeface.create(Typeface.SANS_SERIF, Typeface.BOLD)
+ }
+
+ val fm = textPaint.fontMetrics
+ val textYOffset = (fm.ascent + fm.descent) / 2f
+
+ for (r in 0 until 9) {
+ for (c in 0 until 9) {
+ val v = board[r][c]
+ if (v == 0) continue
+ val cx = margin + c * cell + cell / 2f
+ val cy = margin + r * cell + cell / 2f - textYOffset
+ canvas.drawText(v.toString(), cx, cy, textPaint)
+ }
+ }
+ return bmp
+ }
+}
+```
+
+### Overlay rendering
+Instead of simply rendering a solved grid, the application overlays the missing digits directly onto the original Sudoku image. This is achieved by drawing the solution in the rectified grid space and then mapping it back to the input image using the inverse perspective transform.
+
+Only cells that were originally empty are filled, and the solution digits are rendered in green to distinguish them from the original puzzle. This approach closely matches how real-world Sudoku solver apps present results and provides an intuitive visual confirmation that the pipeline is working correctly.
+
+## MainActivity integration
+MainActivity acts as a thin integration layer between the UI and the processing engine. Its responsibilities are intentionally minimal:
+* loading a random Sudoku image from resources,
+* invoking the solver on a background thread,
+* updating the UI with the solved result or an error message.
+
+All heavy computation is delegated to SudokuEngine, which ensures that the UI remains responsive during processing.
+
+```Kotlin
+package com.arm.sudokusolveronnx
+
+import android.graphics.Bitmap
+import android.graphics.BitmapFactory
+import android.os.Bundle
+import android.widget.Button
+import android.widget.ImageView
+import android.widget.TextView
+import androidx.appcompat.app.AppCompatActivity
+import androidx.lifecycle.lifecycleScope
+import kotlinx.coroutines.Dispatchers
+import kotlinx.coroutines.launch
+import kotlinx.coroutines.withContext
+import kotlin.random.Random
+import org.opencv.android.OpenCVLoader
+class MainActivity : AppCompatActivity() {
+
+ private lateinit var btnLoadRandom: Button
+ private lateinit var btnSolve: Button
+ private lateinit var txtStatus: TextView
+ private lateinit var imgInput: ImageView
+ private lateinit var imgOutput: ImageView
+ private lateinit var engine: SudokuEngine
+
+ private var currentBitmap: Bitmap? = null
+
+ // Clean and camera-like pools (you copied these into res/drawable/)
+ private val sudokuCleanImages = listOf(
+ R.drawable.sudoku_01,
+ R.drawable.sudoku_02,
+ R.drawable.sudoku_03,
+ R.drawable.sudoku_04,
+ R.drawable.sudoku_05,
+ R.drawable.sudoku_06,
+ R.drawable.sudoku_07,
+ R.drawable.sudoku_08,
+ R.drawable.sudoku_09,
+ R.drawable.sudoku_10,
+ )
+
+ private val sudokuCamImages = listOf(
+ R.drawable.sudoku_cam_01,
+ R.drawable.sudoku_cam_02,
+ R.drawable.sudoku_cam_03,
+ R.drawable.sudoku_cam_04,
+ R.drawable.sudoku_cam_05,
+ R.drawable.sudoku_cam_06,
+ R.drawable.sudoku_cam_07,
+ R.drawable.sudoku_cam_08,
+ R.drawable.sudoku_cam_09,
+ R.drawable.sudoku_cam_10,
+ )
+
+ override fun onCreate(savedInstanceState: Bundle?) {
+ super.onCreate(savedInstanceState)
+ setContentView(R.layout.activity_main)
+
+ btnLoadRandom = findViewById(R.id.btnLoadRandom)
+ btnSolve = findViewById(R.id.btnSolve)
+ txtStatus = findViewById(R.id.txtStatus)
+ imgInput = findViewById(R.id.imgInput)
+ imgOutput = findViewById(R.id.imgOutput)
+
+ btnSolve.isEnabled = false
+
+ val ok = OpenCVLoader.initLocal()
+ txtStatus.text = if (ok) "OpenCV ready" else "OpenCV init failed"
+
+ engine = SudokuEngine(this)
+
+ btnLoadRandom.setOnClickListener {
+ loadRandomSudokuImage(useCameraLike = true)
+ }
+
+ btnSolve.setOnClickListener {
+ val bmp = currentBitmap ?: return@setOnClickListener
+ txtStatus.text = "Solving..."
+ btnSolve.isEnabled = false
+
+ lifecycleScope.launch {
+ val result = withContext(Dispatchers.Default) {
+ engine.solveBitmap(bmp)
+ }
+
+ if (result.overlayBitmap != null) {
+ imgOutput.setImageBitmap(result.overlayBitmap)
+ txtStatus.text = "Solved"
+ } else {
+ txtStatus.text = "Solve failed (recognition errors)"
+ }
+
+ btnSolve.isEnabled = true
+ }
+ }
+ }
+
+ private fun loadRandomSudokuImage(useCameraLike: Boolean) {
+ val pool = if (useCameraLike) sudokuCamImages else sudokuCleanImages
+ val resId = pool[Random.nextInt(pool.size)]
+
+ val bmp = BitmapFactory.decodeResource(resources, resId)
+ currentBitmap = bmp
+
+ imgInput.setImageBitmap(bmp)
+ imgOutput.setImageDrawable(null)
+
+ btnSolve.isEnabled = true
+ txtStatus.text = if (useCameraLike) "Loaded camera-like Sudoku" else "Loaded clean Sudoku"
+ }
+}
+```
+
+Depending on your OpenCV packaging, you may use initLocal() or initDebug().
+
+## Testing the application
+With the full pipeline integrated, the application can now be tested end-to-end on an Android device or emulator.
+
+To test the app:
+1. Tap Load image to randomly select one of the bundled Sudoku bitmaps (clean or camera-like).
+2. The selected image is displayed in the Input section.
+3. Tap Solve to run the complete on-device pipeline:
+* OpenCV detects and rectifies the Sudoku grid,
+* the ONNX model performs batched digit recognition,
+* the Sudoku solver reconstructs and solves the board,
+* the solution is overlaid back onto the original image.
+
+The figures below show two representative test cases. In each example, the upper image corresponds to the original Sudoku puzzle, while the lower image shows the same puzzle with the missing digits filled in and overlaid in green. This visual comparison confirms that grid detection, digit recognition, solving, and rendering are all functioning correctly on-device.
+
+
+
+
+These tests demonstrate that the application is robust to perspective distortion and partial digit placement, and that the model performs reliably when deployed via ONNX Runtime on Android.
+
+## Summary and next steps
+In this learning path, you have built a complete, end-to-end workflow for deploying machine learning models with ONNX on Arm64 and mobile devices. Starting from model development in Python, you moved step by step through export, optimization, and integration, ultimately deploying a fully functional solution that runs entirely on an Android device.
+
+Along the way, you trained and exported a neural network to the ONNX format, explored how to optimize inference for edge deployment, and built a robust vision pipeline using OpenCV. You then brought these components together on Android by integrating ONNX Runtime and implementing a Sudoku solver that performs image preprocessing, neural network inference, and deterministic solving entirely on-device, without any cloud dependency.
+
+It is also important to recognize the current limitations of the approach. While the system performs well on most test images, there are cases where digit recognition may fail or produce ambiguous results—particularly under challenging lighting conditions, strong perspective distortion, or when digits are faint or partially occluded. In such cases, recognition errors can propagate to the solver, leading to longer solve times or, occasionally, failure to find a valid solution. These limitations are typical for lightweight, on-device vision systems and highlight the trade-offs between model complexity, robustness, and performance at the edge.
+
+Despite these constraints, the application stands as a practical and self-contained example of edge AI deployment. It demonstrates how ONNX can serve as a bridge between model development and real-world deployment, enabling the same model to move seamlessly from a desktop environment to a mobile platform.
+
+From here, there are many natural directions for improvement. You could enhance robustness by incorporating additional training data or more advanced preprocessing, extend the app to use live camera input with CameraX, refine the user experience with animations and progress feedback, or experiment with quantized models on devices that support additional execution providers. The same architectural pattern can also be applied beyond Sudoku, to other document- or grid-based vision problems where lightweight, on-device inference is essential.
+
+This concludes the learning path and provides a solid foundation for building, optimizing, and deploying ONNX-based machine learning applications on Arm64 and mobile platforms.
+
+## Companion code
+You can find the companion code in these repositories:
+1. [Sudoku solver](https://github.com/dawidborycki/SudokuSolverOnnx.git)
+2. [Python scripts](https://github.com/dawidborycki/ONNX-LP.git)
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/01.png b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/01.png
new file mode 100644
index 0000000000..c32ae10bba
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/01.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/02.png b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/02.png
new file mode 100644
index 0000000000..48021bad66
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/02.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/03.png b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/03.png
new file mode 100644
index 0000000000..8f61d07fa0
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/03.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/04.png b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/04.png
new file mode 100644
index 0000000000..a2544c74ef
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/04.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/05.png b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/05.png
new file mode 100644
index 0000000000..273ac7bbf9
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/05.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/06.png b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/06.png
new file mode 100644
index 0000000000..8e461219f9
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/06.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/07.png b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/07.png
new file mode 100644
index 0000000000..84b3b41b57
Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/onnx/Figures/07.png differ
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/_index.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/_index.md
new file mode 100644
index 0000000000..0921cec1a1
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/_index.md
@@ -0,0 +1,67 @@
+---
+title: "ONNX in Action: Building, Optimizing, and Deploying Models on Arm64 and Mobile"
+
+minutes_to_complete: 240
+
+who_is_this_for: This is an introductory topic for developers who are interested in creating, optimizing, and deploying machine learning models with ONNX. It is especially useful for those targeting Arm64-based devices (such as Raspberry Pi, mobile SoCs, or Android smartphones) and looking to run efficient inference at the edge.
+
+learning_objectives:
+ - Describe what ONNX is, and what it can offer in the ML ecosystem.
+ - Build and export a simple neural network model in Python to ONNX format.
+ - Perform inference and training using ONNX Runtime.
+ - Apply optimization techniques to improve performance.
+ - Deploy an optimized ONNX model inside an Android app.
+
+prerequisites:
+ - A development machine with Python 3.10+ installed.
+ - Basic familiarity with PyTorch or TensorFlow.
+ - An Arm64 device (e.g., Raspberry Pi or Android smartphone).
+ - "[Android Studio](https://developer.android.com/studio) installed for deployment testing."
+
+author: Dawid Borycki
+
+### Tags
+skilllevels: Introductory
+subjects: Machine Learning, Edge AI
+armips:
+ - Cortex-A
+ - Neoverse
+operatingsystems:
+ - Windows
+ - Linux
+ - macOS
+tools_software_languages:
+ - Python
+ - PyTorch
+ - TensorFlow
+ - ONNX
+ - ONNX Runtime
+ - Android
+ - Android Studio
+ - Kotlin
+ - Java
+
+further_reading:
+ - resource:
+ title: ONNX
+ link: https://onnx.ai
+ type: documentation
+ - resource:
+ title: ONNX Runtime
+ link: https://onnxruntime.ai
+ type: documentation
+ - resource:
+ title: Getting Started with ONNX Runtime on Mobile
+ link: https://onnxruntime.ai/docs/tutorials/mobile
+ type: tutorial
+ - resource:
+ title: Optimizing Models with ONNX Runtime
+ link: https://onnxruntime.ai/docs/performance/model-optimizations.html
+ type: documentation
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1
+layout: "learningpathall"
+learning_path_main_page: "yes"
+---
\ No newline at end of file
diff --git a/content/learning-paths/mobile-graphics-and-gaming/onnx/_next-steps.md b/content/learning-paths/mobile-graphics-and-gaming/onnx/_next-steps.md
new file mode 100644
index 0000000000..c3db0de5a2
--- /dev/null
+++ b/content/learning-paths/mobile-graphics-and-gaming/onnx/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+# FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps" # Always the same, html page title.
+layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
+---