NASM Learn Complex: MNIST Digit Recognition

A neural network that recognizes handwritten digits, written entirely in x86-64 NASM assembly using raw Linux syscalls, SSE2/SSE4.1 and floating-point instructions.

Achieves 96.5% test accuracy on MNIST (9,652/10,000 correct), comparable to equivalent Python/NumPy implementations.

Includes a live drawing app: draw digits on a canvas and watch the NASM neural network classify them in real time (~0.7ms per inference).

Architecture

Input (784) → Hidden (32, sigmoid) → Output (10, softmax)

Training: Mini-batch SGD, batch size 32, learning rate 0.1, 30 epochs
Loss: Cross-entropy with softmax output
Weight init: Xavier scaling (hidden: ±0.0815, output: ±0.378)
Data: MNIST IDX binary files parsed at runtime (no preprocessing tools)
Precision: IEEE 754 double-precision (64-bit) throughout

Quick Start

Prerequisites

Linux x86-64 (tested on WSL2)
NASM assembler (sudo apt install nasm)
GNU ld linker (usually included with binutils)
GCC (for the live app server)
Python 3 (optional, for downloading MNIST data)

1. Download MNIST Data

./scripts/download_mnist.sh

This downloads the four MNIST files (~55MB) into data/.

2. Train the Network

make clean && make
./build/mnistlearn

Training takes ~1-2 minutes (30 epochs). Output:

=== NASM Learn Complex — MNIST ===
Training 784-32-10 network...
Epoch 1/30 | Loss: 0.6105 | Accuracy: 85.26%
...
Epoch 30/30 | Loss: 0.0788 | Accuracy: 97.78%
Saving weights... done.

--- Test Results ---
Overall accuracy: 96.52% (9652/10000)

Per-digit accuracy:
  0: 98.98%
  1: 98.77%
  2: 95.93%
  ...

Trained weights are saved to data/weights.bin (203,600 bytes).

3. Run the Live Drawing App

cd app
make clean && make
make run

Open http://localhost:8080 in your browser. Draw a digit on the canvas and watch predictions update live as you draw.

The server links directly with the NASM neural network object files. The same assembly code that trained the network runs inference with zero overhead.

Project Structure

nasm-learn-complex/
├── include/
│   └── constants.inc          # Network dims, syscall numbers, training params
├── src/
│   ├── main.asm               # Entry point, training loop, weight save, test eval
│   ├── data/
│   │   └── mnist.asm          # MNIST IDX file parser, pixel normalization
│   ├── math/
│   │   ├── dot.asm            # Dot product (leaf function)
│   │   ├── exp.asm            # e^x via range reduction + Horner polynomial
│   │   ├── ln.asm             # ln(x) via IEEE 754 decomposition + Pade series
│   │   ├── sigmoid.asm        # Sigmoid and derivative
│   │   └── softmax.asm        # Numerically stable softmax
│   ├── nn/
│   │   ├── forward.asm        # Dense layer forward pass (sigmoid + linear)
│   │   ├── backward.asm       # Backpropagation (output softmax + hidden sigmoid)
│   │   └── update.asm         # SGD weight update, zero/scale buffer utilities
│   ├── io/
│   │   ├── print.asm          # Print routines (string, double, uint64, percent)
│   │   └── string.asm         # Double-to-ASCII conversion
│   └── timing/
│       └── clock.asm          # Wall-clock timing via clock_gettime
├── app/
│   ├── server.c               # HTTP inference server (links NASM .o files)
│   ├── index.html             # Drawing canvas + live prediction UI
│   └── Makefile               # Builds server with NASM objects
├── scripts/
│   └── download_mnist.sh      # Downloads MNIST data files
├── data/                      # MNIST files + trained weights (gitignored)
├── docs/plans/                # Design documents
└── Makefile                   # Builds the training program

Total: 2,680 lines of NASM assembly + 170 lines of C server + 300 lines of HTML/JS frontend.

How It Works

Training (Pure Assembly)

Data loading: mnist.asm opens MNIST IDX binary files with sys_open, reads them with sys_read in a loop, parses big-endian headers with bswap, normalizes pixel bytes to [0, 1] doubles, and one-hot encodes labels.
Forward pass: For each sample, forward.asm computes output = sigmoid(W·x + b) for the hidden layer and raw linear output for the softmax layer. softmax.asm converts raw outputs to probabilities (numerically stable: subtracts max before exp).
Loss: Cross-entropy loss −ln(softmax[true_class]) computed via ln.asm, which decomposes IEEE 754 doubles into exponent + mantissa for a Pade approximation.
Backward pass: backward.asm computes gradients. The softmax+cross-entropy gradient simplifies to delta[i] = output[i] - target[i]. Hidden layer gradients use the sigmoid derivative s(1−s) and chain rule.
Update: After each mini-batch (32 samples), gradients are averaged and weights are updated: w -= lr * grad.
Shuffling: Fisher-Yates shuffle randomizes training order each epoch, using an LCG PRNG seeded from rdtsc.

Live App (C + NASM)

The inference server (app/server.c) links directly with the same NASM .o files used in training. It:

Loads trained weights from data/weights.bin
Serves the HTML/JS drawing interface
Accepts POST requests with 784 raw pixel bytes
Normalizes pixels, calls forward_sigmoid → forward_linear → softmax
Returns JSON probabilities

The frontend draws on a 280x280 canvas, downsamples to 28x28 by averaging 10x10 blocks, centers the digit using center-of-mass (matching MNIST preprocessing), and sends predictions on every stroke.

Key Technical Details

ABI Compliance

All functions follow the SysV AMD64 ABI:

Arguments: rdi, rsi, rdx, rcx, r8, r9 (then stack)
Callee-saved: rbx, rbp, r12–r15
Return: rax (integer), xmm0 (float)
All XMM registers are caller-saved
Stack aligned to 16 bytes before every call

Weight File Format

data/weights.bin contains raw IEEE 754 doubles in this order:

Buffer	Count	Bytes
w_hidden	25,088	200,704
b_hidden	32	256
w_output	320	2,560
b_output	10	80
Total	25,450	203,600

Performance

Metric	Value
Training time	~60–135s (30 epochs, hardware dependent)
Train accuracy	97.8%
Test accuracy	96.5%
Inference latency	~0.7ms round-trip (including HTTP)
Weight file size	199 KB
Binary size (training)	~27 KB
Binary size (server)	~25 KB

License

Licensed under the Apache License, Version 2.0. See NOTICE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
include		include
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NASM Learn Complex: MNIST Digit Recognition

Architecture

Quick Start

Prerequisites

1. Download MNIST Data

2. Train the Network

3. Run the Live Drawing App

Project Structure

How It Works

Training (Pure Assembly)

Live App (C + NASM)

Key Technical Details

ABI Compliance

Weight File Format

Performance

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NASM Learn Complex: MNIST Digit Recognition

Architecture

Quick Start

Prerequisites

1. Download MNIST Data

2. Train the Network

3. Run the Live Drawing App

Project Structure

How It Works

Training (Pure Assembly)

Live App (C + NASM)

Key Technical Details

ABI Compliance

Weight File Format

Performance

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages