Skip to content

SBNovaScript/nasm-learn-complex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NASM Learn Complex: MNIST Digit Recognition

A neural network that recognizes handwritten digits, written entirely in x86-64 NASM assembly using raw Linux syscalls, SSE2/SSE4.1 and floating-point instructions.

Achieves 96.5% test accuracy on MNIST (9,652/10,000 correct), comparable to equivalent Python/NumPy implementations.

Includes a live drawing app: draw digits on a canvas and watch the NASM neural network classify them in real time (~0.7ms per inference).

Architecture

Input (784) → Hidden (32, sigmoid) → Output (10, softmax)
  • Training: Mini-batch SGD, batch size 32, learning rate 0.1, 30 epochs
  • Loss: Cross-entropy with softmax output
  • Weight init: Xavier scaling (hidden: ±0.0815, output: ±0.378)
  • Data: MNIST IDX binary files parsed at runtime (no preprocessing tools)
  • Precision: IEEE 754 double-precision (64-bit) throughout

Quick Start

Prerequisites

  • Linux x86-64 (tested on WSL2)
  • NASM assembler (sudo apt install nasm)
  • GNU ld linker (usually included with binutils)
  • GCC (for the live app server)
  • Python 3 (optional, for downloading MNIST data)

1. Download MNIST Data

./scripts/download_mnist.sh

This downloads the four MNIST files (~55MB) into data/.

2. Train the Network

make clean && make
./build/mnistlearn

Training takes ~1-2 minutes (30 epochs). Output:

=== NASM Learn Complex — MNIST ===
Training 784-32-10 network...
Epoch 1/30 | Loss: 0.6105 | Accuracy: 85.26%
...
Epoch 30/30 | Loss: 0.0788 | Accuracy: 97.78%
Saving weights... done.

--- Test Results ---
Overall accuracy: 96.52% (9652/10000)

Per-digit accuracy:
  0: 98.98%
  1: 98.77%
  2: 95.93%
  ...

Trained weights are saved to data/weights.bin (203,600 bytes).

3. Run the Live Drawing App

cd app
make clean && make
make run

Open http://localhost:8080 in your browser. Draw a digit on the canvas and watch predictions update live as you draw.

The server links directly with the NASM neural network object files. The same assembly code that trained the network runs inference with zero overhead.

Project Structure

nasm-learn-complex/
├── include/
│   └── constants.inc          # Network dims, syscall numbers, training params
├── src/
│   ├── main.asm               # Entry point, training loop, weight save, test eval
│   ├── data/
│   │   └── mnist.asm          # MNIST IDX file parser, pixel normalization
│   ├── math/
│   │   ├── dot.asm            # Dot product (leaf function)
│   │   ├── exp.asm            # e^x via range reduction + Horner polynomial
│   │   ├── ln.asm             # ln(x) via IEEE 754 decomposition + Pade series
│   │   ├── sigmoid.asm        # Sigmoid and derivative
│   │   └── softmax.asm        # Numerically stable softmax
│   ├── nn/
│   │   ├── forward.asm        # Dense layer forward pass (sigmoid + linear)
│   │   ├── backward.asm       # Backpropagation (output softmax + hidden sigmoid)
│   │   └── update.asm         # SGD weight update, zero/scale buffer utilities
│   ├── io/
│   │   ├── print.asm          # Print routines (string, double, uint64, percent)
│   │   └── string.asm         # Double-to-ASCII conversion
│   └── timing/
│       └── clock.asm          # Wall-clock timing via clock_gettime
├── app/
│   ├── server.c               # HTTP inference server (links NASM .o files)
│   ├── index.html             # Drawing canvas + live prediction UI
│   └── Makefile               # Builds server with NASM objects
├── scripts/
│   └── download_mnist.sh      # Downloads MNIST data files
├── data/                      # MNIST files + trained weights (gitignored)
├── docs/plans/                # Design documents
└── Makefile                   # Builds the training program

Total: 2,680 lines of NASM assembly + 170 lines of C server + 300 lines of HTML/JS frontend.

How It Works

Training (Pure Assembly)

  1. Data loading: mnist.asm opens MNIST IDX binary files with sys_open, reads them with sys_read in a loop, parses big-endian headers with bswap, normalizes pixel bytes to [0, 1] doubles, and one-hot encodes labels.

  2. Forward pass: For each sample, forward.asm computes output = sigmoid(W·x + b) for the hidden layer and raw linear output for the softmax layer. softmax.asm converts raw outputs to probabilities (numerically stable: subtracts max before exp).

  3. Loss: Cross-entropy loss −ln(softmax[true_class]) computed via ln.asm, which decomposes IEEE 754 doubles into exponent + mantissa for a Pade approximation.

  4. Backward pass: backward.asm computes gradients. The softmax+cross-entropy gradient simplifies to delta[i] = output[i] - target[i]. Hidden layer gradients use the sigmoid derivative s(1−s) and chain rule.

  5. Update: After each mini-batch (32 samples), gradients are averaged and weights are updated: w -= lr * grad.

  6. Shuffling: Fisher-Yates shuffle randomizes training order each epoch, using an LCG PRNG seeded from rdtsc.

Live App (C + NASM)

The inference server (app/server.c) links directly with the same NASM .o files used in training. It:

  1. Loads trained weights from data/weights.bin
  2. Serves the HTML/JS drawing interface
  3. Accepts POST requests with 784 raw pixel bytes
  4. Normalizes pixels, calls forward_sigmoidforward_linearsoftmax
  5. Returns JSON probabilities

The frontend draws on a 280x280 canvas, downsamples to 28x28 by averaging 10x10 blocks, centers the digit using center-of-mass (matching MNIST preprocessing), and sends predictions on every stroke.

Key Technical Details

ABI Compliance

All functions follow the SysV AMD64 ABI:

  • Arguments: rdi, rsi, rdx, rcx, r8, r9 (then stack)
  • Callee-saved: rbx, rbp, r12r15
  • Return: rax (integer), xmm0 (float)
  • All XMM registers are caller-saved
  • Stack aligned to 16 bytes before every call

Weight File Format

data/weights.bin contains raw IEEE 754 doubles in this order:

Buffer Count Bytes
w_hidden 25,088 200,704
b_hidden 32 256
w_output 320 2,560
b_output 10 80
Total 25,450 203,600

Performance

Metric Value
Training time ~60–135s (30 epochs, hardware dependent)
Train accuracy 97.8%
Test accuracy 96.5%
Inference latency ~0.7ms round-trip (including HTTP)
Weight file size 199 KB
Binary size (training) ~27 KB
Binary size (server) ~25 KB

License

Licensed under the Apache License, Version 2.0. See NOTICE for details.

About

A neural network that recognizes handwritten digits written entirely in x86-64 NASM assembly

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors