High-Performance, Zero-Copy Graph Engine for Massive Datasets on Consumer Hardware.
GraphZero is a C++ graph processing engine with lightweight Python bindings designed to solve the "Memory Wall" in Graph Neural Networks (GNNs). It allows you to load and sample 100 Million+ node graphs (like ogbn-papers100M) and their massive feature matrices on a standard 16GB RAM laptop—something standard libraries like PyTorch Geometric (PyG) or DGL cannot do.
GNN datasets can be massive. ogbn-papers100M contains 111 Million nodes, 1.6 Billion edges, and gigabytes of node embeddings.
- Standard approach (PyG/NetworkX): Tries to load the entire graph structure and all node features into RAM before training begins.
- The Result:
MemoryError(OOM) on consumer hardware. You need 64GB+ RAM servers just to load the data.
GraphZero abandons the "Load-to-RAM" model. Instead, it uses a custom Zero-Copy Architecture:
-
Memory Mapping (
mmap): The graph and its features stay on disk. The OS only loads the specific "hot" pages needed for computation into RAM via page faults. -
Compressed CSR (
.gl): A custom binary format that compresses raw edges by ~60% (30GB CSV$\to$ 13GB Binary). -
Columnar Tensor Store (
.gd): A raw, C-contiguous binary format for node features that instantly translates to PyTorch tensors without memory allocation. - Parallel Sampling: OpenMP-accelerated random walks that saturate NVMe SSD throughput, using thread-local RNGs to eliminate lock contention.
Task: Load ogbn-papers100M (56GB Raw) and perform random walks.
Hardware: Windows Laptop (16GB RAM, NVMe SSD).
| Metric | GraphZero (v0.2) | PyTorch Geometric |
|---|---|---|
| Load Time | 0.000000 s ⚡ | FAILED (Crash) ❌ |
| Peak RAM Usage | ~5.1 GB (OS Cache) | >24.1 GB (Required) |
| Throughput | 1,264,000 steps/s | N/A |
| Status | ✅ Success | ❌ OOM Error |
Left: GraphZero loading instantly and utilizing OS Page Cache. Right: PyG crashing with
Unable to allocate 24.1 GiB.
GraphZero is available on PyPI:
pip install graphzero
GraphZero uses high-efficiency binary formats. Convert your generic CSV lists once.
example edges.csv, weights are optional:
src,dst,weight
0,1,0.5
1,2,1.0import graphzero as gz
# 1. Convert Topology (Edges & Weights) to .gl
gz.convert_csv_to_gl(
input_csv="dataset/edges.csv",
output_bin="graph.gl",
directed=True
)
# 2. Convert Node Features to .gd (Float32, Int64, etc.)
gz.convert_csv_to_gd(
csv_path="dataset/features.csv",
out_path="features.gd",
dtype=gz.DataType.FLOAT32
)Once converted, the graph and its multi-gigabyte feature matrix are instantly accessible without consuming RAM.
import graphzero as gz
import numpy as np
# TOPOLOGY
# Zero-Copy Load (Instant)
g = gz.Graph("graph.gl")
# Define Start Nodes
start_nodes = np.random.randint(0, g.num_nodes, 1000).astype(np.uint64)
# Parallel Biased Random Walk (Node2Vec style: p=1.0, q=0.5)
walks = g.batch_random_walk(
start_nodes=start_nodes,
walk_length=10,
p=1.0,
q=0.5
)
# FEATURES
# Zero-Copy Feature Load (Instant)
fs = gz.FeatureStore("features.gd")
# Get a perfect 2D Numpy/PyTorch Tensor mapping directly to the SSD
# RAM used: 0 Bytes!
node_features = fs.get_tensor()
print(f"Graph loaded. Feature Matrix Shape: {node_features.shape}")GraphZero is built for Systems & GNN enthusiasts.
- Core: C++20 with
nanobindfor Python bindings. - Parallelism: Uses
#pragma ompwith thread-local deterministic RNGs. - IO: Direct
CreateFileMapping(Windows) andmmap(Linux) calls with alignment optimization (4KB/2MB pages).
GraphZero currently supports the following high-performance ML capabilities:
Graph Structural Engine
-
Instant Ingestion: Fast
mmap-backed loading of directed, undirected, and weighted graphs. -
Zero-Copy CSR: Custom
.glbinary format for dense, continuous memory alignment and 64-byte CPU cache line optimization. -
Thread-Safe Sampling: OpenMP-accelerated
batch_random_walk_uniformandbatch_random_fanout. -
Biased Walks (Node2Vec): Hardware-optimized Alias Table generation for
$O(1)$ weighted sampling (batch_random_walkwithpandqparameters). - Fault-Tolerant: Automatic handling of dead-ends (sinks) and out-of-bounds nodes.
Graph Data Engine
-
Columnar Tensor Store: Custom
.gdbinary format for storing$N \times F$ feature matrices. -
Strong Typing: Native C++ template dispatching supporting
FLOAT32,FLOAT64,INT32, andINT64. -
Zero-Copy Bridge: Direct translation of
mmappointers to Numpy/PyTorch multidimensional arrays.
-
v0.3 (The Algorithmic Core): High-performance analytics engine adding OpenMP-accelerated Parallel BFS/DFS, PageRank, and Connected Components.
-
v0.4 (Dynamic Updates): Breaking the immutable CSR barrier via an LSM-Tree/Adjacency List memory overlay to allow real-time edge/node insertions.
-
v0.5 (Production Hardening): ACID-compliant safety for multi-process PyTorch training using Reader-Writer Locks, Write-Ahead Logging (WAL), and graceful exception handling.
MIT License. Created by Krish Singaria (IIT Mandi).

