Skip to content

Latest commit

 

History

History
104 lines (89 loc) · 3.48 KB

File metadata and controls

104 lines (89 loc) · 3.48 KB

TensorPy Roadmap

This file tracks the current project state as a practical checklist. Checked items are already landed on main. Unchecked items are the next planned steps, not vague ideas.

Language Core

  • Python-like expressions, statements, loops, slicing, and container literals
  • Functions, lambdas, closures, and *args
  • Classes, inheritance, and super() method calls
  • Exceptions with try / except
  • Module imports, package imports, and module caching
  • REPL and script execution
  • More complete keyword-argument and star-argument parity with Python
  • Deeper compatibility coverage for advanced class and descriptor behavior

Runtime And Memory

  • Mark-and-sweep garbage collector
  • VM root walking and heap edge traversal
  • Platform abstraction layer for filesystem, threads, timing, and process-facing operations
  • Opaque platform handles instead of leaking pthread types in public headers
  • Optional Metal build via make METAL=0
  • Initial shared memory abstraction via tpMemAlloc / tpMemFree / friends
  • Finish routing remaining raw allocations through the memory layer
  • Split time APIs into clearer wall-clock vs CPU-time abstractions
  • Harden portability for non-Apple platforms beyond the current POSIX-first baseline

Concurrency And Systems Primitives

  • Threads
  • Mutexes
  • Condition variables
  • Atomics
  • Thread pool
  • parallel_for
  • Add more stress and race-oriented regression tests
  • Expose more concurrency primitives ergonomically at the TensorPy layer

Tensor And Compute Runtime

  • Native tensor, dtype, and device objects
  • CPU float32 eager ops
  • CPU scalar, SIMD, and threaded compute paths for core kernels
  • Tensor reshape, cast, transfer, reduction, activation, and matmul
  • CPU autograd subset for practical training loops
  • sgd_step and adam_step
  • Metal tensor allocation and transfer
  • Metal elementwise fill, add, mul, scalar add, scalar mul
  • Metal matmul for 2D float32 tensors
  • Metal reductions
  • Metal softmax
  • Metal layernorm
  • Better async scheduling to reduce waitUntilCompleted overhead
  • More aggressive CPU matmul optimization, likely via Accelerate / BLAS on macOS

NN And Training Stack

  • Module base class
  • Recursive parameter registration
  • Linear, Conv2d, ReLU, Sigmoid, Tanh, Flatten, Sequential
  • Embedding
  • RNNCell, RNN
  • LSTMCell, LSTM
  • GRUCell, GRU
  • LogisticRegression, MLP, SimpleCNN
  • Adam
  • CrossEntropyLoss
  • Dropout
  • LayerNorm module wrapper
  • train() / eval()
  • More end-to-end recurrent training tests on real datasets

Standard Library Surface

  • json
  • re
  • math
  • time
  • random
  • os
  • io
  • path
  • logging
  • traceback
  • sys
  • collections
  • itertools
  • functools
  • env
  • config
  • host
  • array
  • ml
  • types
  • inspect
  • Continue filling Python-adjacent utility gaps where they meaningfully improve usability

Near-Term Priorities

  • Benchmark and improve CPU matmul
  • Reduce Metal synchronization overhead
  • Add more true Metal kernels for training-critical ops
  • Expand training regression coverage, including MNIST-style smoke tests
  • Keep the repository clean of generated datasets and local test binaries