-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Create a Python module that loads WarpForth-compiled PTX kernels and launches them with NumPy/PyTorch tensors as arguments. This replaces warpforth-runner for real workloads.
Motivation
The existing warpforth-runner is a standalone C++ tool designed for testing — it takes CSV values on the command line. For real ML workloads, we need to pass large tensors (millions of elements) directly from Python without serialization overhead.
Design
Core API
from warpforth import WarpForthKernel
# Compile and load
kernel = WarpForthKernel("attention.forth")
# Launch with PyTorch tensors (zero-copy via .data_ptr())
kernel.launch(
Q_gpu, K_gpu, V_gpu, O_gpu, # GPU tensors
seq_len, head_dim, # scalar params
grid=(seq_len, 1, 1),
block=(64, 1, 1),
)Implementation
- Use PyCUDA's
cuda.module_from_buffer()to load PTX - Accept both NumPy arrays (copy to GPU) and PyTorch CUDA tensors (zero-copy via
data_ptr()) - Subprocess call to
warpforthcfor compilation, or accept pre-compiled PTX - Parse
\!header directives from Forth source to determine parameter types and order - Map f64 arrays to
float64device pointers, i64 arrays toint64device pointers - Handle scalar params (pass by value, not pointer)
Parameter mapping
| Forth declaration | Python input | CUDA argument |
|---|---|---|
\! param X f64[N] |
torch.Tensor (float64, CUDA) |
Device pointer |
\! param X i64[N] |
torch.Tensor (int64, CUDA) |
Device pointer |
\! param X f64 |
float |
Value (bitcast to i64) |
\! param X i64 |
int |
Value |
Files to create
demo/warpforth.py— The integration moduledemo/requirements.txtorpyproject.toml— Dependencies (pycuda, numpy, torch)
Acceptance criteria
- Can load a WarpForth-compiled PTX kernel
- Can launch with PyTorch CUDA tensors (zero-copy)
- Can launch with NumPy arrays (auto-copy to GPU)
- Correctly handles both array and scalar parameters
- Works with the attention kernel from Naive attention kernel in Forth #44
Dependencies
- Naive attention kernel in Forth #44 — Naive attention kernel (first consumer of this integration)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request