LLM-Speed is a CUDA kernel optimization project for LLM inference experiments, covering FlashAttention, Tensor Core GEMM, Python bindings, and verification workflows.
- CUDA kernels in
src/and reusable primitives ininclude/ - Python bindings and packaging in
python/,setup.py, andpyproject.toml - Tests and benchmarks in
tests/andbenchmarks/ - GitHub Pages site for documentation entry, reading paths, and project updates
pip install -r requirements.txt
pip install -e .
cmake --preset release
cmake --build build/release -j$(nproc)
pytest tests/ -v- Project docs:
https://lessup.github.io/llm-speed/ - Site home explains where to start, what to read next, and how the docs are organized
- See
CONTRIBUTING.mdfor contribution workflow
MIT License