Skip to content

Commit 7fe46f4

Browse files
committed
📝 add article
1 parent 7bf1b96 commit 7fe46f4

4 files changed

Lines changed: 127 additions & 10 deletions

File tree

README.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,6 @@
22

33
This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
44

5-
## TODO
6-
7-
- [ ] From conference to journal
8-
- [ ] Multi-GPU series (model, data, tensor, pipeline parallelism, maybe with [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)?)
9-
- [ ] [Triton](https://triton-lang.org/main/index.html) and [Helion](https://helionlang.com) (GPU Python-like programming language)
10-
115
## Installation
126

137
```bash
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
title: Installing Packages from Wheels
3+
description: Understanding why some packages are a nightmare to install and how to sidestep the pain with prebuilt wheels.
4+
slug: install-from-wheels
5+
tags: [package-management]
6+
---
7+
8+
You type `pip install flash-attn`. You hit Enter. You wait. And wait. Your fan spins up like a jet engine. 10 minutes later: **compilation error**. Congratulations, you've experienced one of the most classic rites of passage in the ML world.
9+
10+
This guide explains what's going on, why some packages are a nightmare to install, and how to sidestep the pain with prebuilt wheels.
11+
12+
<!-- truncate -->
13+
14+
---
15+
16+
## What Even Is a Wheel?
17+
18+
A Python **wheel** (`.whl` file) is a prebuilt binary distribution. Think of it like a zip file that contains already-compiled code, ready to be dropped into your Python environment.
19+
20+
When you do a normal `pip install some-package`, pip first checks if a wheel exists for your platform. If it does — great, it's fast. If it doesn't, pip falls back to downloading the source and **compiling it on your machine**. That's where things get messy.
21+
22+
```
23+
some_package-1.0.0-cp310-cp310-linux_x86_64.whl
24+
│ │ │
25+
Python ABI tag Platform
26+
version
27+
```
28+
29+
The filename encodes exactly what it was built for. `cp310` means CPython 3.10. `linux_x86_64` means 64-bit Linux. If your environment doesn't match, pip won't even try to install it.
30+
31+
---
32+
33+
## Issue Behind the Scene
34+
35+
`flash-attn`, `xformers`, `bitsandbytes`, `apex` — these packages share a common trait: they have **CUDA kernels** baked in. That means they need to be compiled against:
36+
37+
1. A specific **CUDA version** (e.g., 11.8, 12.1, 12.4)
38+
2. A specific **PyTorch version** (e.g., 2.1.0, 2.3.0)
39+
3. Your **Python version**
40+
41+
When you `pip install flash-attn` from source, your machine has to compile thousands of lines of CUDA code. This takes **15–40 minutes**, requires `nvcc` (the NVIDIA CUDA compiler) to be installed and on your PATH, and will fail spectacularly if any version is mismatched.
42+
43+
The error usually looks like one of these:
44+
45+
```
46+
# The "I don't even have a compiler" error
47+
error: command 'gcc' failed: No such file or directory
48+
49+
# The "CUDA version mismatch" error
50+
RuntimeError: CUDA error: no kernel image is available for execution on the device
51+
52+
# The "nvcc not found" classic
53+
nvcc: command not found
54+
55+
# The cryptic one that sends you to Stack Overflow at 2am
56+
ninja: build stopped: subcommand failed.
57+
```
58+
59+
---
60+
61+
## The Smart Way: Install from a Prebuilt Wheel
62+
63+
Most popular CUDA packages maintain a repo of prebuilt wheels for common CUDA + PyTorch + Python combinations. Instead of compiling, you download the exact binary you need.
64+
65+
### Step 1: Know Your Environment
66+
67+
Before hunting for a wheel, figure out exactly what you're working with:
68+
69+
```bash
70+
# Python version
71+
python --version
72+
73+
# PyTorch version + CUDA it was built with
74+
python -c "import torch; print(torch.__version__); print(torch.version.cuda)"
75+
76+
# CUDA toolkit version on your system
77+
nvcc --version
78+
# or, if nvcc isn't installed:
79+
nvidia-smi # shows driver's max supported CUDA version
80+
```
81+
82+
Example output you might see:
83+
84+
```
85+
Python 3.10.12
86+
2.3.0+cu121
87+
12.1
88+
```
89+
90+
So you need: **Python 3.10**, **PyTorch 2.3.0**, **CUDA 12.1**.
91+
92+
### Step 2: Find the Right Wheel
93+
94+
**For `flash-attn`**, the prebuilt wheels live on GitHub [releases](https://github.com/Dao-AILab/flash-attention/releases).
95+
96+
Look for a filename that matches your setup. For the example above, you'd grab something like:
97+
98+
```
99+
flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
100+
```
101+
102+
### Step 3: Install It
103+
104+
Once you have the URL or have downloaded the file:
105+
106+
```bash
107+
# Install directly from URL (no download needed)
108+
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
109+
# or with uv
110+
uv add https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
111+
112+
# Or install from a local file you downloaded
113+
pip install flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
114+
# or with uv
115+
uv add flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
116+
```
117+
118+
Done. No compiler needed. No 30-minute wait. Just a clean, fast install.

blog/tags.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,9 @@ slurm:
2626
cloud-computing:
2727
label: Cloud Computing
2828
permalink: /cloud-computing
29-
description: Cloud Computing
29+
description: Cloud Computing
30+
31+
package-management:
32+
label: Package Management
33+
permalink: /package-management
34+
description: Package Management

package-lock.json

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)