|
| 1 | +--- |
| 2 | +title: Installing Packages from Wheels |
| 3 | +description: Understanding why some packages are a nightmare to install and how to sidestep the pain with prebuilt wheels. |
| 4 | +slug: install-from-wheels |
| 5 | +tags: [package-management] |
| 6 | +--- |
| 7 | + |
| 8 | +You type `pip install flash-attn`. You hit Enter. You wait. And wait. Your fan spins up like a jet engine. 10 minutes later: **compilation error**. Congratulations, you've experienced one of the most classic rites of passage in the ML world. |
| 9 | + |
| 10 | +This guide explains what's going on, why some packages are a nightmare to install, and how to sidestep the pain with prebuilt wheels. |
| 11 | + |
| 12 | +<!-- truncate --> |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## What Even Is a Wheel? |
| 17 | + |
| 18 | +A Python **wheel** (`.whl` file) is a prebuilt binary distribution. Think of it like a zip file that contains already-compiled code, ready to be dropped into your Python environment. |
| 19 | + |
| 20 | +When you do a normal `pip install some-package`, pip first checks if a wheel exists for your platform. If it does — great, it's fast. If it doesn't, pip falls back to downloading the source and **compiling it on your machine**. That's where things get messy. |
| 21 | + |
| 22 | +``` |
| 23 | +some_package-1.0.0-cp310-cp310-linux_x86_64.whl |
| 24 | + │ │ │ |
| 25 | + Python ABI tag Platform |
| 26 | + version |
| 27 | +``` |
| 28 | + |
| 29 | +The filename encodes exactly what it was built for. `cp310` means CPython 3.10. `linux_x86_64` means 64-bit Linux. If your environment doesn't match, pip won't even try to install it. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## Issue Behind the Scene |
| 34 | + |
| 35 | +`flash-attn`, `xformers`, `bitsandbytes`, `apex` — these packages share a common trait: they have **CUDA kernels** baked in. That means they need to be compiled against: |
| 36 | + |
| 37 | +1. A specific **CUDA version** (e.g., 11.8, 12.1, 12.4) |
| 38 | +2. A specific **PyTorch version** (e.g., 2.1.0, 2.3.0) |
| 39 | +3. Your **Python version** |
| 40 | + |
| 41 | +When you `pip install flash-attn` from source, your machine has to compile thousands of lines of CUDA code. This takes **15–40 minutes**, requires `nvcc` (the NVIDIA CUDA compiler) to be installed and on your PATH, and will fail spectacularly if any version is mismatched. |
| 42 | + |
| 43 | +The error usually looks like one of these: |
| 44 | + |
| 45 | +``` |
| 46 | +# The "I don't even have a compiler" error |
| 47 | +error: command 'gcc' failed: No such file or directory |
| 48 | +
|
| 49 | +# The "CUDA version mismatch" error |
| 50 | +RuntimeError: CUDA error: no kernel image is available for execution on the device |
| 51 | +
|
| 52 | +# The "nvcc not found" classic |
| 53 | +nvcc: command not found |
| 54 | +
|
| 55 | +# The cryptic one that sends you to Stack Overflow at 2am |
| 56 | +ninja: build stopped: subcommand failed. |
| 57 | +``` |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## The Smart Way: Install from a Prebuilt Wheel |
| 62 | + |
| 63 | +Most popular CUDA packages maintain a repo of prebuilt wheels for common CUDA + PyTorch + Python combinations. Instead of compiling, you download the exact binary you need. |
| 64 | + |
| 65 | +### Step 1: Know Your Environment |
| 66 | + |
| 67 | +Before hunting for a wheel, figure out exactly what you're working with: |
| 68 | + |
| 69 | +```bash |
| 70 | +# Python version |
| 71 | +python --version |
| 72 | + |
| 73 | +# PyTorch version + CUDA it was built with |
| 74 | +python -c "import torch; print(torch.__version__); print(torch.version.cuda)" |
| 75 | + |
| 76 | +# CUDA toolkit version on your system |
| 77 | +nvcc --version |
| 78 | +# or, if nvcc isn't installed: |
| 79 | +nvidia-smi # shows driver's max supported CUDA version |
| 80 | +``` |
| 81 | + |
| 82 | +Example output you might see: |
| 83 | + |
| 84 | +``` |
| 85 | +Python 3.10.12 |
| 86 | +2.3.0+cu121 |
| 87 | +12.1 |
| 88 | +``` |
| 89 | + |
| 90 | +So you need: **Python 3.10**, **PyTorch 2.3.0**, **CUDA 12.1**. |
| 91 | + |
| 92 | +### Step 2: Find the Right Wheel |
| 93 | + |
| 94 | +**For `flash-attn`**, the prebuilt wheels live on GitHub [releases](https://github.com/Dao-AILab/flash-attention/releases). |
| 95 | + |
| 96 | +Look for a filename that matches your setup. For the example above, you'd grab something like: |
| 97 | + |
| 98 | +``` |
| 99 | +flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp310-cp310-linux_x86_64.whl |
| 100 | +``` |
| 101 | + |
| 102 | +### Step 3: Install It |
| 103 | + |
| 104 | +Once you have the URL or have downloaded the file: |
| 105 | + |
| 106 | +```bash |
| 107 | +# Install directly from URL (no download needed) |
| 108 | +pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl |
| 109 | +# or with uv |
| 110 | +uv add https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl |
| 111 | + |
| 112 | +# Or install from a local file you downloaded |
| 113 | +pip install flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl |
| 114 | +# or with uv |
| 115 | +uv add flash_attn-2.6.3+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl |
| 116 | +``` |
| 117 | + |
| 118 | +Done. No compiler needed. No 30-minute wait. Just a clean, fast install. |
0 commit comments