La Perf

La Perf — a local AI performance benchmark

for comparing AI performance across different devices.

The goal of this project is to create an all-in-one source of information you need before buying your next laptop or PC for local AI tasks.

It’s designed for AI/ML engineers who prefer to run workloads locally — and for AI enthusiasts who want to understand real-world device performance.

See full benchmark results here: Laperf Results

Overview

Tasks

La Perf is a collection of reproducible tests and community-submitted results for :

Embeddings — ✅ Ready (sentence-transformers, IMDB dataset)
sts models:
- modernbert-embed-base
LLM inference — ✅ Ready (LM Studio and Ollama, Awesome Prompts dataset)
llm models:
- LM Studio: gpt-oss-20b
  - macOS: mlx-community/gpt-oss-20b-MXFP4-Q8 (MLX MXFP4-Q8)
  - Other platforms: lmstudio-community/gpt-oss-20b-GGUF (GGUF)
- Ollama: gpt-oss-20b
VLM inference — ✅ Ready (LM Studio and Ollama, Hallucination_COCO dataset)
vlm models:
- LM Studio: Qwen3-VL-8B-Thinking
  - macOS: mlx-community/Qwen3-VL-8B-Thinking-4bit (MLX 4-bit)
  - Other platforms: Qwen/Qwen3-VL-8B-Thinking-GGUF-Q4_K_M (Q4_K_M)
- Ollama: qwen3-vl:8b
  - all platforms: qwen3-vl:8b (Q4_K_M)
Diffusion image generation — 📋 Planned
Speach to Text - 📋 Planned (whisper)
Classic ML — 📋 Planned (scikit-learn, XGBoost, LightGBM, Catboost)

Note For mac-users: If it's possible prefer to use lmstudio with mlx backend, which gives 10-20% more performance then gguf. If you run ollama (by default benchmarks runs both lmstudio and ollama) then you'll see a difference between mlx and gguf formats.

The MLX backend makes the benchmark harder to maintain, but it provides a more realistic performance view, since it’s easy to convert a safetensors model into an mlx x-bit model.

Requirements

La Perf is compatible with Linux, macOS, and Windows. For embedding tasks, 8 GB of RAM is usually sufficient. However for all tasks, it is recommended to have at least 16 GB, 18 GB is better, and 24 GB or more provides the best performance and reduces swap usage.

It’s designed to run anywhere the uv package manager is installed.

It’s recommended to use a GPU from NVIDIA, AMD, Intel, or Apple, since AI workloads run significantly faster on GPUs. Make sure to enable full GPU offload in tools like LM Studio or Ollama for optimal performance.

For embedding tasks, La Perf automatically detects your available device and runs computations accordingly.

Benchmark Results

Last Updated: 2025-11-19

Device	Platform	CPU	GPU	VRAM	Emb RPS P50	LLM TPS P50 (lms)	LLM TPS P50 (ollama)	VLM TPS P50 (lms)	VLM TPS P50 (ollama)	GPU Power P50	CPU Power P50	Emb Efficiency (RPS/W)	LLM Efficiency (TPS/W) lms	LLM Efficiency (TPS/W) ollama	VLM Efficiency (TPS/W) lms	VLM Efficiency (TPS/W) ollama
ASUSTeK COMPUTER ASUS Vivobook Pro N6506MV	🐧 Linux	Intel(R) Core(TM) Ultra 9 185H (16)	NVIDIA GeForce RTX 4060 Laptop GPU	8 GB	119.1	8.8	10.0	11.8	8.4	16.6 W	-	7.18	0.53	0.60	0.71	0.51
Mac16,6	🍏 macOS	Apple M4 Max (14)	Apple M4 Max (32 cores)	shared with system RAM	56.2	61.5	61.4	55.4	45.7	11.7 W	1.0 W	4.79	5.24	5.24	4.72	3.89
Mac16,6 (battery)	🍏 macOS	Apple M4 Max (14) (battery)	Apple M4 Max (32 cores) (battery)	shared with system RAM	56.2	59.1	60.6	54.8	44.9	11.4 W	1.0 W	4.94	5.21	5.33	4.83	3.95
OpenStack Nova 26.0.7-1 A100 40GB	🐧 Linux	Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz	NVIDIA A100-PCIE-40GB	39 GB	453.6	-	113.5	-	108.0	218.2 W	-	2.08	-	0.52	-	0.50
OpenStack Nova A100 80GB	🐧 Linux	Intel Xeon Processor (Icelake)	NVIDIA A100 80GB PCIe	79 GB	623.8	-	135.5	-	121.2	230.5 W	-	2.71	-	0.59	-	0.53
OpenStack Nova RTX3090	🐧 Linux	Intel Xeon Processor (Cascadelake)	NVIDIA GeForce RTX 3090	24 GB	349.5	-	114.8	-	105.3	345.6 W	-	1.01	-	0.33	-	0.30
OpenStack Nova RTX4090	🐧 Linux	Intel Xeon Processor (Icelake)	NVIDIA GeForce RTX 4090	24 GB	643.6	-	148.7	-	130.4	282.5 W	-	2.28	-	0.53	-	0.46
OpenStack Nova Tesla T4	🐧 Linux	Intel Xeon Processor (Cascadelake)	Tesla T4	15 GB	133.7	-	41.5	-	32.6	68.9 W	-	1.94	-	0.60	-	0.47

RPS - Requests Per Second (embeddings throughput)

TPS - Tokens Per Second (generation speed)

W - Watts (power consumption)

Efficiency metrics (RPS/W, TPS/W) are calculated using GPU power consumption

⚡ Quick Start

For a full quickstart and setup instructions, please visit the La Perf documentation: Quickstart.

1. Clone the repository

git clone https://github.com/bogdanminko/laperf.git
cd laperf

2. (Optional) Configure environment variables

La Perf works out of the box with default settings, but you can customize it for different providers:

cp .env.example .env
# Edit .env to change URLs, models, dataset sizes, etc.

See .env.example for all available options, including how to use custom OpenAI-compatible providers like vLLM, TGI, or LocalAI.

3. Install dependencies (optional)

uv sync

This will:

Create a virtual environment
Install all required dependencies
Set up the project for immediate use

Running Your First Benchmark

Run all benchmarks

Using make

make bench

Using uv

uv run python main.py

This will:

Auto-detect your hardware (CUDA / MPS / CPU)
Run all available benchmarks (all are pre-selected — you can toggle individual ones in the TUI using Space)
Save the results to results/report_{your_device}.json

Power Monitoring Tool

La Perf includes a standalone real-time power monitoring tool that works independently from benchmarks.

📦 PyPI Package: laperf-power

Installation & Usage

Option 1: Run without installation (recommended) ⭐

# Lightweight standalone package (~5 MB with psutil)
# PyPI: https://pypi.org/project/laperf-power/
uvx laperf-power

# With custom options
uvx laperf-power --interval 1.0 --output metrics.json

Option 2: Install as a global tool

# Lightweight standalone package
uv tool install laperf-power
# or: pip install laperf-power

# Now available everywhere
laperf-power
laperf-power --interval 10.0 --no-sudo

Option 3: Development mode (from source)

git clone https://github.com/bogdanminko/laperf.git
cd laperf/laperf-power
uv pip install -e .
laperf-power

CLI Options

laperf-power [OPTIONS]

Options:
  --interval SECONDS    Sampling interval in seconds (default: 10.0)
  --no-sudo            Disable sudo powermetrics on macOS
  --output FILE        Save results to JSON file
  -h, --help           Show help message

Press Ctrl+C to stop and view statistics.

What it monitors

GPU: Power (W), Utilization (%), VRAM (GB), Temperature (°C)
CPU: Power (W, macOS only with sudo), Utilization (%)
System: RAM usage (GB), Battery drain (%)

Example Output

⚡ REAL-TIME POWER MONITORING
================================================================================
Started: 2025-11-27 14:30:00
Interval: 1.0s
================================================================================

Press Ctrl+C to stop and view statistics

[Sample #42] GPU: 11.7W 32% 8.2GB | CPU: 15% 1.0W | RAM: 16.3GB | Temp: 45°C

Platform Support:

macOS: Full support (with sudo for GPU/CPU power via powermetrics)
Linux (NVIDIA): GPU metrics via nvidia-smi
Windows: Basic CPU/RAM metrics via psutil

Running on GPU Servers (Docker)

For production deployments on cloud GPU instances or dedicated servers, you can use our Docker image:

Pull the image

docker pull bogdan01m/laperf-cli:latest

Run with NVIDIA GPU

docker run --gpus all -it --rm \
  -v $(pwd)/results:/app/results \
  bogdan01m/laperf-cli:latest

Run with AMD ROCm

docker run --device=/dev/kfd --device=/dev/dri -it --rm \
  -v $(pwd)/results:/app/results \
  bogdan01m/laperf-cli:latest

Run CPU-only

docker run -it --rm \
  -v $(pwd)/results:/app/results \
  bogdan01m/laperf-cli:latest

Note: Results will be saved to the mounted ./results directory on your host machine.

Citation

If you use LaPerf in your research or reports, please cite it as follows:

Minko B. (2025). LaPerf: Local AI Performance Benchmark Suite. GitHub repository. Available at: https://github.com/bogdan01m/laperf Licensed under the Apache License, Version 2.0.

BibTeX:

@software{laperf,
  author       = {Bogdan Minko},
  title        = {LaPerf: Local AI Performance Benchmark Suite},
  year         = {2025},
  url          = {https://github.com/bogdan01m/laperf},
  license      = {Apache-2.0},
  note         = {GitHub repository}
}

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
docs		docs
laperf-power		laperf-power
results		results
src		src
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile.laperf-cli		Dockerfile.laperf-cli
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

La Perf

La Perf — a local AI performance benchmark

Table of Contents

Overview

Tasks

Embeddings — ✅ Ready (sentence-transformers, IMDB dataset)

LLM inference — ✅ Ready (LM Studio and Ollama, Awesome Prompts dataset)

VLM inference — ✅ Ready (LM Studio and Ollama, Hallucination_COCO dataset)

Diffusion image generation — 📋 Planned

Speach to Text - 📋 Planned (whisper)

Classic ML — 📋 Planned (scikit-learn, XGBoost, LightGBM, Catboost)

Requirements

Benchmark Results

⚡ Quick Start

1. Clone the repository

2. (Optional) Configure environment variables

3. Install dependencies (optional)

Running Your First Benchmark

Run all benchmarks

Power Monitoring Tool

Installation & Usage

CLI Options

What it monitors

Example Output

Running on GPU Servers (Docker)

Pull the image

Run with NVIDIA GPU

Run with AMD ROCm

Run CPU-only

Citation

About

Uh oh!

Releases 3

Packages

Languages

License

bogdanminko/laperf

Folders and files

Latest commit

History

Repository files navigation

La Perf

La Perf — a local AI performance benchmark

Table of Contents

Overview

Tasks

Embeddings — ✅ Ready (sentence-transformers, IMDB dataset)

LLM inference — ✅ Ready (LM Studio and Ollama, Awesome Prompts dataset)

VLM inference — ✅ Ready (LM Studio and Ollama, Hallucination_COCO dataset)

Diffusion image generation — 📋 Planned

Speach to Text - 📋 Planned (whisper)

Classic ML — 📋 Planned (scikit-learn, XGBoost, LightGBM, Catboost)

Requirements

Benchmark Results

⚡ Quick Start

1. Clone the repository

2. (Optional) Configure environment variables

3. Install dependencies (optional)

Running Your First Benchmark

Run all benchmarks

Power Monitoring Tool

Installation & Usage

CLI Options

What it monitors

Example Output

Running on GPU Servers (Docker)

Pull the image

Run with NVIDIA GPU

Run with AMD ROCm

Run CPU-only

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages