Add MPS backend for Apple Silicon #1853

imperatormk · 2026-01-31T20:32:57Z

Summary

Adds native MPS (Metal Performance Shaders) backend for Apple Silicon Macs
Implements 8-bit optimizers (Adam, RMSprop, Lion, Momentum, AdEMAMix) using dynamic codebook quantization
Adds INT8 and 4-bit (NF4/FP4) quantization and matrix multiplication support
Updates README to reflect full MPS support
Enables MPS device in 8-bit optimizer test suite

Dependencies

Requires mps-bitsandbytes for Metal kernels:

pip install mps-bitsandbytes

Implementation Details

New bitsandbytes/backends/mps/ module with ops registration
Uses torch.library.register_kernel for seamless integration with existing torch.ops.bitsandbytes calls
Dynamic codebook quantization matching CUDA behavior (256 values, blockwise absmax scaling)
Falls back gracefully when mps_bitsandbytes package is not installed

Registered kernels:

4-bit: quantize_4bit, dequantize_4bit, gemv_4bit
8-bit blockwise: quantize_blockwise, dequantize_blockwise
INT8: int8_linear_matmul, int8_mm_dequant, int8_vectorwise_quant, int8_vectorwise_dequant, int8_scaled_mm
Optimizers: optimizer_update_8bit_blockwise (Adam, Momentum, RMSprop, Lion, AdEMAMix)

Notes

12 bf16/fp16 8-bit optimizer test failures:

bf16 adam/momentum/rmsprop/ademamix: relerr 0.0017-0.002 (threshold 0.0016)
fp16 rmsprop: relerr 0.0007-0.0008 (threshold 0.0006)

Adds native Metal GPU acceleration for MPS devices via mps-bitsandbytes. When installed, automatically registers optimized kernels for: - 4-bit quantization (NF4/FP4): quantize, dequantize, gemv - 8-bit blockwise quantization - INT8 linear operations - 8-bit optimizers (Adam, Lion, SGD, RMSprop) Falls back to default PyTorch implementation if mps-bitsandbytes is not installed. Tested on Apple M3 Max with 218/218 Linear4bit tests passing.

Attention operations can produce non-contiguous tensors that fail with .view(). Using .reshape() handles both contiguous and non-contiguous cases.

imperatormk force-pushed the main branch from ea46414 to aa3afc3 Compare January 31, 2026 21:40

imperatormk added 2 commits January 31, 2026 22:49

Fix int8_vectorwise_quant to zero outliers before absmax computation

666075b

Fix view() -> reshape() for non-contiguous tensors in MPS backend

e620054

Attention operations can produce non-contiguous tensors that fail with .view(). Using .reshape() handles both contiguous and non-contiguous cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MPS backend for Apple Silicon #1853

Add MPS backend for Apple Silicon #1853

Uh oh!

imperatormk commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Add MPS backend for Apple Silicon #1853

Are you sure you want to change the base?

Add MPS backend for Apple Silicon #1853

Uh oh!

Conversation

imperatormk commented Jan 31, 2026

Summary

Dependencies

Implementation Details

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant