Skip to content

Conversation

@imperatormk
Copy link

Summary

  • Adds native MPS (Metal Performance Shaders) backend for Apple Silicon Macs
  • Implements 8-bit optimizers (Adam, RMSprop, Lion, Momentum, AdEMAMix) using dynamic codebook quantization
  • Adds INT8 and 4-bit (NF4/FP4) quantization and matrix multiplication support
  • Updates README to reflect full MPS support
  • Enables MPS device in 8-bit optimizer test suite

Dependencies

Requires mps-bitsandbytes for Metal kernels:

pip install mps-bitsandbytes

Implementation Details

  • New bitsandbytes/backends/mps/ module with ops registration
  • Uses torch.library.register_kernel for seamless integration with existing torch.ops.bitsandbytes calls
  • Dynamic codebook quantization matching CUDA behavior (256 values, blockwise absmax scaling)
  • Falls back gracefully when mps_bitsandbytes package is not installed

Registered kernels:

  • 4-bit: quantize_4bit, dequantize_4bit, gemv_4bit
  • 8-bit blockwise: quantize_blockwise, dequantize_blockwise
  • INT8: int8_linear_matmul, int8_mm_dequant, int8_vectorwise_quant, int8_vectorwise_dequant, int8_scaled_mm
  • Optimizers: optimizer_update_8bit_blockwise (Adam, Momentum, RMSprop, Lion, AdEMAMix)

Notes

12 bf16/fp16 8-bit optimizer test failures:

  • bf16 adam/momentum/rmsprop/ademamix: relerr 0.0017-0.002 (threshold 0.0016)
  • fp16 rmsprop: relerr 0.0007-0.0008 (threshold 0.0006)

Adds native Metal GPU acceleration for MPS devices via mps-bitsandbytes.
When installed, automatically registers optimized kernels for:

- 4-bit quantization (NF4/FP4): quantize, dequantize, gemv
- 8-bit blockwise quantization
- INT8 linear operations
- 8-bit optimizers (Adam, Lion, SGD, RMSprop)

Falls back to default PyTorch implementation if mps-bitsandbytes
is not installed.

Tested on Apple M3 Max with 218/218 Linear4bit tests passing.
Attention operations can produce non-contiguous tensors that fail with
.view(). Using .reshape() handles both contiguous and non-contiguous cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant