Skip to content

Conversation

@kernelpool
Copy link
Contributor

@kernelpool kernelpool commented Jan 18, 2026

Proposed changes

Sharding 6-bit (and other non-power-of-2) quantized models with certain input dimensions (like 1536) may fail because input_dims *= 32 // bits truncates incorrectly. See: ml-explore/mlx-lm#771 (comment)

For 6-bit with packed dimension 288:

  • Before: 288 * (32 // 6) = 288 * 5 = 1440 -> wrong
  • After: (288 * 32) // 6 = 1536 -> correct

This caused shard_linear to fail with:
ValueError: [quantize] ... matrix has shape (6144,1440)

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing that!

@awni awni merged commit ca14d3d into ml-explore:main Jan 18, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants