Skip to content

Conversation

@Anri-Lombard
Copy link
Contributor

Fixes #1800 by lowering the four-step FFT threshold from 4096 to 1024, forcing recursive decomposition to avoid Metal threadgroup memory limits. Added test case for 2^21, 2^22, 2^23.

Fixes ml-explore#1800

The FFT was failing for arrays of size 2^21 and 2^22 with a "kernel not found" error because the four-step decomposition was creating sub-FFTs (n1=2048 or 4096) that exceeded Metal's threadgroup memory limit.

This fix lowers the four-step FFT threshold from 4096 to 1024, forcing recursive decomposition earlier and ensuring all constituent FFTs fit within Metal's 32KB threadgroup memory limit.

Changes:
- Added MAX_SAFE_FFT_SIZE constant (1024)
- Updated plan_fft to use MAX_SAFE_FFT_SIZE instead of MAX_STOCKHAM_FFT_SIZE
- Added test case for 2^21, 2^22, 2^23 to prevent regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] FFT fails on certain array lengths

1 participant