Name and Version
version: 9459 (07ac3ce)
built with Clang 19.1.5 for Windows x86_64 (latest version)
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 5060 Ti
Models
No response
Problem description & steps to reproduce
D:/a/beellama.cpp/beellama.cpp/ggml/src/ggml-backend.cpp:272: GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds") failed
In the previous version it was working, and I was getting a 1.5x+ speed up on the MoE model too, but now it seems only dense is working.
First Bad Commit
No response
Relevant log output
Logs
Name and Version
version: 9459 (07ac3ce)
built with Clang 19.1.5 for Windows x86_64 (latest version)
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 5060 Ti
Models
No response
Problem description & steps to reproduce
D:/a/beellama.cpp/beellama.cpp/ggml/src/ggml-backend.cpp:272: GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds") failed
In the previous version it was working, and I was getting a 1.5x+ speed up on the MoE model too, but now it seems only dense is working.
First Bad Commit
No response
Relevant log output
Logs