I get fatal errors on startup regardless if I use your precompiled binaries or build my own from master, but different errors depending on the model loaded. Here's the end of the startup log.
Unsloth/gemma-4-31B-it-UD-Q4_K_XL:
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve: CUDA0 compute buffer size = 522.50 MiB
sched_reserve: CUDA_Host compute buffer size = 61.02 MiB
sched_reserve: graph nodes = 2584
sched_reserve: graph splits = 291 (with bs=512), 71 (with bs=1)
sched_reserve: reserve took 40.29 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
C:\Users\anbee\projects\beellama.cpp\ggml\src\ggml-cpu\ops.cpp:4447: fatal error
C:\Users\anbee\projects\beellama.cpp\ggml\src\ggml-cpu\ops.cpp:4447: fatal error
C:\Users\anbee\projects\beellama.cpp\ggml\src\ggml-cpu\ops.cpp:4447: fatal error
The same thing happens even when I add --no-warmup. When I try Qwen, beellama won't even load the model:
Unsloth/Qwen3.6-27B-UD-Q5_K_XL:
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 866, got 862
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'F:/LLMs/Qwen3.6-27B-UD-Q5_K_XL.gguf'
srv load_model: failed to load model, 'F:/LLMs/Qwen3.6-27B-UD-Q5_K_XL.gguf'
srv operator (): operator (): cleaning up before exit...
main: exiting due to model loading error
Both these models work perfectly in vanilla llama.cpp. Oh, and I have an RTX 4090 and a Ryzen 7950X3D with 96 GB RAM.
I get fatal errors on startup regardless if I use your precompiled binaries or build my own from master, but different errors depending on the model loaded. Here's the end of the startup log.
Unsloth/gemma-4-31B-it-UD-Q4_K_XL:
The same thing happens even when I add --no-warmup. When I try Qwen, beellama won't even load the model:
Unsloth/Qwen3.6-27B-UD-Q5_K_XL:
Both these models work perfectly in vanilla llama.cpp. Oh, and I have an RTX 4090 and a Ryzen 7950X3D with 96 GB RAM.