Garbage outputs of Qwen3-Coder-Next in second run of generation code with KV cache-related warning.

Hi, I encountered buggy outputs from Qwen3-Coder-Next in second, third, (and so on) call of `create_completion`.

```python
from llama_cpp import Llama

model = Llama(
    "path/to/gguf/file.gguf", 
    n_gpu_layers=-1
)

prompt = """<|im_start|>system
You're a helpful AI assistant.
<|im_end|>
<|im_start|>user
Introduce yourself.
<|im_end|>
<|im_start|>assistant"""
```
```python
res = model.create_completion(prompt, stop=["<|im_end|>"])
print(res["choices"][0]["text"])
```

The first call of the `create_completion` gave me;
```
Hello! I'm Qwen, a large-scale language model developed by Alibaba Cloud
```
, which I guess is Okay.
But in second call gave;
```
</</</</</</</</</
```
with logging says
```
WARN: memory_seq_rm(0, 23, -1) failed. Executing fallback: memory_seq_rm(0, 0, -1)
```
I guess there's some bug in the part managing KV cache.
Note that this also happens in the previous version 3.23.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage outputs of Qwen3-Coder-Next in second run of generation code with KV cache-related warning. #59

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Garbage outputs of Qwen3-Coder-Next in second run of generation code with KV cache-related warning. #59

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions