Skip to content

Garbage outputs of Qwen3-Coder-Next in second run of generation code with KV cache-related warning. #59

@yamikumo-DSD

Description

@yamikumo-DSD

Hi, I encountered buggy outputs from Qwen3-Coder-Next in second, third, (and so on) call of create_completion.

from llama_cpp import Llama

model = Llama(
    "path/to/gguf/file.gguf", 
    n_gpu_layers=-1
)

prompt = """<|im_start|>system
You're a helpful AI assistant.
<|im_end|>
<|im_start|>user
Introduce yourself.
<|im_end|>
<|im_start|>assistant"""
res = model.create_completion(prompt, stop=["<|im_end|>"])
print(res["choices"][0]["text"])

The first call of the create_completion gave me;

Hello! I'm Qwen, a large-scale language model developed by Alibaba Cloud

, which I guess is Okay.
But in second call gave;

</</</</</</</</</

with logging says

WARN: memory_seq_rm(0, 23, -1) failed. Executing fallback: memory_seq_rm(0, 0, -1)

I guess there's some bug in the part managing KV cache.
Note that this also happens in the previous version 3.23.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions