Skip to content

fix: benchmark Qwen3.5 compatibility — disable thinking, robust JSON …#150

Merged
solderzzc merged 1 commit intodevelopfrom
feature/benchmark-qwen35-fixes
Mar 14, 2026
Merged

fix: benchmark Qwen3.5 compatibility — disable thinking, robust JSON …#150
solderzzc merged 1 commit intodevelopfrom
feature/benchmark-qwen35-fixes

Conversation

@solderzzc
Copy link
Member

…parsing, token streaming

  • Disable Qwen3.5 thinking via empty assistant prefix injection
  • Add balanced brace JSON parser to handle trailing thinking text
  • Add buffered token streaming with [C]/[R] field tagging
  • Add smart early abort: 100 reasoning tokens, 2x maxTokens, 2000 global cap
  • Add full prompt logging with inline image support ([IMG:] protocol)
  • Add Qwen3.5 recommended non-thinking params (presence_penalty 1.5)
  • Remove unsupported response_format and chat_template_kwargs for llama-server

…parsing, token streaming

- Disable Qwen3.5 thinking via empty <think></think> assistant prefix injection
- Add balanced brace JSON parser to handle trailing thinking text
- Add buffered token streaming with [C]/[R] field tagging
- Add smart early abort: 100 reasoning tokens, 2x maxTokens, 2000 global cap
- Add full prompt logging with inline image support ([IMG:] protocol)
- Add Qwen3.5 recommended non-thinking params (presence_penalty 1.5)
- Remove unsupported response_format and chat_template_kwargs for llama-server
@solderzzc solderzzc merged commit bf4c517 into develop Mar 14, 2026
1 check passed
@solderzzc solderzzc deleted the feature/benchmark-qwen35-fixes branch March 14, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant