-
-
Notifications
You must be signed in to change notification settings - Fork 36
[DRAFT] Add configurable reasoning preservation for openai completions #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Standard behavior: discard reasoning before last user message (based on OpenAI SDK). Model-level config enables preservation (e.g. GLM-4.7 with preserved thinking). Changes: - prune-history: add keep-history-reasoning parameter - Tests: cover both pruning and preservation - Docs: add keepHistoryReasoning to model schema
a94e839 to
1be0ff3
Compare
|
|
||
| ## Unreleased | ||
|
|
||
| - (OpenAI Chat) - Configurable reasoning history via `keepHistoryReasoning` (model-level, default: prune) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this default for all openai-chat is a good idea or if we should have it enabled only for google provider in config.clj, also shouldn't this be a flag by provider and not model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I've found in Opencode and OpenAI SDK it is default for every model.
- OpenCode transform
<think>block to message of type reasoning and this type is ignored by OpenAI SDK and never reach the LLM. - Delta reasoning messages are usually persisted in one turn and then thrown away (as you can see in GLM docs bellow, first vs second diagram)
It shouldn't be by provider because you might want to configure it only for e.g. GLM-4.7 (second diagram here ->
https://docs.z.ai/guides/capabilities/thinking-mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's keep an eye on that if affects any openai-chat model, I still think it's a little bit dangerous move, but if openai sdk is really doing that it may be ok
|
@ericdallo If you approve I will merge it tommorow after fixing integration tests. |
docs/models.md
Outdated
|
|
||
| === "History reasoning" | ||
|
|
||
| `keepHistoryReasoning` preserves reasoning in conversation history. Set for specific models: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we elaborate that some models require that, most people won't understand what this means and why one may need to use it or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if there is any model that requires it by default.
I mean:
- no model requires reasoning to be returned in
<think>tags (but user can experiment and see if there are models that profits from it). - It looks like GLM4.7 with
"clear_thinking": Falsemight profit from it while some other models can just ignore it. For example Deepseek:
When the next user question begins (Turn 2.1), the previous reasoning_content should be removed, while keeping other elements to send to the API. If reasoning_content is retained and sent to the API, the API will ignore it.
In Opencode you can't even configure this. So I would say that people might care about this only in very special scenarios where they already understand what this is about.
TLDR. I wouldn't say that any models require that, but maybe there is a room how to improve the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, require this as false, imagine a user checking docs and seeing that flag, it's not clear enough why one would like to set it to true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it is not hard requirement. You can experiment it with.
<think> blocks returned as assistant message will influence model behavior and I can imagine that there might exist scenarios where it is worth to experiment with it.
While reasoning_content might be dropped or not.
I am trying to come up with a nice short sentence that can explain it better than it is right now.
What about this?
Persists the model's internal reasoning chain within the chat context for future turns. Set for specific model configuration if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, I think it's important to mention something like: "Some models don't work when reason is sent back"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are those models though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add there a link to GLM config and explain the motivation behind false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well gemini right? they won't work if it's true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in the case of Gemini, it was specifically about XML thinking. So there is probably no model that would not support it. But maybe this change will help it break less (or not at all). It's hard to say, it's relatively rare and evaluating it is not free, hehe.
Discard reasoning before last user message (based on OpenAI SDK).
Model-level config enables preservation (e.g. GLM-4.7 with preserved thinking).
Changes:
prune-history: add keep-history-reasoning parameter
Tests: cover both pruning and preservation
Docs: add keepHistoryReasoning to model schema
I added a entry in changelog under unreleased section.
@ericdallo Subjectively, the behavior seems better now. Since it's difficult to evaluate, I went through the OpenAI SDK with Opencode via Opus and it found that they discard it as well. And I couldn't find any documentation that would suggest we should do otherwise by default.