-
Notifications
You must be signed in to change notification settings - Fork 7.1k
feat: support claude agent SDK-style structured outputs in the OpenCode SDK #8161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Add outputFormat option to session.prompt() for requesting structured JSON output. When type is 'json_schema', injects a StructuredOutput tool that validates model output against the provided schema. - Add OutputFormat schema types (text, json_schema) to message-v2.ts - Add structured_output field to AssistantMessage - Add StructuredOutputError for validation failures - Implement createStructuredOutputTool helper in prompt.ts - Integrate structured output into agent loop with retry support - Regenerate OpenAPI spec with new types - Add unit tests for schema validation
Regenerate SDK types to include outputFormat and structured_output fields.
- Fix loop exit condition to check processor.message.finish instead of result === "stop" (processor.process() returns "continue" on normal exit) - Add system prompt instruction when json_schema mode is enabled to ensure model calls the StructuredOutput tool - Add integration tests for structured output functionality - Fix test Session.messages call to use object parameter format
Document the outputFormat feature for requesting structured JSON output: - Basic usage example with JSON schema - Output format types (text, json_schema) - Schema configuration options (type, schema, retryCount) - Error handling for StructuredOutputError - Best practices for using structured output
When structured output is requested, the model must call the StructuredOutput tool instead of responding with plain text. Changes: - Add toolChoice parameter to LLM.StreamInput - Pass toolChoice: "required" to streamText when outputFormat is json_schema - This forces the model to call a tool, ensuring structured output is captured
Merge upstream changes while preserving structured output feature: - Keep tools deprecation notice from upstream - Keep bypassAgentCheck parameter from upstream - Keep variant field on user messages from upstream - Preserve outputFormat and StructuredOutput tool injection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
cc @thdxr :) |
|
The following comment was made by an LLM, it may be inaccurate: No duplicate PRs found |
|
Happy to clean up merge conflicts if this PR is on the right track implementation-wise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a solid start! I have some questions around the implementation.
question: retry logic
Maybe I'm missing something, but where is the retry logic? I see there is a retryCount in the OutputFormat (with the json_schema discriminant); however, I can't determine where this is used. The error seems to be hard coded to 0 retries when created.
discussion: architecture
Is this the best path to take? I understand this is reverse-engineered from Claude Code, but let's be honest: I'm here because Claude Code was pissing me off. They couldn't even figure out how to remove MCP tools from their prompts when the tools are disabled for subagents.
Important
Actually, I researched the claude agents SDK and I'm fairly certain (by reading through their code) they use native output_format as defined here. This may not have always been the case, but they do seem to be using API enforced structured outputs now.
That said, what do you think about an approach that utilizes the provider-level structured outputs? Almost all providers implement something for their models, and I think we could likely provide a clean abstraction that utilizes those features.
I personally built a system similar to this PR this inside my own plugin. Instead of a tool call, it simply injected the expected response format and requested the model reply in an object that matched. It does work, there's no doubt of that, but it does have it's limitations. In some scenarios, assuming the retry logic is implemented, you end up with a few extra round trips to correct the outputs. It's easy enough to enforce this at the tool layer, but different models have different levels of "willingness to cooperate".
Regarding my implementation, I injected the Zod schemas directly (in a format the LLM could understand) and asked it to just generate that output, instead of calling a tool. I found this method to be slightly more reliable than requesting a tool call, because many models see tool calls and "non-terminal" to their execution loops. Or at least, this is a working theory I have based off how many times I've seen OpenAI models get sassy when I request they end with tool calls.
Anyway, food for thought.
TLDR
- Is there retry logic?
- Should we look into first-class provider support instead of prompt injection + output validation + retries?
Fascinating, do you have evals or some reproduction setup that demonstrates this? The architectural approach is similar to Claude code's because (a) OpenCode's harness is largely based on Claude code's, (b) not all inference providers support structured outputs so this output is more generalizable, and (c) I had conversations with one of the OpenCode maintainers to validate the approach before I started Will double-check retry logic once I'm back at my machine |
Unfortunately no, this was just me messing with it across different models. I do think it's worth doing some research on before merging into OpenCode core though. I wouldn't mind trying that out and reporting my findings, since I could get access to both methodologies quite easily.
I don't think that's true (at least not currently). I just pulled the latest from the claude agents SDK and they are using the
That I totally agree with. That's also why I'm fairly certain Claude Code does NOT do this because they only call into their APIs which support this. It would be nonsensical for them to not use it, actually. For us though...we don't have that luxury. It's why I implemented something similar in my plugin. That said, I wonder if there is a two stage approach we could take? We could attempt to use the provider support directly or fall back to this injection/tool method if not available. I think that actually opens the door for doing this, then later adding provider support (maybe even per-provider) as it becomes available. |
|
BTW @K-Mistele want to move the "architecture" talk back to the issue? It might be larger than just this PR, especially if we do a two-staged thing (this PR + some other stuff). |
Good to know, let's set aside guesswork for now then.
This is a straightforward feature and a research project before merging it seems unwarranted, happy to leave that decision up to the maintainers though.
Unfortunately, this is guesswork, and it is incorrect. Perhaps I should've been more clear - the tool injection approach is precisely how Claude code implements this feature, based on analysis of proxied network traffic from the harness. It does not use the output_format feature in the API, it injects a tool called StructuredOutput. The prompts I provided are very similar to what the claude harness uses for that StructuredOutput tool that it injects for this purpose.
See above. I am precisely certain this is exactly the approach the harness uses. Verifying this is trivial.
Correct, retries are a tactical implementation decision to compensate for smaller models or lower-precision inference providers which OpenCode supports and which many users may wish to use |
I attempted to do just that with a proxy and didn't see that tool. Granted, I also don't know how I would specifically get Claude Code to trigger a structured output. I got it to ask a question, but that just called it's The only reason I could see Claude Code using a tool like that at the end of the generation would be to ensure streaming responses (which return chunks of JSON) result in the properly formatted result at the end. But I feel like that would be best done in the actual harness, not relying on a non-deterministic LLM to hopefully call the tool.
I think you misunderstood me, or maybe I am misunderstanding you... I get the value of retries and I see that you provided a way to configure a number of retries. The issue I'm seeing is that the logic to perform those retries, increment, and and check for retry limits seems to be missing. If you pass in Am I missing something? How does this implementation handle the retries? |
|
Checking on the retries right now.
yes, use the claude agent SDK (https://platform.claude.com/docs/en/agent-sdk/typescript) with link to more details about how to use structured outputs in the agent SDK is given in the PR description (or here) Do that and put a proxy in front of it. You'll see something like this (note that the input parameters / descriptions are from your specified JSON schema)
|
Sounds good!
Ahh, that's very interesting. Thanks for providing that! What's interesting is I know that the Anthropic Client SDK uses the underlying API The tool/validate/retry loop you proposed here is good for all the reasons you mentioned. So just to be clear, I'm not against using this approach at all. I think it's a good method to ensure it "always" (within reason) works regardless of the underlying provider/model capability/etc. I also want it to be clear I'm not against using a tool to do the validation. My anecdotal experience was just that: anecdotal. I simply wanted to provide context to see if that was tested before hand. I certainly didn't do a thorough analysis of the best method to use and I assume Anthropic verified their approach, but only against their models. When it comes to relying on output from just prompt vs requiring a tool, I would much rather rely on a tool. The only reason I opted to do something else in my plugin was because certain low powered OpenAI models didn't reliably call tools. I wanted to use a tool and am now thinking I'll try that method again. It's possible I didn't implement it fully by requesting the model always call a tool. The only real concern I have for this PR is the retry loop. The discussion about using provider APIs directly is not material to this work, as it would be a follow up regardless. |
Add 5 new unit tests that verify the retry mechanism for structured output validation: - Multiple validation failures trigger multiple onError calls - Success after failures correctly calls onSuccess - Error messages guide model to fix issues and retry - Simulates retry state tracking matching prompt.ts logic - Simulates successful retry after initial failures Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thanks for adding the tests! I want to make sure I'm following the implementation correctly. Looking at the test Here's the specific flow I'm trying to trace:
From what I can see in
For retries to work, I'd expect the code to:
Without steps 2-3, the LLM never gets another chance. It doesn't know validation failed because the loop already exited. Could you point me to where this happens? I might be looking at the wrong place, but I've traced through The tests verify that Happy to hop on a call if that's easier! I just want to make sure users get the retry behavior they're expecting. |

What does this PR do?
This PR adds support for structured generation in the OpenCode SDK (v2) by implementing an injected
StructuredOutputtool with an arbitrary user-defined JSON schema.This implementation approach follows Claude Code's approach as indicated by debugging and proxy analysis.
When set,
toolChoiceis forced to"required"and the agent loop is exited once the input to the tool call passes validation against the user-supplied JSON Schema.Retries are supported, and retry count is customizable (important for smaller models or providers which have poor support for structured generation!).
Prompts and descriptions are based on those used in the Claude Agent SDK
You can learn more about structured outputs in the Claude Agent SDK here
Closes #5639
How did you verify your code works?
This PR contains both unit tests and integration tests.