Skip to content

Conversation

@K-Mistele
Copy link

What does this PR do?

This PR adds support for structured generation in the OpenCode SDK (v2) by implementing an injected StructuredOutput tool with an arbitrary user-defined JSON schema.

This implementation approach follows Claude Code's approach as indicated by debugging and proxy analysis.

When set, toolChoice is forced to "required" and the agent loop is exited once the input to the tool call passes validation against the user-supplied JSON Schema.

Retries are supported, and retry count is customizable (important for smaller models or providers which have poor support for structured generation!).

Prompts and descriptions are based on those used in the Claude Agent SDK

You can learn more about structured outputs in the Claude Agent SDK here

Closes #5639

How did you verify your code works?

This PR contains both unit tests and integration tests.

K-Mistele and others added 7 commits January 12, 2026 20:09
Add outputFormat option to session.prompt() for requesting structured JSON
output. When type is 'json_schema', injects a StructuredOutput tool that
validates model output against the provided schema.

- Add OutputFormat schema types (text, json_schema) to message-v2.ts
- Add structured_output field to AssistantMessage
- Add StructuredOutputError for validation failures
- Implement createStructuredOutputTool helper in prompt.ts
- Integrate structured output into agent loop with retry support
- Regenerate OpenAPI spec with new types
- Add unit tests for schema validation
Regenerate SDK types to include outputFormat and structured_output fields.
- Fix loop exit condition to check processor.message.finish instead of
  result === "stop" (processor.process() returns "continue" on normal exit)
- Add system prompt instruction when json_schema mode is enabled to ensure
  model calls the StructuredOutput tool
- Add integration tests for structured output functionality
- Fix test Session.messages call to use object parameter format
Document the outputFormat feature for requesting structured JSON output:
- Basic usage example with JSON schema
- Output format types (text, json_schema)
- Schema configuration options (type, schema, retryCount)
- Error handling for StructuredOutputError
- Best practices for using structured output
When structured output is requested, the model must call the
StructuredOutput tool instead of responding with plain text.

Changes:
- Add toolChoice parameter to LLM.StreamInput
- Pass toolChoice: "required" to streamText when outputFormat is json_schema
- This forces the model to call a tool, ensuring structured output is captured
Merge upstream changes while preserving structured output feature:
- Keep tools deprecation notice from upstream
- Keep bypassAgentCheck parameter from upstream
- Keep variant field on user messages from upstream
- Preserve outputFormat and StructuredOutput tool injection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@K-Mistele
Copy link
Author

cc @thdxr :)

@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

No duplicate PRs found

@K-Mistele
Copy link
Author

Happy to clean up merge conflicts if this PR is on the right track implementation-wise

Copy link

@eXamadeus eXamadeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a solid start! I have some questions around the implementation.

question: retry logic

Maybe I'm missing something, but where is the retry logic? I see there is a retryCount in the OutputFormat (with the json_schema discriminant); however, I can't determine where this is used. The error seems to be hard coded to 0 retries when created.

discussion: architecture

Is this the best path to take? I understand this is reverse-engineered from Claude Code, but let's be honest: I'm here because Claude Code was pissing me off. They couldn't even figure out how to remove MCP tools from their prompts when the tools are disabled for subagents.

Important

Actually, I researched the claude agents SDK and I'm fairly certain (by reading through their code) they use native output_format as defined here. This may not have always been the case, but they do seem to be using API enforced structured outputs now.

That said, what do you think about an approach that utilizes the provider-level structured outputs? Almost all providers implement something for their models, and I think we could likely provide a clean abstraction that utilizes those features.

I personally built a system similar to this PR this inside my own plugin. Instead of a tool call, it simply injected the expected response format and requested the model reply in an object that matched. It does work, there's no doubt of that, but it does have it's limitations. In some scenarios, assuming the retry logic is implemented, you end up with a few extra round trips to correct the outputs. It's easy enough to enforce this at the tool layer, but different models have different levels of "willingness to cooperate".

Regarding my implementation, I injected the Zod schemas directly (in a format the LLM could understand) and asked it to just generate that output, instead of calling a tool. I found this method to be slightly more reliable than requesting a tool call, because many models see tool calls and "non-terminal" to their execution loops. Or at least, this is a working theory I have based off how many times I've seen OpenAI models get sassy when I request they end with tool calls.

Anyway, food for thought.

TLDR

  1. Is there retry logic?
  2. Should we look into first-class provider support instead of prompt injection + output validation + retries?

@K-Mistele
Copy link
Author

I found this method to be slightly more reliable than requesting a tool call

Fascinating, do you have evals or some reproduction setup that demonstrates this?

The architectural approach is similar to Claude code's because (a) OpenCode's harness is largely based on Claude code's, (b) not all inference providers support structured outputs so this output is more generalizable, and (c) I had conversations with one of the OpenCode maintainers to validate the approach before I started

Will double-check retry logic once I'm back at my machine

@eXamadeus
Copy link

eXamadeus commented Jan 19, 2026

Fascinating, do you have evals or some reproduction setup that demonstrates this?

Unfortunately no, this was just me messing with it across different models. I do think it's worth doing some research on before merging into OpenCode core though. I wouldn't mind trying that out and reporting my findings, since I could get access to both methodologies quite easily.

The architectural approach is similar to Claude code's

I don't think that's true (at least not currently). I just pulled the latest from the claude agents SDK and they are using the output_format parameters which corresponds with their API. I also didn't see any retry logic in the code I was reading through.

not all inference providers support structured outputs so this output is more generalizable

That I totally agree with. That's also why I'm fairly certain Claude Code does NOT do this because they only call into their APIs which support this. It would be nonsensical for them to not use it, actually.

For us though...we don't have that luxury. It's why I implemented something similar in my plugin.

That said, I wonder if there is a two stage approach we could take? We could attempt to use the provider support directly or fall back to this injection/tool method if not available. I think that actually opens the door for doing this, then later adding provider support (maybe even per-provider) as it becomes available.

@eXamadeus
Copy link

eXamadeus commented Jan 19, 2026

BTW @K-Mistele want to move the "architecture" talk back to the issue? It might be larger than just this PR, especially if we do a two-staged thing (this PR + some other stuff).

@K-Mistele
Copy link
Author

Fascinating, do you have evals or some reproduction setup that demonstrates this?

Unfortunately no, this was just me messing with it across different models. I do think it's worth doing some research on before merging into OpenCode core though. I wouldn't mind trying that out and reporting my findings, since I could get access to both methodologies quite easily.

The architectural approach is similar to Claude code's

I don't think that's true (at least not currently). I just pulled the latest from the claude agents SDK and they are using the output_format parameters which corresponds with their API. I also didn't see any retry logic in the code I was reading through.

not all inference providers support structured outputs so this output is more generalizable

That I totally agree with. That's also why I'm fairly certain Claude Code does NOT do this because they only call into their APIs which support this. It would be nonsensical for them to not use it, actually.

For us though...we don't have that luxury. It's why I implemented something similar in my plugin.

That said, I wonder if there is a two stage approach we could take? We could attempt to use the provider support directly or fall back to this injection/tool method if not available. I think that actually opens the door for doing this, then later adding provider support (maybe even per-provider) as it becomes available.

Unfortunately no, this was just me messing with it across different models.

Good to know, let's set aside guesswork for now then.

I do think it's worth doing some research on before merging into OpenCode core though.

This is a straightforward feature and a research project before merging it seems unwarranted, happy to leave that decision up to the maintainers though.

I don't think that's true (at least not currently). I just pulled the latest from the claude agents SDK and they are using the output_format parameters which corresponds with their API.

Unfortunately, this is guesswork, and it is incorrect. Perhaps I should've been more clear - the tool injection approach is precisely how Claude code implements this feature, based on analysis of proxied network traffic from the harness.

It does not use the output_format feature in the API, it injects a tool called StructuredOutput.

The prompts I provided are very similar to what the claude harness uses for that StructuredOutput tool that it injects for this purpose.

I'm fairly certain Claude Code does NOT do this because they only call into their APIs which support this. It would be nonsensical for them to not use it, actually.

See above. I am precisely certain this is exactly the approach the harness uses. Verifying this is trivial.

I also didn't see any retry logic in the code I was reading through.

Correct, retries are a tactical implementation decision to compensate for smaller models or lower-precision inference providers which OpenCode supports and which many users may wish to use

@eXamadeus
Copy link

eXamadeus commented Jan 19, 2026

@eXamadeus:

I'm fairly certain Claude Code does NOT do this because they only call into their APIs which support this. It would be nonsensical for them to not use it, actually.

@K-Mistele:
See above. I am precisely certain this is exactly the approach the harness uses. Verifying this is trivial.

I attempted to do just that with a proxy and didn't see that tool. Granted, I also don't know how I would specifically get Claude Code to trigger a structured output. I got it to ask a question, but that just called it's AskUserQuestion tool. Do you have more information on how you verified this? Or could you provide the verification?

The only reason I could see Claude Code using a tool like that at the end of the generation would be to ensure streaming responses (which return chunks of JSON) result in the properly formatted result at the end. But I feel like that would be best done in the actual harness, not relying on a non-deterministic LLM to hopefully call the tool.

@eXamadeus:

I also didn't see any retry logic in the code I was reading through.

@K-Mistele:
Correct, retries are a tactical implementation decision to compensate for smaller models or lower-precision inference providers which OpenCode supports and which many users may wish to use

I think you misunderstood me, or maybe I am misunderstanding you...

I get the value of retries and I see that you provided a way to configure a number of retries. The issue I'm seeing is that the logic to perform those retries, increment, and and check for retry limits seems to be missing. If you pass in retryCount: 3, from what I can tell the tooling does not perform retries when a failure occurs.

Am I missing something? How does this implementation handle the retries?

@K-Mistele
Copy link
Author

K-Mistele commented Jan 19, 2026

Checking on the retries right now.

I attempted to do just that with a proxy and didn't see that tool. Granted I also don't know how I would specifically get Claude Code to trigger a structured output. Do you have more information on how you verified this?

yes, use the claude agent SDK (https://platform.claude.com/docs/en/agent-sdk/typescript) with outputFormat specified

link to more details about how to use structured outputs in the agent SDK is given in the PR description (or here)

Do that and put a proxy in front of it. You'll see something like this (note that the input parameters / descriptions are from your specified JSON schema)

Screenshot 2026-01-19 at 10 09 08 AM

@eXamadeus
Copy link

eXamadeus commented Jan 19, 2026

Checking on the retries right now.

Sounds good!

You'll see something like this

Ahh, that's very interesting. Thanks for providing that!

What's interesting is I know that the Anthropic Client SDK uses the underlying API output_format directly. So I'm a little surprised that their Agent SDK doesn't. There's probably some reason for that choice like supporting Bedrock models/older APIs/etc, but that's anyone's guess as Anthropic isn't exactly...open with their processes.

The tool/validate/retry loop you proposed here is good for all the reasons you mentioned. So just to be clear, I'm not against using this approach at all. I think it's a good method to ensure it "always" (within reason) works regardless of the underlying provider/model capability/etc.

I also want it to be clear I'm not against using a tool to do the validation. My anecdotal experience was just that: anecdotal. I simply wanted to provide context to see if that was tested before hand. I certainly didn't do a thorough analysis of the best method to use and I assume Anthropic verified their approach, but only against their models.

When it comes to relying on output from just prompt vs requiring a tool, I would much rather rely on a tool. The only reason I opted to do something else in my plugin was because certain low powered OpenAI models didn't reliably call tools. I wanted to use a tool and am now thinking I'll try that method again. It's possible I didn't implement it fully by requesting the model always call a tool.

The only real concern I have for this PR is the retry loop. The discussion about using provider APIs directly is not material to this work, as it would be a follow up regardless.

K-Mistele and others added 2 commits January 19, 2026 11:28
Add 5 new unit tests that verify the retry mechanism for structured
output validation:
- Multiple validation failures trigger multiple onError calls
- Success after failures correctly calls onSuccess
- Error messages guide model to fix issues and retry
- Simulates retry state tracking matching prompt.ts logic
- Simulates successful retry after initial failures

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@eXamadeus
Copy link

eXamadeus commented Jan 20, 2026

Thanks for adding the tests! I want to make sure I'm following the implementation correctly.

Looking at the test "simulates retry state tracking (like prompt.ts does)", I see it creates its own structuredOutputRetries counter and implements a retry loop within the test itself. But when I look at prompt.ts, I'm not finding the corresponding loop in the actual implementation.

Here's the specific flow I'm trying to trace:

  1. User configures retryCount: 3
  2. Model produces invalid output (fails schema validation)
  3. What happens next?

From what I can see in prompt.ts:

  • Line ~660 creates StructuredOutputError with retries: 0 hardcoded
  • The loop breaks immediately
  • outputFormat.retryCount is stored but never read

For retries to work, I'd expect the code to:

  1. Catch the validation failure
  2. Inject an error message back into the conversation (so the LLM knows to try again)
  3. Continue the loop instead of breaking
  4. Only break with StructuredOutputError after exhausting retryCount

Without steps 2-3, the LLM never gets another chance. It doesn't know validation failed because the loop already exited.

Could you point me to where this happens? I might be looking at the wrong place, but I've traced through prompt.ts a few times and I'm not seeing it.

The tests verify that createStructuredOutputTool correctly fires onSuccess/onError callbacks when called, but they test the tool in isolation by manually invoking execute() multiple times. What I'm trying to find is where prompt.ts itself re-invokes the loop on failure.

Happy to hop on a call if that's easier! I just want to make sure users get the retry behavior they're expecting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Support for Structured Outputs in the OpenCode SDK

2 participants