Skip to content

fix(ai): undo strict-mode null-widening before structured-output validation#732

Open
DrewHoo wants to merge 2 commits into
TanStack:mainfrom
DrewHoo:fix/structured-output-schema-aware-null-normalization
Open

fix(ai): undo strict-mode null-widening before structured-output validation#732
DrewHoo wants to merge 2 commits into
TanStack:mainfrom
DrewHoo:fix/structured-output-schema-aware-null-normalization

Conversation

@DrewHoo

@DrewHoo DrewHoo commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Structured output rejects a valid provider response when the schema has an optional field. To satisfy OpenAI-style strict schemas, optional fields are widened to required + nullable, so the provider returns null for an absent optional — but that null is then validated against the original schema, where .optional() means T | undefined, not T | null. Validation throws.

This adds a schema-aware undoNullWidening to @tanstack/ai-utils and runs it on the structured-output result before Standard Schema validation, so the synthesized nulls are dropped while genuine .nullable()/.nullish() nulls are preserved.

Fixes #731

Reproduction

import * as v from 'valibot'
import { toStandardJsonSchema } from '@valibot/to-json-schema'

const schema = toStandardJsonSchema(
  v.object({ title: v.string(), note: v.optional(v.string()) }),
)
// chat({ outputSchema: schema, ... }) with @tanstack/ai-openrouter
// → StandardSchemaValidationError: Invalid type: Expected string but received null

The outbound schema sent to the provider widens note to { "type": ["string", "null"] } and adds it to required, so the model returns { "title": "…", "note": null }. The original Valibot schema rejects that null.

Root cause

convertSchemaToJsonSchema(schema, { forStructuredOutput: true }) widens optional → required + nullable (correct, and required for strict mode). The round trip just never undoes it: the response is validated against the un-widened original via parseWithStandardSchema, which rejects the null the widening told the model to send.

Fix

  • @tanstack/ai-utils gains undoNullWidening(value, schema) — a schema-aware counterpart to transformNullsToUndefined. It consults the original (un-widened) JSON Schema and drops a null only where the schema disallows one (a synthesized widening), preserving nulls a .nullable()/.nullish() field genuinely allows, and leaving nulls under shapes the schema doesn't describe untouched.
  • @tanstack/ai runs it inside the structured-output validate callback (activities/chat), before parseWithStandardSchema, using convertSchemaToJsonSchema(outputSchema) (no forStructuredOutput) as the nullability source of truth.

Why core, not the adapter

By the time the schema reaches the adapter it's already been widened (forStructuredOutput: true), so the adapter can no longer tell a synthesized null from a genuine one. Only the activity layer still holds the original schema, so the normalization has to live there. This also fixes it for every adapter on the validated path at once, rather than per-adapter.

Tests

  • @tanstack/ai-utils: unit coverage for undoNullWidening — optional vs. nullable, mixed, nested nullable objects (via anyOf), array items, schemaless passthrough.
  • @tanstack/ai: end-to-end chat({ outputSchema }) regression — a provider null on an optional field now resolves (key absent), and a null on a .nullable() field is preserved.
  • Full @tanstack/ai suite (1005 tests) + tsc + eslint pass.

Out of scope / follow-up

There's a symmetric latent issue in the @tanstack/openai-base adapters: they blind-strip all nulls (transformNullsToUndefined), so a genuinely .nullable() required field that returns null is stripped and then fails validation as missing — the inverse of the bug fixed here.

field passthrough adapters (e.g. OpenRouter) blind-strip adapters (openai-base)
optional (T | undefined) ❌ before this PR → ✅
nullable required (T | null)

Unifying that is intentionally not in this PR: the openai-base null-strip runs inside structuredOutputStream (shared by the non-streaming Promise path and the streamed path, which re-emits the adapter chunk), so fixing it cleanly means either threading the original schema down to the adapters or normalizing the streaming chunk pipeline — a larger, more invasive change. Happy to follow up with that if you'd prefer a single unified pass; undoNullWidening is exported and ready to be reused there.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Structured output validation now treats provider-emitted nulls for optional fields as absent (omitted), while preserving genuine nullable fields, preventing spurious validation errors.
  • New Features

    • Added a schema-aware null-normalization utility to strip only synthesized nulls when a schema is available, improving validation accuracy.
  • Tests

    • Added tests covering optional vs. nullable behavior, arrays, tuples, and ambiguous schema cases.

…dation

Optional fields are widened to required+nullable for strict structured output, so providers return `null` for an absent optional. Validating that `null` against the original schema failed (`.optional()` is `T | undefined`, not `T | null`), surfacing as a StandardSchemaValidationError — most visibly through @tanstack/ai-openrouter, whose adapter preserves provider nulls.

Add `undoNullWidening(value, schema)` to @tanstack/ai-utils: a schema-aware counterpart to `transformNullsToUndefined` that drops only synthesized nulls (those the original JSON Schema disallows) while preserving the ones a `.nullable()`/`.nullish()` field genuinely allows. The chat activity runs it on the structured-output result before Standard Schema validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3007340e-a5b2-4470-bc11-f1e05dce2cdd

📥 Commits

Reviewing files that changed from the base of the PR and between 16af8c9 and 3f44678.

📒 Files selected for processing (2)
  • packages/ai-utils/src/transforms.ts
  • packages/ai-utils/tests/transforms.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/ai-utils/src/transforms.ts

📝 Walkthrough

Walkthrough

This PR adds a schema-aware undoNullWidening utility to ai-utils, re-exports it, integrates it into chat structured-output validation (applies before Standard Schema parsing when outputSchema is a Standard JSON Schema), adds unit and end-to-end tests, and records the change in a changeset with dependency updates.

Changes

Structured Output Null Normalization

Layer / File(s) Summary
undoNullWidening utility implementation
packages/ai-utils/src/transforms.ts, packages/ai-utils/src/index.ts
New JsonSchemaNode type and undoNullWidening(value, schema?) function walk JSON-shaped values with a minimal schema to remove only nulls disallowed by the original schema, resolving composite schemas via anyOf/oneOf, handling tuple vs uniform arrays, and omitting keys collapsed to undefined. transformNullsToUndefined JSDoc clarified and ai-utils index re-exports updated.
ai-utils unit tests
packages/ai-utils/tests/transforms.test.ts
Tests for undoNullWidening: optional-key null removal, nullable null preservation, mixed optional/nullable, nested anyOf recursion, array item and tuple handling, ambiguous anyOf preservation, schema-absent passthrough, and unknown-property null preservation.
Chat structured-output validation integration
packages/ai/src/activities/chat/index.ts, packages/ai/package.json
Imports undoNullWidening and isStandardJSONSchema, derives a validation JSON Schema when outputSchema is a Standard JSON Schema, applies undoNullWidening(data, validationSchema) before calling parseWithStandardSchema, and adds @tanstack/ai-utils workspace dependency.
Structured output null normalization tests
packages/ai/tests/chat-structured-output-null-normalization.test.ts
End-to-end tests verify optional provider null is dropped before validation (field omitted) and .nullable() provider null is preserved (field present with null).
Changeset documentation
.changeset/structured-output-undo-null-widening.md
Records the fix for strict-mode null-widening validation errors, documents the new undoNullWidening utility, and captures version bumps for ai-utils (minor) and ai (patch).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • TanStack/ai#662: Updates convertSchemaToJsonSchema standard-schema detection logic referenced by the structured-output validation path.

Suggested reviewers

  • AlemTuzlak
  • tombeckenham

Poem

🐰 I nudged a null back to where it came from, neat —
Synth-tails removed, real nulls kept sweet.
Fields that were optional hop out of sight,
Nullable friends stay, in plain daylight.
Cheers — a tidy payload, tidy delight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding schema-aware null-widening reversal before structured-output validation to fix validation failures.
Description check ✅ Passed The description is comprehensive and covers the problem statement, root cause, fix implementation, test coverage, and out-of-scope considerations with clear explanations and code examples.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-utils/src/transforms.ts`:
- Around line 84-95: resolveSchema currently returns the first
object/array-shaped variant from schema.anyOf/schema.oneOf which can be wrong
when multiple non-null branches match; change the logic in resolveSchema to
collect all non-null matching variants (using the same shape-check: array vs
object heuristics), and only return a variant when exactly one non-null match is
found, otherwise return the original schema to avoid destructive rewrites that
strip valid null branches; reference the resolveSchema function and the local
variables variants, match, isArray when making this change.
- Around line 110-115: The array handling currently applies resolved.items[0] to
every element; change the mapping to pick the schema per index when
resolved.items is an array: compute itemSchemaForIndex =
Array.isArray(resolved.items) ? resolved.items[i] ||
resolved.items[resolved.items.length - 1] || resolved.additionalItems ||
resolved.items[0] : resolved.items, then call value.map((item, i) => walk(item,
itemSchemaForIndex)); update the block around resolveSchema, resolved, and walk
to use the index-aware schema selection so tuple-style schemas are handled
correctly.

In `@packages/ai-utils/tests/transforms.test.ts`:
- Around line 2-6: The import order violates ESLint import/order: combine the
type and value imports into a single statement or place the value imports before
the type import so types don't come first; specifically adjust the imports
involving JsonSchemaNode, transformNullsToUndefined, and undoNullWidening so
they are imported in one consolidated line or with transformNullsToUndefined and
undoNullWidening imported before the type JsonSchemaNode to satisfy the linter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd2ec226-feb0-42be-b3f6-5e7d26a784d9

📥 Commits

Reviewing files that changed from the base of the PR and between e8ce0e1 and 16af8c9.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (7)
  • .changeset/structured-output-undo-null-widening.md
  • packages/ai-utils/src/index.ts
  • packages/ai-utils/src/transforms.ts
  • packages/ai-utils/tests/transforms.test.ts
  • packages/ai/package.json
  • packages/ai/src/activities/chat/index.ts
  • packages/ai/tests/chat-structured-output-null-normalization.test.ts

Comment thread packages/ai-utils/src/transforms.ts
Comment thread packages/ai-utils/src/transforms.ts Outdated
Comment thread packages/ai-utils/tests/transforms.test.ts Outdated
…oNullWidening

Addresses CodeRabbit review on TanStack#732:

- resolveSchema now descends only when exactly one non-null anyOf/oneOf branch matches the value's shape; ambiguous unions keep the original schema rather than risk stripping a null a sibling branch allows.

- Array walking applies tuple-style `items: [a, b, …]` schemas per index instead of always using the first.

Adds coverage for both and fixes the test's import order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DrewHoo

DrewHoo commented Jun 9, 2026

Copy link
Copy Markdown
Author

Thanks @coderabbitai — all three addressed in 3f44678:

  • resolveSchema ambiguous branches: now resolves only when exactly one non-null anyOf/oneOf branch matches the value's shape; ambiguous unions keep the original schema and stop descending, rather than risk stripping a null a sibling branch allows.
  • Tuple items: array walking applies items: [a, b, …] per index instead of always using the first.
  • Test import order: fixed — used a separate top-level import type rather than the suggested inline type specifier, since the repo's import/consistent-type-specifier-style rule requires the separate form.

Added unit coverage for both the tuple and ambiguous-union cases.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@DrewHoo: Thanks for the thorough follow-up! All three fixes look well-considered — especially the conservative approach to ambiguous union resolution (keeping the original schema and stopping descent rather than guessing) which avoids incorrectly stripping nulls that a sibling branch permits. The tuple-aware index walking and the additional unit coverage for both new cases are great additions.

(ノ✧▽✧)ノ 🐇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

@tanstack/ai-openrouter: optional structured output fields are rewritten as nullable but rejected by original schema validation

1 participant