Skip to content

fix: restore lenient variable coercion for built-in scalars in GraphQL runtime#655

Open
vsseixaso wants to merge 1 commit into
masterfrom
fix/lenient-graphql-variable-coercion
Open

fix: restore lenient variable coercion for built-in scalars in GraphQL runtime#655
vsseixaso wants to merge 1 commit into
masterfrom
fix/lenient-graphql-variable-coercion

Conversation

@vsseixaso
Copy link
Copy Markdown
Contributor

Summary

Restore the pre-graphql@14.5 lenient coercion of GraphQL variables for the built-in Int, Float, and String scalars in the runtime that ships with @vtex/api. Apps migrating from node@4.x (which used @vtex/api@^3 with graphql@^0.13.2) to node@7.x (@vtex/api@^7 with graphql@^14.5.8) started seeing GraphQL validation errors for requests that previously worked, even though no resolver, schema, or client code had changed.

This patch is a runtime-only fix in node-vtex-api — no schema changes, no app changes, and no API surface changes.

Problem

graphql-js 14.5+ made coerceInt, coerceFloat, and coerceString strict. Before 14.5, those scalars used Number(value) and String(value) and accepted these implicit conversions; in 14.5+ they throw immediately, e.g.:

Variable "$categoryId" got invalid value "20"; Expected type Int.
Int cannot represent non-integer value: "20"

Variable "$items" got invalid value 1 at "items[0].quantity"; Expected type String.
String cannot represent a non string value: 1

Variable "$geoCoordinates" got invalid value 12.4964 at "geoCoordinates[1]"; Expected type String.
String cannot represent a non string value: 12.4964

Variable "$postalCode" got invalid value 1000; Expected type String.
String cannot represent a non string value: 1000

These errors come from coerceVariableValues deep inside graphql-js, called from this very runtime's run.ts middleware. They reproduce reliably by hitting an app's internal __graphql route directly (i.e. without going through vtex.graphql-server, which already does its own pre-coercion before proxying).

The same requests succeeded under @vtex/api@^3 because graphql@0.13.2's coerceInt did Number(value) on a string and coerceString did String(value) on a number. The migration from node@4.xnode@7.x mandates @vtex/api@^7, which transitively bumps graphql-js to ^14.5.8, surfacing this regression for every app on node@7.x whose clients send numeric strings for Int (or vice versa).

How the fix works

A new pure utility coerceVariableValuesLenient(schema, document, variables, operationName) is invoked from the existing run middleware before execute():

  1. Locates the OperationDefinitionNode for the request (resolves operationName when there are multiple operations, no-ops on ambiguity).
  2. For each VariableDefinitionNode, resolves the AST type to a GraphQLInputType via typeFromAST.
  3. Recursively walks NonNull, List, and InputObject wrappers, applying lenient coercion at the leaves only when the target is a built-in scalar:
    • Int ← string of a finite, in-range integer; boolean
    • Float ← string of a finite number; boolean
    • String ← number, boolean
    • ID ← number → string
  4. Returns a new object — the caller's input is never mutated.

Custom scalars and enums are left untouched. Values that are not safely convertible (e.g. "abc" for Int, "20.5" for Int, arrays for scalars, objects for String, integers outside Int32 range) are passed through unchanged so that graphql-js still emits its native error. We do not silently mask invalid client input.

The whole pre-processor is wrapped in a try/catch that falls back to the original variables on any internal failure, so the worst case is identical to today's behavior.

Disabling

Set the env var to opt out without redeploying:

VTEX_API_DISABLE_LENIENT_VARIABLE_COERCION=true

Files changed

File Type Purpose
src/service/worker/runtime/graphql/utils/coerceVariablesLenient.ts new Pre-processor implementation
src/service/worker/runtime/graphql/utils/coerceVariablesLenient.test.ts new 37 unit tests
src/service/worker/runtime/graphql/middlewares/run.ts modified Calls the pre-processor before execute()
CHANGELOG.md modified [Unreleased] entry

Compatibility & safety

  • No schema or API change. GraphQLInt, GraphQLString, etc. continue to be the standard built-ins; resolvers receive the same JS types as before (number for Int, string for String).
  • No global mutation. Nothing on the graphql-js singletons or the schema instance is patched. The coercion is local to the request pipeline.
  • No new dependency. Uses only public graphql-js exports (typeFromAST, isScalarType, isListType, isNonNullType, isInputObjectType).
  • Resolver contract preserved. Coerced values are still produced by graphql-js's own coerceVariableValues after our pass; we only feed it a value of the expected JS type when a safe conversion exists.

Test plan

  • Unit tests for every error pattern observed in production logs ($categoryId: Int = "20", ShippingItem.quantity = 1, ShippingItem.seller = 1, ShippingItem sent as a single object instead of a list, geoCoordinates: [String] sent as [Float], postalCode: String = 1000, pageSize: Int = "2").
  • Unit tests confirming no coercion when the value is already correctly typed.
  • Unit tests confirming values that should still raise the standard graphql-js error ("abc", "", "20.5", arrays, objects, ints outside Int32 range) are passed through untouched.
  • Unit tests for [Int!]!, nested InputObjects, unknown fields preserved.
  • Unit tests for boolean/number ↔ scalar conversions matching graphql 0.13 parity.
  • Unit tests for operation selection (operationName disambiguation, single-operation default, multi-operation no-op).
  • Unit tests for the VTEX_API_DISABLE_LENIENT_VARIABLE_COERCION feature flag.
  • End-to-end tests with a real execute() call confirming the production error disappears with the fix and reappears without it.
  • Full pre-existing test suite remains green (170 tests pass; the 4 failing suites are pre-existing TypeScript compilation errors in unrelated LRUCache/axios/metrics files — the same 4 fail on master without this change).
  • No new TS or TSlint errors in the modified files.

Rollout plan

  1. Land this patch and cut a 7.3.2 release.
  2. Apps on node@7.x pick it up via yarn upgrade @vtex/api@^7.3.2.
  3. Optional: short canary on a beta workspace before promoting to stable globally.
  4. Rollback path: ship VTEX_API_DISABLE_LENIENT_VARIABLE_COERCION=true to the affected runner — instant, no redeploy. Or revert the commit and ship 7.3.3.

Made with Cursor

…L runtime

graphql-js 14.5+ (bundled with @vtex/api since v6) made variable coercion
strict for the built-in `Int`, `Float`, and `String` scalars: `Int.parseValue`
rejects strings, `Float.parseValue` rejects strings, and `String.parseValue`
rejects numbers. This broke apps migrating from `node@4.x` (which used
`@vtex/api@^3` with graphql 0.13, where these conversions were implicit) to
`node@7.x` — well-formed clients started receiving errors like:

    Variable "$categoryId" got invalid value "20"; Expected type Int.
    Int cannot represent non-integer value: "20"

Pre-coerce variable values against the operation's variable definitions
before calling `execute()`, replicating the lenient conversions of
graphql-js 0.13 (string of finite integer → Int, number → String, etc.).
The traversal walks NonNull, List, and InputObject wrappers so nested
fields are coerced as well. Custom scalars and enums are left untouched.

Values that cannot be safely converted (e.g. `"abc"` for Int, floats with
fractional parts for Int, arrays for scalars) are passed through unchanged,
so graphql-js still produces the same native error.

Disable with `VTEX_API_DISABLE_LENIENT_VARIABLE_COERCION=true`.

Co-authored-by: Cursor <cursoragent@cursor.com>
@vsseixaso vsseixaso requested a review from mendescamara May 5, 2026 20:28
@vsseixaso
Copy link
Copy Markdown
Contributor Author

vsseixaso commented May 5, 2026

@mendescamara dropped an interesting suggestion offline that we're evaluating as an alternative to (or in addition to) the pre-processor in this PR. Posting it here so the analysis lives next to the code and we can decide together.

The alternative: schema-side parseValue patch

Override the scalar's parseValue directly on the schema instance, in src/service/worker/runtime/graphql/schema/index.ts, right after makeExecutableSchema:

const IntType = schema.getType('Int') as GraphQLScalarType
IntType.parseValue = (value) => {
  const parsed = parseInt(value, 10)
  if (isNaN(parsed)) throw new Error(`Not a valid integer: ${value}`)
  return parsed
}

It's much smaller, idiomatic to graphql-js, and would also cover parseLiteral (inline literals in the query string) if extended.

Things to validate before going down this path

A couple of subtleties surfaced while looking at the implementation. Sharing them so we can decide whether they're acceptable, mitigable, or actual blockers.

1. schema.getType('Int') returns the global singleton, not a per-schema clone.

Confirmed by reading node_modules/graphql/type/scalars.js:

var GraphQLInt = new _definition.GraphQLScalarType({ name: 'Int', ... })
exports.GraphQLInt = GraphQLInt

schema.getType('Int') is the same reference as require('graphql').GraphQLInt. Mutating it would leak to every other consumer of graphql in the same node process — other schemas, federated schemas, request validators, introspection tooling, third-party packages, tests of unrelated suites, etc. As far as I could see, scoping the change to a single schema with graphql-tools@4 would mean poking schema._typeMap (private API, fragile across graphql-js versions). Worth checking if there's a cleaner hook I missed.

2. parseInt(value, 10) doesn't match graphql 0.13's coercion in a few edge cases.

The example uses parseInt, but graphql 0.13 used Number(value) + an integer check. The two differ in ways that matter:

Input parseInt(value, 10) graphql 0.13 (Number + integer check) What we want
"20.5" 20 (silently truncates) throws throws
"20abc" 20 (stops at non-digit) throws throws
"3e2" 3 (not 300) 300 300
true NaN 1 1
[20] 20 (Array.toString) throws throws

The first row is the one that worries me most — silently truncating "20.5" to 20 would mask a real client bug instead of surfacing it. Using Number(value) + Number.isInteger() + Int32 range check would address this and match 0.13.x semantics exactly. Easy to fix in either approach.

Where the schema patch would clearly be the better tool

If we ever decide to also relax inline literals (category(id: 20.5) written directly in the query string, no variables), parseLiteral is the right hook and the schema-side approach is the natural fit. That's a separate, narrower scope — the prod logs don't show any such cases today, so it's not blocking this PR, but it's a good reason to keep the schema-side approach on the table.

Suggested next step

Happy to take the schema-side route if we can find a way to scope the mutation per schema (or convince ourselves the global mutation is acceptable). Until then, the pre-processor in this PR is a conservative way to stop the bleeding without touching shared state. WDYT?

rootValue: null,
schema,
variableValues,
variableValues: coercedVariables,
Copy link
Copy Markdown

@mendescamara mendescamara May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vamos precisar testar, mas acho que esse approach não vai contemplar os casos de variáveis não literais inline, quando o valor é informado diretamente na Query sem usar o parâmetro variables 🤔 (ex. user(id: "123"))

Diferente de quando passa por variável:

// A query permanece estática (não muda)
const GET_USER = `
  query GetUser($userId: ID!) {
    user(id: $userId) {
      name
    }
  }
`;

// Os valores são passados separadamente
const variables = { userId: "123" };

client.query({ query: GET_USER, variables });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um outro approach seria fazer o Override dos Scalars Nativos via Schema Transform

Após makeExecutableSchema em schema/index.ts, manipular o _typeMap da schema para substituir os scalars built-in

Copy link
Copy Markdown
Contributor Author

@vsseixaso vsseixaso May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validei aqui e inline literal não é coberto mesmo. E também o inline já era rejeitado no graphql 0.13, peguei os erros do log de prod e todos passam as variáveis pelo objeto variables, nenhum usa inline literal. Então o escopo atual resolve os casos reais, sem regressão.

Fiz um teste mandando uma query com inline literal pra mostrar pra identificar a diferença na mensagem de erro:

{
  "errors": [{
    "message": "Expected type Int, found \"20\".",
    "stack": "GraphQLError: Expected type Int, found \"20\".\n    at isValidScalar (.../ValuesOfCorrectType.js:159)\n    at validate (.../validate.js:73)\n    at parseAndValidateQueryToSchema (.../query.js:28)"
  }]
}

Erro quando vem com Variable:

{
  "errors": [{
    "message": "Variable \"$categoryId\" got invalid value \"20\"; Expected type Int. Int cannot represent non-integer value: \"20\"",
    "stack": "TypeError: Int cannot represent non-integer value: \"20\"\n    at GraphQLScalarType.coerceInt [as parseValue] (.../scalars.js:55)\n    at coerceInputValueImpl (.../coerceInputValue.js:127)\n    at coerceVariableValues (.../values.js:119)\n    at runHttpQuery (.../run.js:9)"
  }]
}

No caso, a decisão fica sendo sobre querer suportar algo além do graphql 0.13 ou se fazer "Override dos Scalars Nativos via Schema Transform" teria algum ganho adicional

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top! Então na versão antiga (graphql 0.13) já tinha essa proteção para o inline?

To super confortável de seguir com esse approach 👊🏻

Copy link
Copy Markdown

@mendescamara mendescamara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants