fix(core): resolve forwarded stream keys across deployments#2191
fix(core): resolve forwarded stream keys across deployments#2191pranaygp wants to merge 1 commit into
Conversation
🦋 Changeset detectedLatest commit: bf73d05 The changes in this PR will be included in the next version bump. This PR includes changesets to release 20 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (12 failed)express (1 failed):
fastify (3 failed):
hono (2 failed):
nextjs-turbopack (1 failed):
nuxt (3 failed):
sveltekit (1 failed):
vite (1 failed):
📦 Local Production (1 failed)sveltekit-stable (1 failed):
📋 Other (2 failed)e2e-vercel-prod-tanstack-start (2 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
❌ 📦 Local Production
✅ 🐘 Local Postgres
❌ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
TooTallNate
left a comment
There was a problem hiding this comment.
Approve — solid fix, with one small consistency concern worth thinking about
The cross-deployment encryption bug is real and the fix is well-targeted: add deploymentId to the writable stream descriptor wire format, stamp it via STREAM_SERVER_DEPLOYMENT_ID_SYMBOL on writables that have it, and use it in getEncryptionKeyForRun(runId, { deploymentId }) instead of the assumed-current-deployment fallback. Legacy descriptors without a deploymentId fall back to loading the owning run via world.runs.get(runId) — that's correctness-preserving for in-flight serialized payloads.
Verified
- After rebasing onto current
main: 12 files, +278/-22 (the scary "removals" ofexperimentalSetAttributesfrominterfaces.tsand the encryption error handling inserialization.tsare stale-branch artifacts from being forked before #2134, #2157, and #2145 landed) pnpm install --frozen-lockfile✓pnpm turbo run build --filter @workflow/core✓pnpm --filter @workflow/core test✓ (1083/1083)- The two new serialization tests both pass:
- Happy path — forwarded writable with
deploymentId→ usesgetEncryptionKeyForRun(runId, { deploymentId })and does NOT callruns.get - Legacy fallback — descriptor without
deploymentId→ callsruns.get(runId)first, thengetEncryptionKeyForRun(run)✓
- Happy path — forwarded writable with
CI noise
14 failures on the latest run, but they're all cascading from swc-plugin-workflow WASM build failing with undefined symbol: __emit_diagnostics. That's the bug fixed by #2174 (5dabbeeca fix(swc-plugin): allow wasm host imports during link) which landed on main after this branch was forked. A rebase would resolve them all.
One thing worth thinking about — step-handler.ts uses process.env.VERCEL_DEPLOYMENT_ID
The two workflowDeploymentId sites are:
| Site | Source |
|---|---|
runtime.ts (inline step execution) |
workflowRun.deploymentId / bgRun.deploymentId — the workflow's ACTUAL deployment ✓ |
step-handler.ts (background step execution) |
process.env.VERCEL_DEPLOYMENT_ID — the CURRENT runtime deployment |
For background steps, these are only the same if the queue routes steps to the workflow's original deployment. If queues route steps to a NEWER deployment, process.env.VERCEL_DEPLOYMENT_ID is the newer one, not the workflow's.
Reading the world-vercel getEncryptionKeyForRun, the existing background-step encryption key resolution ALREADY assumes current-deployment (it calls memoizeEncryptionKey(world, workflowRunId) with no context, which then uses local HKDF when running inside Vercel). So this PR's choice in step-handler.ts is at least consistent with how step encryption already works — if the existing background-step encryption is correct (i.e., queue routes to original deployment, or some other invariant), then the new workflowDeploymentId source is also correct.
If the existing encryption has a latent bug for background steps running on newer deployments, this PR carries that forward — but doesn't make it worse, and the parent/child cross-deployment case (which is what the PR is actually trying to fix) IS handled correctly via the inline runtime.ts paths using workflowRun.deploymentId.
Worth confirming the invariant: do background step queues always route to the workflow's original deployment, or can they route to newer deployments? If the latter, there's a follow-up to also resolve the workflow's actual deployment in step-handler.ts (probably load the run first, similar to the legacy fallback in this PR's getForwardedWritableEncryptionKey).
Not a blocker — the PR's scope is the parent/child writable case, and that's handled correctly.
Bonus: the test design caught a subtle correctness point
The legacy-fallback test asserts the right behavior when descriptors are mid-flight from older SDK versions:
expect(runsGet).toHaveBeenCalledWith('wrun_parent');
expect(getEncryptionKeyForRun).toHaveBeenCalledWith(parentRun);That confirms the fallback uses the full WorkflowRun overload (which extracts deploymentId from run.deploymentId), not the broken (runId, undefined-context) overload. Good test instinct.
PR status
Currently draft. Approving in spirit pending:
- Rebase onto current
main(resolves the stale-branch CI cascades + the deletions-in-diff confusion) - Optional: thread an explicit
workflowDeploymentIdthrough the background step path if cross-deployment step routing is actually possible
Approving.
Summary
Root Cause
Cross-run writable revivers only carried the parent
runIdand calledgetEncryptionKeyForRun(targetRunId). In@workflow/world-vercel, a string lookup without deployment context resolves against the executing deployment, so a child started on a newer deployment encrypted new chunks using its key while the parent stream is read using the older run key.Context
This upstreams the behavior temporarily patched downstream in https://github.com/vercel/ash/pull/852, while avoiding its extra owner-run lookup for newly serialized descriptors. Legacy or in-flight descriptors retain a correctness fallback.
Validation
pnpm turbo build --filter=@workflow/core...pnpm --filter @workflow/core typecheckpnpm --filter @workflow/core exec vitest run src/serialization.test.ts src/step/writable-stream.test.tspnpm --filter @workflow/core testpnpm biome format <changed files>pnpm changeset status --since=main