[DEBUG] Trace replay event log and step/hook/sleep assignments#2127
Draft
TooTallNate wants to merge 4 commits into
Draft
[DEBUG] Trace replay event log and step/hook/sleep assignments#2127TooTallNate wants to merge 4 commits into
TooTallNate wants to merge 4 commits into
Conversation
Temporary diagnostic instrumentation for investigating intermittent CorruptedEventLogError 'step consumer mismatch' failures. Emits console.log lines tagged 'WF_TRACE' at four points: - runWorkflow start: dumps the full event array the replay will consume (eventIds, types, correlationIds, stepNames) plus a sha256 digest - step/hook/sleep subscribe: per-replay correlationId -> name assignment - step consumer mismatch: structured record of the failure including the event index in the SDK's view of the log - runWorkflow end: completed | failed | suspended Used to diff successive replays of the same runId and confirm whether the SDK actually sees the same event array each time.
|
Contributor
Contributor
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (2 failed)express (1 failed):
nextjs-webpack (1 failed):
🌍 Community Worlds (69 failed)mongodb-dev (1 failed):
redis-dev (1 failed):
turso-dev (1 failed):
turso (66 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
4 tasks
Set WORKFLOW_SERVER_URL_OVERRIDE to https://workflow-server-7pxaxn4d4.vercel.sh to validate Pranay's monotonic-append fix (workflow-server#456) against the hook/sleep stress repro. If the preview server correctly reorders events so eventIds reflect commit order (instead of letting a slow hook_received commit with an early eventId behind a wait_completed that committed first), the corrupted-event-log failures should disappear without any SDK-side fencing changes.
The set-bypass-cookie flow triggers a 307 redirect-and-set-cookie response intended for browsers. Node's fetch in the SDK doesn't follow the cookie dance, so it loops on the 307. Vercel's docs are explicit that for API-client usage the bare x-vercel-protection-bypass header should authenticate each request directly, without set-bypass-cookie.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Temporary diagnostic instrumentation to investigate intermittent
CorruptedEventLogError"step consumer mismatch" failures. Not for merging. Branched offstableso the CI tarball can be deployed against the repro app.What it does
Emits
console.loglines taggedWF_TRACEat four points so we can diff successive replays of the samerunId:runWorkflowstart (packages/core/src/workflow.ts) — dumps the full event array the replay will consume:eventId,eventType,correlationId,eventData.stepName/resumeAt, plus a sha256 digest for quick equality checks.step/hook/sleepsubscribe (packages/core/src/step.ts,workflow/hook.ts,workflow/sleep.ts) — per-replay assignment ofcorrelationId→stepName/token/resumeAt, with a monotonic per-invocationseq.packages/core/src/step.ts) — structured record of the failure including the offending event's index in the SDK's view of the log.runWorkflowend —completed|failed|suspended.All logging routed through a single throwaway helper
packages/core/src/__debug-replay-trace.tsso the diff is easy to revert.How to use
Trigger the repro app, grep Vercel runtime logs for
WF_TRACE, then group lines byrunId+invto compare what each replay invocation saw. Ifdigestdiffers between tworeplay_startlines for the samerunId, the event array is unstable across replays — root cause is on the server side. Ifdigestmatches but the step subscribe seq+stepName mapping differs, the SDK is the source of non-determinism.Notes
@workflow/corepass).stable, still 22 after).