Skip to content

fix: deduplicate state during V1-to-V2 schedule migration#9725

Merged
chaptersix merged 4 commits intotemporalio:mainfrom
chaptersix:fix/schedule-migration-dedup-recent-actions
Apr 6, 2026
Merged

fix: deduplicate state during V1-to-V2 schedule migration#9725
chaptersix merged 4 commits intotemporalio:mainfrom
chaptersix:fix/schedule-migration-dedup-recent-actions

Conversation

@chaptersix
Copy link
Copy Markdown
Contributor

@chaptersix chaptersix commented Mar 27, 2026

Summary

Fixes two bugs in the V1-to-V2 (CHASM) schedule migration that produce corrupt RecentActions state:

  1. Duplicate RecentActions entries. V1's recordAction puts the same workflow in both RunningWorkflows and RecentActions. The migration converted both lists independently into BufferedStart entries and concatenated them, creating two entries for the same workflow execution.

  2. Identical RequestIds for concurrent running workflows. convertRunningWorkflowsToBufferedStarts generated the same deterministic RequestId for every running workflow (all inputs were identical). With ALLOW_ALL overlap policy, multiple workflows can be running at migration time

Changes

  • convertRecentActionsToBufferedStarts now filters out entries whose RunId matches a RunningWorkflows entry, since those are already converted by convertRunningWorkflowsToBufferedStarts.
  • convertRunningWorkflowsToBufferedStarts now includes the RunId in the backfillID tag parameter to GenerateRequestID, ensuring each running workflow gets a unique RequestId.

V1's recordAction puts the same workflow in both RunningWorkflows and
RecentActions. When migrating to CHASM, both lists were independently
converted to BufferedStarts and concatenated, creating duplicate entries
for the same workflow execution with different RequestIds.

The running entry would later get updated via Nexus callback (COMPLETED),
but the stale duplicate from RecentActions retained its original RUNNING
status, causing the UI to show the same workflow twice with conflicting
statuses.

Filter out RecentActions entries whose RunId matches a RunningWorkflows
entry during V1-to-V2 conversion, since those are already handled by
convertRunningWorkflowsToBufferedStarts.
When multiple workflows are running concurrently (ALLOW_ALL overlap
policy), convertRunningWorkflowsToBufferedStarts generated identical
RequestIds for all of them because the only differentiating inputs
(nominal/actual time) were both set to migrationTime.

Include the RunId in the backfillID tag parameter so each running
workflow gets a distinct RequestId. Without this, recordCompletedAction
would match the first BufferedStart for every completion callback,
misattributing completions.
@chaptersix chaptersix marked this pull request as ready for review March 27, 2026 18:28
@chaptersix chaptersix requested review from a team as code owners March 27, 2026 18:28
@chaptersix chaptersix changed the title fix: deduplicate RecentActions during V1-to-V2 schedule migration fix: deduplicate state during V1-to-V2 schedule migration Mar 27, 2026
@chaptersix chaptersix marked this pull request as draft March 27, 2026 18:35
@chaptersix chaptersix marked this pull request as ready for review March 27, 2026 20:38
CreateSchedule with EnableChasm=true writes a CHASM sentinel that
blocks the subsequent migration activity with a NotFound error due to
a cache invalidation race in the CHASM engine. Enable CHASM only after
the V1 schedule is created and the running workflow is recorded.
@chaptersix chaptersix merged commit 960a728 into temporalio:main Apr 6, 2026
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants