Skip to content

Force transition instances in inconsistent state#618

Merged
gabriel-samfira merged 1 commit intocloudbase:mainfrom
gabriel-samfira:account-for-recovery-case
Feb 11, 2026
Merged

Force transition instances in inconsistent state#618
gabriel-samfira merged 1 commit intocloudbase:mainfrom
gabriel-samfira:account-for-recovery-case

Conversation

@gabriel-samfira
Copy link
Member

@gabriel-samfira gabriel-samfira commented Feb 11, 2026

If GARM is killed or restarted while creating a runner, there is a chance that runners remain in creating or deleting state. We've started checking state transitions in GARM and allow a transition when the new state makes sense in normal circumstances. However, when recovering from a crash, we may be in an inconsisten state from which we need to recover.

This change added a ForceUpdateInstance() function that ignores state transition inconsistencies. For now, we only use it when spinning up a scale set and check for instance states.

This change also fixes a locking issue.

Fixes: #610
Includes: #598

If GARM is killed or restarted while creating a runner, there is a chance
that runners remain in creating or deleting state. We've started checking
state transitions in GARM and allow a transition when the new state makes
sense in normal circumstances. However, when recovering from a crash, we
may be in an inconsisten state from which we need to recover.

This change added a ForceUpdateInstance() function that ignores state
transition inconsistencies. For now, we only use it when spinning up a
scale set and check for instance states.

This change also fixes a locking issue.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
@gabriel-samfira gabriel-samfira merged commit c7d34e0 into cloudbase:main Feb 11, 2026
4 checks passed
@gabriel-samfira gabriel-samfira deleted the account-for-recovery-case branch February 11, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New "Agent mode" code broke scaleset initialization

1 participant