Skip to content

Conversation

@camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Jan 11, 2026

When upgrading OLM from standard (Helm runtime) to experimental (Boxcutter runtime), the BoxcutterStorageMigrator creates a ClusterExtensionRevision from the existing Helm release. However, the migrated revision was created without status conditions, causing a race condition where it wasn't recognized as "Installed".

This fix sets an initial Succeeded status on migrated revisions, ensuring they're immediately recognized and allowing version upgrades to proceed correctly after OLM upgrades.

Fixes test-upgrade-st2ex-e2e flake failures.
Faced when start to validate the resilience of an workload when catalog is deleted.
Example: https://github.com/operator-framework/operator-controller/actions/runs/20890017311/job/60019736069

What is the problem

When we upgrade OLM itself from standard to experimental, our installed extensions get "stuck" and can't be upgraded anymore.

Real-World Scenario

What We're Doing

Day 1: You install OLM standard edition and install PostgreSQL operator v2.0.0

  • Everything works great ✅
  • Your databases are running ✅

Day 2: You want to try the new Boxcutter runtime (experimental features)

  • You upgrade OLM from standard to experimental
  • OLM upgrade completes successfully ✅
  • PostgreSQL still runs fine ✅

Day 3: PostgreSQL v2.1.0 is released with bug fixes you need

  • We try to upgrade PostgreSQL from v2.0.0 → v2.1.0
  • It FAILS
  • Error: "Cannot determine installed version"
  • You're stuck on old version with bugs ❌

What Was Happening (Before Fix)

The Migration Process

When OLM upgrades from Helm to Boxcutter:

  1. Migration starts - OLM needs to convert Helm storage to Boxcutter storage
  2. Creates ClusterExtensionRevision - Copies all your installed manifests
  3. BUT - Forgets to mark it as "successfully installed"
    4. Race condition - System checks what's installed before status is set
  4. System thinks - "Nothing is installed yet, this is still rolling out"
  5. Result - Can't compute upgrade path without knowing current version

The Timing Issue

Second 0: Migration creates revision
Second 0.1: ClusterExtension asks "What's installed?"  
Second 0.1: Answer: "Don't know, revision has no status"
Second 0.1: Upgrade attempt FAILS
Second 2.0: Background controller sets status ← TOO LATE!

What's Fixed Now

After the Fix

When OLM upgrades from Helm to Boxcutter:

  1. Migration starts - Same as before
  2. Creates ClusterExtensionRevision - Same as before
  3. ✨ NEW: Immediately marks it as succeeded - No waiting!
  4. System checks - "What's installed?"
  5. System knows - "v2.0.0 is installed and working"
  6. Result - Upgrade to v2.1.0 works perfectly!

Comparison Table

What You're Doing Before Fix After Fix
Install OLM standard + PostgreSQL v2.0.0 ✅ Works ✅ Works
Upgrade OLM standard → experimental ✅ OLM upgrades ✅ OLM upgrades
PostgreSQL still running after OLM upgrade ✅ Still runs ✅ Still runs
System knows what version is installed NO YES (v2.0.0)
Try to upgrade PostgreSQL to v2.1.0 FAILS WORKS
Error message "cannot determine installed version" None - succeeds
Your databases ⚠️ Stuck on old version ✅ Running new version
Manual work required ❌ Delete and reinstall ✅ None - automatic

When upgrading OLM from standard (Helm runtime) to experimental
(Boxcutter runtime), the BoxcutterStorageMigrator creates a
ClusterExtensionRevision from the existing Helm release. However,
the migrated revision was created without status conditions, causing
a race condition where it wasn't recognized as "Installed".

This fix sets an initial Succeeded status on migrated revisions,
ensuring they're immediately recognized and allowing version upgrades
to proceed correctly after OLM upgrades.

Fixes test-upgrade-st2ex-e2e failures.
Copilot AI review requested due to automatic review settings January 11, 2026 18:35
@netlify
Copy link

netlify bot commented Jan 11, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 3f3514f
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/6963ed6cc0819000080130d3
😎 Deploy Preview https://deploy-preview-2440--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci
Copy link

openshift-ci bot commented Jan 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign perdasilva for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@camilamacedo86 camilamacedo86 changed the title 🐛 (fix) Helm to Boxcutter migration during OLM upgrade WIP 🐛 (fix) Helm to Boxcutter migration during OLM upgrade Jan 11, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a race condition that occurs during the upgrade from standard OLM (Helm runtime) to experimental OLM (Boxcutter runtime). The issue arose because migrated ClusterExtensionRevisions were created without a Succeeded=True status condition, causing them not to be recognized as "Installed" until the ClusterExtensionRevision controller reconciled them. This timing gap led to version resolution failures during OLM upgrades.

Changes:

  • Added a new ClusterExtensionRevisionReasonMigrated constant for tracking migration status
  • Set initial Succeeded=True status condition on migrated revisions immediately after creation
  • Enhanced documentation explaining the race condition and its resolution

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
api/v1/clusterextensionrevision_types.go Added new ClusterExtensionRevisionReasonMigrated constant for status condition reasons
internal/operator-controller/applier/boxcutter.go Added status update logic to set Succeeded=True condition on migrated revisions with comprehensive documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@camilamacedo86 camilamacedo86 changed the title WIP 🐛 (fix) Helm to Boxcutter migration during OLM upgrade 🐛 (fix) Helm to Boxcutter migration during OLM upgrade Jan 11, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 11, 2026
@codecov
Copy link

codecov bot commented Jan 11, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.05%. Comparing base (9d8fda0) to head (3f3514f).

Files with missing lines Patch % Lines
internal/operator-controller/applier/boxcutter.go 77.77% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2440   +/-   ##
=======================================
  Coverage   73.05%   73.05%           
=======================================
  Files         100      100           
  Lines        7641     7650    +9     
=======================================
+ Hits         5582     5589    +7     
- Misses       1623     1624    +1     
- Partials      436      437    +1     
Flag Coverage Δ
e2e 46.70% <0.00%> (-0.07%) ⬇️
experimental-e2e 48.68% <0.00%> (-0.07%) ⬇️
unit 57.13% <77.77%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@camilamacedo86 camilamacedo86 requested review from Copilot and removed request for ankitathomas, Copilot, joelanford, pedjak and perdasilva January 12, 2026 00:23
@camilamacedo86 camilamacedo86 changed the title 🐛 (fix) Helm to Boxcutter migration during OLM upgrade WIP 🐛 (fix) Helm to Boxcutter migration during OLM upgrade Jan 12, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant