Add per-node secret rotation tracking with drift detection by lmiccini · Pull Request #1781 · openstack-k8s-operators/openstack-operator

lmiccini · 2026-01-27T16:35:07Z

Implements persistent tracking of secret versions deployed to each node
in OpenStackDataPlaneNodeSet to coordinate safe deletion of old
credentials during gradual rollouts.

Implementation:

- ConfigMap-based storage (`<nodeset-name>-secret-tracking`) records
  which secret versions are deployed to each node

- Tracks "Current" (deployed) vs "Expected" (cluster) secret states:
  - Current: Hash of secrets actually deployed to nodes
  - Expected: Hash of secrets currently in cluster
  - Drift detected when Current != Expected

- Deployment processing updates tracking data per node with secret
  hashes, skipping stale deployments (hash != cluster hash)

- Drift detection runs after each reconciliation, comparing cluster
  secrets against tracking ConfigMap, using APIReader to bypass cache

- Status field SecretDeployment reports:
  - UpdatedNodes: count of nodes on current secret versions
  - AllNodesUpdated: whether all nodes have current versions
  - ConfigMapName, TotalNodes, LastUpdateTime

- APIReader field added to reconciler to read directly from Kubernetes
  API, bypassing controller-runtime cache for accurate drift detection

This enables safe credential deletion only when all nodes across all
nodesets sharing the credentials have been updated.

softwarefactory-project-zuul · 2026-01-27T19:55:24Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/56ac80bd0e7547ad88350eb0206886b5

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 18m 47s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 38s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 37m 31s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 3h 01m 55s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 51m 23s
❌ openstack-operator-docs-preview POST_FAILURE in 2m 32s

api/core/v1beta1/openstackcontrolplane_webhook.go

softwarefactory-project-zuul · 2026-02-02T23:02:48Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/db62c9cd33b34a538c7eccf243769b6a

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 02m 26s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 20m 56s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 03s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 1h 46m 57s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 34m 08s
❌ openstack-operator-docs-preview POST_FAILURE in 3m 15s

softwarefactory-project-zuul · 2026-02-07T21:18:03Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b5d3972863e64857b2da5055f867ef55

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 20m 43s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 41s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 22s
❌ adoption-standalone-to-crc-ceph-provider FAILURE in 2h 05m 30s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 43m 01s
✔️ openstack-operator-docs-preview SUCCESS in 3m 14s

lmiccini · 2026-02-08T06:33:06Z

/retest

lmiccini · 2026-02-08T09:20:21Z

recheck

lmiccini · 2026-02-08T09:55:32Z

/test openstack-operator-build-deploy-kuttl-4-18

lmiccini · 2026-02-12T07:21:29Z

/test functional

lmiccini · 2026-02-13T06:13:20Z

/test openstack-operator-build-deploy-kuttl-4-18

lmiccini · 2026-02-13T12:54:01Z

/test openstack-operator-build-deploy-kuttl-4-18

slagle · 2026-02-17T20:31:39Z

Is preventing the deletion of in use rabbitmq users the point of this PR? Why do we need these finalizers to enable "safe rotation"?

I'm concerned about the size and complexity of this PR. Personally, this is difficult to review. We might want to come up with a simpler design that we code without AI, and then let AI build on top of that. I'm having a hard time reasoning about all the different changes here.

This also adds some service specific code to the dataplane (nova, neutron, ironic). While we have some instances of that, we have really tried to avoid that in the past, and do things generically and let CRD fields drive the generic code.

I'm just brainstorming, but a simpler solution might be:

We know the Secret/ConfigMaps in use at service deployment time.
Services have a field whose value we use to inspect the Secret/ConfigMap and we save the value found (such as transportURL) on the NodeSet or Deployment Status when the Deployment succeeds
rabbitmq user deletion checks NodeSet or Deployment Status and if it find that user in use, blocks the deletion.

For example, the nova Service has in the spec:

serviceTrackingFields:

dataSource: # ConfigMapRef or SecretRef
fieldPattern: "nova-transport-url-pattern"

Then during Service Deployment, there is similar logic to GetNovaCellRabbitMqUserFromSecret, we get the value of the user and save it on the NodeSet and/or Deployment Status. If we attempt to rotate or delete the user, and that user is still set on a Status, the operation is blocked.

I would also delay solving the problem of enforcing that all nodes in the nodeset have been updated by a Deployment. This is a wider problem that should be solved separately from the user rotation problem.

slagle

See previous comment

slagle · 2026-02-17T20:35:44Z

Or even simpler...we already have the Secret and ConfigMap hashes saved in the Deployment statuses. If the rabbitmq user rotation see that those hashes are out of date, the rotation, or at least the old user deletion part of the rotation is blocked.

lmiccini · 2026-02-18T09:04:34Z

Is preventing the deletion of in use rabbitmq users the point of this PR? Why do we need these finalizers to enable "safe rotation"?

I'm concerned about the size and complexity of this PR. Personally, this is difficult to review. We might want to come up with a simpler design that we code without AI, and then let AI build on top of that. I'm having a hard time reasoning about all the different changes here.

This also adds some service specific code to the dataplane (nova, neutron, ironic). While we have some instances of that, we have really tried to avoid that in the past, and do things generically and let CRD fields drive the generic code.

I'm just brainstorming, but a simpler solution might be:
* We know the Secret/ConfigMaps in use at service deployment time.

* Services have a field whose value we use to inspect the Secret/ConfigMap and we save the value found (such as transportURL) on the NodeSet or Deployment Status when the Deployment succeeds

* rabbitmq user deletion checks NodeSet or Deployment Status and if it find that user in use, blocks the deletion.
For example, the nova Service has in the spec:

serviceTrackingFields:
* dataSource:  # ConfigMapRef or SecretRef
  fieldPattern: "nova-transport-url-pattern"
Then during Service Deployment, there is similar logic to GetNovaCellRabbitMqUserFromSecret, we get the value of the user and save it on the NodeSet and/or Deployment Status. If we attempt to rotate or delete the user, and that user is still set on a Status, the operation is blocked.

I would also delay solving the problem of enforcing that all nodes in the nodeset have been updated by a Deployment. This is a wider problem that should be solved separately from the user rotation problem.

Thanks @slagle , appreciate you taking the time.
The logic is more or less what you are proposing here.
We add finalizers to the rabbitmq users so that each service can "signal" they are in use, and do garbage collection when no finalizer is present, following the same pattern that we use in other places, to avoid having leftover credentials that could pose a security risk.

The additional stuff "on top" is required because we could have different rabbitmq users for nova_compute, neutron and ironic agents running in the dataplane, so I try to track which node in a nodeset ran a deployment for the aforementioned services and store that in a configmap that we update until all have reconciled to the hashes that you mention in the last comment. Here how it could look like:

[zuul@localhost ~]$ oc get configmap openstack-edpm-ipam-service-tracking -o yaml
apiVersion: v1
data:
  neutron.secretHash: 6e657574726f6e2d646863702d6167656e742d6e657574726f6e2d636f6e6669673a313737303632353235383b6e657574726f6e2d7372696f762d6167656e742d6e657574726f6e2d636f6e6669673a313737303632353235383b
  neutron.updatedNodes: '[]'
  nova.secretHash: 6e6f76612d63656c6c312d636f6d707574652d636f6e6669673a313737303634333733313b
  nova.updatedNodes: '["edpm-compute-0","edpm-compute-1"]'

If I understand correctly you would like to flip this around and have infra-operator track each nodeset rabbitmq usage instead? Not sure having infra-operator introspect dataplane objects is my preferred approach, especially because we have no way of knowing if one additional service will be added tomorrow that could use rabbitmq, so we would have to play catch up with the dataplane. That said, I can try to prototype something and see how ugly it gets.
Thanks again.

stuggi · 2026-02-23T14:03:55Z

If I understand correctly you would like to flip this around and have infra-operator track each nodeset rabbitmq usage instead?

we can not do that. that would introduce a circular dependency because infra would add a dependency on the openstack-operator.

Implements persistent tracking of secret versions deployed to each node in OpenStackDataPlaneNodeSet to coordinate safe deletion of old credentials during gradual rollouts. Implementation: - ConfigMap-based storage (`<nodeset-name>-secret-tracking`) records which secret versions are deployed to each node - Tracks "Current" (deployed) vs "Expected" (cluster) secret states: - Current: Hash of secrets actually deployed to nodes - Expected: Hash of secrets currently in cluster - Drift detected when Current != Expected - Deployment processing updates tracking data per node with secret hashes, skipping stale deployments (hash != cluster hash) - Drift detection runs after each reconciliation, comparing cluster secrets against tracking ConfigMap, using APIReader to bypass cache - Status field SecretDeployment reports: - UpdatedNodes: count of nodes on current secret versions - AllNodesUpdated: whether all nodes have current versions - ConfigMapName, TotalNodes, LastUpdateTime - APIReader field added to reconciler to read directly from Kubernetes API, bypassing controller-runtime cache for accurate drift detection This enables safe credential deletion only when all nodes across all nodesets sharing the credentials have been updated. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

openshift-ci · 2026-02-23T14:10:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lmiccini
Once this PR has been reviewed and has the lgtm label, please ask for approval from slagle. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lmiccini · 2026-02-23T14:18:49Z

NodeSet: openstack-edpm-ipam with 2 nodes: compute-0, compute-1
Shared Secret: nova-cell1-compute-config (contains RabbitMQ credentials)
Initial User: user7 (hash: n5b4h...)
Rotated User: user8 (hash: n656h...)

Stage 1: Initial State - All Nodes on Old Credentials

Secret (cluster state)

apiVersion: v1
kind: Secret
metadata:
  name: nova-cell1-compute-config
  resourceVersion: "12345"
data:
  transport_url: "rabbit://user7:pass@rabbitmq:5672/"  # Old credentials

ConfigMap (tracking state)

apiVersion: v1
kind: ConfigMap
metadata:
  name: openstack-edpm-ipam-secret-tracking
data:
  nova-cell1-compute-config: |
    {
      "currentHash": "n5b4h...",
      "expectedHash": "n5b4h...",
      "nodes": {
        "compute-0": {
          "secretHash": "n5b4h...",
          "deploymentName": "edpm-deployment-initial",
          "lastUpdated": "2026-02-20T10:00:00Z"
        },
        "compute-1": {
          "secretHash": "n5b4h...",
          "deploymentName": "edpm-deployment-initial",
          "lastUpdated": "2026-02-20T10:00:00Z"
        }
      }
    }

NodeSet Status

status:
  secretDeployment:
    configMapName: openstack-edpm-ipam-secret-tracking
    totalNodes: 2
    updatedNodes: 2              # ✓ All nodes on current version
    allNodesUpdated: true         # ✓ Safe to delete old credentials (if they existed)
    lastUpdateTime: "2026-02-20T10:00:00Z"

State: All nodes running with user7, no drift, system stable.

Stage 2: Credential Rotation - Cluster Secret Changes

Administrator rotates RabbitMQ credentials by updating the openstackcontrolplane, switching cell1 to use a different user.

Secret (cluster state) - CHANGED

apiVersion: v1
kind: Secret
metadata:
  name: nova-cell1-compute-config
  resourceVersion: "67890"      # ← Changed
data:
  transport_url: "rabbit://user8:pass@rabbitmq:5672/"  # ← New credentials

ConfigMap (tracking state) - UNCHANGED

apiVersion: v1
kind: ConfigMap
metadata:
  name: openstack-edpm-ipam-secret-tracking
data:
  nova-cell1-compute-config: |
    {
      "currentHash": "n5b4h...",     # Still old hash
      "expectedHash": "n5b4h...",    # Still old hash
      "nodes": {
        "compute-0": {
          "secretHash": "n5b4h...",  # Still old hash
          "deploymentName": "edpm-deployment-initial",
          "lastUpdated": "2026-02-20T10:00:00Z"
        },
        "compute-1": {
          "secretHash": "n5b4h...",  # Still old hash
          "deploymentName": "edpm-deployment-initial",
          "lastUpdated": "2026-02-20T10:00:00Z"
        }
      }
    }

NodeSet Status - DRIFT DETECTED

status:
  secretDeployment:
    configMapName: openstack-edpm-ipam-secret-tracking
    totalNodes: 2
    updatedNodes: 0               # ← Changed: drift detected, reset to 0
    allNodesUpdated: false        # ← Changed: drift exists
    lastUpdateTime: "2026-02-23T11:36:46Z"  # ← Updated by drift detection

State: Drift detected! Nodes still have user7, but cluster expects user8.
Action Required: Deploy to update nodes.
Credential Status: ⚠️ Cannot delete user7 - nodes still using it!

Stage 3: Partial Deployment - Update compute-0 Only

Administrator creates deployment with ansibleLimit: compute-0.

Deployment

apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: edpm-deployment-c0-limit
spec:
  nodeSets:
    - openstack-edpm-ipam
  ansibleLimit: compute-0        # Only this node

After deployment completes:

Secret (cluster state) - UNCHANGED

data:
  transport_url: "rabbit://user8:pass@rabbitmq:5672/"  # Still user8

ConfigMap (tracking state) - PARTIALLY UPDATED

apiVersion: v1
kind: ConfigMap
metadata:
  name: openstack-edpm-ipam-secret-tracking
data:
  nova-cell1-compute-config: |
    {
      "currentHash": "n5b4h...",     # ← NOT updated (compute-1 still on n5b4h)
      "expectedHash": "n656h...",    # ← Updated to cluster hash
      "nodes": {
        "compute-0": {
          "secretHash": "n656h...",  # ← Updated to user8
          "deploymentName": "edpm-deployment-c0-limit",
          "lastUpdated": "2026-02-23T12:00:00Z"
        },
        "compute-1": {
          "secretHash": "n5b4h...",  # ← Still on user7
          "deploymentName": "edpm-deployment-initial",
          "lastUpdated": "2026-02-20T10:00:00Z"
        }
      }
    }

NodeSet Status - PARTIAL UPDATE

status:
  secretDeployment:
    configMapName: openstack-edpm-ipam-secret-tracking
    totalNodes: 2
    updatedNodes: 1               # Only 1 of 2 nodes updated
    allNodesUpdated: false        # ← Still false
    lastUpdateTime: "2026-02-23T12:00:00Z"

State: compute-0 now has user8, compute-1 still has user7.
Credential Status: ⚠️ CRITICAL - Cannot delete user7! compute-1 still needs it!

Stage 4: Full Deployment - Update All Remaining Nodes

Administrator deploys to all nodes (or remaining nodes).

Deployment

apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: edpm-deployment-full
spec:
  nodeSets:
    - openstack-edpm-ipam
  # No ansibleLimit - all nodes

After deployment completes:

Secret (cluster state) - UNCHANGED

data:
  transport_url: "rabbit://user8:pass@rabbitmq:5672/"  # Still user8

ConfigMap (tracking state) - FULLY UPDATED

apiVersion: v1
kind: ConfigMap
metadata:
  name: openstack-edpm-ipam-secret-tracking
data:
  nova-cell1-compute-config: |
    {
      "currentHash": "n656h...",     # ← Updated: all nodes on n656h
      "expectedHash": "n656h...",    # Matches cluster
      "nodes": {
        "compute-0": {
          "secretHash": "n656h...",  # user8
          "deploymentName": "edpm-deployment-full",
          "lastUpdated": "2026-02-23T13:00:00Z"
        },
        "compute-1": {
          "secretHash": "n656h...",  # ← Updated to user8
          "deploymentName": "edpm-deployment-full",
          "lastUpdated": "2026-02-23T13:00:00Z"
        }
      }
    }

NodeSet Status - ALL UPDATED

status:
  secretDeployment:
    configMapName: openstack-edpm-ipam-secret-tracking
    totalNodes: 2
    updatedNodes: 2               # ← All nodes updated
    allNodesUpdated: true         # ← Safe to proceed!
    lastUpdateTime: "2026-02-23T13:00:00Z"

State: All nodes now have user8, no drift.
Credential Status: ✓ SAFE - Can now delete user7 credentials!

Stage 5: Multiple NodeSets Scenario

What if multiple NodeSets share the same credentials?

Setup

NodeSet 1: openstack-edpm-compute (2 nodes: compute-0, compute-1)
NodeSet 2: openstack-edpm-storage (2 nodes: storage-0, storage-1)
Shared Secret: nova-cell1-compute-config (both use it)
Total Nodes: 4 nodes across 2 NodeSets

After Partial Deployment (compute NodeSet only)

Compute NodeSet Status

status:
  secretDeployment:
    totalNodes: 2
    updatedNodes: 2
    allNodesUpdated: true         # ✓ Compute NodeSet is done

Storage NodeSet Status

status:
  secretDeployment:
    totalNodes: 2
    updatedNodes: 0
    allNodesUpdated: false        # ✗ Storage NodeSet not updated

Credential Status: ⚠️ BLOCKED - Even though compute NodeSet shows allNodesUpdated: true, storage nodes still need user7!

After Deploying Both NodeSets

Compute NodeSet Status

status:
  secretDeployment:
    totalNodes: 2
    updatedNodes: 2
    allNodesUpdated: true         # ✓

Storage NodeSet Status

status:
  secretDeployment:
    totalNodes: 2
    updatedNodes: 2
    allNodesUpdated: true         # ✓

Credential Status: ✓ SAFE - All 4 nodes across both NodeSets updated. Now safe to delete user7!

Key Observations

currentHash vs expectedHash

currentHash: The hash of secrets actually deployed to nodes
- Only updated when ALL nodes have the same version
- Used to detect when it's safe to delete old credentials
expectedHash: The hash of secrets in the cluster (desired state)
- Always matches current cluster secret hash
- Used to detect drift

Drift Detection Logic

if currentHash != expectedHash:
    drift_detected = true
    updatedNodes = 0
    allNodesUpdated = false

Credential Deletion Safety

Old credentials can ONLY be deleted when:

ALL NodeSets sharing the secret show allNodesUpdated: true
Each NodeSet's currentHash == expectedHash
No deployments in progress

Stale Deployment Handling

If deployment edpm-deployment-old was created before rotation but completes after:

Deployment has stale secret hash (n5b4h) from when it was created
Cluster now has new secret hash (n656h)
Action: Skip this deployment entirely - don't update tracking
Reason: Prevents incorrectly marking nodes as "updated" when they got old credentials

lmiccini · 2026-02-23T14:25:24Z

This new approach can be used directly by openstack-operator to set and remove finalizers on rabbitmq users, or by infra-operator to read the nodeset status and do the finalizer management (openstack-k8s-operators/infra-operator@main...lmiccini:infra-operator:track_dataplaneusers)

openshift-ci · 2026-02-23T15:21:30Z

@lmiccini: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/openstack-operator-build-deploy-kuttl	`1698305`	link	true	`/test openstack-operator-build-deploy-kuttl`
ci/prow/precommit-check	`cd2cef1`	link	true	`/test precommit-check`
ci/prow/openstack-operator-build-deploy-kuttl-4-18	`cd2cef1`	link	true	`/test openstack-operator-build-deploy-kuttl-4-18`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

lmiccini requested review from fmount and stuggi January 27, 2026 16:35

lmiccini added do-not-merge/work-in-progress do-not-merge/hold labels Jan 27, 2026

openshift-ci bot removed the do-not-merge/work-in-progress label Jan 27, 2026

openshift-ci bot requested review from abays and dprince January 27, 2026 16:35

stuggi requested a review from slagle January 28, 2026 08:13

auniyal61 requested changes Jan 30, 2026

View reviewed changes

api/core/v1beta1/openstackcontrolplane_webhook.go Outdated Show resolved Hide resolved

openshift-ci bot assigned auniyal61 Jan 30, 2026

openshift-merge-robot added the needs-rebase label Jan 30, 2026

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 04cc55a to 1698305 Compare February 2, 2026 20:59

openshift-merge-robot removed the needs-rebase label Feb 2, 2026

SeanMooney mentioned this pull request Feb 6, 2026

Rabbitmq vhost and user support openstack-k8s-operators/nova-operator#1052

Merged

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from 3885c4a to c1fe8f8 Compare February 7, 2026 18:56

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from cbfbb7c to f52529a Compare February 8, 2026 15:01

lmiccini removed the do-not-merge/hold label Feb 10, 2026

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from f52529a to 017d2ca Compare February 10, 2026 06:41

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from 97bb482 to b1d9350 Compare February 12, 2026 15:46

slagle requested changes Feb 17, 2026

View reviewed changes

openshift-ci bot assigned slagle Feb 17, 2026

lmiccini added the do-not-merge/work-in-progress label Feb 20, 2026

lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from b1d9350 to cd2cef1 Compare February 23, 2026 14:10

openshift-ci bot removed the do-not-merge/work-in-progress label Feb 23, 2026

lmiccini changed the title ~~Nodeset rabbitmquser finalizer management and status tracking via configmap~~ Add per-node secret rotation tracking with drift detection Feb 23, 2026

lmiccini added the do-not-merge/work-in-progress label Feb 23, 2026

Comments

Conversation

lmiccini commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

softwarefactory-project-zuul bot commented Jan 27, 2026

Uh oh!

Uh oh!

softwarefactory-project-zuul bot commented Feb 2, 2026

Uh oh!

softwarefactory-project-zuul bot commented Feb 7, 2026

Uh oh!

lmiccini commented Feb 8, 2026

Uh oh!

lmiccini commented Feb 8, 2026

Uh oh!

lmiccini commented Feb 8, 2026

Uh oh!

lmiccini commented Feb 12, 2026

Uh oh!

lmiccini commented Feb 13, 2026

Uh oh!

lmiccini commented Feb 13, 2026

Uh oh!

slagle commented Feb 17, 2026

Uh oh!

slagle left a comment

Choose a reason for hiding this comment

Uh oh!

slagle commented Feb 17, 2026

Uh oh!

lmiccini commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stuggi commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Feb 23, 2026

Uh oh!

lmiccini commented Feb 23, 2026

Stage 1: Initial State - All Nodes on Old Credentials

Secret (cluster state)

ConfigMap (tracking state)

NodeSet Status

Stage 2: Credential Rotation - Cluster Secret Changes

Secret (cluster state) - CHANGED

ConfigMap (tracking state) - UNCHANGED

NodeSet Status - DRIFT DETECTED

Stage 3: Partial Deployment - Update compute-0 Only

Deployment

Secret (cluster state) - UNCHANGED

ConfigMap (tracking state) - PARTIALLY UPDATED

NodeSet Status - PARTIAL UPDATE

Stage 4: Full Deployment - Update All Remaining Nodes

Deployment

Secret (cluster state) - UNCHANGED

ConfigMap (tracking state) - FULLY UPDATED

NodeSet Status - ALL UPDATED

Stage 5: Multiple NodeSets Scenario

Setup

After Partial Deployment (compute NodeSet only)

Compute NodeSet Status

Storage NodeSet Status

After Deploying Both NodeSets

Compute NodeSet Status

Storage NodeSet Status

Key Observations

currentHash vs expectedHash

Drift Detection Logic

Credential Deletion Safety

Stale Deployment Handling

Uh oh!

lmiccini commented Feb 23, 2026

Uh oh!

openshift-ci bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

lmiccini commented Jan 27, 2026 •

edited

Loading

lmiccini commented Feb 18, 2026 •

edited

Loading

stuggi commented Feb 23, 2026 •

edited

Loading