Skip to content

Conversation

@dprince
Copy link
Contributor

@dprince dprince commented Nov 14, 2025

Rescaffold the nova-operator to operator-sdk 1.41.1, which includes:

  • Reorganize project structure (pkg/ -> internal/)
  • Move webhook implementations to internal/webhook/v1beta1/
  • Add new cmd/main.go entrypoint with updated controller initialization
  • Update RBAC, certmanager, and prometheus configurations
  • Enhance network policies for metrics and webhook traffic
  • Remove auto-generated test suite scaffolding
  • Update build workflow and Dockerfile to version 1.41.1

This upgrade modernizes the operator structure and aligns with the latest operator-sdk best practices.

Jira: OSPRH-21969

Depends-On: openstack-k8s-operators/openstack-operator#1683

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9a74a948246442dfb17dea450b4c74a2

openstack-meta-content-provider FAILURE in 11m 42s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

@dprince dprince force-pushed the operator_sdk_1.41.1 branch from 76db895 to 41a2f63 Compare November 14, 2025 21:42
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b77ba055c14a4fb896e67692835bbe63

openstack-meta-content-provider FAILURE in 16m 35s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

@danpawlik
Copy link
Contributor

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8f758e3f635f4b039bd378d654dbf317

openstack-meta-content-provider FAILURE in 14m 51s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

@danpawlik
Copy link
Contributor

recheck

@danpawlik
Copy link
Contributor

earlier hold node, setting project_layout to v4 seems to help:

-    operators.operatorframework.io/project_layout: go.kubebuilder.io/v3
+    operators.operatorframework.io/project_layout: go.kubebuilder.io/v4

but that should not be necessary, due openstack-k8s-operators/openstack-operator#1683 contains that.

@softwarefactory-project-zuul
Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1683 is needed.

@danpawlik
Copy link
Contributor

Updated first comment Depends-On

@danpawlik
Copy link
Contributor

recheck

@danpawlik
Copy link
Contributor

Wondering if error in job nova-operator-tempest-multinode:

2025-11-17 08:42:59.440354 | controller |         File "/tmp/ansible_kubernetes.core.k8s_payload_9gcprbir/ansible_kubernetes.core.k8s_payload.zip/ansible_collections/kubernetes/core/plugins/module_utils/k8s/service.py", line 201, in retrieve
2025-11-17 08:42:59.440360 | controller |       ansible_collections.kubernetes.core.plugins.module_utils.k8s.exceptions.CoreException: Failed to retrieve requested object: HTTPSConnectionPool(host='api.crc.testing', port=6443): Max retries exceeded with url: /api/v1/namespaces/openstack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f5889f7e6d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-11-17 08:42:59.440366 | controller |

is related to recent changes in crc-cloud: crc-org/crc-cloud#209
Done same command 2 minutes after fail and cluster seems to be up and ready...

Let's try with recheck, if that would be flaky, will do another PR with retry + delay.

So far, I think operator-sdk is in correct version.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ab36e589caeb4c0fb83160c69b89aa41

✔️ openstack-meta-content-provider SUCCESS in 3h 05m 42s
nova-operator-kuttl FAILURE in 38m 34s
nova-operator-tempest-multinode FAILURE in 21m 51s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 49m 27s

@dprince
Copy link
Contributor Author

dprince commented Nov 17, 2025

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/62cf6212155e4c6ea9419d73dafdb617

✔️ openstack-meta-content-provider SUCCESS in 2h 43m 47s
nova-operator-kuttl FAILURE in 39m 20s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 26m 09s
nova-operator-tempest-multinode-ceph FAILURE in 21m 29s

@dprince
Copy link
Contributor Author

dprince commented Nov 17, 2025

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/02a0c3ee95164448bb616b28a857ab3e

✔️ openstack-meta-content-provider SUCCESS in 3h 17m 11s
nova-operator-kuttl FAILURE in 39m 25s
nova-operator-tempest-multinode FAILURE in 21m 57s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 55m 25s

Rescaffold the nova-operator to operator-sdk 1.41.1, which includes:
 - Reorganize project structure (pkg/ -> internal/)
 - Move webhook implementations to internal/webhook/v1beta1/
 - Add new cmd/main.go entrypoint with updated controller initialization
 - Update RBAC, certmanager, and prometheus configurations
 - Enhance network policies for metrics and webhook traffic
 - Remove auto-generated test suite scaffolding
 - Update build workflow and Dockerfile to version 1.41.1

This upgrade modernizes the operator structure and aligns with the latest
operator-sdk best practices.

Jira: OSPRH-21969

Depends-On: openstack-k8s-operators/openstack-operator#1683
@dprince dprince force-pushed the operator_sdk_1.41.1 branch from 41a2f63 to 8e33039 Compare November 18, 2025 12:21
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4c902cf5b693468f98c0c6515fb3a2be

✔️ openstack-meta-content-provider SUCCESS in 3h 18m 57s
nova-operator-kuttl FAILURE in 38m 57s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 25m 35s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 56m 56s

@dprince
Copy link
Contributor Author

dprince commented Nov 18, 2025

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/40ac4540071444dd8ba36feae8cc9791

✔️ openstack-meta-content-provider SUCCESS in 2h 56m 34s
nova-operator-kuttl FAILURE in 32m 26s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 20m 37s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 39m 05s

Explicitly to delete any running nova-operator deployments from openstack-operator here as
label selectors can change and installing a service catalog/index like this alongside
openstack-operator (what CI appears to do?) is not recommended unless
the initialization resource controller in openstack-operator is paused
and existing deployments are cleaned properly
var tlsOpts []func(*tls.Config)
flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
"Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pprofBindAddress is missing from current implementation. lets add this when we bump to the new infra-op version

Comment on lines +451 to +454
# explicitly to delete any running nova-operator deployments from openstack-operator here as
# label selectors can change and installing a service catalog/index like this alongside
# openstack-operator (what CI appears to do?) is not recommended
oc delete deployment nova-operator-controller-manager -n openstack-operators --ignore-not-found=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context mentioned in the comment, wouldn't the openstack-operator-controller-operator pod just recreate the Deployment again right after we delete it here? I wonder if we need to use the OpenStack interface to drop the replicas for the Nova operator to 0 [1]?

[1] https://github.com/openstack-k8s-operators/openstack-operator/blob/17b1faec894dfcad58164b52f38cf6acda76f9dc/api/operator/v1beta1/openstack_types.go#L223

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no because in CI they are setting the initialization openstack-operator-controller-operator replicas to 0

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprince, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 5c47a41 into openstack-k8s-operators:main Nov 25, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants