Skip to content

Comments

[multiple] Co-locate provisionserver with metal3 to prevent DHCP failures#3691

Open
mnietoji wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
mnietoji:dhcp_provisioning
Open

[multiple] Co-locate provisionserver with metal3 to prevent DHCP failures#3691
mnietoji wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
mnietoji:dhcp_provisioning

Conversation

@mnietoji
Copy link
Contributor

@mnietoji mnietoji commented Feb 17, 2026

[multiple] Co-locate provisionserver with metal3 to prevent DHCP failures

When metal3-dnsmasq pod restarts during a node's DHCP lease renewal on the
provisioning network (172.23.0.0/24), NetworkManager fails to renew and sets
ipv4.method=disabled. NMState operator then preserves this disabled state,
causing permanent loss of provisioning network connectivity on that node.

The issue occurs when OpenStackProvisionServer and metal3 pods run on
different nodes. If metal3 restarts while a node is attempting DHCP renewal,
the temporary unavailability of metal3-dnsmasq causes the renewal to fail.

Solution:
Automatically detect the node running metal3 pod (via k8s-app=metal3 label)
and configure provisionServerNodeSelector in baremetalSetTemplate to schedule
OpenStackProvisionServer on the same node. This ensures provisioning network
connectivity is maintained because metal3-static-ip-manager maintains a static
IP (172.23.0.3) on the metal3 node regardless of dnsmasq restarts.

Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c72a795e19ee4ef1994a72979d3b02ba

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 53m 36s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 11s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 40m 42s
✔️ cifmw-pod-zuul-files SUCCESS in 5m 20s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 52s
cifmw-pod-pre-commit FAILURE in 8m 40s
✔️ cifmw-molecule-devscripts SUCCESS in 10m 12s

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign michburk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/419cbfb4c38c4c90a492120cc24c67ca

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 55m 45s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 22m 44s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 42m 54s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 43s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 00s
cifmw-pod-pre-commit FAILURE in 8m 25s
✔️ cifmw-molecule-devscripts SUCCESS in 10m 26s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/bb09852a1750455a9f54d8a9a5c3f190

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 48m 51s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 17m 15s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 35m 25s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 54s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 00s
cifmw-pod-pre-commit FAILURE in 7m 52s
✔️ cifmw-molecule-devscripts SUCCESS in 10m 17s

Copy link
Contributor

@danpawlik danpawlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for PR LGTM.
Could you share testproject result via DM ?

@mnietoji mnietoji force-pushed the dhcp_provisioning branch 3 times, most recently from b378912 to 9a374c6 Compare February 22, 2026 21:45
@mnietoji mnietoji changed the title Fix provisioning network DHCP timeout race condition [multiple] Co-locate provisionserver with metal3 to prevent DHCP failures Feb 22, 2026
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/48d00299c0da4681a65b751e604fc62b

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 22m 33s
podified-multinode-edpm-deployment-crc FAILURE in 23m 41s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 30m 42s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 2h 09m 04s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 59s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 30s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 34s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 27s
✔️ cifmw-architecture-validate-hci SUCCESS in 3m 54s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 07s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 04s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/cb65dea83fcb4bd4924a69fc8156e64d

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 27m 43s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 51s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 23m 57s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 2h 14m 58s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 52s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 36s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 35s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 28s
✔️ cifmw-architecture-validate-hci SUCCESS in 4m 02s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 12s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 18s

@mnietoji mnietoji enabled auto-merge (rebase) February 23, 2026 22:15
@mnietoji
Copy link
Contributor Author

recheck

…ures

When metal3-dnsmasq pod restarts during a node's DHCP lease renewal on the
provisioning network (172.23.0.0/24), NetworkManager fails to renew and sets
ipv4.method=disabled. NMState operator then preserves this disabled state,
causing permanent loss of provisioning network connectivity on that node.

The issue occurs when OpenStackProvisionServer and metal3 pods run on
different nodes. If metal3 restarts while a node is attempting DHCP renewal,
the temporary unavailability of metal3-dnsmasq causes the renewal to fail.

Solution:
Automatically detect the node running metal3 pod (via k8s-app=metal3 label)
and configure provisionServerNodeSelector in baremetalSetTemplate to schedule
OpenStackProvisionServer on the same node. This ensures provisioning network
connectivity is maintained because metal3-static-ip-manager maintains a static
IP (172.23.0.3) on the metal3 node regardless of dnsmasq restarts.

Signed-off-by: Miguel Angel Nieto Jimenez <mnietoji@redhat.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e0bffd51141e4e21b058b228aef2fff8

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 29m 41s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 08s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 34m 54s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 2h 16m 36s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 50s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 40s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 54s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 44s
✔️ cifmw-architecture-validate-hci SUCCESS in 4m 30s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 38s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 39s

@amartyasinha
Copy link
Contributor

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c69c9689e2e04aa0a1ceaaf172e60e01

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 36m 21s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 25s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 31m 37s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 2h 22m 35s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 25s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 35s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 46s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 51s
✔️ cifmw-architecture-validate-hci SUCCESS in 4m 14s
cifmw-molecule-ci_gen_kustomize_values TIMED_OUT in 30m 45s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 22s

@danpawlik
Copy link
Contributor

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/480004750ed042aba015d0bf8ab41162

✔️ openstack-k8s-operators-content-provider SUCCESS in 28m 15s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 11m 20s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 14m 34s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 14m 24s
✔️ cifmw-pod-zuul-files SUCCESS in 6m 11s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 10m 20s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 6m 37s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 36s
✔️ cifmw-architecture-validate-hci SUCCESS in 5m 23s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 6m 15s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 24s

@amartyasinha
Copy link
Contributor

recheck

Copy link
Contributor

@amartyasinha amartyasinha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@danpawlik
Copy link
Contributor

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/268e95b083da44ca8380b7b24f71af4a

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 36m 11s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 02s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 14m 29s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 14m 47s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 58s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 30s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 5m 06s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 24s
✔️ cifmw-architecture-validate-hci SUCCESS in 4m 16s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 49s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 24s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants