Skip to content

[manila-csi-plugin] Retry proxied probe on Unavailable#3125

Open
carterpewpew wants to merge 1 commit into
kubernetes:masterfrom
carterpewpew:fix/manila-retry-unavailable
Open

[manila-csi-plugin] Retry proxied probe on Unavailable#3125
carterpewpew wants to merge 1 commit into
kubernetes:masterfrom
carterpewpew:fix/manila-retry-unavailable

Conversation

@carterpewpew

Copy link
Copy Markdown

What this PR does / why we need it:
The Manila CSI node plugin fatally exits on startup when its proxied CSI driver socket (e.g. NFS CSI plugin) is not yet available. After a node reboot, both DaemonSets restart concurrently and the Manila driver crashes because upstream ProbeForever only retries on DeadlineExceeded, not on Unavailable (connection refused). This causes a fatal exit within ~1 second instead of retrying within the existing 15-second timeout window. This PR wraps the ProbeForever call in a retry loop that specifically handles codes.Unavailable, bounded by the existing context timeout.

Which issue this PR fixes(if applicable):
fixes #3111

Special notes for reviewers:

  1. Deploy Manila CSI with a proxied NFS driver
  2. Reboot a node so both DaemonSets restart concurrently
  3. Observe that the Manila node plugin retries instead of fatally exiting

Release note:

[manila-csi-plugin]  Fix fatal exit on startup when proxied CSI driver socket is not yet ready by retrying on transient Unavailable errors.

The Manila CSI node plugin fatally exits on startup when the proxied
CSI driver socket (e.g. NFS) is not yet available, because ProbeForever
only retries on DeadlineExceeded and immediately returns on Unavailable.

Wrap the ProbeForever call in a retry loop that retries on
codes.Unavailable within the existing 15-second context timeout, so
transient connection errors during concurrent DaemonSet restarts no
longer cause a fatal exit.

Signed-off-by: Jathavedhan M <jathavedhan.m@ibm.com>
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 8, 2026
@k8s-ci-robot k8s-ci-robot requested a review from Fedosin June 8, 2026 17:04
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gouthampacha for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from tsmetana June 8, 2026 17:04
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Welcome @carterpewpew!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 8, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @carterpewpew. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[manila-csi-plugin] Node plugin fatally exits on startup when proxied CSI driver socket is not yet ready

2 participants