Skip to content

[cinder-csi-plugin] Wait for volume availability before attach#3124

Open
hemna wants to merge 1 commit into
kubernetes:masterfrom
hemna:fix/wait-volume-available-before-attach
Open

[cinder-csi-plugin] Wait for volume availability before attach#3124
hemna wants to merge 1 commit into
kubernetes:masterfrom
hemna:fix/wait-volume-available-before-attach

Conversation

@hemna

@hemna hemna commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

ControllerPublishVolume now waits for the volume to reach available or in-use status before calling the Cinder attachment API.

Previously, if the CO called ControllerPublishVolume immediately after CreateVolume, the volume could still be in creating state on the backend. This caused Cinder to reject the attachment with a 409 Conflict:

Volume X status must be available or downloading to reserve, but the current status is creating.

This forced the CO (external-attacher) to retry blindly, generating unnecessary API calls to Cinder and delaying volume attachment. The issue is most pronounced with storage backends where volume creation is asynchronous and takes several seconds (e.g., when volumes are backed by network-attached storage).

How the fix works:

A new WaitVolumeTargetStatusWithContext method uses wait.PollUntilContextCancel which:

  1. Polls the volume status every 3 seconds
  2. Returns immediately if the volume is already available or in-use
  3. Returns an error if the volume enters an error state
  4. Respects the gRPC context deadline (no fixed step count — bounded by the CO's RPC timeout)

If the context expires before the volume is ready, the driver returns FAILED_PRECONDITION which tells the CO to retry with exponential backoff.

Which issue this PR fixes(if applicable):
fixes #

Special notes for reviewers:

The volumeReadyPollInterval is set to 3 seconds, which balances responsiveness against API load on the Cinder service. The total wait time is bounded by the gRPC context deadline (typically 15-30s depending on the external-attacher configuration), not a fixed backoff.

The existing WaitVolumeTargetStatus (used in ControllerExpandVolume) is left unchanged to avoid disrupting existing behavior.

Release note:

[cinder-csi-plugin] ControllerPublishVolume now waits for volumes to become available before calling the Cinder attach API, eliminating 409 Conflict errors when volumes are still being provisioned.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 5, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @hemna. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested review from stephenfin and zetaab June 5, 2026 14:38
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 5, 2026
@hemna hemna force-pushed the fix/wait-volume-available-before-attach branch from 4e4519a to 35031dc Compare June 5, 2026 14:45
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2026
@hemna hemna force-pushed the fix/wait-volume-available-before-attach branch from 35031dc to 4396d1c Compare June 6, 2026 20:58
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2026
@stephenfin

Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 8, 2026
@hemna

hemna commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@hemna

hemna commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest openstack-cloud-csi-cinder-sanity-test

@hemna

hemna commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

The CI failure seems unrelated to my patch. k8s is killing the CI job.

@hemna

hemna commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@hemna

hemna commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@kayrus kayrus left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR, see my notes

Comment thread pkg/csi/cinder/controllerserver.go Outdated

// volumeReadyPollInterval is the interval at which the volume status is
// polled when waiting for a volume to become available before attaching.
volumeReadyPollInterval = 3 * time.Second

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to make this variable configurable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added volume-status-poll-delay (in seconds) to BlockStorageOpts, defaulting to 3 seconds via VolumeStatusPollInterval().

// context with a deadline (e.g., gRPC request context) and wants the wait
// to be bounded by that deadline rather than a fixed number of steps.
// The interval parameter controls how often the volume status is polled.
func (os *OpenStack) WaitVolumeTargetStatusWithContext(ctx context.Context, volumeID string, tStatus []string, interval time.Duration) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method can be extended to return volume object, or even better, adjust the GetInstanceByID method to wait for the volume status. Currently the logic calls WaitVolumeTargetStatusWithContext, then GetInstanceByID, which does not make sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted to only available is accepted for non-multiattach volumes. If a non-multiattach volume is already in-use, it fails immediately rather than waiting. Multiattach volumes accept both available and in-use.

Comment thread pkg/csi/cinder/controllerserver.go Outdated
// "in-use" for multiattach) state. Without this wait, the driver would
// immediately hit a 409 Conflict if CreateVolume has just been called and
// the volume is still being provisioned on the backend.
targetStatus := []string{openstack.VolumeAvailableStatus, openstack.VolumeInUseStatus}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these target statuses should depend on multiattach feature turned on or off. in some cases, when cinder has issues and the volume is already in use, the CSI plugin will try to attach an in-use volume and get 409.

@hemna hemna force-pushed the fix/wait-volume-available-before-attach branch from d77b6c3 to 2516855 Compare June 9, 2026 12:34
Comment thread pkg/csi/cinder/controllerserver.go Outdated
return false, fmt.Errorf("volume %s is already in-use and does not support multiattach", volumeID)
}

for _, eState := range openstack.VolumeErrorStates {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to use the https://pkg.go.dev/slices#Contains

Comment thread pkg/csi/cinder/controllerserver.go Outdated
Comment on lines +389 to +397
if vol.Multiattach && vol.Status == openstack.VolumeInUseStatus {
return true, nil
}

// If a non-multiattach volume is already in-use, it cannot be
// attached again - fail immediately rather than waiting.
if !vol.Multiattach && vol.Status == openstack.VolumeInUseStatus {
return false, fmt.Errorf("volume %s is already in-use and does not support multiattach", volumeID)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach would be cleaner

Suggested change
if vol.Multiattach && vol.Status == openstack.VolumeInUseStatus {
return true, nil
}
// If a non-multiattach volume is already in-use, it cannot be
// attached again - fail immediately rather than waiting.
if !vol.Multiattach && vol.Status == openstack.VolumeInUseStatus {
return false, fmt.Errorf("volume %s is already in-use and does not support multiattach", volumeID)
}
if vol.Status == openstack.VolumeInUseStatus {
if vol.Multiattach {
return true, nil
}
// If a non-multiattach volume is already in-use, it cannot be
// attached again - fail immediately rather than waiting
return false, fmt.Errorf("volume %s is already in-use and does not support multiattach", volumeID)
}


_ = volume // volume is available for future use (e.g. multiattach validation)

_, err = cloud.GetInstanceByID(ctx, instanceID)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volumes.Get API call still happens here in GetInstanceByID after the waitForVolumeAttachable. Does it make sense to merge these two methods?

}
}
for _, eState := range volumeErrorStates {
for _, eState := range VolumeErrorStates {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to use the https://pkg.go.dev/slices#Contains

Comment thread pkg/csi/cinder/openstack/openstack.go Outdated
RescanOnResize bool `gcfg:"rescan-on-resize"`
IgnoreVolumeAZ bool `gcfg:"ignore-volume-az"`
IgnoreVolumeMicroversion bool `gcfg:"ignore-volume-microversion"`
VolumeStatusPollDelay int `gcfg:"volume-status-poll-delay"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation must be updated accordingly with a supported time format example

@hemna hemna force-pushed the fix/wait-volume-available-before-attach branch from 2516855 to 075b611 Compare June 9, 2026 14:55
@hemna

hemna commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

volumes.Get API call still happens here in GetInstanceByID after the waitForVolumeAttachable. Does it make sense to merge these two methods?

GetInstanceByID does not call volumes.Get — it calls servers.Get (Nova compute API) to verify the target instance exists. These are checking two different resources: the volume readiness vs. the compute node existence.

The actual redundant volumes.Get call is inside AttachVolume (openstack_volumes.go line 209), which re-fetches the volume to check if it's already attached (idempotency) and to read the Multiattach flag for compute API microversion selection. Eliminating that redundancy would require changing AttachVolume's signature to accept a *volumes.Volume parameter, which changes the IOpenStack interface and all its implementations (mock, fakecloud, etc.) — that seems out of scope for this bugfix PR but could be a follow-up optimization.

ControllerPublishVolume now waits for the volume to reach a state
valid for attachment before calling the Cinder attachment API.

Previously, if the CO called ControllerPublishVolume immediately after
CreateVolume, the volume could still be in 'creating' state on the
backend. This caused Cinder to reject the attachment with a 409
Conflict, forcing the CO to retry blindly.

The new waitForVolumeAttachable helper:
- Polls the volume status using wait.PollUntilContextCancel, bounded
  by the gRPC context deadline (no fixed step count).
- Returns immediately if the volume is already 'available'.
- For multiattach volumes, also accepts 'in-use' as a valid state.
- For non-multiattach volumes, fails immediately if the volume is
  already 'in-use' (cannot attach again).
- Fails immediately if the volume enters an error state.
- Returns the volume object, eliminating the redundant GetVolume call.
- Returns FAILED_PRECONDITION on timeout, signaling the CO to retry
  with exponential backoff.

The poll interval defaults to 3 seconds and is configurable via the
'volume-status-poll-delay' option in the [BlockStorage] config section.
@hemna hemna force-pushed the fix/wait-volume-available-before-attach branch from 075b611 to f413b34 Compare June 13, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants