Skip to content

Conversation

@mweibel
Copy link
Collaborator

@mweibel mweibel commented Dec 16, 2025

This fixes a couple of race conditions with node stage/unstage etc. calls.

adjust udevadm calls
udevadm when called concurrently may potentially redo symlinks.
Adding a lock, so if multiple pods attach at the same time, we don't
call udevadm concurrently.
Adjusts the order so we first check if the device is available, before
doing udevadm calls.
Adjust order of udevadm calls itself: first trigger, then settle.

Reason:

  • settle waits for new events to finish
  • trigger triggers new events but doesn't wait for them

improve node stage volume mounting

  • add volume locks to avoid concurrent operations on the same volume
  • add detection if symlinks don't add up to the right volume
  • increase logging
  • test: add concurrency/race condition tests

lint issues

  • fix ptr/nonptr receiver
  • add lint target and fix lint errors

@mweibel mweibel requested review from alakae and disperate December 16, 2025 16:43
@mweibel mweibel force-pushed the release/3.5.7-prerelease branch 6 times, most recently from 4f1856c to 39b17bf Compare December 23, 2025 13:38
udevadm when called concurrently may potentially redo symlinks.
Adding a lock, so if multiple pods attach at the same time, we don't
call udevadm concurrently.
Adjusts the order so we first check if the device is available, before
doing udevadm calls.
Adjust order of udevadm calls itself: first trigger, then settle.

Reason:
- settle waits for new events to finish
- trigger triggers new events but doesn't wait for them

Improve logging and increase the amount of logs (in debug level) to
potentially debug issues with more data.
- add volume locks to avoid concurrent operations on the same volume
- add detection if symlinks don't add up to the right volume
- increase logging once more
- test: add concurrency/race condition tests
- fix ptr/nonptr receiver
- add lint target and fix lint errors
@mweibel mweibel force-pushed the release/3.5.7-prerelease branch from 9952c20 to cbcd901 Compare January 12, 2026 07:52
@mweibel mweibel changed the title 3.5.7-prerelease fix race conditions on stage/unstage volume Jan 12, 2026
@mweibel mweibel marked this pull request as ready for review January 12, 2026 08:02
@mweibel
Copy link
Collaborator Author

mweibel commented Jan 12, 2026

workflow release charts failed due to me rebasing/squashing commits.

Comment on lines +98 to +99
GOBIN=$(shell go env GOPATH)/bin
GOLANGCI_LINT_VER=v2.8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider installing from a tool.mod file, similar to the approach used in CCM:

https://github.com/cloudscale-ch/cloudscale-cloud-controller-manager/blob/66c1b44edb6e69cb4ae1ac79d0f501fd9e6f192a/Makefile#L8

I generally prefer to avoid curl | bash. Let me know your thoughts on this approach!

| 1.28 | v3.3.0 | v3.5.6 |
| 1.29 | v3.3.0 | v3.5.6 |
| 1.30 | v3.3.0 | v3.5.6 |
| 1.24 | v3.1.0 | v3.5.7-rc2 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must be extra careful that all these entries are correct, when releasing:

  • update lines for the latests kubernetes versions
  • min version 1.28 according to CHANGELOG.md
  • be sure that the final version does not mention any rc versions.

Comment on lines +3 to +12
## v3.5.7-rc2 - 2025.12.23
* Minimum supported Kubernetes version is now 1.28.
* add volume locks to avoid concurrent operations on the same volume
* add detection if symlinks don't add up to the right volume
* increase logging once more

## v3.5.7-rc1 - 2025.12.17
* Adjust udevadm calls & reorder them, and put them behind a mutex to avoid a potential race condition
* Improve logging by adding debug logs
* Add `--log-level=(trace|debug|info|warn|error)` flag to customize log level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what do you think; is it better to mention both rcs in the final version, or should we create a new single entry for v3.5.7?

}).Info("successfully found attached volume_id at device_path")

// Debug logging to help diagnose potential race conditions with concurrent volume mounts
resolvedSource, resolveErr := filepath.EvalSymlinks(source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: can we wrap EvalSymlinks into a log level check?

e.g.: if d.log.Logger.IsLevelEnabled(logrus.DebugLevel).

}

// Resolve source symlink for debug logging
resolvedSource, resolveErr := filepath.EvalSymlinks(source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in the other file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants