-
Notifications
You must be signed in to change notification settings - Fork 11
fix race conditions on stage/unstage volume #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
4f1856c to
39b17bf
Compare
udevadm when called concurrently may potentially redo symlinks. Adding a lock, so if multiple pods attach at the same time, we don't call udevadm concurrently. Adjusts the order so we first check if the device is available, before doing udevadm calls. Adjust order of udevadm calls itself: first trigger, then settle. Reason: - settle waits for new events to finish - trigger triggers new events but doesn't wait for them Improve logging and increase the amount of logs (in debug level) to potentially debug issues with more data.
- add volume locks to avoid concurrent operations on the same volume - add detection if symlinks don't add up to the right volume - increase logging once more - test: add concurrency/race condition tests - fix ptr/nonptr receiver - add lint target and fix lint errors
9952c20 to
cbcd901
Compare
|
workflow |
| GOBIN=$(shell go env GOPATH)/bin | ||
| GOLANGCI_LINT_VER=v2.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Consider installing from a tool.mod file, similar to the approach used in CCM:
I generally prefer to avoid curl | bash. Let me know your thoughts on this approach!
| | 1.28 | v3.3.0 | v3.5.6 | | ||
| | 1.29 | v3.3.0 | v3.5.6 | | ||
| | 1.30 | v3.3.0 | v3.5.6 | | ||
| | 1.24 | v3.1.0 | v3.5.7-rc2 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we must be extra careful that all these entries are correct, when releasing:
- update lines for the latests kubernetes versions
- min version 1.28 according to CHANGELOG.md
- be sure that the final version does not mention any rc versions.
| ## v3.5.7-rc2 - 2025.12.23 | ||
| * Minimum supported Kubernetes version is now 1.28. | ||
| * add volume locks to avoid concurrent operations on the same volume | ||
| * add detection if symlinks don't add up to the right volume | ||
| * increase logging once more | ||
|
|
||
| ## v3.5.7-rc1 - 2025.12.17 | ||
| * Adjust udevadm calls & reorder them, and put them behind a mutex to avoid a potential race condition | ||
| * Improve logging by adding debug logs | ||
| * Add `--log-level=(trace|debug|info|warn|error)` flag to customize log level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what do you think; is it better to mention both rcs in the final version, or should we create a new single entry for v3.5.7?
| }).Info("successfully found attached volume_id at device_path") | ||
|
|
||
| // Debug logging to help diagnose potential race conditions with concurrent volume mounts | ||
| resolvedSource, resolveErr := filepath.EvalSymlinks(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: can we wrap EvalSymlinks into a log level check?
e.g.: if d.log.Logger.IsLevelEnabled(logrus.DebugLevel).
| } | ||
|
|
||
| // Resolve source symlink for debug logging | ||
| resolvedSource, resolveErr := filepath.EvalSymlinks(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as in the other file.
This fixes a couple of race conditions with node stage/unstage etc. calls.
adjust udevadm calls
udevadm when called concurrently may potentially redo symlinks.
Adding a lock, so if multiple pods attach at the same time, we don't
call udevadm concurrently.
Adjusts the order so we first check if the device is available, before
doing udevadm calls.
Adjust order of udevadm calls itself: first trigger, then settle.
Reason:
improve node stage volume mounting
lint issues