-
Notifications
You must be signed in to change notification settings - Fork 16
New workaround for PSM3 issues causing test failures in 2025.06 #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/generic |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/generic |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/generic |
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws for:arch=x86_64/generic |
|
New job on instance
|
casparvl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Also clear that it shouldn't reintroduce our prior issue. Approved!
See EESSI/software-layer#1288: test step is failing for all x86_64 targets. I could sort of reproduce it interactively, but with a different error, so I'm going to debug it a bit more here.
edit: okay, just to make it easier to find the conclusions, I'm putting the conclusions here:
The ReFrame logs of the failed tests done by the bot contained PSM3 timeout messages like:
Initially I could not reproduce that with a manually submitted Slurm job on the AWS cluster, though I did get this:
which I could solve by setting
export PRTE_MCA_rmaps_default_mapping_policy=:oversubscribe.Then I noticed that
test_suite.shhas a workaround that setshttps://github.com/EESSI/software-layer-scripts/blob/main/test_suite.sh#L174, seesoftware-layer-scripts/test_suite.sh
Line 174 in d341916
FI_PROVIDER="^psm3"completely disables the PSM3 provider, and that solves the issue.