-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Open
Labels
bugbuild automationbuild issueIssues relating to the tutorials buildIssues relating to the tutorials build
Description
Add Link
My PR for a new tutorial: #3763
Describe the bug
THe Jenkins CI has a few hard-coded behaviors that are confusing. The linux.16xlarge.nvidia.gpu label in Jenkins appears to be a legacy label that means "needs multi-GPU". It doesn't describe the actual hardware that is used.
get_files_to_run.py has a hard-coded check for this exact key, which routes it to shard 0. Shard 0 maps to WORKER_ID=1 in the matrix (also confusing), and then shard 1's runner is a linux.g5.12xlarge.nvidia.gpu, not a 16xl.
Reporting this since it resulted in some confusion while getting my CI for my PR to pass. If it were me, the key should be '4-gpu' instead of linux.16xlarge.nvidia.gpu.
Describe your environment
PyTorch tutorial CI
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugbuild automationbuild issueIssues relating to the tutorials buildIssues relating to the tutorials build