Skip to content

[BUG] Confusing machine names in Jenkins CI #3780

@crypdick

Description

@crypdick

Add Link

My PR for a new tutorial: #3763

Describe the bug

THe Jenkins CI has a few hard-coded behaviors that are confusing. The linux.16xlarge.nvidia.gpu label in Jenkins appears to be a legacy label that means "needs multi-GPU". It doesn't describe the actual hardware that is used.

get_files_to_run.py has a hard-coded check for this exact key, which routes it to shard 0. Shard 0 maps to WORKER_ID=1 in the matrix (also confusing), and then shard 1's runner is a linux.g5.12xlarge.nvidia.gpu, not a 16xl.

Reporting this since it resulted in some confusion while getting my CI for my PR to pass. If it were me, the key should be '4-gpu' instead of linux.16xlarge.nvidia.gpu.

Describe your environment

PyTorch tutorial CI

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions