Node Auto-Provisioning failing for certain GPU nodes (T4)

## How to re-create
A job that is marked as requiring `nvidia.com/gpu`, if results in a new node being spun up in GKE, will _fail to be scheduled_ on that node.


## Why is this bad
- Using GPU nodes with Node-Auto-Provisioning in GKE is **broken** (at least for T4s, not sure which other GPU types are affected)
- It feels strange that such a core "elasticity behavior" is unacknowledged -- hoping this issue gets attention and results in at least an ETA for the fix

## Details on error

The provisioned node has a `nvidia-device-plugin` pod
This pod has a `nvidia-driver-installer` container which is an `init` container
This container is stuck on startup

```
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 100   720  100   720    0     0   113k      0 --:--:-- --:--:-- --:--:--  117k
GPU driver auto installation is disabled.
Waiting for GPU driver libraries to be available.
```

As a result, the kubelet never registers the `nvidia.com/gpu` resource, which means that the job (which triggered the node in the first place!) can't get its pods scheduled on it.


## Prior context:
This is based off the following issue, which is _no longer fixed_ (but which I cannot reopen)

https://github.com/GoogleCloudPlatform/container-engine-accelerators/issues/356



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node Auto-Provisioning failing for certain GPU nodes (T4) #402

How to re-create

Why is this bad

Details on error

Prior context:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Node Auto-Provisioning failing for certain GPU nodes (T4) #402

Description

How to re-create

Why is this bad

Details on error

Prior context:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions