-
Notifications
You must be signed in to change notification settings - Fork 1.9k
DRA GA, admin access, priority lists #103484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DRA GA, admin access, priority lists #103484
Conversation
|
🤖 Tue Dec 16 16:18:01 - Prow CI generated the docs preview: |
|
@tkashem @sairameshv PTAL |
| -- | ||
| where: | ||
|
|
||
| `spec.devices.requests.name`:: Configures a resource request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `spec.devices.requests.name`:: Configures a resource request. | |
| `spec.devices.requests`:: Specifies requests for devices |
|
|
||
| `spec.devices.requests.name`:: Configures a resource request. | ||
|
|
||
| `spec.devices.requests.name.firstAvailable/exactly:`:: Specifies whether all of the requested devices must be available. This value must be `exactly` or `firstAvailable`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `spec.devices.requests.name.firstAvailable/exactly:`:: Specifies whether all of the requested devices must be available. This value must be `exactly` or `firstAvailable`. | |
| `spec.devices.requests.firstAvailable/exactly:`:: `exactly` specifies the details for a single request that must be met exactly for the request to be satisfied. `firstAvailable` contains subrequests, of which exactly one will be selected by the scheduler. So if there are two entries in the list, the scheduler will only check the second one if it determines that the first one can not be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if there are two entries in the list
The upstream docs seem to suggests that there can be only two subrequests. Is this true?
With the prioritized list feature, a second alternative can be specified,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be a list of subrequests i.e. more than 2 subrequests
|
|
||
| `spec.devices.requests.deviceClassName`:: Specifies which device class to use with this request. | ||
|
|
||
| `spec.devices.requests.selectors`:: Specifies uses CEL expressions to request devices in the specified device class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `spec.devices.requests.deviceClassName`:: Specifies which device class to use with this request. | |
| `spec.devices.requests.selectors`:: Specifies uses CEL expressions to request devices in the specified device class. |
deviceClassName is used inside a subrequest i.e. spec.devices.requests.exactly.deviceClassName or spec.devices.requests.firstAvailable.deviceClassName and the selectors are inherited by the subrequest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sairameshv I am suggesting here that the requested device must be in the device class in the subrequest. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion is correct but the reference/syntax is incorrect
| Admins and operators can create a _resource claim_ to request a GPU from a specific device class. A resource claim differs from a resource claim template by allowing you to share GPUs with multiple pods. Also, resource claims are not deleted when a requesting pod is terminated. | ||
| + | ||
| The following example resource claim template uses CEL expressions to request specific devices in the `example-device-class` device class that are of a specific size. | ||
| The following example resource claim uses CEL expressions to request specific devices in the `example-device-class` device class that are of a specific size. Because the `exactly` parameter is included in the request, all of the devices must be available before the scheduler can create the requesting pod. Alternatively, you can specify `firstAvailable` instead of `exactly` to create a prioritized list of devices, in case one of the requested devices is not available. For more information, see _Priority lists_ in this section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar suggestion as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of the devices must be available before the scheduler can create the requesting pod
Is this true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true in case of using exactly but my point is it is a pre-requisite and these lines seem to emphasize more on the pre-req conditions rather tha on why/what the field signifies. I may be wrong
| -- | ||
| where: | ||
|
|
||
| `spec.devices.requests.name.exactly.adminAccess:true`:: Specifies that the admin access mode is enabled for the specified device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `spec.devices.requests.name.exactly.adminAccess:true`:: Specifies that the admin access mode is enabled for the specified device. | |
| `spec.devices.requests.exactly.adminAccess:true`:: Specifies that the admin access mode is enabled for the specified device. |
| Priority lists:: | ||
| In both resource claim templates and resource claims, you can either require all requested devices be available before a pod can be scheduled, or create a prioritized list of two resources in case the first resource is not available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Priority lists:: | |
| In both resource claim templates and resource claims, you can either require all requested devices be available before a pod can be scheduled, or create a prioritized list of two resources in case the first resource is not available. | |
| Prioritized list:: | |
| A prioritized list of subrequests for requests in resource claim templates or resource claims can be provided. This allows users to specify alternative devices that can be used by the workload if the primary choice is not available. |
| Priority lists:: | ||
| In both resource claim templates and resource claims, you can either require all requested devices be available before a pod can be scheduled, or create a prioritized list of two resources in case the first resource is not available. | ||
| + | ||
| In the `ResourceClaimTemplate` or `ResourceClaim` object, you must specify `exactly` to require all devices to be available or `firstAvailable` to create a priority list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should focus on the functionality of the options exactly, firstAvailable rather than insisting on the availability of the devices when these options are used.
|
|
||
| `spec.spec.devices.requests`:: Configures a resource request. | ||
|
|
||
| `spec.spec.devices.requests.firstAvailable:`:: Specifies whether all of the requested devices must be available. In this example, `firstAvailable`. specifies two sub-requests, named: `2g-10gb` and `3g-20gb`. Alternatively, you can specify `exactly` to request one specific device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifies whether all of the requested devices must be available
The user requests for a device like a gpu via a resourceclaim or a resourceclaim template and the option to request a device is via spec.spec.devices.requests.firstAvailable or spec.spec.devices.requests.exactly. The former option would help scheduler in picking/selecting the first available device from the list of requests. So, I think this explanation seems to be not matching the expectation
| + | ||
| * `spec.devices.requests.firstAvailable` specifies multiple requests for a device, of which only one device needs to be available before the scheduler can create the requesting pod. The scheduler checks the availability of the devices in the order listed and selects the first available device. The scheduler can create the pod if one requested devices is available. | ||
|
|
||
| * `spec.devices.requests.exactly` specifies one or more requests for a device, for which all of the devices must be available before the scheduler can create the requesting pod. Each of the devices must the request exactly for the request to be satisfied. If any of the requested devices is not available, the scheduler cannot create the pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be incorrect. Based on the upstream doc reference, Exactly specifies the details for a single request that must be met exactly for the request to be satisfied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sairameshv Thank you. Not sure where I picked up that idea that you can make multiple requests. I agree, it seems incorrect.
Do we support devices.requests.exactly.allocationMode and devices.requests.exactly.count? Perhaps that is what confused me. If so, and I request exactly 3 of the devices, and all 3 devices are not available, what happens?
|
/retest |
|
@asahay19 Can you PTAL? |
|
/retest |
|
@mburke5678: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Promoting the DRA/Allocating GPUs feature to GA.
Added information on admin access and priority lists.
Updated the API from resource.k8s.io/v1beta1 to resource.k8s.io/v1
Added that exactly or firstAvailable is required in the resource claim or template.
Removed TP references.
OSDOCS-17427
OSDOCS-17430
OSDOCS-17431
Link to docs preview:
QE review: