Skip to content

Conversation

@mburke5678
Copy link
Contributor

@mburke5678 mburke5678 commented Dec 5, 2025

Promoting the DRA/Allocating GPUs feature to GA.

Added information on admin access and priority lists.
Updated the API from resource.k8s.io/v1beta1 to resource.k8s.io/v1
Added that exactly or firstAvailable is required in the resource claim or template.
Removed TP references.

OSDOCS-17427
OSDOCS-17430
OSDOCS-17431

Link to docs preview:

QE review:

  • QE has approved this change.

@mburke5678 mburke5678 added this to the Planned for 4.21 GA milestone Dec 5, 2025
@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 5, 2025
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Dec 5, 2025

🤖 Tue Dec 16 16:18:01 - Prow CI generated the docs preview:

https://103484--ocpdocs-pr.netlify.app/openshift-enterprise/latest/nodes/pods/nodes-pods-allocate-dra.html

@mburke5678
Copy link
Contributor Author

@tkashem @sairameshv PTAL

--
where:

`spec.devices.requests.name`:: Configures a resource request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`spec.devices.requests.name`:: Configures a resource request.
`spec.devices.requests`:: Specifies requests for devices


`spec.devices.requests.name`:: Configures a resource request.

`spec.devices.requests.name.firstAvailable/exactly:`:: Specifies whether all of the requested devices must be available. This value must be `exactly` or `firstAvailable`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`spec.devices.requests.name.firstAvailable/exactly:`:: Specifies whether all of the requested devices must be available. This value must be `exactly` or `firstAvailable`.
`spec.devices.requests.firstAvailable/exactly:`:: `exactly` specifies the details for a single request that must be met exactly for the request to be satisfied. `firstAvailable` contains subrequests, of which exactly one will be selected by the scheduler. So if there are two entries in the list, the scheduler will only check the second one if it determines that the first one can not be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sairameshv

So if there are two entries in the list

The upstream docs seem to suggests that there can be only two subrequests. Is this true?

With the prioritized list feature, a second alternative can be specified,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be a list of subrequests i.e. more than 2 subrequests

Comment on lines 79 to 82

`spec.devices.requests.deviceClassName`:: Specifies which device class to use with this request.

`spec.devices.requests.selectors`:: Specifies uses CEL expressions to request devices in the specified device class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`spec.devices.requests.deviceClassName`:: Specifies which device class to use with this request.
`spec.devices.requests.selectors`:: Specifies uses CEL expressions to request devices in the specified device class.

deviceClassName is used inside a subrequest i.e. spec.devices.requests.exactly.deviceClassName or spec.devices.requests.firstAvailable.deviceClassName and the selectors are inherited by the subrequest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sairameshv I am suggesting here that the requested device must be in the device class in the subrequest. Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is correct but the reference/syntax is incorrect

Admins and operators can create a _resource claim_ to request a GPU from a specific device class. A resource claim differs from a resource claim template by allowing you to share GPUs with multiple pods. Also, resource claims are not deleted when a requesting pod is terminated.
+
The following example resource claim template uses CEL expressions to request specific devices in the `example-device-class` device class that are of a specific size.
The following example resource claim uses CEL expressions to request specific devices in the `example-device-class` device class that are of a specific size. Because the `exactly` parameter is included in the request, all of the devices must be available before the scheduler can create the requesting pod. Alternatively, you can specify `firstAvailable` instead of `exactly` to create a prioritized list of devices, in case one of the requested devices is not available. For more information, see _Priority lists_ in this section.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sairameshv

all of the devices must be available before the scheduler can create the requesting pod

Is this true?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true in case of using exactly but my point is it is a pre-requisite and these lines seem to emphasize more on the pre-req conditions rather tha on why/what the field signifies. I may be wrong

--
where:

`spec.devices.requests.name.exactly.adminAccess:true`:: Specifies that the admin access mode is enabled for the specified device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`spec.devices.requests.name.exactly.adminAccess:true`:: Specifies that the admin access mode is enabled for the specified device.
`spec.devices.requests.exactly.adminAccess:true`:: Specifies that the admin access mode is enabled for the specified device.

Comment on lines 166 to 167
Priority lists::
In both resource claim templates and resource claims, you can either require all requested devices be available before a pod can be scheduled, or create a prioritized list of two resources in case the first resource is not available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Priority lists::
In both resource claim templates and resource claims, you can either require all requested devices be available before a pod can be scheduled, or create a prioritized list of two resources in case the first resource is not available.
Prioritized list::
A prioritized list of subrequests for requests in resource claim templates or resource claims can be provided. This allows users to specify alternative devices that can be used by the workload if the primary choice is not available.

Priority lists::
In both resource claim templates and resource claims, you can either require all requested devices be available before a pod can be scheduled, or create a prioritized list of two resources in case the first resource is not available.
+
In the `ResourceClaimTemplate` or `ResourceClaim` object, you must specify `exactly` to require all devices to be available or `firstAvailable` to create a priority list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should focus on the functionality of the options exactly, firstAvailable rather than insisting on the availability of the devices when these options are used.


`spec.spec.devices.requests`:: Configures a resource request.

`spec.spec.devices.requests.firstAvailable:`:: Specifies whether all of the requested devices must be available. In this example, `firstAvailable`. specifies two sub-requests, named: `2g-10gb` and `3g-20gb`. Alternatively, you can specify `exactly` to request one specific device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifies whether all of the requested devices must be available

The user requests for a device like a gpu via a resourceclaim or a resourceclaim template and the option to request a device is via spec.spec.devices.requests.firstAvailable or spec.spec.devices.requests.exactly. The former option would help scheduler in picking/selecting the first available device from the list of requests. So, I think this explanation seems to be not matching the expectation

+
* `spec.devices.requests.firstAvailable` specifies multiple requests for a device, of which only one device needs to be available before the scheduler can create the requesting pod. The scheduler checks the availability of the devices in the order listed and selects the first available device. The scheduler can create the pod if one requested devices is available.

* `spec.devices.requests.exactly` specifies one or more requests for a device, for which all of the devices must be available before the scheduler can create the requesting pod. Each of the devices must the request exactly for the request to be satisfied. If any of the requested devices is not available, the scheduler cannot create the pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be incorrect. Based on the upstream doc reference, Exactly specifies the details for a single request that must be met exactly for the request to be satisfied.

Copy link
Contributor Author

@mburke5678 mburke5678 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sairameshv Thank you. Not sure where I picked up that idea that you can make multiple requests. I agree, it seems incorrect.

Do we support devices.requests.exactly.allocationMode and devices.requests.exactly.count? Perhaps that is what confused me. If so, and I request exactly 3 of the devices, and all 3 devices are not available, what happens?

@mburke5678
Copy link
Contributor Author

/retest

@mburke5678
Copy link
Contributor Author

@asahay19 Can you PTAL?

@mburke5678
Copy link
Contributor Author

/retest

Michael Burke added 2 commits December 16, 2025 10:14
@openshift-ci
Copy link

openshift-ci bot commented Dec 16, 2025

@mburke5678: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch/enterprise-4.21 size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants