diff --git a/gpu-operator/deploy-kata-containers.rst b/gpu-operator/deploy-kata-containers.rst index 6cd877c40..3d9801d06 100644 --- a/gpu-operator/deploy-kata-containers.rst +++ b/gpu-operator/deploy-kata-containers.rst @@ -225,6 +225,31 @@ Kubernetes Cluster Refer to the `Kata Containers documentation `_ for more details on the Kata runtime and VFIO cold-plug. +* Increase kubelet image pull timeouts configuration to 20 minutes to avoid timeouts when pulling large images. + Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. + + Increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the Kata shim configurations in Kata Containers. + The default timeout is 2 minutes. + + Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): + + .. code-block:: yaml + :emphasize-lines: 3 + + apiVersion: kubelet.config.k8s.io/v1beta1 + kind: KubeletConfiguration + runtimeRequestTimeout: 20m + + Restart the kubelet service to apply the change: + + .. code-block:: console + + $ sudo systemctl restart kubelet + + If you need a timeout of more than 1200 seconds (20 minutes), you will also need to adjust the Kata Agent's ``image_pull_timeout``, which defaults to 1200s. + This setting also sets the confidential data hub's image pull API timeout in seconds. + To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation. + .. _label-nodes-kata-containers: Label Nodes to use Kata Containers diff --git a/gpu-operator/platform-support.rst b/gpu-operator/platform-support.rst index 93edc6fe1..67949a47c 100644 --- a/gpu-operator/platform-support.rst +++ b/gpu-operator/platform-support.rst @@ -559,6 +559,46 @@ KubeVirt and OpenShift Virtualization with NVIDIA vGPU is supported on the follo KubeVirt with NVIDIA vGPU is supported on ``nodes`` with Linux kernel < 6.0, such as Ubuntu 22.04 ``LTS``. +**************************************************************************************** +Support for Kata Containers, Confidential Containers, and OpenShift Sandboxed Containers +**************************************************************************************** + +The GPU Operator supports running GPU workloads in lightweight virtual machines using +`Kata Containers `__ for single and multi GPU passthrough workloads. +Confidential Containers are also supported through Kata Containers and the NVIDIA Reference Architecture for Confidential Containers. + +.. list-table:: + :header-rows: 1 + :widths: 40 60 + + * - Component + - Support + * - Kata Containers + - 3.29.0 and higher (installed with the upstream ``kata-deploy`` Helm chart) + * - NVIDIA Reference Architecture for Confidential Containers + - Refer to the NVIDIA Confidential Containers :doc:`support matrix documentation `. + * - OpenShift Sandboxed Containers + - 1.12 (Technology Preview support) + +For deatils on installing Kata Containers with the GPU Operator, refer to the :doc:`deploy-kata-containers` page. +This page includes additional limitations and restrictions for using Kata Containers with the GPU Operator. + +Refer to the `Red Hat OpenShift Sandboxed Containers `__ documentation for more details. + +*********************************** +Support for Confidential Containers +*********************************** + +The GPU Operator supports deploying Confidential Containers using Kata Containers and the NVIDIA Reference Architecture for Confidential Containers. +This is a dedicated architecture for deploying Confidential Containers on Kubernetes clusters. +It supports everything listed in the + +For additional details on the NVIDIA Reference Architecture for Confidential Containers, including supported GPUs, host CPU platforms, operating systems, and software component, refer to the NVIDIA Confidential Containers :doc:`support matrix documentation `. + +The GPU Operator offers Technology Preview support for +`Red Hat OpenShift Sandboxed Containers `__ v1.12 +to deploy Confidential Containers workloads. + ************************** Support for GPUDirect RDMA **************************