From f0c35263a6657f25bba57f82f160b406ad1158da Mon Sep 17 00:00:00 2001 From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> Date: Thu, 30 Apr 2026 11:26:47 -0400 Subject: [PATCH 1/2] Init pass at restructuring TOC Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> --- .../confidential-containers-deploy.rst | 233 +++++++----------- confidential-containers/configure-cc-mode.rst | 156 ++++++++++++ .../configure-image-pull-timeouts.rst | 109 ++++++++ .../configure-multi-gpu.rst | 149 +++++++++++ confidential-containers/index.rst | 64 ++++- confidential-containers/prerequisites.rst | 128 ++++++++++ .../run-sample-workload.rst | 118 +++++++++ .../supported-platforms.rst | 2 +- 8 files changed, 803 insertions(+), 156 deletions(-) create mode 100644 confidential-containers/configure-cc-mode.rst create mode 100644 confidential-containers/configure-image-pull-timeouts.rst create mode 100644 confidential-containers/configure-multi-gpu.rst create mode 100644 confidential-containers/prerequisites.rst create mode 100644 confidential-containers/run-sample-workload.rst diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst index d48fe8797..2b150619a 100644 --- a/confidential-containers/confidential-containers-deploy.rst +++ b/confidential-containers/confidential-containers-deploy.rst @@ -38,7 +38,7 @@ Overview The high-level workflow for configuring Confidential Containers is as follows: -#. Configure the :ref:`Prerequisites `. +#. Configure the :doc:`Prerequisites `. #. :ref:`Label Nodes ` that you want to use with Confidential Containers. @@ -49,8 +49,8 @@ The high-level workflow for configuring Confidential Containers is as follows: This installs the NVIDIA GPU Operator components that are required to deploy GPU passthrough workloads. The GPU Operator uses the node labels to determine what software components to deploy to a node. -After installation, you can :ref:`run a sample GPU workload ` in a confidential container. -You can also configure :doc:`Attestation ` with the Trustee framework. +After installation, you can :doc:`run a sample GPU workload ` in a confidential container. +You can also configure :doc:`Attestation ` with the Trustee framework. The Trustee attestation service is typically deployed on a separate, trusted environment. After configuration, you can schedule workloads that request GPU resources and use the ``kata-qemu-nvidia-gpu-tdx`` or ``kata-qemu-nvidia-gpu-snp`` runtime classes for secure deployment. @@ -188,7 +188,7 @@ Installation .. _coco-label-nodes: Label Nodes ------------ +=========== #. Get a list of the nodes in your cluster: @@ -248,7 +248,7 @@ After labeling the node, you can continue to the next steps to install Kata Cont .. _coco-install-kata-chart: Install the Kata Containers Helm Chart --------------------------------------- +====================================== Install Kata Containers using the ``kata-deploy`` Helm chart. The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers. @@ -343,7 +343,7 @@ The minimum required version is 3.29.0. .. _coco-install-gpu-operator: Install the NVIDIA GPU Operator --------------------------------- +================================ Install the NVIDIA GPU Operator and configure it to deploy Confidential Container components. @@ -447,14 +447,62 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe If you have an issue deploying the GPU Operator, refer to the :doc:`NVIDIA GPU Operator troubleshooting guide ` for guidance on troubleshooting and resolving issues. -With Kata Containers and the GPU Operator installed, you can start using your cluster to run Confidential Containers workloads. -To run a sample workload, refer to the :ref:`Run a Sample Workload ` section. +.. _coco-configuration-settings: + +Optional: Confidential Containers Configuration Settings +-------------------------------------------------------- + +The following are the available GPU Operator configuration settings to enable Confidential Containers: + +.. list-table:: + :widths: 20 50 30 + :header-rows: 1 + + * - Parameter + - Description + - Default + + * - ``sandboxWorkloads.enabled`` + - Enables sandbox workload management in the GPU Operator for virtual + machine-style workloads and related operands. + - ``false`` + + * - ``sandboxWorkloads.defaultWorkload`` + - Specifies the default type of workload for the cluster, one of ``container``, ``vm-passthrough``, or ``vm-vgpu``. + + Setting ``vm-passthrough`` or ``vm-vgpu`` can be helpful if you plan to run all or mostly virtual machines in your cluster. + - ``container`` + + * - ``sandboxWorkloads.mode`` + - Specifies the sandbox mode to use when deploying sandbox workloads. + Accepted values are ``kubevirt`` (default) and ``kata``. + - ``kubevirt`` -For further configuration settings, refer to the following sections: + * - ``sandboxDevicePlugin.env`` + - Optional list of environment variables passed to the NVIDIA Sandbox + Device Plugin pod. Each list item is an ``EnvVar`` object with required + ``name`` and optional ``value`` fields. + - ``[]`` (empty list) + +.. _coco-configuration-heterogeneous-clusters: + +Optional: Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types +---------------------------------------------------------------------------------------------- + +By default, the NVIDIA GPU Operator creates a single resource type for GPUs, ``nvidia.com/pgpu``. +In clusters where all GPUs are the same model, a single resource type is sufficient. -* :ref:`Managing the Confidential Computing Mode ` -* :ref:`Configuring Workloads to use Multi-GPU Passthrough ` -* :ref:`Configuring GPU or NVSwitch Resource Types Name ` +In heterogeneous clusters, where you have different GPU types on your nodes, you might want to use specific GPU types for your workload. +To do this, specify an empty ``P_GPU_ALIAS`` environment variable in the sandbox device plugin by adding the following to your GPU Operator installation: +``--set sandboxDevicePlugin.env[0].name=P_GPU_ALIAS`` and +``--set sandboxDevicePlugin.env[0].value=""``. + +When this variable is set to ``""``, the sandbox device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H100L_94GB``, instead of the default ``nvidia.com/pgpu`` type. +Use the exposed device resource types in pod specs by specifying respective resource limits. + +Similarly, NVSwitches are exposed as resources of type ``nvidia.com/nvswitch`` by default. +You can include ``--set sandboxDevicePlugin.env[0].name=NVSWITCH_ALIAS`` and +``--set sandboxDevicePlugin.env[0].value=""`` for the device plugin environment variable when installing the GPU Operator to configure advertising behavior similar to ``P_GPU_ALIAS``. .. _coco-run-sample-workload: @@ -489,17 +537,16 @@ A pod manifest for a confidential container GPU workload requires that you speci * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type where the workloads should run. * In the sample above, ``nvidia.com/pgpu`` is the default resource type for GPUs. - If you are deploying on a heterogeneous cluster, you might want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the Kata device plugin. - Refer to the :ref:`Configuring GPU or NVSwitch Resource Types Name ` section on this page for more details. + If you are deploying on a heterogeneous cluster, you might want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the sandbox device plugin. + Refer to the :ref:`Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ` section on this page for more details. * If you have machines that support multi-GPU passthrough, use a pod deployment manifest that specifies 8 PGPU and 4 NVSwitch resources. .. code-block:: yaml - resources: - limits: - nvidia.com/pgpu: "8" - nvidia.com/nvswitch: "4" + limits: + nvidia.com/pgpu: "8" + nvidia.com/nvswitch: "4" .. note:: If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, also refer to :ref:`Managing the Confidential Computing Mode ` for details on how to set the ``ppcie`` mode. @@ -510,7 +557,7 @@ A pod manifest for a confidential container GPU workload requires that you speci .. code-block:: console $ kubectl apply -f cuda-vectoradd-kata.yaml - + *Example Output:* .. code-block:: output @@ -518,9 +565,9 @@ A pod manifest for a confidential container GPU workload requires that you speci pod/cuda-vectoradd-kata created -3. Verify the pod is running: + Optional: Verify the pod is running. - .. code-block:: console + .. code-block:: console $ kubectl get pod cuda-vectoradd-kata @@ -531,7 +578,7 @@ A pod manifest for a confidential container GPU workload requires that you speci NAME READY STATUS RESTARTS AGE cuda-vectoradd-kata 1/1 Running 0 10s -4. View the logs from the pod after the container starts: +3. View the logs from the pod after the container starts: .. code-block:: console @@ -548,107 +595,13 @@ A pod manifest for a confidential container GPU workload requires that you speci Test PASSED Done -5. Delete the pod: +4. Delete the pod: .. code-block:: console $ kubectl delete -f cuda-vectoradd-kata.yaml -.. _coco-configuration-settings: - -Common GPU Operator Configuration Settings -=========================================== - -The following are the available GPU Operator configuration settings to enable Confidential Containers: - -.. list-table:: - :widths: 20 50 30 - :header-rows: 1 - - * - Parameter - - Description - - Default - - * - ``sandboxWorkloads.enabled`` - - Enables sandbox workload management in the GPU Operator for virtual - machine-style workloads and related operands. - - ``false`` - - * - ``sandboxWorkloads.defaultWorkload`` - - Specifies the default type of workload for the cluster, one of ``container``, ``vm-passthrough``, or ``vm-vgpu``. - - Setting ``vm-passthrough`` or ``vm-vgpu`` can be helpful if you plan to run all or mostly virtual machines in your cluster. - - ``container`` - - * - ``sandboxWorkloads.mode`` - - Specifies the sandbox mode to use when deploying sandbox workloads. - Accepted values are ``kubevirt`` (default) and ``kata``. - - ``kubevirt`` - - * - ``kataSandboxDevicePlugin.env`` - - Optional list of environment variables passed to the NVIDIA Kata - Device Plugin pod. Each list item is an ``EnvVar`` object with required - ``name`` and optional ``value`` fields. - Use the setting to configure ``P_GPU_ALIAS`` or ``NVSWITCH_ALIAS`` for the Kata sandbox device plugin. - Refer to the :ref:`Configuring GPU or NVSwitch Resource Types Name ` section for more details. - - ``[]`` (empty list) - -.. _coco-configuration-heterogeneous-clusters: - -Configuring GPU or NVSwitch Resource Types Name ------------------------------------------------- - -By default, the NVIDIA GPU Operator creates a resource type for GPUs and NVSwitches, ``nvidia.com/pgpu`` and ``nvidia.com/nvswitch``. -You can reference this name in your manifests to request GPU or NVSwitch resources for your workload. -If you want to use a different name, you can set the ``P_GPU_ALIAS`` or ``NVSWITCH_ALIAS`` environment variables in the Kata device plugin to your preferred name. -In clusters where all GPUs are the same model, a single resource type is typically sufficient. - -In heterogeneous clusters, where you have different GPU types on your nodes, you might want to use specific GPU types for your workload. -To do this, specify an empty ``P_GPU_ALIAS`` environment variable in the Kata sandbox device plugin by adding the following to your GPU Operator installation: -``--set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS`` and -``--set kataSandboxDevicePlugin.env[0].value=""``. - -When this variable is set to ``""``, the Kata device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H100L_94GB``, instead of the default ``nvidia.com/pgpu`` type. -Use the exposed device resource types in pod specs by specifying respective resource limits. - -Similarly, you can set ``NVSWITCH_ALIAS`` to ``""`` to advertise model-specific NVSwitch resource types. - -The following example installs the GPU Operator with both ``P_GPU_ALIAS`` and ``NVSWITCH_ALIAS`` configured: - -.. code-block:: console - - $ helm install --wait --timeout 10m --generate-name \ - -n gpu-operator --create-namespace \ - nvidia/gpu-operator \ - --set sandboxWorkloads.enabled=true \ - --set sandboxWorkloads.mode=kata \ - --set nfd.enabled=true \ - --set nfd.nodefeaturerules=true \ - --set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS \ - --set kataSandboxDevicePlugin.env[0].value="" \ - --set kataSandboxDevicePlugin.env[1].name=NVSWITCH_ALIAS \ - --set kataSandboxDevicePlugin.env[1].value="" \ - --version=v26.3.1 - -After installing the GPU Operator, you can view the GPU or NVSwitch resource types available on a node by running the following command: - -.. code-block:: console - - $ kubectl get node $NODE_NAME -o json | grep nvidia.com - -.. note:: - The ``NODE_NAME`` environment variable was set in the :ref:`Label Nodes ` section. - If you want to view the resource types for a different node, you can update the ``NODE_NAME`` environment variable and run the command again. - -*Example Output:* - -.. code-block:: output - - "nvidia.com/GH100_H100L_94GB": "1" - - - .. _managing-confidential-computing-mode: Managing the Confidential Computing Mode @@ -677,7 +630,7 @@ The supported modes are: * - Mode - Description - Configuration Method - * - ``on`` (default) + * - ``on`` - Enable Confidential Computing. - cluster-wide default, node-level override * - ``off`` @@ -688,15 +641,15 @@ The supported modes are: On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe (PPCIE) which claims exclusive use of the NVSwitches for a single Confidential Container - virtual machine. + virtual machine. If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, - set the GPU mode to ``ppcie`` mode. - + set the GPU mode to ``ppcie`` mode. + The NVIDIA Blackwell architecture uses NVLink encryption which places the switches outside of the Trusted Computing Base (TCB), meaning the ``ppcie`` mode is not required. Use ``on`` mode in this case. - node-level override - + You can set a cluster-wide default mode, and you can set the mode on individual nodes. The mode that you set on a node has higher precedence than the cluster-wide default mode. @@ -770,11 +723,11 @@ To verify that a mode change was successful, view the ``nvidia.com/cc.mode``, * The ``nvidia.com/cc.mode`` label is the desired state. -* The ``nvidia.com/cc.mode.state`` label reflects the mode that was last successfully applied to the GPU hardware by the Confidential Computing Manager. +* The ``nvidia.com/cc.mode.state`` label reflects the mode that was last successfully applied to the GPU hardware by the Confidential Computing Manager. Its value mirrors the applied mode ``on``, ``off``, or ``ppcie``, after the transition is complete on the node. A value of ``failed`` indicates that the last mode transition encountered an error. -* The ``nvidia.com/cc.ready.state`` label indicates whether the node is ready to run Confidential Container workloads. +* The ``nvidia.com/cc.ready.state`` label indicates whether the node is ready to run Confidential Container workloads. It is set to ``true`` when ``cc.mode.state`` is ``on`` or ``ppcie``, and ``false`` when ``cc.mode.state`` is ``off``. .. note:: @@ -784,10 +737,8 @@ To verify that a mode change was successful, view the ``nvidia.com/cc.mode``, ``nvidia.com/cc.mode.state`` have the same value. -.. _coco-configuration-multi-gpu-passthrough: - -Configuring Workloads to use Multi-GPU Passthrough -=================================================== +Configuring Multi-GPU Passthrough Support +========================================= To configure multi-GPU passthrough, you can specify the following resource limits in your manifests: @@ -802,7 +753,7 @@ You must assign all the GPUs and NVSwitches on the node in your manifest to the On the NVIDIA Hopper architecture, multi-GPU passthrough uses protected PCIe (PPCIE), which claims exclusive use of the NVSwitches for a single Confidential Container. When using NVIDIA Hopper nodes for multi-GPU passthrough, transition your node's GPU Confidential Computing mode to ``ppcie`` by applying the ``nvidia.com/cc.mode=ppcie`` label. -Refer to the :ref:`Managing the Confidential Computing Mode ` section for details. +Refer to the :ref:`Managing the Confidential Computing Mode ` section for details. The NVIDIA Blackwell architecture uses NVLink encryption which places the switches outside of the Trusted Computing Base (TCB) and only requires the GPU Confidential Computing mode to be set to ``on``. @@ -815,24 +766,9 @@ Configure Image Pull Timeouts The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start. Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. -If you plan to use large images, increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers. - -Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): - -.. code-block:: yaml - :emphasize-lines: 3 - - apiVersion: kubelet.config.k8s.io/v1beta1 - kind: KubeletConfiguration - runtimeRequestTimeout: 20m - -Restart the kubelet service to apply the change: - -.. code-block:: console - - $ sudo systemctl restart kubelet +Configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration `_ with a higher timeout value than the two-minute default. +Consider setting this value to 20 minutes (``20m``) to match the default values for the NVIDIA shim configurations in Kata Containers ``create_container_timeout`` and the agent's ``image_pull_timeout``. -Additional timeouts to consider updating are the NVIDIA Shim and Kata Agent Policy timeouts. The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes). This controls the time the shim allows for a container to remain in container creating state. @@ -843,8 +779,7 @@ To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim c Next Steps ========== -* Refer to the :doc:`Attestation ` page for more information on configuring attestation. +* :doc:`Run a Sample Workload ` to verify your deployment. * To help manage the lifecycle of Kata Containers, install the `Kata Lifecycle Manager `_. This Argo Workflows-based tool manages Kata Containers upgrades and day-two operations. -* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information. -* Licensing information is available on the :doc:`Licensing ` page. \ No newline at end of file +* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information. \ No newline at end of file diff --git a/confidential-containers/configure-cc-mode.rst b/confidential-containers/configure-cc-mode.rst new file mode 100644 index 000000000..e730b6855 --- /dev/null +++ b/confidential-containers/configure-cc-mode.rst @@ -0,0 +1,156 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _managing-confidential-computing-mode: + +***************************************** +Managing the Confidential Computing Mode +***************************************** + +You can set the default confidential computing mode of the NVIDIA GPUs by setting the ``ccManager.defaultMode=`` option. +The default value of ``ccManager.defaultMode`` is ``on``. +You can set this option when you install NVIDIA GPU Operator or afterward by modifying the cluster-policy instance of the ClusterPolicy object. + +When you change the mode, the manager performs the following actions: + +* Evicts the other GPU Operator operands from the node. + + However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode. + +* Unbinds the GPU from the VFIO PCI device driver. +* Changes the mode and resets the GPU. +* Reschedules the other GPU Operator operands. + +The supported modes are: + +.. list-table:: + :widths: 15 55 30 + :header-rows: 1 + + * - Mode + - Description + - Configuration Method + * - ``on`` + - Enable Confidential Computing. + - cluster-wide default, node-level override + * - ``off`` + - Disable Confidential Computing. + - cluster-wide default, node-level override + * - ``ppcie`` + - Enable Confidential Computing on NVIDIA Hopper GPUs. + + On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe (PPCIE) + which claims exclusive use of the NVSwitches for a single Confidential Container + virtual machine. + If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, + set the GPU mode to ``ppcie`` mode. + + The NVIDIA Blackwell architecture uses NVLink + encryption which places the switches outside of the Trusted Computing Base (TCB), + meaning the ``ppcie`` mode is not required. Use ``on`` mode in this case. + - node-level override + +You can set a cluster-wide default mode, and you can set the mode on individual nodes. +The mode that you set on a node has higher precedence than the cluster-wide default mode. + +Setting a Cluster-Wide Default Mode +==================================== + +To set a cluster-wide mode, specify the ``ccManager.defaultMode`` field like the following example: + +.. code-block:: console + + $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \ + --type=merge \ + -p '{"spec": {"ccManager": {"defaultMode": "on"}}}' + +*Example Output:* + +.. code-block:: output + + clusterpolicy.nvidia.com/cluster-policy patched + +.. note:: + + The ``ppcie`` mode cannot be set as a cluster-wide default, it can only be set as a node label value. + +Setting a Node-Level Mode +========================== + +To set a node-level mode, apply the ``nvidia.com/cc.mode=`` label on the node. + +Set the ``NODE_NAME`` environment variable to the name of the node you want to configure: + +.. code-block:: console + + $ export NODE_NAME="" + +Then apply the label: + +.. code-block:: console + + $ kubectl label node $NODE_NAME nvidia.com/cc.mode=on --overwrite + +The mode that you set on a node has higher precedence than the cluster-wide default mode. + +Verifying a Mode Change +======================== + +To verify that a mode change was successful, view the ``nvidia.com/cc.mode``, +``nvidia.com/cc.mode.state``, and ``nvidia.com/cc.ready.state`` node labels: + +.. code-block:: console + + $ kubectl get node $NODE_NAME -o json | \ + jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/cc")))' + +*Example Output (CC mode disabled):* + +.. code-block:: json + + { + "nvidia.com/cc.mode": "off", + "nvidia.com/cc.mode.state": "off", + "nvidia.com/cc.ready.state": "false" + } + +*Example Output (CC mode enabled):* + +.. code-block:: json + + { + "nvidia.com/cc.mode": "on", + "nvidia.com/cc.mode.state": "on", + "nvidia.com/cc.ready.state": "true" + } + +* The ``nvidia.com/cc.mode`` label is the desired state. + +* The ``nvidia.com/cc.mode.state`` label reflects the mode that was last successfully applied to the GPU hardware by the Confidential Computing Manager. + Its value mirrors the applied mode ``on``, ``off``, or ``ppcie``, after the transition is complete on the node. + A value of ``failed`` indicates that the last mode transition encountered an error. + +* The ``nvidia.com/cc.ready.state`` label indicates whether the node is ready to run Confidential Container workloads. + It is set to ``true`` when ``cc.mode.state`` is ``on`` or ``ppcie``, and ``false`` when ``cc.mode.state`` is ``off``. + +.. note:: + + It can take one to two minutes for GPU state transitions to complete and the labels to be updated. + A mode change is complete and successful when ``nvidia.com/cc.mode`` and + ``nvidia.com/cc.mode.state`` have the same value. diff --git a/confidential-containers/configure-image-pull-timeouts.rst b/confidential-containers/configure-image-pull-timeouts.rst new file mode 100644 index 000000000..8e9b2dac1 --- /dev/null +++ b/confidential-containers/configure-image-pull-timeouts.rst @@ -0,0 +1,109 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _configure-image-pull-timeouts: + +***************************** +Configure Image Pull Timeouts +***************************** + +The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start. +Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. + +The timeout chain has three components that you might need to configure: + +* **Kubelet** ``runtimeRequestTimeout``: Controls how long kubelet waits for the container runtime to respond. Default: ``2m``. +* **Kata shim** ``create_container_timeout``: Controls how long the NVIDIA shim allows a container to remain in container creating state. Default: ``1200s`` (20 minutes). +* **Kata Agent** ``image_pull_timeout``: Controls the agent-side timeout for guest-image pull. Default: ``1200s`` (20 minutes). + +Configure the Kubelet Timeout +============================== + +Configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration `_ with a higher timeout value than the two-minute default. +Set this value to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers. + +Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): + +.. code-block:: yaml + :emphasize-lines: 3 + + apiVersion: kubelet.config.k8s.io/v1beta1 + kind: KubeletConfiguration + runtimeRequestTimeout: 20m + +Restart the kubelet service to apply the change: + +.. code-block:: console + + $ sudo systemctl restart kubelet + +Configure Timeouts Beyond 20 Minutes +====================================== + +If you need a timeout of more than 1200 seconds (20 minutes), you must also adjust the Kata Agent Policy's ``image_pull_timeout`` value. + +You can set this value either through a pod annotation or by modifying the shim configuration. + +Using a Pod Annotation +----------------------- + +Add the ``io.katacontainers.config.hypervisor.kernel_params`` annotation to your pod manifest with the desired ``agent.image_pull_timeout`` value in seconds: + +.. code-block:: yaml + :emphasize-lines: 7 + + apiVersion: v1 + kind: Pod + metadata: + name: large-model-kata + namespace: default + annotations: + io.katacontainers.config.hypervisor.kernel_params: "agent.image_pull_timeout=1800" + spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp + restartPolicy: Never + containers: + - name: model-server + image: "nvcr.io/nvidia/example-large-model:latest" + resources: + limits: + nvidia.com/pgpu: "1" + memory: 64Gi + +In this example, ``agent.image_pull_timeout=1800`` sets the agent-side timeout to 30 minutes (1800 seconds). + +Using the Shim Configuration +----------------------------- + +To set the timeout globally, add the ``agent.image_pull_timeout`` kernel parameter to your Kata shim configuration file. +The shim configuration files are located in ``/opt/kata/share/defaults/kata-containers/`` on the worker nodes. + +Add the parameter to the ``kernel_params`` field in the ``[hypervisor.qemu]`` section: + +.. code-block:: toml + :emphasize-lines: 2 + + [hypervisor.qemu] + kernel_params = "agent.image_pull_timeout=1800" + +.. note:: + + When setting timeouts beyond 20 minutes, ensure that all three timeout values in the chain are consistent: + the kubelet ``runtimeRequestTimeout``, the Kata shim ``create_container_timeout``, and the + agent ``image_pull_timeout`` should all be set to accommodate the expected image pull duration. diff --git a/confidential-containers/configure-multi-gpu.rst b/confidential-containers/configure-multi-gpu.rst new file mode 100644 index 000000000..edc27d513 --- /dev/null +++ b/confidential-containers/configure-multi-gpu.rst @@ -0,0 +1,149 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _coco-multi-gpu-passthrough: + +***************************************** +Configuring Multi-GPU Passthrough Support +***************************************** + +Multi-GPU passthrough assigns all GPUs and NVSwitches on a node to a single Confidential Container virtual machine. +This configuration is required for NVSwitch (NVLink) based HGX systems running confidential workloads. + +You must assign all the GPUs and NVSwitches on the node to the same Confidential Container virtual machine. +Configuring only a subset of GPUs for Confidential Computing on a single node is not supported. + +Prerequisites +============= + +* Complete the :doc:`Confidential Containers deployment ` steps. +* Verify that your node has multi-GPU hardware (NVSwitch-based HGX system). + +Set the Confidential Computing Mode +==================================== + +The required CC mode depends on your GPU architecture. + +Set the ``NODE_NAME`` environment variable to the name of the node you want to configure: + +.. code-block:: console + + $ export NODE_NAME="" + +**NVIDIA Hopper architecture:** + +Multi-GPU passthrough on Hopper uses protected PCIe (PPCIE), which claims exclusive use of the NVSwitches for a single Confidential Container. +Set the node's CC mode to ``ppcie``: + +.. code-block:: console + + $ kubectl label node $NODE_NAME nvidia.com/cc.mode=ppcie --overwrite + +**NVIDIA Blackwell architecture:** + +The Blackwell architecture uses NVLink encryption which places the switches outside of the Trusted Computing Base (TCB). +The ``ppcie`` mode is not required. Use ``on`` mode: + +.. code-block:: console + + $ kubectl label node $NODE_NAME nvidia.com/cc.mode=on --overwrite + +Refer to :doc:`Managing the Confidential Computing Mode ` for details on verifying the mode change. + +Run a Multi-GPU Workload +======================== + +1. Create a file, such as ``multi-gpu-kata.yaml``, with a pod manifest that requests all GPUs and NVSwitches on the node: + + .. code-block:: yaml + :emphasize-lines: 7,14-16 + + apiVersion: v1 + kind: Pod + metadata: + name: multi-gpu-kata + namespace: default + spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp + restartPolicy: Never + containers: + - name: cuda-sample + image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" + resources: + limits: + nvidia.com/pgpu: "8" + nvidia.com/nvswitch: "4" + memory: 128Gi + + Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type. + + .. note:: + + If you configured ``P_GPU_ALIAS`` for heterogeneous clusters, replace ``nvidia.com/pgpu`` with the model-specific resource type. + Refer to :ref:`Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ` for details. + +2. Create the pod: + + .. code-block:: console + + $ kubectl apply -f multi-gpu-kata.yaml + + *Example Output:* + + .. code-block:: output + + pod/multi-gpu-kata created + +3. Verify the pod is running: + + .. code-block:: console + + $ kubectl get pod multi-gpu-kata + + *Example Output:* + + .. code-block:: output + + NAME READY STATUS RESTARTS AGE + multi-gpu-kata 1/1 Running 0 30s + +4. Verify that all GPUs are visible inside the container: + + .. code-block:: console + + $ kubectl exec multi-gpu-kata -- nvidia-smi -L + + *Example Output:* + + .. code-block:: output + + GPU 0: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 1: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 2: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 3: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 4: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 5: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 6: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 7: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + +5. Delete the pod: + + .. code-block:: console + + $ kubectl delete -f multi-gpu-kata.yaml diff --git a/confidential-containers/index.rst b/confidential-containers/index.rst index a5024ad2d..e3f522d7d 100644 --- a/confidential-containers/index.rst +++ b/confidential-containers/index.rst @@ -25,11 +25,34 @@ NVIDIA Confidential Containers Architecture :hidden: :titlesonly: - Release Notes Overview Supported Platforms + +.. toctree:: + :caption: Install + :hidden: + :titlesonly: + + Prerequisites Deploy Confidential Containers + Run a Sample Workload + +.. toctree:: + :caption: Configure + :hidden: + :titlesonly: + + Managing the CC Mode + Multi-GPU Passthrough + Image Pull Timeouts Attestation + +.. toctree:: + :caption: Reference + :hidden: + :titlesonly: + + Release Notes Licensing @@ -51,28 +74,57 @@ This is documentation for NVIDIA's implementation of Confidential Containers inc Learn about the validated hardware, OS, and component versions. + .. grid-item-card:: :octicon:`checklist;1.5em;sd-mr-1` Prerequisites + :link: prerequisites + :link-type: doc + + Hardware, BIOS, and Kubernetes cluster requirements. + .. grid-item-card:: :octicon:`rocket;1.5em;sd-mr-1` Deploy Confidential Containers :link: confidential-containers-deploy :link-type: doc - Use this page to deploy with the NVIDIA GPU Operator on Kubernetes. + Install Kata Containers and the NVIDIA GPU Operator on Kubernetes. + + .. grid-item-card:: :octicon:`play;1.5em;sd-mr-1` Run a Sample Workload + :link: run-sample-workload + :link-type: doc + + Verify your deployment by running a GPU workload in a confidential container. + + .. grid-item-card:: :octicon:`gear;1.5em;sd-mr-1` Managing the CC Mode + :link: configure-cc-mode + :link-type: doc + + Set the confidential computing mode on NVIDIA GPUs at cluster or node level. + + .. grid-item-card:: :octicon:`cpu;1.5em;sd-mr-1` Multi-GPU Passthrough + :link: configure-multi-gpu + :link-type: doc + + Configure multi-GPU passthrough for NVSwitch-based HGX systems. + + .. grid-item-card:: :octicon:`clock;1.5em;sd-mr-1` Image Pull Timeouts + :link: configure-image-pull-timeouts + :link-type: doc + + Tune image pull timeouts for large container images in confidential VMs. .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation :link: attestation :link-type: doc - Learn about remote attestation, Trustee, and the NVIDIA verifier for GPU workloads. - + Remote attestation, Trustee, and the NVIDIA verifier for GPU workloads. .. grid-item-card:: :octicon:`note;1.5em;sd-mr-1` Release Notes :link: release-notes :link-type: doc - Review new features and known issues for each release. + New features and known issues for each release. .. grid-item-card:: :octicon:`law;1.5em;sd-mr-1` Licensing :link: licensing :link-type: doc - Learn about the licensing information for Confidential Containers documentation. + Licensing information for Confidential Containers documentation. diff --git a/confidential-containers/prerequisites.rst b/confidential-containers/prerequisites.rst new file mode 100644 index 000000000..4ec0e6b0b --- /dev/null +++ b/confidential-containers/prerequisites.rst @@ -0,0 +1,128 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _coco-prerequisites: + +************* +Prerequisites +************* + +Complete the following prerequisites before deploying Confidential Containers. +Refer to the :doc:`Supported Platforms ` page for validated hardware and software versions. + +Hardware and BIOS +================= + +* Use a supported platform configured for Confidential Computing. + For more information on machine setup, refer to :doc:`Supported Platforms `. + +* Ensure hosts are configured to enable hardware virtualization and Access Control Services (ACS). With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER). Enable these features in the host BIOS. + +* Configure hosts to support IOMMU. + You can check if your host is configured for IOMMU by running the following command: + + .. code-block:: console + + $ ls /sys/kernel/iommu_groups + + If the output of this command includes 0, 1, and so on, then your host is configured for IOMMU. + + If the host is not configured or if you are unsure, add the ``amd_iommu=on`` Linux kernel command-line argument. For most Linux distributions, add the argument to the ``/etc/default/grub`` file, for instance:: + + ... + GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on modprobe.blacklist=nouveau" + ... + + After making the change, configure the bootloader + + .. code-block:: console + + $ sudo update-grub + + *Example Output:* + + .. code-block:: output + + Sourcing file `/etc/default/grub' + Generating grub configuration file ... + Found linux image: /boot/vmlinuz-5.15.0-generic + Found initrd image: /boot/initrd.img-5.15.0-generic + done + + Reboot the host after configuring the bootloader. + + .. note:: + + After configuring IOMMU, you might see QEMU warnings about PCI P2P DMA when running GPU workloads. + These are expected and can be safely ignored. + Refer to :ref:`coco-limitations` for details. + +* Ensure that no NVIDIA GPU drivers are installed on the host. + Confidential Containers uses VFIO to pass GPUs directly to the confidential VM, and host-level GPU drivers interfere with VFIO device binding. + + To check if NVIDIA GPU drivers are installed, run the following command: + + .. code-block:: console + + $ lsmod | grep nvidia + + If the output is empty, no NVIDIA GPU drivers are loaded. + If modules such as ``nvidia``, ``nvidia_uvm``, or ``nvidia_modeset`` are listed, NVIDIA GPU drivers are present and must be removed before proceeding. + Refer to `Removing the Driver `_ in the NVIDIA Driver Installation Guide. + +Kubernetes Cluster +================== + +* A Kubernetes cluster with cluster administrator privileges. + Refer to the :ref:`Supported Software Components ` table for supported Kubernetes versions. + +* Helm installed on your cluster. + Refer to the `Helm documentation `_ for installation instructions. + +* Enable the ``KubeletPodResourcesGet`` and ``RuntimeClassInImageCriApi`` Kubelet feature gates on your cluster. + + * ``KubeletPodResourcesGet``: Enabled by default on Kubernetes v1.34 and later. + On older versions, you must enable it explicitly. + The Kata runtime uses this feature gate to query the Kubelet Pod Resources API and discover allocated GPU devices during sandbox creation. + + * ``RuntimeClassInImageCriApi``: Alpha since Kubernetes v1.29 and is not enabled by default. + This feature gate is required to support pod deployments that use multiple snapshotters side-by-side. + + Add both feature gates to your Kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): + + .. code-block:: yaml + + apiVersion: kubelet.config.k8s.io/v1beta1 + kind: KubeletConfiguration + featureGates: + KubeletPodResourcesGet: true + RuntimeClassInImageCriApi: true + + If your ``config.yaml`` already has a ``featureGates`` section, add the gates to the existing section rather than creating a duplicate. + + Restart the Kubelet service to apply the changes: + + .. code-block:: console + + $ sudo systemctl restart kubelet + +Next Steps +========== + +After completing the prerequisites, proceed to :doc:`Deploy Confidential Containers `. diff --git a/confidential-containers/run-sample-workload.rst b/confidential-containers/run-sample-workload.rst new file mode 100644 index 000000000..12b7e433c --- /dev/null +++ b/confidential-containers/run-sample-workload.rst @@ -0,0 +1,118 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _coco-run-sample-workload: + +********************* +Run a Sample Workload +********************* + +After completing the :doc:`deployment steps `, you can verify your installation by running a sample GPU workload in a confidential container. + +A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX. + +1. Create a file, such as the following ``cuda-vectoradd-kata.yaml`` sample, specifying the kata-qemu-nvidia-gpu-snp runtime class: + + .. code-block:: yaml + :emphasize-lines: 7,14 + + apiVersion: v1 + kind: Pod + metadata: + name: cuda-vectoradd-kata + namespace: default + spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp + restartPolicy: Never + containers: + - name: cuda-vectoradd + image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" + resources: + limits: + nvidia.com/pgpu: "1" + memory: 16Gi + + The following are Confidential Containers configurations in the sample manifest: + + * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type where the workloads should run. + + * In the sample above, ``nvidia.com/pgpu`` is the default resource type for GPUs. + If you are deploying on a heterogeneous cluster, you might want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the sandbox device plugin. + Refer to the :ref:`Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ` for more details. + + * If you have machines that support multi-GPU passthrough, refer to the :doc:`Configuring Multi-GPU Passthrough ` page for a complete workload example and architecture-specific CC mode requirements. + + +2. Create the pod: + + .. code-block:: console + + $ kubectl apply -f cuda-vectoradd-kata.yaml + + *Example Output:* + + .. code-block:: output + + pod/cuda-vectoradd-kata created + + + Optional: Verify the pod is running. + + .. code-block:: console + + $ kubectl get pod cuda-vectoradd-kata + + *Example Output:* + + .. code-block:: output + + NAME READY STATUS RESTARTS AGE + cuda-vectoradd-kata 1/1 Running 0 10s + +3. View the logs from the pod after the container starts: + + .. code-block:: console + + $ kubectl logs -n default cuda-vectoradd-kata + + *Example Output:* + + .. code-block:: output + + [Vector addition of 50000 elements] + Copy input data from the host memory to the CUDA device + CUDA kernel launch with 196 blocks of 256 threads + Copy output data from the CUDA device to the host memory + Test PASSED + Done + +4. Delete the pod: + + .. code-block:: console + + $ kubectl delete -f cuda-vectoradd-kata.yaml + + +Next Steps +========== + +* Configure :doc:`Attestation ` with the Trustee framework to enable remote verification of your confidential environment. +* Set up :doc:`multi-GPU passthrough ` for NVSwitch-based HGX systems. +* Tune :doc:`image pull timeouts ` if you are pulling large container images. +* Manage the :doc:`confidential computing mode ` on your GPUs. diff --git a/confidential-containers/supported-platforms.rst b/confidential-containers/supported-platforms.rst index 986d170ef..d871b2a45 100644 --- a/confidential-containers/supported-platforms.rst +++ b/confidential-containers/supported-platforms.rst @@ -58,7 +58,7 @@ NVIDIA GPUs .. note:: Multi-GPU passthrough on NVIDIA Hopper HGX systems requires that you set the Confidential Computing mode to ``ppcie`` mode. - Refer to :ref:`Managing the Confidential Computing Mode ` in the deployment guide for details. + Refer to :doc:`Managing the Confidential Computing Mode ` for details. .. note:: From 8fac0fb76b3b00e0024b03b43363621ae1a14e70 Mon Sep 17 00:00:00 2001 From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> Date: Tue, 12 May 2026 11:40:23 -0400 Subject: [PATCH 2/2] Rebase, add configure workload page Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> --- confidential-containers/attestation.rst | 74 +-- .../confidential-containers-deploy.rst | 463 ++---------------- confidential-containers/configure-cc-mode.rst | 21 +- .../configure-image-pull-timeouts.rst | 109 ----- .../configure-multi-gpu.rst | 234 +++++++-- confidential-containers/configure.rst | 57 +++ confidential-containers/index.rst | 41 +- confidential-containers/licensing.rst | 4 +- confidential-containers/overview.rst | 35 +- confidential-containers/prerequisites.rst | 73 ++- confidential-containers/release-notes.rst | 11 +- .../run-sample-workload.rst | 57 ++- .../supported-platforms.rst | 17 +- 13 files changed, 501 insertions(+), 695 deletions(-) delete mode 100644 confidential-containers/configure-image-pull-timeouts.rst create mode 100644 confidential-containers/configure.rst diff --git a/confidential-containers/attestation.rst b/confidential-containers/attestation.rst index 56528dbc5..cd745286b 100644 --- a/confidential-containers/attestation.rst +++ b/confidential-containers/attestation.rst @@ -19,12 +19,13 @@ .. _attestation: -*********** +########### Attestation -*********** +########### -This page provides an overview of how to configure remote attestation for Confidential Container workloads. -Attestation cryptographically verifies the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload. + +The :doc:`Confidential Containers deployment guide ` configures your cluster to run workloads in a Confidential Container. +To strengthen workload security, configure attestation to verify the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload. Attestation is required for any feature that depends on secrets, including: @@ -35,12 +36,19 @@ Attestation is required for any feature that depends on secrets, including: When a workload requires a secret, such as a key to decrypt a container image or model, guest components collect hardware evidence from the active CPU and GPU enclaves. The evidence is sent to a remote verifier, Trustee, which evaluates the evidence against configured policies and conditionally releases the secret. +Trustee is typically deployed in a separate trusted environment that is reachable from your worker nodes over the network. + +.. note:: -For background on how attestation fits into the Confidential Containers architecture, refer to the :doc:`NVIDIA Confidential Containers Reference Architecture overview `. + This page is an educational overview of attestation with Confidential Containers, not a complete configuration guide. + The attestation workflow is fully documented in the upstream `Confidential Containers documentation `_, which is the source of truth for setup and configuration details. + Attestation is not required to deploy Confidential Containers, but is required for features that rely on secrets in your cluster. + +************* Prerequisites -============= +************* * A Kubernetes cluster configured to deploy Confidential Containers workloads. Refer to the :doc:`deployment guide ` for configuration steps. @@ -50,8 +58,9 @@ Prerequisites Trustee does not require Confidential Computing hardware or a GPU. * Network connectivity from the worker nodes in your Kubernetes cluster to the Trustee instance. +********************** Configuration Workflow -====================== +********************** After you meet the prerequisites, complete the following steps to enable attestation: @@ -63,14 +72,14 @@ After configuration, the Confidential Containers runtime automatically runs the .. _provision-trustee: +***************** Provision Trustee -================= +***************** Trustee is an open-source framework used in Confidential Containers to verify attestation evidence and conditionally release secrets. For a full overview of attestation with Trustee, refer to the upstream `Trustee documentation `_. -To provision a Trustee instance, follow the upstream `Install Trustee in Docker `_ guide. -This is the recommended install method. +To provision a Trustee instance, follow the recommended upstream `Install Trustee in Docker `_ guide. .. note:: @@ -83,45 +92,50 @@ After you complete installation, Trustee is configured to use the NVIDIA Remote .. _configure-workloads-trustee: +*********************************** Configure Workloads for Attestation -==================================== +*********************************** -To enable attestation for your workloads, point them to the Trustee network endpoint, sometimes referred to as the Key Broker Service (KBS) endpoint, by adding the following annotation to your workload pod spec: +To enable attestation for your workloads, point them to the Trustee network endpoint, also called the Key Broker Service (KBS) endpoint, by adding the following annotation to your workload pod spec: .. code-block:: yaml io.katacontainers.config.hypervisor.kernel_params: "agent.aa_kbc_params=cc_kbc::http://:" -Replace ```` with the IP address or hostname at which your Trustee instance is reachable from the worker nodes, and ```` with the port (default: ``8080``). +Replace ```` with the IP address or hostname at which your Trustee instance is reachable from the worker nodes. +Replace ```` with the port that Trustee listens on (default: ``8080``). Refer to the upstream `Setup Confidential Containers `_ documentation for more information on configuring workloads for attestation. .. _customize-attestation: -Customize Attestation Workflows -=============================== +***************************************** +Optional: Customize Attestation Workflows +***************************************** + +Confidential Containers enables sensible default attestation policies for NVIDIA Confidential Computing GPUs. +In most cases, the default policy is appropriate and you only need to provide reference values. +For more information, refer to the upstream `Confidential Containers reference values `_ documentation. + +You can use the Key Broker Service (KBS) Client Tool to configure Trustee reference values and secrets. +Refer to the upstream documentation on `using the KBS Client Tool `_. -After Trustee is provisioned and workloads are configured, you can customize attestation workflows to enforce your desired security policies. -This can include configuring the following: +For more advanced customization, refer to the following upstream Confidential Containers documentation: -* KBS Client Tool: Configure Trustee resources and secrets by using the Key Broker Service (KBS) Client Tool. - Refer to the upstream documentation on `using the KBS Client Tool `_. -* Configure resources: Create resources, or secrets, that your workloads need. - Refer to the upstream `Confidential Containers resources `_ documentation for more information on the resources. -* Configure policies: Confidential Containers uses different policy types to secure workload at different layers. - Refer to the upstream `Confidential Containers policy `_ documentation for more information on the policy types and configuring policies. - -Refer to the upstream `Confidential Containers Features `_ documentation for a full list of attestation features and how to configure them. +* `Resources `_: Create the resources, such as secrets, that your workloads need. +* `Policies `_: Configure the policy types that secure workloads at different layers. +* `Features `_: Browse the full list of attestation features and how to configure them. +*************** Troubleshooting -=============== +*************** If attestation does not succeed after provisioning Trustee, enable debug logging by setting the ``RUST_LOG=debug`` environment variable in the Trustee environment. Use the Trustee log to diagnose the attestation process. +********** Next Steps -========== +********** -* Refer to the :doc:`deployment guide ` for Confidential Containers setup instructions. -* Refer to the upstream `Confidential Containers Features `_ documentation for a complete list of attestation-dependent features. -* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information. +* Refer to the upstream `Confidential Containers Features `_ for complete documentation on attestation features. +* If you haven't already, refer to the :doc:`Confidential Containers deployment guide ` to configure your environment for confidential workloads. diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst index 2b150619a..b71162738 100644 --- a/confidential-containers/confidential-containers-deploy.rst +++ b/confidential-containers/confidential-containers-deploy.rst @@ -19,9 +19,9 @@ .. _confidential-containers-deploy: -****************************** +############################## Deploy Confidential Containers -****************************** +############################## This page describes deploying Kata Containers and the NVIDIA GPU Operator. These are key pieces of the NVIDIA Confidential Containers Reference Architecture used to manage GPU resources on your cluster and deploy workloads into Confidential Containers. @@ -32,9 +32,9 @@ This guide assumes you are familiar with the NVIDIA GPU Operator, Kata Container Refer to the :doc:`NVIDIA GPU Operator ` and `Kata Containers `_ documentation for more information on these software components. Refer to the `Kubernetes documentation `_ for more information on Kubernetes cluster administration. - +******** Overview -======== +******** The high-level workflow for configuring Confidential Containers is as follows: @@ -55,140 +55,13 @@ The Trustee attestation service is typically deployed on a separate, trusted env After configuration, you can schedule workloads that request GPU resources and use the ``kata-qemu-nvidia-gpu-tdx`` or ``kata-qemu-nvidia-gpu-snp`` runtime classes for secure deployment. -.. _coco-prerequisites: - -Prerequisites -============= - -Hardware and BIOS ------------------ - -* Use a supported platform configured for Confidential Computing. - For more information on machine setup, refer to :doc:`Supported Platforms `. - -* Ensure hosts are configured to enable hardware virtualization and Access Control Services (ACS). With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER). Enable these features in the host BIOS. - -* Configure hosts to support IOMMU. - You can check if your host is configured for IOMMU by running the following command: - - .. code-block:: console - - $ ls /sys/kernel/iommu_groups - - If the output of this command includes 0, 1, and so on, then your host is configured for IOMMU. - - If the host is not configured or if you are unsure, add the ``amd_iommu=on`` Linux kernel command-line argument for AMD CPUs, or ``intel_iommu=on`` for Intel CPUs. For most Linux distributions, add the argument to the ``/etc/default/grub`` file, for instance: - - .. code-block:: console - - ... - GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on modprobe.blacklist=nouveau" - ... - - After making the change, configure the bootloader. - - .. code-block:: console - - $ sudo update-grub - - *Example Output:* - - .. code-block:: output - - Sourcing file `/etc/default/grub' - Generating grub configuration file ... - Found linux image: /boot/vmlinuz-5.15.0-generic - Found initrd image: /boot/initrd.img-5.15.0-generic - done - - Reboot the host after configuring the bootloader. - - .. note:: - - After configuring IOMMU, you might see QEMU warnings about PCI P2P DMA when running GPU workloads. - These are expected and can be safely ignored. - Refer to :ref:`coco-limitations` for details. - -* Ensure that no NVIDIA GPU drivers are installed on the host. - Confidential Containers uses VFIO to pass GPUs directly to the confidential VM, and host-level GPU drivers interfere with VFIO device binding. - - To check if NVIDIA GPU drivers are installed, run the following command: - - .. code-block:: console - - $ lsmod | grep nvidia - - If the output is empty, no NVIDIA GPU drivers are loaded. - If modules such as ``nvidia``, ``nvidia_uvm``, or ``nvidia_modeset`` are listed, NVIDIA GPU drivers are present and must be removed before proceeding. - Refer to `Removing the Driver `_ in the NVIDIA Driver Installation Guide. - -Kubernetes Cluster ------------------- - -* A Kubernetes cluster with cluster administrator privileges. - Refer to the :ref:`Supported Software Components ` table for supported Kubernetes versions. - -* containerd version 2.2.2 installed. - Refer to the `containerd Getting Started guide `_ for installation instructions. - - To verify the installed version, run the following command: - - .. code-block:: console - - $ containerd --version - - *Example Output:* - - .. code-block:: output - - containerd containerd.io 2.2.2 ... - -* Helm installed. - Use the command below to install Helm or refer to the `Helm documentation `_ for installation instructions. - - .. code-block:: console - - $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \ - && chmod 700 get_helm.sh \ - && ./get_helm.sh - - -* Enable the ``KubeletPodResourcesGet`` and ``RuntimeClassInImageCriApi`` Kubelet feature gates on your cluster. - - * ``KubeletPodResourcesGet``: Enabled by default on Kubernetes v1.34 and later. - On older versions, you must enable it explicitly. - The Kata runtime uses this feature gate to query the Kubelet Pod Resources API and discover allocated GPU devices during sandbox creation. - - * ``RuntimeClassInImageCriApi``: Alpha since Kubernetes v1.29 and is not enabled by default. - This feature gate is required to support pod deployments that use multiple snapshotters side-by-side. - - Add both feature gates to your Kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): - - .. code-block:: yaml - - apiVersion: kubelet.config.k8s.io/v1beta1 - kind: KubeletConfiguration - featureGates: - KubeletPodResourcesGet: true - RuntimeClassInImageCriApi: true - - If your ``config.yaml`` already has a ``featureGates`` section, add the gates to the existing section rather than creating a duplicate. - - Restart the Kubelet service to apply the changes: - - .. code-block:: console - - $ sudo systemctl restart kubelet - .. _installation-and-configuration: -Installation -============ - .. _coco-label-nodes: +*********** Label Nodes -=========== +*********** #. Get a list of the nodes in your cluster: @@ -247,8 +120,9 @@ After labeling the node, you can continue to the next steps to install Kata Cont .. _coco-install-kata-chart: +************************************** Install the Kata Containers Helm Chart -====================================== +************************************** Install Kata Containers using the ``kata-deploy`` Helm chart. The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers. @@ -342,8 +216,9 @@ The minimum required version is 3.29.0. .. _coco-install-gpu-operator: +******************************* Install the NVIDIA GPU Operator -================================ +******************************* Install the NVIDIA GPU Operator and configure it to deploy Confidential Container components. @@ -420,6 +295,7 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe .. note:: It can take several minutes for all GPU Operator pods to be in the Running state. If you are not seeing the expected output, you can view the logs for the GPU Operator pods: + .. code-block:: console $ kubectl logs -n gpu-operator @@ -449,8 +325,8 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe .. _coco-configuration-settings: -Optional: Confidential Containers Configuration Settings --------------------------------------------------------- +Common GPU Operator Configuration Settings +========================================== The following are the available GPU Operator configuration settings to enable Confidential Containers: @@ -478,308 +354,73 @@ The following are the available GPU Operator configuration settings to enable Co Accepted values are ``kubevirt`` (default) and ``kata``. - ``kubevirt`` - * - ``sandboxDevicePlugin.env`` - - Optional list of environment variables passed to the NVIDIA Sandbox + * - ``kataSandboxDevicePlugin.env`` + - Optional list of environment variables passed to the NVIDIA Kata Device Plugin pod. Each list item is an ``EnvVar`` object with required ``name`` and optional ``value`` fields. + Use the setting to configure ``P_GPU_ALIAS`` or ``NVSWITCH_ALIAS`` for the Kata sandbox device plugin. + Refer to the :ref:`Configuring GPU or NVSwitch Resource Types Name ` section for more details. - ``[]`` (empty list) .. _coco-configuration-heterogeneous-clusters: -Optional: Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ----------------------------------------------------------------------------------------------- +Configuring GPU or NVSwitch Resource Types Name +=============================================== -By default, the NVIDIA GPU Operator creates a single resource type for GPUs, ``nvidia.com/pgpu``. -In clusters where all GPUs are the same model, a single resource type is sufficient. +By default, the NVIDIA GPU Operator creates a resource type for GPUs and NVSwitches, ``nvidia.com/pgpu`` and ``nvidia.com/nvswitch``. +You can reference this name in your manifests to request GPU or NVSwitch resources for your workload. +If you want to use a different name, you can set the ``P_GPU_ALIAS`` or ``NVSWITCH_ALIAS`` environment variables in the Kata device plugin to your preferred name. +In clusters where all GPUs are the same model, a single resource type is typically sufficient. In heterogeneous clusters, where you have different GPU types on your nodes, you might want to use specific GPU types for your workload. -To do this, specify an empty ``P_GPU_ALIAS`` environment variable in the sandbox device plugin by adding the following to your GPU Operator installation: -``--set sandboxDevicePlugin.env[0].name=P_GPU_ALIAS`` and -``--set sandboxDevicePlugin.env[0].value=""``. +To do this, specify an empty ``P_GPU_ALIAS`` environment variable in the Kata sandbox device plugin by adding the following to your GPU Operator installation: +``--set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS`` and +``--set kataSandboxDevicePlugin.env[0].value=""``. -When this variable is set to ``""``, the sandbox device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H100L_94GB``, instead of the default ``nvidia.com/pgpu`` type. +When this variable is set to ``""``, the Kata device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H200_141GB``, instead of the default ``nvidia.com/pgpu`` type. Use the exposed device resource types in pod specs by specifying respective resource limits. -Similarly, NVSwitches are exposed as resources of type ``nvidia.com/nvswitch`` by default. -You can include ``--set sandboxDevicePlugin.env[0].name=NVSWITCH_ALIAS`` and -``--set sandboxDevicePlugin.env[0].value=""`` for the device plugin environment variable when installing the GPU Operator to configure advertising behavior similar to ``P_GPU_ALIAS``. - -.. _coco-run-sample-workload: - -Run a Sample Workload -===================== - -A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX. - -1. Create a file, such as the following ``cuda-vectoradd-kata.yaml`` sample, specifying the kata-qemu-nvidia-gpu-snp runtime class: - - .. code-block:: yaml - :emphasize-lines: 7,14 - - apiVersion: v1 - kind: Pod - metadata: - name: cuda-vectoradd-kata - namespace: default - spec: - runtimeClassName: kata-qemu-nvidia-gpu-snp - restartPolicy: Never - containers: - - name: cuda-vectoradd - image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" - resources: - limits: - nvidia.com/pgpu: "1" - memory: 16Gi - - The following are Confidential Containers configurations in the sample manifest: - - * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type where the workloads should run. +Similarly, you can set ``NVSWITCH_ALIAS`` to ``""`` to advertise model-specific NVSwitch resource types. - * In the sample above, ``nvidia.com/pgpu`` is the default resource type for GPUs. - If you are deploying on a heterogeneous cluster, you might want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the sandbox device plugin. - Refer to the :ref:`Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ` section on this page for more details. - - * If you have machines that support multi-GPU passthrough, use a pod deployment manifest that specifies 8 PGPU and 4 NVSwitch resources. - - .. code-block:: yaml - - limits: - nvidia.com/pgpu: "8" - nvidia.com/nvswitch: "4" - - .. note:: - If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, also refer to :ref:`Managing the Confidential Computing Mode ` for details on how to set the ``ppcie`` mode. - - -2. Create the pod: - - .. code-block:: console - - $ kubectl apply -f cuda-vectoradd-kata.yaml - - *Example Output:* - - .. code-block:: output - - pod/cuda-vectoradd-kata created - - - Optional: Verify the pod is running. - - .. code-block:: console - - $ kubectl get pod cuda-vectoradd-kata - - *Example Output:* - - .. code-block:: output - - NAME READY STATUS RESTARTS AGE - cuda-vectoradd-kata 1/1 Running 0 10s - -3. View the logs from the pod after the container starts: - - .. code-block:: console - - $ kubectl logs -n default cuda-vectoradd-kata - - *Example Output:* - - .. code-block:: output - - [Vector addition of 50000 elements] - Copy input data from the host memory to the CUDA device - CUDA kernel launch with 196 blocks of 256 threads - Copy output data from the CUDA device to the host memory - Test PASSED - Done - -4. Delete the pod: - - .. code-block:: console - - $ kubectl delete -f cuda-vectoradd-kata.yaml - - -.. _managing-confidential-computing-mode: - -Managing the Confidential Computing Mode -========================================= - -You can set the default confidential computing mode of the NVIDIA GPUs by setting the ``ccManager.defaultMode=`` option. -The default value of ``ccManager.defaultMode`` is ``on``. -You can set this option when you install NVIDIA GPU Operator or afterward by modifying the cluster-policy instance of the ClusterPolicy object. - -When you change the mode, the manager performs the following actions: - -* Evicts the other GPU Operator operands from the node. - - However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode. - -* Unbinds the GPU from the VFIO PCI device driver. -* Changes the mode and resets the GPU. -* Reschedules the other GPU Operator operands. - -The supported modes are: - -.. list-table:: - :widths: 15 55 30 - :header-rows: 1 - - * - Mode - - Description - - Configuration Method - * - ``on`` - - Enable Confidential Computing. - - cluster-wide default, node-level override - * - ``off`` - - Disable Confidential Computing. - - cluster-wide default, node-level override - * - ``ppcie`` - - Enable Confidential Computing on NVIDIA Hopper GPUs. - - On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe (PPCIE) - which claims exclusive use of the NVSwitches for a single Confidential Container - virtual machine. - If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, - set the GPU mode to ``ppcie`` mode. - - The NVIDIA Blackwell architecture uses NVLink - encryption which places the switches outside of the Trusted Computing Base (TCB), - meaning the ``ppcie`` mode is not required. Use ``on`` mode in this case. - - node-level override - -You can set a cluster-wide default mode, and you can set the mode on individual nodes. -The mode that you set on a node has higher precedence than the cluster-wide default mode. - -Setting a Cluster-Wide Default Mode ------------------------------------- - -To set a cluster-wide mode, specify the ``ccManager.defaultMode`` field like the following example: +The following example installs the GPU Operator with both ``P_GPU_ALIAS`` and ``NVSWITCH_ALIAS`` configured: .. code-block:: console - $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \ - --type=merge \ - -p '{"spec": {"ccManager": {"defaultMode": "on"}}}' - -*Example Output:* - -.. code-block:: output - - clusterpolicy.nvidia.com/cluster-policy patched - -.. note:: - - The ``ppcie`` mode cannot be set as a cluster-wide default, it can only be set as a node label value. - -Setting a Node-Level Mode --------------------------- - -To set a node-level mode, apply the ``nvidia.com/cc.mode=`` label on the node. - -.. note:: - - The ``NODE_NAME`` environment variable was set in the :ref:`Label Nodes ` section. - If you want to set the mode for a different node, you can update the ``NODE_NAME`` environment variable and run the command again. + $ helm install --wait --timeout 10m --generate-name \ + -n gpu-operator --create-namespace \ + nvidia/gpu-operator \ + --set sandboxWorkloads.enabled=true \ + --set sandboxWorkloads.mode=kata \ + --set nfd.enabled=true \ + --set nfd.nodefeaturerules=true \ + --set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS \ + --set kataSandboxDevicePlugin.env[0].value="" \ + --set kataSandboxDevicePlugin.env[1].name=NVSWITCH_ALIAS \ + --set kataSandboxDevicePlugin.env[1].value="" \ + --version=v26.3.1 + +After installing the GPU Operator, you can view the GPU or NVSwitch resource types available on a node by running the following command: .. code-block:: console - $ kubectl label node $NODE_NAME nvidia.com/cc.mode=on --overwrite - -The mode that you set on a node has higher precedence than the cluster-wide default mode. - -Verifying a Mode Change ------------------------- - -To verify that a mode change was successful, view the ``nvidia.com/cc.mode``, -``nvidia.com/cc.mode.state``, and ``nvidia.com/cc.ready.state`` node labels: - -.. code-block:: console - - $ kubectl get node $NODE_NAME -o json | \ - jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/cc")))' - -*Example Output (CC mode disabled):* - -.. code-block:: json - - { - "nvidia.com/cc.mode": "off", - "nvidia.com/cc.mode.state": "off", - "nvidia.com/cc.ready.state": "false" - } - -*Example Output (CC mode enabled):* - -.. code-block:: json - - { - "nvidia.com/cc.mode": "on", - "nvidia.com/cc.mode.state": "on", - "nvidia.com/cc.ready.state": "true" - } - -* The ``nvidia.com/cc.mode`` label is the desired state. - -* The ``nvidia.com/cc.mode.state`` label reflects the mode that was last successfully applied to the GPU hardware by the Confidential Computing Manager. - Its value mirrors the applied mode ``on``, ``off``, or ``ppcie``, after the transition is complete on the node. - A value of ``failed`` indicates that the last mode transition encountered an error. - -* The ``nvidia.com/cc.ready.state`` label indicates whether the node is ready to run Confidential Container workloads. - It is set to ``true`` when ``cc.mode.state`` is ``on`` or ``ppcie``, and ``false`` when ``cc.mode.state`` is ``off``. + $ kubectl get node $NODE_NAME -o json | grep nvidia.com .. note:: - It can take one to two minutes for GPU state transitions to complete and the labels to be updated. - A mode change is complete and successful when ``nvidia.com/cc.mode`` and - ``nvidia.com/cc.mode.state`` have the same value. - - -Configuring Multi-GPU Passthrough Support -========================================= - -To configure multi-GPU passthrough, you can specify the following resource limits in your manifests: - -.. code-block:: yaml - - limits: - nvidia.com/pgpu: "8" - nvidia.com/nvswitch: "4" - - -You must assign all the GPUs and NVSwitches on the node in your manifest to the same Confidential Container virtual machine. - -On the NVIDIA Hopper architecture, multi-GPU passthrough uses protected PCIe (PPCIE), which claims exclusive use of the NVSwitches for a single Confidential Container. -When using NVIDIA Hopper nodes for multi-GPU passthrough, transition your node's GPU Confidential Computing mode to ``ppcie`` by applying the ``nvidia.com/cc.mode=ppcie`` label. -Refer to the :ref:`Managing the Confidential Computing Mode ` section for details. - -The NVIDIA Blackwell architecture uses NVLink encryption which places the switches outside of the Trusted Computing Base (TCB) and only requires the GPU Confidential Computing mode to be set to ``on``. - - -.. _configure-image-pull-timeouts: - -Configure Image Pull Timeouts -============================= - -The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start. -Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. - -Configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration `_ with a higher timeout value than the two-minute default. -Consider setting this value to 20 minutes (``20m``) to match the default values for the NVIDIA shim configurations in Kata Containers ``create_container_timeout`` and the agent's ``image_pull_timeout``. + The ``NODE_NAME`` environment variable was set in the :ref:`Label Nodes ` section. + If you want to view the resource types for a different node, you can update the ``NODE_NAME`` environment variable and run the command again. -The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes). -This controls the time the shim allows for a container to remain in container creating state. +*Example Output:* -If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull. -To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation. +.. code-block:: output + "nvidia.com/GH100_H200_141GB": "1" +********** Next Steps -========== +********** * :doc:`Run a Sample Workload ` to verify your deployment. +* :doc:`Configure ` additional options for your environment, including attestation, the confidential computing mode, and :ref:`multi-GPU passthrough `. * To help manage the lifecycle of Kata Containers, install the `Kata Lifecycle Manager `_. This Argo Workflows-based tool manages Kata Containers upgrades and day-two operations. -* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information. \ No newline at end of file diff --git a/confidential-containers/configure-cc-mode.rst b/confidential-containers/configure-cc-mode.rst index e730b6855..d7d1407b2 100644 --- a/confidential-containers/configure-cc-mode.rst +++ b/confidential-containers/configure-cc-mode.rst @@ -19,9 +19,9 @@ .. _managing-confidential-computing-mode: -***************************************** +######################################### Managing the Confidential Computing Mode -***************************************** +######################################### You can set the default confidential computing mode of the NVIDIA GPUs by setting the ``ccManager.defaultMode=`` option. The default value of ``ccManager.defaultMode`` is ``on``. @@ -46,7 +46,7 @@ The supported modes are: * - Mode - Description - Configuration Method - * - ``on`` + * - ``on`` (default) - Enable Confidential Computing. - cluster-wide default, node-level override * - ``off`` @@ -55,9 +55,9 @@ The supported modes are: * - ``ppcie`` - Enable Confidential Computing on NVIDIA Hopper GPUs. - On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe (PPCIE) - which claims exclusive use of the NVSwitches for a single Confidential Container - virtual machine. + On the NVIDIA Hopper architecture :ref:`multi-GPU passthrough ` + uses protected PCIe (PPCIE) which claims exclusive use of the NVSwitches for a single + Confidential Container virtual machine. If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, set the GPU mode to ``ppcie`` mode. @@ -69,8 +69,9 @@ The supported modes are: You can set a cluster-wide default mode, and you can set the mode on individual nodes. The mode that you set on a node has higher precedence than the cluster-wide default mode. +*********************************** Setting a Cluster-Wide Default Mode -==================================== +*********************************** To set a cluster-wide mode, specify the ``ccManager.defaultMode`` field like the following example: @@ -90,8 +91,9 @@ To set a cluster-wide mode, specify the ``ccManager.defaultMode`` field like the The ``ppcie`` mode cannot be set as a cluster-wide default, it can only be set as a node label value. +************************* Setting a Node-Level Mode -========================== +************************* To set a node-level mode, apply the ``nvidia.com/cc.mode=`` label on the node. @@ -109,8 +111,9 @@ Then apply the label: The mode that you set on a node has higher precedence than the cluster-wide default mode. +*********************** Verifying a Mode Change -======================== +*********************** To verify that a mode change was successful, view the ``nvidia.com/cc.mode``, ``nvidia.com/cc.mode.state``, and ``nvidia.com/cc.ready.state`` node labels: diff --git a/confidential-containers/configure-image-pull-timeouts.rst b/confidential-containers/configure-image-pull-timeouts.rst deleted file mode 100644 index 8e9b2dac1..000000000 --- a/confidential-containers/configure-image-pull-timeouts.rst +++ /dev/null @@ -1,109 +0,0 @@ -.. license-header - SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. - SPDX-License-Identifier: Apache-2.0 - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. - -.. headings # #, * *, =, -, ^, " - - -.. _configure-image-pull-timeouts: - -***************************** -Configure Image Pull Timeouts -***************************** - -The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start. -Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. - -The timeout chain has three components that you might need to configure: - -* **Kubelet** ``runtimeRequestTimeout``: Controls how long kubelet waits for the container runtime to respond. Default: ``2m``. -* **Kata shim** ``create_container_timeout``: Controls how long the NVIDIA shim allows a container to remain in container creating state. Default: ``1200s`` (20 minutes). -* **Kata Agent** ``image_pull_timeout``: Controls the agent-side timeout for guest-image pull. Default: ``1200s`` (20 minutes). - -Configure the Kubelet Timeout -============================== - -Configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration `_ with a higher timeout value than the two-minute default. -Set this value to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers. - -Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): - -.. code-block:: yaml - :emphasize-lines: 3 - - apiVersion: kubelet.config.k8s.io/v1beta1 - kind: KubeletConfiguration - runtimeRequestTimeout: 20m - -Restart the kubelet service to apply the change: - -.. code-block:: console - - $ sudo systemctl restart kubelet - -Configure Timeouts Beyond 20 Minutes -====================================== - -If you need a timeout of more than 1200 seconds (20 minutes), you must also adjust the Kata Agent Policy's ``image_pull_timeout`` value. - -You can set this value either through a pod annotation or by modifying the shim configuration. - -Using a Pod Annotation ------------------------ - -Add the ``io.katacontainers.config.hypervisor.kernel_params`` annotation to your pod manifest with the desired ``agent.image_pull_timeout`` value in seconds: - -.. code-block:: yaml - :emphasize-lines: 7 - - apiVersion: v1 - kind: Pod - metadata: - name: large-model-kata - namespace: default - annotations: - io.katacontainers.config.hypervisor.kernel_params: "agent.image_pull_timeout=1800" - spec: - runtimeClassName: kata-qemu-nvidia-gpu-snp - restartPolicy: Never - containers: - - name: model-server - image: "nvcr.io/nvidia/example-large-model:latest" - resources: - limits: - nvidia.com/pgpu: "1" - memory: 64Gi - -In this example, ``agent.image_pull_timeout=1800`` sets the agent-side timeout to 30 minutes (1800 seconds). - -Using the Shim Configuration ------------------------------ - -To set the timeout globally, add the ``agent.image_pull_timeout`` kernel parameter to your Kata shim configuration file. -The shim configuration files are located in ``/opt/kata/share/defaults/kata-containers/`` on the worker nodes. - -Add the parameter to the ``kernel_params`` field in the ``[hypervisor.qemu]`` section: - -.. code-block:: toml - :emphasize-lines: 2 - - [hypervisor.qemu] - kernel_params = "agent.image_pull_timeout=1800" - -.. note:: - - When setting timeouts beyond 20 minutes, ensure that all three timeout values in the chain are consistent: - the kubelet ``runtimeRequestTimeout``, the Kata shim ``create_container_timeout``, and the - agent ``image_pull_timeout`` should all be set to accommodate the expected image pull duration. diff --git a/confidential-containers/configure-multi-gpu.rst b/confidential-containers/configure-multi-gpu.rst index edc27d513..a1ec76d04 100644 --- a/confidential-containers/configure-multi-gpu.rst +++ b/confidential-containers/configure-multi-gpu.rst @@ -17,59 +17,218 @@ .. headings # #, * *, =, -, ^, " -.. _coco-multi-gpu-passthrough: +.. _coco-configure-workloads: + +############################################ +Configuring Confidential Container Workloads +############################################ + +A Confidential Container workload is a standard Kubernetes pod that runs inside a TEE-protected +virtual machine and requests one or more GPUs through the NVIDIA Kata sandbox device plugin. +Compared with a traditional GPU pod, a Confidential Container workload pod manifest differs in +three ways: + +* It selects a TEE-aware Kata runtime class instead of the default ``runc``-based runtime. +* It requests GPU and NVSwitch resources using the resource types advertised by the NVIDIA + Kata sandbox device plugin, which can be either default names or model-specific names. +* For NVSwitch-based HGX systems, it requests every GPU and NVSwitch on the node together so + that all devices reside inside the same Confidential Container virtual machine. + +This page describes each of these decisions and provides single-GPU and multi-GPU passthrough +manifest examples that you can copy and adapt to your environment. + +Before beginning, you should configure your cluster to deploy Confidential Containers workloads using the :doc:`Confidential Containers deployment ` steps. + +******************************** +Select a Container Runtime Class +******************************** + +A Confidential Container workload must set ``spec.runtimeClassName`` to a TEE-aware Kata +runtime that NVIDIA provides through the ``kata-deploy`` Helm chart. +Select the runtime class based on the CPU TEE on the target worker node: + +.. list-table:: + :header-rows: 1 + :widths: 30 40 30 + + * - Node TEE + - Runtime class + - Typical CPU vendor + * - AMD SEV-SNP + - ``kata-qemu-nvidia-gpu-snp`` + - AMD EPYC (Genoa or newer) + * - Intel TDX + - ``kata-qemu-nvidia-gpu-tdx`` + - Intel Xeon (Sapphire Rapids or newer) + +The ``kata-deploy`` chart also installs a ``kata-qemu-nvidia-gpu`` runtime class. +That class is intended for non-confidential Kata workloads. You should not use it for Confidential +Container workloads because it does not start the GPU in CC mode. + +.. _coco-resource-types: ***************************************** -Configuring Multi-GPU Passthrough Support +Reference GPU and NVSwitch Resource Types ***************************************** -Multi-GPU passthrough assigns all GPUs and NVSwitches on a node to a single Confidential Container virtual machine. -This configuration is required for NVSwitch (NVLink) based HGX systems running confidential workloads. +The NVIDIA Kata sandbox device plugin advertises GPUs and NVSwitches to Kubernetes as extended resources. +Your pod manifest requests those resources under ``resources.limits``. +You can use either the default resource types or model-specific resource types. -You must assign all the GPUs and NVSwitches on the node to the same Confidential Container virtual machine. -Configuring only a subset of GPUs for Confidential Computing on a single node is not supported. +By default, every passthrough GPU is advertised as ``nvidia.com/pgpu`` and every NVSwitch is advertised as ``nvidia.com/nvswitch``. +These names are stable across GPU models, which keeps manifests portable when every node in your cluster has the same GPU type. -Prerequisites -============= +A sample resource request using the default resource type is shown below: -* Complete the :doc:`Confidential Containers deployment ` steps. -* Verify that your node has multi-GPU hardware (NVSwitch-based HGX system). +.. code-block:: yaml -Set the Confidential Computing Mode -==================================== + resources: + limits: + nvidia.com/pgpu: "1" -The required CC mode depends on your GPU architecture. +In heterogeneous clusters, where worker nodes use different GPU models, you can configure the Kata sandbox device plugin to advertise resources under model-specific names by setting +``P_GPU_ALIAS=""`` (and optionally ``NVSWITCH_ALIAS=""``) on the plugin. +With this configuration, GPUs are exposed as resources such as ``nvidia.com/GH100_H200_141GB``, +which lets a workload pin itself to a specific accelerator model. -Set the ``NODE_NAME`` environment variable to the name of the node you want to configure: +Refer to :ref:`Configuring GPU or NVSwitch Resource Types Name ` +for the GPU Operator install flags that enable this behavior. -.. code-block:: console +Use the model-specific resource name in workloads that must target a specific accelerator: - $ export NODE_NAME="" +.. code-block:: yaml -**NVIDIA Hopper architecture:** + resources: + limits: + nvidia.com/GH100_H200_141GB: "1" -Multi-GPU passthrough on Hopper uses protected PCIe (PPCIE), which claims exclusive use of the NVSwitches for a single Confidential Container. -Set the node's CC mode to ``ppcie``: +To list the GPU and NVSwitch resource types advertised on a node, run: .. code-block:: console - $ kubectl label node $NODE_NAME nvidia.com/cc.mode=ppcie --overwrite + $ kubectl get node $NODE_NAME -o json | grep nvidia.com -**NVIDIA Blackwell architecture:** +*Example Output:* -The Blackwell architecture uses NVLink encryption which places the switches outside of the Trusted Computing Base (TCB). -The ``ppcie`` mode is not required. Use ``on`` mode: +.. code-block:: output -.. code-block:: console + "nvidia.com/GH100_H200_141GB": "1" + +.. _coco-single-gpu-workload: + +********************** +Single-GPU Passthrough +********************** + +A single-GPU workload requests one GPU and runs inside its own Confidential Container virtual +machine. +This pattern is the recommended starting point for verifying a deployment and for most +independent workloads that do not require NVLink between GPUs. + +#. Create a file, such as ``cuda-vectoradd-kata.yaml``: + + .. code-block:: yaml + :emphasize-lines: 7,14 + + apiVersion: v1 + kind: Pod + metadata: + name: cuda-vectoradd-kata + namespace: default + spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx + restartPolicy: Never + containers: + - name: cuda-vectoradd + image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" + resources: + limits: + nvidia.com/pgpu: "1" + memory: 16Gi + + .. note:: - $ kubectl label node $NODE_NAME nvidia.com/cc.mode=on --overwrite + If you configured the Kata sandbox device plugin to use model-specific resource types, + replace ``nvidia.com/pgpu`` with the appropriate model-specific name, for example + ``nvidia.com/GH100_H200_141GB``. -Refer to :doc:`Managing the Confidential Computing Mode ` for details on verifying the mode change. +#. Create the pod: + + .. code-block:: console + + $ kubectl apply -f cuda-vectoradd-kata.yaml + +#. Verify the workload completes successfully: + + .. code-block:: console + + $ kubectl logs cuda-vectoradd-kata + + *Example Output:* + + .. code-block:: output + + [Vector addition of 50000 elements] + Copy input data from the host memory to the CUDA device + CUDA kernel launch with 196 blocks of 256 threads + Copy output data from the CUDA device to the host memory + Test PASSED + Done + +Refer to :doc:`run-sample-workload` for the end-to-end verification flow including +deletion and troubleshooting tips. + +.. _coco-multi-gpu-prereqs: +.. _coco-multi-gpu-passthrough: + +********************* +Multi-GPU Passthrough +********************* + +Multi-GPU passthrough assigns every GPU and NVSwitch on a node to a single Confidential +Container virtual machine. +This configuration is required for NVSwitch (NVLink) based HGX systems running confidential +workloads. + +.. important:: + + You must assign all the GPUs and NVSwitches on the node to the same Confidential Container + virtual machine. + Configuring only a subset of GPUs for Confidential Computing on a single node is not + supported. + +NVIDIA Hopper PPCIE Mode +======================== + +For NVIDIA Hopper GPUs, multi-GPU passthrough requires protected PCIe (PPCIE) mode, which +claims exclusive use of the NVSwitches for a single Confidential Container. +The NVIDIA Confidential Computing Manager for Kubernetes transitions GPUs into the correct +mode based on the ``cc.mode`` label that you set. + +#. Set the ``NODE_NAME`` environment variable to the node you want to configure: + + .. code-block:: console + + $ export NODE_NAME="" + +#. Apply the ``ppcie`` CC mode label to the node: + + .. code-block:: console + + $ kubectl label node $NODE_NAME nvidia.com/cc.mode=ppcie --overwrite + +Refer to :doc:`Managing the Confidential Computing Mode ` for full details +on setting the CC mode and verifying the change. + +NVIDIA Blackwell GPUs use NVLink encryption, which places the switches outside of the +Trusted Computing Base (TCB), so the default CC mode of ``on`` is sufficient and no additional +configuration is required. Run a Multi-GPU Workload ======================== -1. Create a file, such as ``multi-gpu-kata.yaml``, with a pod manifest that requests all GPUs and NVSwitches on the node: +#. Create a file, such as ``multi-gpu-kata.yaml``, with a pod manifest that requests every GPU + and NVSwitch on the node: .. code-block:: yaml :emphasize-lines: 7,14-16 @@ -80,7 +239,7 @@ Run a Multi-GPU Workload name: multi-gpu-kata namespace: default spec: - runtimeClassName: kata-qemu-nvidia-gpu-snp + runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx restartPolicy: Never containers: - name: cuda-sample @@ -88,17 +247,18 @@ Run a Multi-GPU Workload resources: limits: nvidia.com/pgpu: "8" - nvidia.com/nvswitch: "4" + nvidia.com/nvswitch: "4" # Only for NVIDIA Hopper GPUs with PPCIE mode memory: 128Gi - Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type. - .. note:: - If you configured ``P_GPU_ALIAS`` for heterogeneous clusters, replace ``nvidia.com/pgpu`` with the model-specific resource type. - Refer to :ref:`Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ` for details. + If you configured ``P_GPU_ALIAS`` or ``NVSWITCH_ALIAS`` for heterogeneous clusters, + replace ``nvidia.com/pgpu`` and ``nvidia.com/nvswitch`` with the corresponding + model-specific resource types. + Refer to :ref:`Reference GPU and NVSwitch Resource Types ` + for details. -2. Create the pod: +#. Create the pod: .. code-block:: console @@ -110,7 +270,7 @@ Run a Multi-GPU Workload pod/multi-gpu-kata created -3. Verify the pod is running: +#. Verify the pod is running: .. code-block:: console @@ -123,7 +283,7 @@ Run a Multi-GPU Workload NAME READY STATUS RESTARTS AGE multi-gpu-kata 1/1 Running 0 30s -4. Verify that all GPUs are visible inside the container: +#. Verify that all GPUs are visible inside the container: .. code-block:: console @@ -142,7 +302,7 @@ Run a Multi-GPU Workload GPU 6: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) GPU 7: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) -5. Delete the pod: +#. Delete the pod: .. code-block:: console diff --git a/confidential-containers/configure.rst b/confidential-containers/configure.rst new file mode 100644 index 000000000..f02afa7e3 --- /dev/null +++ b/confidential-containers/configure.rst @@ -0,0 +1,57 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _configure-confidential-containers: + +################################# +Configure Confidential Containers +################################# + +After deploying Confidential Containers, you can configure additional options for your environment. +Use the cards below to navigate to a specific configuration topic. + + +.. grid:: 3 + :gutter: 3 + + .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation + :link: attestation + :link-type: doc + + Configure remote attestation, Trustee, and the NVIDIA verifier for GPU workloads. + + .. grid-item-card:: :octicon:`gear;1.5em;sd-mr-1` Managing the CC Mode + :link: configure-cc-mode + :link-type: doc + + Set the confidential computing mode on NVIDIA GPUs at the cluster or node level. + + .. grid-item-card:: :octicon:`cpu;1.5em;sd-mr-1` Configuring Workloads + :link: configure-multi-gpu + :link-type: doc + + Configure Confidential Container workloads, including runtime class selection, GPU and + NVSwitch resource types, and single- or multi-GPU passthrough. + + .. grid-item-card:: :octicon:`stack;1.5em;sd-mr-1` Multi-GPU Passthrough + :link: coco-multi-gpu-passthrough + :link-type: ref + + Assign every GPU and NVSwitch on a node to a single Confidential Container virtual + machine for NVSwitch-based HGX systems. diff --git a/confidential-containers/index.rst b/confidential-containers/index.rst index e3f522d7d..aa8ca9ade 100644 --- a/confidential-containers/index.rst +++ b/confidential-containers/index.rst @@ -16,9 +16,9 @@ .. headings # #, * *, =, -, ^, " -********************************************************** +########################################### NVIDIA Confidential Containers Architecture -********************************************************** +########################################### .. toctree:: :caption: NVIDIA Confidential Containers Architecture @@ -42,10 +42,10 @@ NVIDIA Confidential Containers Architecture :hidden: :titlesonly: - Managing the CC Mode - Multi-GPU Passthrough - Image Pull Timeouts + Configure Overview Attestation + Managing the Confidential Computing Mode + Configuring Workloads .. toctree:: :caption: Reference @@ -90,25 +90,7 @@ This is documentation for NVIDIA's implementation of Confidential Containers inc :link: run-sample-workload :link-type: doc - Verify your deployment by running a GPU workload in a confidential container. - - .. grid-item-card:: :octicon:`gear;1.5em;sd-mr-1` Managing the CC Mode - :link: configure-cc-mode - :link-type: doc - - Set the confidential computing mode on NVIDIA GPUs at cluster or node level. - - .. grid-item-card:: :octicon:`cpu;1.5em;sd-mr-1` Multi-GPU Passthrough - :link: configure-multi-gpu - :link-type: doc - - Configure multi-GPU passthrough for NVSwitch-based HGX systems. - - .. grid-item-card:: :octicon:`clock;1.5em;sd-mr-1` Image Pull Timeouts - :link: configure-image-pull-timeouts - :link-type: doc - - Tune image pull timeouts for large container images in confidential VMs. + Run a sample GPU workload in a confidential container. .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation :link: attestation @@ -116,15 +98,4 @@ This is documentation for NVIDIA's implementation of Confidential Containers inc Remote attestation, Trustee, and the NVIDIA verifier for GPU workloads. - .. grid-item-card:: :octicon:`note;1.5em;sd-mr-1` Release Notes - :link: release-notes - :link-type: doc - - New features and known issues for each release. - - .. grid-item-card:: :octicon:`law;1.5em;sd-mr-1` Licensing - :link: licensing - :link-type: doc - - Licensing information for Confidential Containers documentation. diff --git a/confidential-containers/licensing.rst b/confidential-containers/licensing.rst index 43d76fff9..d30207776 100644 --- a/confidential-containers/licensing.rst +++ b/confidential-containers/licensing.rst @@ -16,9 +16,9 @@ .. headings # #, * *, =, -, ^, " -********* +######### Licensing -********* +######### While the Confidential Containers (CoCo) Reference Architecture includes some components that are open source, the NVIDIA Confidential Computing capability is a licensed feature for production use cases. To use these products, you must have a valid NVIDIA Confidential Computing license. diff --git a/confidential-containers/overview.rst b/confidential-containers/overview.rst index 2b9646695..47de85d19 100644 --- a/confidential-containers/overview.rst +++ b/confidential-containers/overview.rst @@ -17,9 +17,9 @@ .. headings # #, * *, =, -, ^, " -***************************************************** +##################################################### NVIDIA Confidential Containers Reference Architecture -***************************************************** +##################################################### NVIDIA GPUs with Confidential Computing support provide the hardware foundation for running GPU workloads inside a hardware-enforced Trusted Execution Environment (TEE). The NVIDIA Confidential Containers Reference Architecture provides a validated deployment model for cluster administrators interested in leveraging NVIDIA GPU Confidential Computing capabilities on Kubernetes platforms. @@ -31,8 +31,9 @@ Refer to the `Confidential Containers .. _confidential-containers-overview: +********** Background -========== +********** NVIDIA GPUs power the training and deployment of Frontier Models—world-class Large Language Models (LLMs) that define the state of the art in AI reasoning and capability. @@ -45,8 +46,9 @@ The Confidential Containers project leverages Kata Containers to provide the san .. _coco-use-cases: +********* Use Cases -========= +********* The target for Confidential Containers is to enable model providers (closed and open source) and Enterprises to use the advancements of Gen AI, agnostic to the deployment model (Cloud, Enterprise, or Edge). Some of the key use cases that CC and Confidential Containers enable are: @@ -61,8 +63,9 @@ The target for Confidential Containers is to enable model providers (closed and .. _coco-architecture: +********************* Architecture Overview -===================== +********************* NVIDIA's approach to the Confidential Containers architecture delivers on the key promise of Confidential Computing: confidentiality, integrity, and verifiability. Integrating open source and NVIDIA software components with the Confidential Computing capabilities of NVIDIA GPUs, the Reference Architecture for Confidential Containers is designed to be the secure and trusted deployment model for AI workloads. @@ -89,8 +92,9 @@ The components are described in more detail in the next section. .. _coco-supported-platforms-components: +*********************************************** Software Components for Confidential Containers -=============================================== +*********************************************** The following is a brief overview of the software components in NVIDIA's Reference Architecture for Confidential Containers. Refer to the diagram above for a visual representation of the components. @@ -160,7 +164,7 @@ A minimal hardened init system that securely bootstraps the guest environment, l .. _coco-gpu-operator-cluster-topology: GPU Operator Cluster Topology Considerations --------------------------------------------- +============================================ The GPU Operator deploys and manages components for allocating and utilizing the GPU resources on your cluster. Depending on how you configure the Operator, different components are deployed on the worker nodes. @@ -183,21 +187,22 @@ Consider the following example where node A is configured to run traditional con * Node Feature Discovery * NVIDIA GPU Feature Discovery - * NVIDIA Confidential Computing Manager for Kubernetes - * NVIDIA Sandbox Device Plugin + * NVIDIA Kata Sandbox Device Plugin * NVIDIA VFIO Manager * Node Feature Discovery This configuration can be controlled through node labelling, as described in the :doc:`Confidential Containers deployment guide `. +******************************************* Supported Features and Deployment Scenarios -=========================================== +******************************************* The following features are supported with Confidential Containers: * Support for Confidential Container workloads as - * Single-GPU passthrough (one physical GPU per pod). - * Multi-GPU passthrough on NVSwitch (NVLink) based HGX systems. + * :ref:`Single-GPU passthrough ` (one physical GPU per pod). + * :ref:`Multi-GPU passthrough ` on NVSwitch (NVLink) based HGX systems. .. note:: @@ -218,8 +223,9 @@ More information on these features can be found in the `Confidential Containers .. _coco-limitations: +**************************** Limitations and Restrictions -============================ +**************************** * NVIDIA supports the GPU Operator and confidential computing with the containerd runtime only. * All GPUs on the host must be configured for Confidential Computing. @@ -241,7 +247,7 @@ Limitations and Restrictions Refer to the `QEMU IOMMUFD documentation `_ for more information. Security Considerations ------------------------ +======================= * Application security defects: Confidential Computing does not protect against threats within the confidential VM, including vulnerabilities in the application itself. Applications must still follow security best practices such as input validation. @@ -259,8 +265,9 @@ Security Considerations * Availability: Confidential Computing does not provide availability guarantees. Achieve availability through replication, which is standard practice in Kubernetes deployments. +********** Next Steps -========== +********** Refer to the following pages to learn more about deploying with Confidential Containers: .. grid:: 3 diff --git a/confidential-containers/prerequisites.rst b/confidential-containers/prerequisites.rst index 4ec0e6b0b..29c4128d3 100644 --- a/confidential-containers/prerequisites.rst +++ b/confidential-containers/prerequisites.rst @@ -19,15 +19,17 @@ .. _coco-prerequisites: -************* +############# Prerequisites -************* +############# + +The following prerequisites are required to configure your cluster to deploy Confidential Containers. -Complete the following prerequisites before deploying Confidential Containers. Refer to the :doc:`Supported Platforms ` page for validated hardware and software versions. +***************** Hardware and BIOS -================= +***************** * Use a supported platform configured for Confidential Computing. For more information on machine setup, refer to :doc:`Supported Platforms `. @@ -43,13 +45,15 @@ Hardware and BIOS If the output of this command includes 0, 1, and so on, then your host is configured for IOMMU. - If the host is not configured or if you are unsure, add the ``amd_iommu=on`` Linux kernel command-line argument. For most Linux distributions, add the argument to the ``/etc/default/grub`` file, for instance:: + If the host is not configured or if you are unsure, add the ``amd_iommu=on`` Linux kernel command-line argument for AMD CPUs, or ``intel_iommu=on`` for Intel CPUs. For most Linux distributions, add the argument to the ``/etc/default/grub`` file, for instance: + + .. code-block:: console ... GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on modprobe.blacklist=nouveau" ... - After making the change, configure the bootloader + After making the change, configure the bootloader. .. code-block:: console @@ -86,14 +90,36 @@ Hardware and BIOS If modules such as ``nvidia``, ``nvidia_uvm``, or ``nvidia_modeset`` are listed, NVIDIA GPU drivers are present and must be removed before proceeding. Refer to `Removing the Driver `_ in the NVIDIA Driver Installation Guide. +****************** Kubernetes Cluster -================== +****************** * A Kubernetes cluster with cluster administrator privileges. Refer to the :ref:`Supported Software Components ` table for supported Kubernetes versions. -* Helm installed on your cluster. - Refer to the `Helm documentation `_ for installation instructions. +* containerd version 2.2.2 installed. + Refer to the `containerd Getting Started guide `_ for installation instructions. + + To verify the installed version, run the following command: + + .. code-block:: console + + $ containerd --version + + *Example Output:* + + .. code-block:: output + + containerd containerd.io 2.2.2 ... + +* Helm installed. + Use the command below to install Helm or refer to the `Helm documentation `_ for installation instructions. + + .. code-block:: console + + $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \ + && chmod 700 get_helm.sh \ + && ./get_helm.sh * Enable the ``KubeletPodResourcesGet`` and ``RuntimeClassInImageCriApi`` Kubelet feature gates on your cluster. @@ -122,7 +148,34 @@ Kubernetes Cluster $ sudo systemctl restart kubelet +* Configure image pull timeouts. The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start. + Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. + + If you plan to use large images, increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers. + + Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): + + .. code-block:: yaml + :emphasize-lines: 3 + + apiVersion: kubelet.config.k8s.io/v1beta1 + kind: KubeletConfiguration + runtimeRequestTimeout: 20m + + Restart the kubelet service to apply the change: + + .. code-block:: console + + $ sudo systemctl restart kubelet + + Optionally, you can configure additional timeouts for the NVIDIA Shim and Kata Agent Policy. + The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes). + This controls the time the shim allows for a container to remain in container creating state. + If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull. + To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation. + +********** Next Steps -========== +********** After completing the prerequisites, proceed to :doc:`Deploy Confidential Containers `. diff --git a/confidential-containers/release-notes.rst b/confidential-containers/release-notes.rst index 5f7ddbe4f..94e430ee9 100644 --- a/confidential-containers/release-notes.rst +++ b/confidential-containers/release-notes.rst @@ -18,9 +18,9 @@ .. _coco-release-notes: -************* +############# Release Notes -************* +############# This document describes the new features and known issues for the NVIDIA Confidential Containers Reference Architecture. @@ -28,8 +28,9 @@ This document describes the new features and known issues for the NVIDIA Confide .. _coco-v1.0.0: +***** 1.0.0 -===== +***** This is the initial general availability (GA) release of the NVIDIA Confidential Containers Reference Architecture, a validated deployment model for running GPU-accelerated AI workloads inside hardware-enforced Trusted Execution Environments (TEEs). It is designed for organizations in regulated industries that require strong isolation and cryptographic verification to protect model intellectual property and sensitive data on untrusted infrastructure. @@ -37,7 +38,7 @@ It is designed for organizations in regulated industries that require strong iso The architecture combines NVIDIA GPU Confidential Computing, Kata Containers, and the NVIDIA GPU Operator to provide a secure, attestable, Kubernetes-native platform for confidential AI workloads. Key Features ------------- +============ * This release supports HGX platforms with: @@ -66,7 +67,7 @@ Key Features Limitations and Restrictions ----------------------------- +============================ * NVIDIA supports the GPU Operator and confidential computing with the containerd runtime only. diff --git a/confidential-containers/run-sample-workload.rst b/confidential-containers/run-sample-workload.rst index 12b7e433c..10f5668bb 100644 --- a/confidential-containers/run-sample-workload.rst +++ b/confidential-containers/run-sample-workload.rst @@ -19,15 +19,20 @@ .. _coco-run-sample-workload: -********************* +##################### Run a Sample Workload -********************* +##################### -After completing the :doc:`deployment steps `, you can verify your installation by running a sample GPU workload in a confidential container. +After completing the :doc:`deployment steps `, verify your +installation by running a basic single-GPU sample workload inside a Confidential Container. -A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX. +This page intentionally uses the simplest possible manifest so that you can confirm the +deployment end-to-end. +For the full set of workload configuration options, including runtime class selection, +resource type naming, and multi-GPU passthrough, refer to +:doc:`Configuring Confidential Container Workloads `. -1. Create a file, such as the following ``cuda-vectoradd-kata.yaml`` sample, specifying the kata-qemu-nvidia-gpu-snp runtime class: +#. Create a file named ``cuda-vectoradd-kata.yaml`` with the following sample manifest: .. code-block:: yaml :emphasize-lines: 7,14 @@ -38,7 +43,7 @@ A pod manifest for a confidential container GPU workload requires that you speci name: cuda-vectoradd-kata namespace: default spec: - runtimeClassName: kata-qemu-nvidia-gpu-snp + runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx restartPolicy: Never containers: - name: cuda-vectoradd @@ -48,33 +53,33 @@ A pod manifest for a confidential container GPU workload requires that you speci nvidia.com/pgpu: "1" memory: 16Gi - The following are Confidential Containers configurations in the sample manifest: + Before applying the manifest, adjust the two highlighted lines for your environment: - * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type where the workloads should run. + * **Runtime class.** Use ``kata-qemu-nvidia-gpu-snp`` on AMD SEV-SNP nodes or + ``kata-qemu-nvidia-gpu-tdx`` on Intel TDX nodes. + * **GPU resource type.** The sample requests ``nvidia.com/pgpu``, which is the default + resource name advertised by the NVIDIA Kata sandbox device plugin. + If your cluster was installed with the ``P_GPU_ALIAS=""`` setting, replace it with the + model-specific name advertised on your node, for example ``nvidia.com/GH100_H200_141GB``. - * In the sample above, ``nvidia.com/pgpu`` is the default resource type for GPUs. - If you are deploying on a heterogeneous cluster, you might want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the sandbox device plugin. - Refer to the :ref:`Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ` for more details. + Refer to :doc:`Configuring Confidential Container Workloads ` for + guidance on each option. - * If you have machines that support multi-GPU passthrough, refer to the :doc:`Configuring Multi-GPU Passthrough ` page for a complete workload example and architecture-specific CC mode requirements. - - -2. Create the pod: +#. Create the pod: .. code-block:: console $ kubectl apply -f cuda-vectoradd-kata.yaml - + *Example Output:* .. code-block:: output pod/cuda-vectoradd-kata created +#. Optional: Verify the pod is running: - Optional: Verify the pod is running. - - .. code-block:: console + .. code-block:: console $ kubectl get pod cuda-vectoradd-kata @@ -85,7 +90,7 @@ A pod manifest for a confidential container GPU workload requires that you speci NAME READY STATUS RESTARTS AGE cuda-vectoradd-kata 1/1 Running 0 10s -3. View the logs from the pod after the container starts: +#. View the logs from the pod after the container starts: .. code-block:: console @@ -102,17 +107,19 @@ A pod manifest for a confidential container GPU workload requires that you speci Test PASSED Done -4. Delete the pod: +#. Delete the pod: .. code-block:: console $ kubectl delete -f cuda-vectoradd-kata.yaml +********** Next Steps -========== +********** -* Configure :doc:`Attestation ` with the Trustee framework to enable remote verification of your confidential environment. -* Set up :doc:`multi-GPU passthrough ` for NVSwitch-based HGX systems. -* Tune :doc:`image pull timeouts ` if you are pulling large container images. +* :doc:`Configure Confidential Container workloads ` for runtime class + selection, resource type naming, and single- or multi-GPU passthrough patterns. +* Configure :doc:`Attestation ` with the Trustee framework to enable remote + verification of your confidential environment. * Manage the :doc:`confidential computing mode ` on your GPUs. diff --git a/confidential-containers/supported-platforms.rst b/confidential-containers/supported-platforms.rst index d871b2a45..f3391f003 100644 --- a/confidential-containers/supported-platforms.rst +++ b/confidential-containers/supported-platforms.rst @@ -18,17 +18,18 @@ .. _coco-supported-platforms: -******************* +################### Supported Platforms -******************* +################### Following are the platforms supported by the NVIDIA Confidential Containers Reference Architecture. -Supported Hardware Platform -=========================== +******** +Hardware +******** NVIDIA GPUs ------------ +=========== .. list-table:: :header-rows: 1 @@ -57,7 +58,7 @@ NVIDIA GPUs .. note:: - Multi-GPU passthrough on NVIDIA Hopper HGX systems requires that you set the Confidential Computing mode to ``ppcie`` mode. + :ref:`Multi-GPU passthrough ` on NVIDIA Hopper HGX systems requires that you set the Confidential Computing mode to ``ppcie`` mode. Refer to :doc:`Managing the Confidential Computing Mode ` for details. .. note:: @@ -66,7 +67,7 @@ NVIDIA GPUs Configuring only some GPUs on a node for Confidential Computing is not supported. CPU Platforms -------------- +============= .. flat-table:: :header-rows: 1 @@ -98,7 +99,7 @@ For additional resources on machine setup: .. _coco-supported-software-components: Supported Software Components ------------------------------ +============================= .. flat-table:: :header-rows: 1