diff --git a/confidential-containers/attestation.rst b/confidential-containers/attestation.rst index 56528dbc5..cd745286b 100644 --- a/confidential-containers/attestation.rst +++ b/confidential-containers/attestation.rst @@ -19,12 +19,13 @@ .. _attestation: -*********** +########### Attestation -*********** +########### -This page provides an overview of how to configure remote attestation for Confidential Container workloads. -Attestation cryptographically verifies the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload. + +The :doc:`Confidential Containers deployment guide ` configures your cluster to run workloads in a Confidential Container. +To strengthen workload security, configure attestation to verify the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload. Attestation is required for any feature that depends on secrets, including: @@ -35,12 +36,19 @@ Attestation is required for any feature that depends on secrets, including: When a workload requires a secret, such as a key to decrypt a container image or model, guest components collect hardware evidence from the active CPU and GPU enclaves. The evidence is sent to a remote verifier, Trustee, which evaluates the evidence against configured policies and conditionally releases the secret. +Trustee is typically deployed in a separate trusted environment that is reachable from your worker nodes over the network. + +.. note:: -For background on how attestation fits into the Confidential Containers architecture, refer to the :doc:`NVIDIA Confidential Containers Reference Architecture overview `. + This page is an educational overview of attestation with Confidential Containers, not a complete configuration guide. + The attestation workflow is fully documented in the upstream `Confidential Containers documentation `_, which is the source of truth for setup and configuration details. + Attestation is not required to deploy Confidential Containers, but is required for features that rely on secrets in your cluster. + +************* Prerequisites -============= +************* * A Kubernetes cluster configured to deploy Confidential Containers workloads. Refer to the :doc:`deployment guide ` for configuration steps. @@ -50,8 +58,9 @@ Prerequisites Trustee does not require Confidential Computing hardware or a GPU. * Network connectivity from the worker nodes in your Kubernetes cluster to the Trustee instance. +********************** Configuration Workflow -====================== +********************** After you meet the prerequisites, complete the following steps to enable attestation: @@ -63,14 +72,14 @@ After configuration, the Confidential Containers runtime automatically runs the .. _provision-trustee: +***************** Provision Trustee -================= +***************** Trustee is an open-source framework used in Confidential Containers to verify attestation evidence and conditionally release secrets. For a full overview of attestation with Trustee, refer to the upstream `Trustee documentation `_. -To provision a Trustee instance, follow the upstream `Install Trustee in Docker `_ guide. -This is the recommended install method. +To provision a Trustee instance, follow the recommended upstream `Install Trustee in Docker `_ guide. .. note:: @@ -83,45 +92,50 @@ After you complete installation, Trustee is configured to use the NVIDIA Remote .. _configure-workloads-trustee: +*********************************** Configure Workloads for Attestation -==================================== +*********************************** -To enable attestation for your workloads, point them to the Trustee network endpoint, sometimes referred to as the Key Broker Service (KBS) endpoint, by adding the following annotation to your workload pod spec: +To enable attestation for your workloads, point them to the Trustee network endpoint, also called the Key Broker Service (KBS) endpoint, by adding the following annotation to your workload pod spec: .. code-block:: yaml io.katacontainers.config.hypervisor.kernel_params: "agent.aa_kbc_params=cc_kbc::http://:" -Replace ```` with the IP address or hostname at which your Trustee instance is reachable from the worker nodes, and ```` with the port (default: ``8080``). +Replace ```` with the IP address or hostname at which your Trustee instance is reachable from the worker nodes. +Replace ```` with the port that Trustee listens on (default: ``8080``). Refer to the upstream `Setup Confidential Containers `_ documentation for more information on configuring workloads for attestation. .. _customize-attestation: -Customize Attestation Workflows -=============================== +***************************************** +Optional: Customize Attestation Workflows +***************************************** + +Confidential Containers enables sensible default attestation policies for NVIDIA Confidential Computing GPUs. +In most cases, the default policy is appropriate and you only need to provide reference values. +For more information, refer to the upstream `Confidential Containers reference values `_ documentation. + +You can use the Key Broker Service (KBS) Client Tool to configure Trustee reference values and secrets. +Refer to the upstream documentation on `using the KBS Client Tool `_. -After Trustee is provisioned and workloads are configured, you can customize attestation workflows to enforce your desired security policies. -This can include configuring the following: +For more advanced customization, refer to the following upstream Confidential Containers documentation: -* KBS Client Tool: Configure Trustee resources and secrets by using the Key Broker Service (KBS) Client Tool. - Refer to the upstream documentation on `using the KBS Client Tool `_. -* Configure resources: Create resources, or secrets, that your workloads need. - Refer to the upstream `Confidential Containers resources `_ documentation for more information on the resources. -* Configure policies: Confidential Containers uses different policy types to secure workload at different layers. - Refer to the upstream `Confidential Containers policy `_ documentation for more information on the policy types and configuring policies. - -Refer to the upstream `Confidential Containers Features `_ documentation for a full list of attestation features and how to configure them. +* `Resources `_: Create the resources, such as secrets, that your workloads need. +* `Policies `_: Configure the policy types that secure workloads at different layers. +* `Features `_: Browse the full list of attestation features and how to configure them. +*************** Troubleshooting -=============== +*************** If attestation does not succeed after provisioning Trustee, enable debug logging by setting the ``RUST_LOG=debug`` environment variable in the Trustee environment. Use the Trustee log to diagnose the attestation process. +********** Next Steps -========== +********** -* Refer to the :doc:`deployment guide ` for Confidential Containers setup instructions. -* Refer to the upstream `Confidential Containers Features `_ documentation for a complete list of attestation-dependent features. -* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information. +* Refer to the upstream `Confidential Containers Features `_ for complete documentation on attestation features. +* If you haven't already, refer to the :doc:`Confidential Containers deployment guide ` to configure your environment for confidential workloads. diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst index d48fe8797..b71162738 100644 --- a/confidential-containers/confidential-containers-deploy.rst +++ b/confidential-containers/confidential-containers-deploy.rst @@ -19,9 +19,9 @@ .. _confidential-containers-deploy: -****************************** +############################## Deploy Confidential Containers -****************************** +############################## This page describes deploying Kata Containers and the NVIDIA GPU Operator. These are key pieces of the NVIDIA Confidential Containers Reference Architecture used to manage GPU resources on your cluster and deploy workloads into Confidential Containers. @@ -32,13 +32,13 @@ This guide assumes you are familiar with the NVIDIA GPU Operator, Kata Container Refer to the :doc:`NVIDIA GPU Operator ` and `Kata Containers `_ documentation for more information on these software components. Refer to the `Kubernetes documentation `_ for more information on Kubernetes cluster administration. - +******** Overview -======== +******** The high-level workflow for configuring Confidential Containers is as follows: -#. Configure the :ref:`Prerequisites `. +#. Configure the :doc:`Prerequisites `. #. :ref:`Label Nodes ` that you want to use with Confidential Containers. @@ -49,146 +49,19 @@ The high-level workflow for configuring Confidential Containers is as follows: This installs the NVIDIA GPU Operator components that are required to deploy GPU passthrough workloads. The GPU Operator uses the node labels to determine what software components to deploy to a node. -After installation, you can :ref:`run a sample GPU workload ` in a confidential container. -You can also configure :doc:`Attestation ` with the Trustee framework. +After installation, you can :doc:`run a sample GPU workload ` in a confidential container. +You can also configure :doc:`Attestation ` with the Trustee framework. The Trustee attestation service is typically deployed on a separate, trusted environment. After configuration, you can schedule workloads that request GPU resources and use the ``kata-qemu-nvidia-gpu-tdx`` or ``kata-qemu-nvidia-gpu-snp`` runtime classes for secure deployment. -.. _coco-prerequisites: - -Prerequisites -============= - -Hardware and BIOS ------------------ - -* Use a supported platform configured for Confidential Computing. - For more information on machine setup, refer to :doc:`Supported Platforms `. - -* Ensure hosts are configured to enable hardware virtualization and Access Control Services (ACS). With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER). Enable these features in the host BIOS. - -* Configure hosts to support IOMMU. - You can check if your host is configured for IOMMU by running the following command: - - .. code-block:: console - - $ ls /sys/kernel/iommu_groups - - If the output of this command includes 0, 1, and so on, then your host is configured for IOMMU. - - If the host is not configured or if you are unsure, add the ``amd_iommu=on`` Linux kernel command-line argument for AMD CPUs, or ``intel_iommu=on`` for Intel CPUs. For most Linux distributions, add the argument to the ``/etc/default/grub`` file, for instance: - - .. code-block:: console - - ... - GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on modprobe.blacklist=nouveau" - ... - - After making the change, configure the bootloader. - - .. code-block:: console - - $ sudo update-grub - - *Example Output:* - - .. code-block:: output - - Sourcing file `/etc/default/grub' - Generating grub configuration file ... - Found linux image: /boot/vmlinuz-5.15.0-generic - Found initrd image: /boot/initrd.img-5.15.0-generic - done - - Reboot the host after configuring the bootloader. - - .. note:: - - After configuring IOMMU, you might see QEMU warnings about PCI P2P DMA when running GPU workloads. - These are expected and can be safely ignored. - Refer to :ref:`coco-limitations` for details. - -* Ensure that no NVIDIA GPU drivers are installed on the host. - Confidential Containers uses VFIO to pass GPUs directly to the confidential VM, and host-level GPU drivers interfere with VFIO device binding. - - To check if NVIDIA GPU drivers are installed, run the following command: - - .. code-block:: console - - $ lsmod | grep nvidia - - If the output is empty, no NVIDIA GPU drivers are loaded. - If modules such as ``nvidia``, ``nvidia_uvm``, or ``nvidia_modeset`` are listed, NVIDIA GPU drivers are present and must be removed before proceeding. - Refer to `Removing the Driver `_ in the NVIDIA Driver Installation Guide. - -Kubernetes Cluster ------------------- - -* A Kubernetes cluster with cluster administrator privileges. - Refer to the :ref:`Supported Software Components ` table for supported Kubernetes versions. - -* containerd version 2.2.2 installed. - Refer to the `containerd Getting Started guide `_ for installation instructions. - - To verify the installed version, run the following command: - - .. code-block:: console - - $ containerd --version - - *Example Output:* - - .. code-block:: output - - containerd containerd.io 2.2.2 ... - -* Helm installed. - Use the command below to install Helm or refer to the `Helm documentation `_ for installation instructions. - - .. code-block:: console - - $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \ - && chmod 700 get_helm.sh \ - && ./get_helm.sh - - -* Enable the ``KubeletPodResourcesGet`` and ``RuntimeClassInImageCriApi`` Kubelet feature gates on your cluster. - - * ``KubeletPodResourcesGet``: Enabled by default on Kubernetes v1.34 and later. - On older versions, you must enable it explicitly. - The Kata runtime uses this feature gate to query the Kubelet Pod Resources API and discover allocated GPU devices during sandbox creation. - - * ``RuntimeClassInImageCriApi``: Alpha since Kubernetes v1.29 and is not enabled by default. - This feature gate is required to support pod deployments that use multiple snapshotters side-by-side. - - Add both feature gates to your Kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): - - .. code-block:: yaml - - apiVersion: kubelet.config.k8s.io/v1beta1 - kind: KubeletConfiguration - featureGates: - KubeletPodResourcesGet: true - RuntimeClassInImageCriApi: true - - If your ``config.yaml`` already has a ``featureGates`` section, add the gates to the existing section rather than creating a duplicate. - - Restart the Kubelet service to apply the changes: - - .. code-block:: console - - $ sudo systemctl restart kubelet - .. _installation-and-configuration: -Installation -============ - .. _coco-label-nodes: +*********** Label Nodes ------------ +*********** #. Get a list of the nodes in your cluster: @@ -247,8 +120,9 @@ After labeling the node, you can continue to the next steps to install Kata Cont .. _coco-install-kata-chart: +************************************** Install the Kata Containers Helm Chart --------------------------------------- +************************************** Install Kata Containers using the ``kata-deploy`` Helm chart. The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers. @@ -342,8 +216,9 @@ The minimum required version is 3.29.0. .. _coco-install-gpu-operator: +******************************* Install the NVIDIA GPU Operator --------------------------------- +******************************* Install the NVIDIA GPU Operator and configure it to deploy Confidential Container components. @@ -420,6 +295,7 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe .. note:: It can take several minutes for all GPU Operator pods to be in the Running state. If you are not seeing the expected output, you can view the logs for the GPU Operator pods: + .. code-block:: console $ kubectl logs -n gpu-operator @@ -447,118 +323,10 @@ Install the NVIDIA GPU Operator and configure it to deploy Confidential Containe If you have an issue deploying the GPU Operator, refer to the :doc:`NVIDIA GPU Operator troubleshooting guide ` for guidance on troubleshooting and resolving issues. -With Kata Containers and the GPU Operator installed, you can start using your cluster to run Confidential Containers workloads. -To run a sample workload, refer to the :ref:`Run a Sample Workload ` section. - -For further configuration settings, refer to the following sections: - -* :ref:`Managing the Confidential Computing Mode ` -* :ref:`Configuring Workloads to use Multi-GPU Passthrough ` -* :ref:`Configuring GPU or NVSwitch Resource Types Name ` - -.. _coco-run-sample-workload: - -Run a Sample Workload -===================== - -A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX. - -1. Create a file, such as the following ``cuda-vectoradd-kata.yaml`` sample, specifying the kata-qemu-nvidia-gpu-snp runtime class: - - .. code-block:: yaml - :emphasize-lines: 7,14 - - apiVersion: v1 - kind: Pod - metadata: - name: cuda-vectoradd-kata - namespace: default - spec: - runtimeClassName: kata-qemu-nvidia-gpu-snp - restartPolicy: Never - containers: - - name: cuda-vectoradd - image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" - resources: - limits: - nvidia.com/pgpu: "1" - memory: 16Gi - - The following are Confidential Containers configurations in the sample manifest: - - * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type where the workloads should run. - - * In the sample above, ``nvidia.com/pgpu`` is the default resource type for GPUs. - If you are deploying on a heterogeneous cluster, you might want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the Kata device plugin. - Refer to the :ref:`Configuring GPU or NVSwitch Resource Types Name ` section on this page for more details. - - * If you have machines that support multi-GPU passthrough, use a pod deployment manifest that specifies 8 PGPU and 4 NVSwitch resources. - - .. code-block:: yaml - - resources: - limits: - nvidia.com/pgpu: "8" - nvidia.com/nvswitch: "4" - - .. note:: - If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, also refer to :ref:`Managing the Confidential Computing Mode ` for details on how to set the ``ppcie`` mode. - - -2. Create the pod: - - .. code-block:: console - - $ kubectl apply -f cuda-vectoradd-kata.yaml - - *Example Output:* - - .. code-block:: output - - pod/cuda-vectoradd-kata created - - -3. Verify the pod is running: - - .. code-block:: console - - $ kubectl get pod cuda-vectoradd-kata - - *Example Output:* - - .. code-block:: output - - NAME READY STATUS RESTARTS AGE - cuda-vectoradd-kata 1/1 Running 0 10s - -4. View the logs from the pod after the container starts: - - .. code-block:: console - - $ kubectl logs -n default cuda-vectoradd-kata - - *Example Output:* - - .. code-block:: output - - [Vector addition of 50000 elements] - Copy input data from the host memory to the CUDA device - CUDA kernel launch with 196 blocks of 256 threads - Copy output data from the CUDA device to the host memory - Test PASSED - Done - -5. Delete the pod: - - .. code-block:: console - - $ kubectl delete -f cuda-vectoradd-kata.yaml - - .. _coco-configuration-settings: Common GPU Operator Configuration Settings -=========================================== +========================================== The following are the available GPU Operator configuration settings to enable Confidential Containers: @@ -597,7 +365,7 @@ The following are the available GPU Operator configuration settings to enable Co .. _coco-configuration-heterogeneous-clusters: Configuring GPU or NVSwitch Resource Types Name ------------------------------------------------- +=============================================== By default, the NVIDIA GPU Operator creates a resource type for GPUs and NVSwitches, ``nvidia.com/pgpu`` and ``nvidia.com/nvswitch``. You can reference this name in your manifests to request GPU or NVSwitch resources for your workload. @@ -609,7 +377,7 @@ To do this, specify an empty ``P_GPU_ALIAS`` environment variable in the Kata sa ``--set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS`` and ``--set kataSandboxDevicePlugin.env[0].value=""``. -When this variable is set to ``""``, the Kata device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H100L_94GB``, instead of the default ``nvidia.com/pgpu`` type. +When this variable is set to ``""``, the Kata device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H200_141GB``, instead of the default ``nvidia.com/pgpu`` type. Use the exposed device resource types in pod specs by specifying respective resource limits. Similarly, you can set ``NVSWITCH_ALIAS`` to ``""`` to advertise model-specific NVSwitch resource types. @@ -619,17 +387,17 @@ The following example installs the GPU Operator with both ``P_GPU_ALIAS`` and `` .. code-block:: console $ helm install --wait --timeout 10m --generate-name \ - -n gpu-operator --create-namespace \ - nvidia/gpu-operator \ - --set sandboxWorkloads.enabled=true \ - --set sandboxWorkloads.mode=kata \ - --set nfd.enabled=true \ - --set nfd.nodefeaturerules=true \ - --set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS \ - --set kataSandboxDevicePlugin.env[0].value="" \ - --set kataSandboxDevicePlugin.env[1].name=NVSWITCH_ALIAS \ - --set kataSandboxDevicePlugin.env[1].value="" \ - --version=v26.3.1 + -n gpu-operator --create-namespace \ + nvidia/gpu-operator \ + --set sandboxWorkloads.enabled=true \ + --set sandboxWorkloads.mode=kata \ + --set nfd.enabled=true \ + --set nfd.nodefeaturerules=true \ + --set kataSandboxDevicePlugin.env[0].name=P_GPU_ALIAS \ + --set kataSandboxDevicePlugin.env[0].value="" \ + --set kataSandboxDevicePlugin.env[1].name=NVSWITCH_ALIAS \ + --set kataSandboxDevicePlugin.env[1].value="" \ + --version=v26.3.1 After installing the GPU Operator, you can view the GPU or NVSwitch resource types available on a node by running the following command: @@ -638,6 +406,7 @@ After installing the GPU Operator, you can view the GPU or NVSwitch resource typ $ kubectl get node $NODE_NAME -o json | grep nvidia.com .. note:: + The ``NODE_NAME`` environment variable was set in the :ref:`Label Nodes ` section. If you want to view the resource types for a different node, you can update the ``NODE_NAME`` environment variable and run the command again. @@ -645,206 +414,13 @@ After installing the GPU Operator, you can view the GPU or NVSwitch resource typ .. code-block:: output - "nvidia.com/GH100_H100L_94GB": "1" - - - -.. _managing-confidential-computing-mode: - -Managing the Confidential Computing Mode -========================================= - -You can set the default confidential computing mode of the NVIDIA GPUs by setting the ``ccManager.defaultMode=`` option. -The default value of ``ccManager.defaultMode`` is ``on``. -You can set this option when you install NVIDIA GPU Operator or afterward by modifying the cluster-policy instance of the ClusterPolicy object. - -When you change the mode, the manager performs the following actions: - -* Evicts the other GPU Operator operands from the node. - - However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode. - -* Unbinds the GPU from the VFIO PCI device driver. -* Changes the mode and resets the GPU. -* Reschedules the other GPU Operator operands. - -The supported modes are: - -.. list-table:: - :widths: 15 55 30 - :header-rows: 1 - - * - Mode - - Description - - Configuration Method - * - ``on`` (default) - - Enable Confidential Computing. - - cluster-wide default, node-level override - * - ``off`` - - Disable Confidential Computing. - - cluster-wide default, node-level override - * - ``ppcie`` - - Enable Confidential Computing on NVIDIA Hopper GPUs. - - On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe (PPCIE) - which claims exclusive use of the NVSwitches for a single Confidential Container - virtual machine. - If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, - set the GPU mode to ``ppcie`` mode. - - The NVIDIA Blackwell architecture uses NVLink - encryption which places the switches outside of the Trusted Computing Base (TCB), - meaning the ``ppcie`` mode is not required. Use ``on`` mode in this case. - - node-level override - -You can set a cluster-wide default mode, and you can set the mode on individual nodes. -The mode that you set on a node has higher precedence than the cluster-wide default mode. - -Setting a Cluster-Wide Default Mode ------------------------------------- - -To set a cluster-wide mode, specify the ``ccManager.defaultMode`` field like the following example: - -.. code-block:: console - - $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \ - --type=merge \ - -p '{"spec": {"ccManager": {"defaultMode": "on"}}}' - -*Example Output:* - -.. code-block:: output - - clusterpolicy.nvidia.com/cluster-policy patched - -.. note:: - - The ``ppcie`` mode cannot be set as a cluster-wide default, it can only be set as a node label value. - -Setting a Node-Level Mode --------------------------- - -To set a node-level mode, apply the ``nvidia.com/cc.mode=`` label on the node. - -.. note:: - - The ``NODE_NAME`` environment variable was set in the :ref:`Label Nodes ` section. - If you want to set the mode for a different node, you can update the ``NODE_NAME`` environment variable and run the command again. - -.. code-block:: console - - $ kubectl label node $NODE_NAME nvidia.com/cc.mode=on --overwrite - -The mode that you set on a node has higher precedence than the cluster-wide default mode. - -Verifying a Mode Change ------------------------- - -To verify that a mode change was successful, view the ``nvidia.com/cc.mode``, -``nvidia.com/cc.mode.state``, and ``nvidia.com/cc.ready.state`` node labels: - -.. code-block:: console - - $ kubectl get node $NODE_NAME -o json | \ - jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/cc")))' - -*Example Output (CC mode disabled):* - -.. code-block:: json - - { - "nvidia.com/cc.mode": "off", - "nvidia.com/cc.mode.state": "off", - "nvidia.com/cc.ready.state": "false" - } - -*Example Output (CC mode enabled):* - -.. code-block:: json - - { - "nvidia.com/cc.mode": "on", - "nvidia.com/cc.mode.state": "on", - "nvidia.com/cc.ready.state": "true" - } - -* The ``nvidia.com/cc.mode`` label is the desired state. - -* The ``nvidia.com/cc.mode.state`` label reflects the mode that was last successfully applied to the GPU hardware by the Confidential Computing Manager. - Its value mirrors the applied mode ``on``, ``off``, or ``ppcie``, after the transition is complete on the node. - A value of ``failed`` indicates that the last mode transition encountered an error. - -* The ``nvidia.com/cc.ready.state`` label indicates whether the node is ready to run Confidential Container workloads. - It is set to ``true`` when ``cc.mode.state`` is ``on`` or ``ppcie``, and ``false`` when ``cc.mode.state`` is ``off``. - -.. note:: - - It can take one to two minutes for GPU state transitions to complete and the labels to be updated. - A mode change is complete and successful when ``nvidia.com/cc.mode`` and - ``nvidia.com/cc.mode.state`` have the same value. - - -.. _coco-configuration-multi-gpu-passthrough: - -Configuring Workloads to use Multi-GPU Passthrough -=================================================== - -To configure multi-GPU passthrough, you can specify the following resource limits in your manifests: - -.. code-block:: yaml - - limits: - nvidia.com/pgpu: "8" - nvidia.com/nvswitch: "4" - - -You must assign all the GPUs and NVSwitches on the node in your manifest to the same Confidential Container virtual machine. - -On the NVIDIA Hopper architecture, multi-GPU passthrough uses protected PCIe (PPCIE), which claims exclusive use of the NVSwitches for a single Confidential Container. -When using NVIDIA Hopper nodes for multi-GPU passthrough, transition your node's GPU Confidential Computing mode to ``ppcie`` by applying the ``nvidia.com/cc.mode=ppcie`` label. -Refer to the :ref:`Managing the Confidential Computing Mode ` section for details. - -The NVIDIA Blackwell architecture uses NVLink encryption which places the switches outside of the Trusted Computing Base (TCB) and only requires the GPU Confidential Computing mode to be set to ``on``. - - -.. _configure-image-pull-timeouts: - -Configure Image Pull Timeouts -============================= - -The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start. -Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. - -If you plan to use large images, increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers. - -Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): - -.. code-block:: yaml - :emphasize-lines: 3 - - apiVersion: kubelet.config.k8s.io/v1beta1 - kind: KubeletConfiguration - runtimeRequestTimeout: 20m - -Restart the kubelet service to apply the change: - -.. code-block:: console - - $ sudo systemctl restart kubelet - -Additional timeouts to consider updating are the NVIDIA Shim and Kata Agent Policy timeouts. -The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes). -This controls the time the shim allows for a container to remain in container creating state. - -If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull. -To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation. - + "nvidia.com/GH100_H200_141GB": "1" +********** Next Steps -========== +********** -* Refer to the :doc:`Attestation ` page for more information on configuring attestation. +* :doc:`Run a Sample Workload ` to verify your deployment. +* :doc:`Configure ` additional options for your environment, including attestation, the confidential computing mode, and :ref:`multi-GPU passthrough `. * To help manage the lifecycle of Kata Containers, install the `Kata Lifecycle Manager `_. This Argo Workflows-based tool manages Kata Containers upgrades and day-two operations. -* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information. -* Licensing information is available on the :doc:`Licensing ` page. \ No newline at end of file diff --git a/confidential-containers/configure-cc-mode.rst b/confidential-containers/configure-cc-mode.rst new file mode 100644 index 000000000..d7d1407b2 --- /dev/null +++ b/confidential-containers/configure-cc-mode.rst @@ -0,0 +1,159 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _managing-confidential-computing-mode: + +######################################### +Managing the Confidential Computing Mode +######################################### + +You can set the default confidential computing mode of the NVIDIA GPUs by setting the ``ccManager.defaultMode=`` option. +The default value of ``ccManager.defaultMode`` is ``on``. +You can set this option when you install NVIDIA GPU Operator or afterward by modifying the cluster-policy instance of the ClusterPolicy object. + +When you change the mode, the manager performs the following actions: + +* Evicts the other GPU Operator operands from the node. + + However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode. + +* Unbinds the GPU from the VFIO PCI device driver. +* Changes the mode and resets the GPU. +* Reschedules the other GPU Operator operands. + +The supported modes are: + +.. list-table:: + :widths: 15 55 30 + :header-rows: 1 + + * - Mode + - Description + - Configuration Method + * - ``on`` (default) + - Enable Confidential Computing. + - cluster-wide default, node-level override + * - ``off`` + - Disable Confidential Computing. + - cluster-wide default, node-level override + * - ``ppcie`` + - Enable Confidential Computing on NVIDIA Hopper GPUs. + + On the NVIDIA Hopper architecture :ref:`multi-GPU passthrough ` + uses protected PCIe (PPCIE) which claims exclusive use of the NVSwitches for a single + Confidential Container virtual machine. + If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, + set the GPU mode to ``ppcie`` mode. + + The NVIDIA Blackwell architecture uses NVLink + encryption which places the switches outside of the Trusted Computing Base (TCB), + meaning the ``ppcie`` mode is not required. Use ``on`` mode in this case. + - node-level override + +You can set a cluster-wide default mode, and you can set the mode on individual nodes. +The mode that you set on a node has higher precedence than the cluster-wide default mode. + +*********************************** +Setting a Cluster-Wide Default Mode +*********************************** + +To set a cluster-wide mode, specify the ``ccManager.defaultMode`` field like the following example: + +.. code-block:: console + + $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \ + --type=merge \ + -p '{"spec": {"ccManager": {"defaultMode": "on"}}}' + +*Example Output:* + +.. code-block:: output + + clusterpolicy.nvidia.com/cluster-policy patched + +.. note:: + + The ``ppcie`` mode cannot be set as a cluster-wide default, it can only be set as a node label value. + +************************* +Setting a Node-Level Mode +************************* + +To set a node-level mode, apply the ``nvidia.com/cc.mode=`` label on the node. + +Set the ``NODE_NAME`` environment variable to the name of the node you want to configure: + +.. code-block:: console + + $ export NODE_NAME="" + +Then apply the label: + +.. code-block:: console + + $ kubectl label node $NODE_NAME nvidia.com/cc.mode=on --overwrite + +The mode that you set on a node has higher precedence than the cluster-wide default mode. + +*********************** +Verifying a Mode Change +*********************** + +To verify that a mode change was successful, view the ``nvidia.com/cc.mode``, +``nvidia.com/cc.mode.state``, and ``nvidia.com/cc.ready.state`` node labels: + +.. code-block:: console + + $ kubectl get node $NODE_NAME -o json | \ + jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/cc")))' + +*Example Output (CC mode disabled):* + +.. code-block:: json + + { + "nvidia.com/cc.mode": "off", + "nvidia.com/cc.mode.state": "off", + "nvidia.com/cc.ready.state": "false" + } + +*Example Output (CC mode enabled):* + +.. code-block:: json + + { + "nvidia.com/cc.mode": "on", + "nvidia.com/cc.mode.state": "on", + "nvidia.com/cc.ready.state": "true" + } + +* The ``nvidia.com/cc.mode`` label is the desired state. + +* The ``nvidia.com/cc.mode.state`` label reflects the mode that was last successfully applied to the GPU hardware by the Confidential Computing Manager. + Its value mirrors the applied mode ``on``, ``off``, or ``ppcie``, after the transition is complete on the node. + A value of ``failed`` indicates that the last mode transition encountered an error. + +* The ``nvidia.com/cc.ready.state`` label indicates whether the node is ready to run Confidential Container workloads. + It is set to ``true`` when ``cc.mode.state`` is ``on`` or ``ppcie``, and ``false`` when ``cc.mode.state`` is ``off``. + +.. note:: + + It can take one to two minutes for GPU state transitions to complete and the labels to be updated. + A mode change is complete and successful when ``nvidia.com/cc.mode`` and + ``nvidia.com/cc.mode.state`` have the same value. diff --git a/confidential-containers/configure-multi-gpu.rst b/confidential-containers/configure-multi-gpu.rst new file mode 100644 index 000000000..a1ec76d04 --- /dev/null +++ b/confidential-containers/configure-multi-gpu.rst @@ -0,0 +1,309 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _coco-configure-workloads: + +############################################ +Configuring Confidential Container Workloads +############################################ + +A Confidential Container workload is a standard Kubernetes pod that runs inside a TEE-protected +virtual machine and requests one or more GPUs through the NVIDIA Kata sandbox device plugin. +Compared with a traditional GPU pod, a Confidential Container workload pod manifest differs in +three ways: + +* It selects a TEE-aware Kata runtime class instead of the default ``runc``-based runtime. +* It requests GPU and NVSwitch resources using the resource types advertised by the NVIDIA + Kata sandbox device plugin, which can be either default names or model-specific names. +* For NVSwitch-based HGX systems, it requests every GPU and NVSwitch on the node together so + that all devices reside inside the same Confidential Container virtual machine. + +This page describes each of these decisions and provides single-GPU and multi-GPU passthrough +manifest examples that you can copy and adapt to your environment. + +Before beginning, you should configure your cluster to deploy Confidential Containers workloads using the :doc:`Confidential Containers deployment ` steps. + +******************************** +Select a Container Runtime Class +******************************** + +A Confidential Container workload must set ``spec.runtimeClassName`` to a TEE-aware Kata +runtime that NVIDIA provides through the ``kata-deploy`` Helm chart. +Select the runtime class based on the CPU TEE on the target worker node: + +.. list-table:: + :header-rows: 1 + :widths: 30 40 30 + + * - Node TEE + - Runtime class + - Typical CPU vendor + * - AMD SEV-SNP + - ``kata-qemu-nvidia-gpu-snp`` + - AMD EPYC (Genoa or newer) + * - Intel TDX + - ``kata-qemu-nvidia-gpu-tdx`` + - Intel Xeon (Sapphire Rapids or newer) + +The ``kata-deploy`` chart also installs a ``kata-qemu-nvidia-gpu`` runtime class. +That class is intended for non-confidential Kata workloads. You should not use it for Confidential +Container workloads because it does not start the GPU in CC mode. + +.. _coco-resource-types: + +***************************************** +Reference GPU and NVSwitch Resource Types +***************************************** + +The NVIDIA Kata sandbox device plugin advertises GPUs and NVSwitches to Kubernetes as extended resources. +Your pod manifest requests those resources under ``resources.limits``. +You can use either the default resource types or model-specific resource types. + +By default, every passthrough GPU is advertised as ``nvidia.com/pgpu`` and every NVSwitch is advertised as ``nvidia.com/nvswitch``. +These names are stable across GPU models, which keeps manifests portable when every node in your cluster has the same GPU type. + +A sample resource request using the default resource type is shown below: + +.. code-block:: yaml + + resources: + limits: + nvidia.com/pgpu: "1" + +In heterogeneous clusters, where worker nodes use different GPU models, you can configure the Kata sandbox device plugin to advertise resources under model-specific names by setting +``P_GPU_ALIAS=""`` (and optionally ``NVSWITCH_ALIAS=""``) on the plugin. +With this configuration, GPUs are exposed as resources such as ``nvidia.com/GH100_H200_141GB``, +which lets a workload pin itself to a specific accelerator model. + +Refer to :ref:`Configuring GPU or NVSwitch Resource Types Name ` +for the GPU Operator install flags that enable this behavior. + +Use the model-specific resource name in workloads that must target a specific accelerator: + +.. code-block:: yaml + + resources: + limits: + nvidia.com/GH100_H200_141GB: "1" + +To list the GPU and NVSwitch resource types advertised on a node, run: + +.. code-block:: console + + $ kubectl get node $NODE_NAME -o json | grep nvidia.com + +*Example Output:* + +.. code-block:: output + + "nvidia.com/GH100_H200_141GB": "1" + +.. _coco-single-gpu-workload: + +********************** +Single-GPU Passthrough +********************** + +A single-GPU workload requests one GPU and runs inside its own Confidential Container virtual +machine. +This pattern is the recommended starting point for verifying a deployment and for most +independent workloads that do not require NVLink between GPUs. + +#. Create a file, such as ``cuda-vectoradd-kata.yaml``: + + .. code-block:: yaml + :emphasize-lines: 7,14 + + apiVersion: v1 + kind: Pod + metadata: + name: cuda-vectoradd-kata + namespace: default + spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx + restartPolicy: Never + containers: + - name: cuda-vectoradd + image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" + resources: + limits: + nvidia.com/pgpu: "1" + memory: 16Gi + + .. note:: + + If you configured the Kata sandbox device plugin to use model-specific resource types, + replace ``nvidia.com/pgpu`` with the appropriate model-specific name, for example + ``nvidia.com/GH100_H200_141GB``. + +#. Create the pod: + + .. code-block:: console + + $ kubectl apply -f cuda-vectoradd-kata.yaml + +#. Verify the workload completes successfully: + + .. code-block:: console + + $ kubectl logs cuda-vectoradd-kata + + *Example Output:* + + .. code-block:: output + + [Vector addition of 50000 elements] + Copy input data from the host memory to the CUDA device + CUDA kernel launch with 196 blocks of 256 threads + Copy output data from the CUDA device to the host memory + Test PASSED + Done + +Refer to :doc:`run-sample-workload` for the end-to-end verification flow including +deletion and troubleshooting tips. + +.. _coco-multi-gpu-prereqs: +.. _coco-multi-gpu-passthrough: + +********************* +Multi-GPU Passthrough +********************* + +Multi-GPU passthrough assigns every GPU and NVSwitch on a node to a single Confidential +Container virtual machine. +This configuration is required for NVSwitch (NVLink) based HGX systems running confidential +workloads. + +.. important:: + + You must assign all the GPUs and NVSwitches on the node to the same Confidential Container + virtual machine. + Configuring only a subset of GPUs for Confidential Computing on a single node is not + supported. + +NVIDIA Hopper PPCIE Mode +======================== + +For NVIDIA Hopper GPUs, multi-GPU passthrough requires protected PCIe (PPCIE) mode, which +claims exclusive use of the NVSwitches for a single Confidential Container. +The NVIDIA Confidential Computing Manager for Kubernetes transitions GPUs into the correct +mode based on the ``cc.mode`` label that you set. + +#. Set the ``NODE_NAME`` environment variable to the node you want to configure: + + .. code-block:: console + + $ export NODE_NAME="" + +#. Apply the ``ppcie`` CC mode label to the node: + + .. code-block:: console + + $ kubectl label node $NODE_NAME nvidia.com/cc.mode=ppcie --overwrite + +Refer to :doc:`Managing the Confidential Computing Mode ` for full details +on setting the CC mode and verifying the change. + +NVIDIA Blackwell GPUs use NVLink encryption, which places the switches outside of the +Trusted Computing Base (TCB), so the default CC mode of ``on`` is sufficient and no additional +configuration is required. + +Run a Multi-GPU Workload +======================== + +#. Create a file, such as ``multi-gpu-kata.yaml``, with a pod manifest that requests every GPU + and NVSwitch on the node: + + .. code-block:: yaml + :emphasize-lines: 7,14-16 + + apiVersion: v1 + kind: Pod + metadata: + name: multi-gpu-kata + namespace: default + spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx + restartPolicy: Never + containers: + - name: cuda-sample + image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" + resources: + limits: + nvidia.com/pgpu: "8" + nvidia.com/nvswitch: "4" # Only for NVIDIA Hopper GPUs with PPCIE mode + memory: 128Gi + + .. note:: + + If you configured ``P_GPU_ALIAS`` or ``NVSWITCH_ALIAS`` for heterogeneous clusters, + replace ``nvidia.com/pgpu`` and ``nvidia.com/nvswitch`` with the corresponding + model-specific resource types. + Refer to :ref:`Reference GPU and NVSwitch Resource Types ` + for details. + +#. Create the pod: + + .. code-block:: console + + $ kubectl apply -f multi-gpu-kata.yaml + + *Example Output:* + + .. code-block:: output + + pod/multi-gpu-kata created + +#. Verify the pod is running: + + .. code-block:: console + + $ kubectl get pod multi-gpu-kata + + *Example Output:* + + .. code-block:: output + + NAME READY STATUS RESTARTS AGE + multi-gpu-kata 1/1 Running 0 30s + +#. Verify that all GPUs are visible inside the container: + + .. code-block:: console + + $ kubectl exec multi-gpu-kata -- nvidia-smi -L + + *Example Output:* + + .. code-block:: output + + GPU 0: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 1: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 2: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 3: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 4: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 5: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 6: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + GPU 7: NVIDIA H100 (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) + +#. Delete the pod: + + .. code-block:: console + + $ kubectl delete -f multi-gpu-kata.yaml diff --git a/confidential-containers/configure.rst b/confidential-containers/configure.rst new file mode 100644 index 000000000..f02afa7e3 --- /dev/null +++ b/confidential-containers/configure.rst @@ -0,0 +1,57 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _configure-confidential-containers: + +################################# +Configure Confidential Containers +################################# + +After deploying Confidential Containers, you can configure additional options for your environment. +Use the cards below to navigate to a specific configuration topic. + + +.. grid:: 3 + :gutter: 3 + + .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation + :link: attestation + :link-type: doc + + Configure remote attestation, Trustee, and the NVIDIA verifier for GPU workloads. + + .. grid-item-card:: :octicon:`gear;1.5em;sd-mr-1` Managing the CC Mode + :link: configure-cc-mode + :link-type: doc + + Set the confidential computing mode on NVIDIA GPUs at the cluster or node level. + + .. grid-item-card:: :octicon:`cpu;1.5em;sd-mr-1` Configuring Workloads + :link: configure-multi-gpu + :link-type: doc + + Configure Confidential Container workloads, including runtime class selection, GPU and + NVSwitch resource types, and single- or multi-GPU passthrough. + + .. grid-item-card:: :octicon:`stack;1.5em;sd-mr-1` Multi-GPU Passthrough + :link: coco-multi-gpu-passthrough + :link-type: ref + + Assign every GPU and NVSwitch on a node to a single Confidential Container virtual + machine for NVSwitch-based HGX systems. diff --git a/confidential-containers/index.rst b/confidential-containers/index.rst index a5024ad2d..aa8ca9ade 100644 --- a/confidential-containers/index.rst +++ b/confidential-containers/index.rst @@ -16,20 +16,43 @@ .. headings # #, * *, =, -, ^, " -********************************************************** +########################################### NVIDIA Confidential Containers Architecture -********************************************************** +########################################### .. toctree:: :caption: NVIDIA Confidential Containers Architecture :hidden: :titlesonly: - Release Notes Overview Supported Platforms + +.. toctree:: + :caption: Install + :hidden: + :titlesonly: + + Prerequisites Deploy Confidential Containers + Run a Sample Workload + +.. toctree:: + :caption: Configure + :hidden: + :titlesonly: + + Configure Overview Attestation + Managing the Confidential Computing Mode + Configuring Workloads + +.. toctree:: + :caption: Reference + :hidden: + :titlesonly: + + Release Notes Licensing @@ -51,28 +74,28 @@ This is documentation for NVIDIA's implementation of Confidential Containers inc Learn about the validated hardware, OS, and component versions. - .. grid-item-card:: :octicon:`rocket;1.5em;sd-mr-1` Deploy Confidential Containers - :link: confidential-containers-deploy + .. grid-item-card:: :octicon:`checklist;1.5em;sd-mr-1` Prerequisites + :link: prerequisites :link-type: doc - Use this page to deploy with the NVIDIA GPU Operator on Kubernetes. + Hardware, BIOS, and Kubernetes cluster requirements. - .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation - :link: attestation + .. grid-item-card:: :octicon:`rocket;1.5em;sd-mr-1` Deploy Confidential Containers + :link: confidential-containers-deploy :link-type: doc - Learn about remote attestation, Trustee, and the NVIDIA verifier for GPU workloads. - + Install Kata Containers and the NVIDIA GPU Operator on Kubernetes. - .. grid-item-card:: :octicon:`note;1.5em;sd-mr-1` Release Notes - :link: release-notes + .. grid-item-card:: :octicon:`play;1.5em;sd-mr-1` Run a Sample Workload + :link: run-sample-workload :link-type: doc - Review new features and known issues for each release. + Run a sample GPU workload in a confidential container. - .. grid-item-card:: :octicon:`law;1.5em;sd-mr-1` Licensing - :link: licensing + .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation + :link: attestation :link-type: doc - Learn about the licensing information for Confidential Containers documentation. + Remote attestation, Trustee, and the NVIDIA verifier for GPU workloads. + diff --git a/confidential-containers/licensing.rst b/confidential-containers/licensing.rst index 43d76fff9..d30207776 100644 --- a/confidential-containers/licensing.rst +++ b/confidential-containers/licensing.rst @@ -16,9 +16,9 @@ .. headings # #, * *, =, -, ^, " -********* +######### Licensing -********* +######### While the Confidential Containers (CoCo) Reference Architecture includes some components that are open source, the NVIDIA Confidential Computing capability is a licensed feature for production use cases. To use these products, you must have a valid NVIDIA Confidential Computing license. diff --git a/confidential-containers/overview.rst b/confidential-containers/overview.rst index 2b9646695..47de85d19 100644 --- a/confidential-containers/overview.rst +++ b/confidential-containers/overview.rst @@ -17,9 +17,9 @@ .. headings # #, * *, =, -, ^, " -***************************************************** +##################################################### NVIDIA Confidential Containers Reference Architecture -***************************************************** +##################################################### NVIDIA GPUs with Confidential Computing support provide the hardware foundation for running GPU workloads inside a hardware-enforced Trusted Execution Environment (TEE). The NVIDIA Confidential Containers Reference Architecture provides a validated deployment model for cluster administrators interested in leveraging NVIDIA GPU Confidential Computing capabilities on Kubernetes platforms. @@ -31,8 +31,9 @@ Refer to the `Confidential Containers .. _confidential-containers-overview: +********** Background -========== +********** NVIDIA GPUs power the training and deployment of Frontier Models—world-class Large Language Models (LLMs) that define the state of the art in AI reasoning and capability. @@ -45,8 +46,9 @@ The Confidential Containers project leverages Kata Containers to provide the san .. _coco-use-cases: +********* Use Cases -========= +********* The target for Confidential Containers is to enable model providers (closed and open source) and Enterprises to use the advancements of Gen AI, agnostic to the deployment model (Cloud, Enterprise, or Edge). Some of the key use cases that CC and Confidential Containers enable are: @@ -61,8 +63,9 @@ The target for Confidential Containers is to enable model providers (closed and .. _coco-architecture: +********************* Architecture Overview -===================== +********************* NVIDIA's approach to the Confidential Containers architecture delivers on the key promise of Confidential Computing: confidentiality, integrity, and verifiability. Integrating open source and NVIDIA software components with the Confidential Computing capabilities of NVIDIA GPUs, the Reference Architecture for Confidential Containers is designed to be the secure and trusted deployment model for AI workloads. @@ -89,8 +92,9 @@ The components are described in more detail in the next section. .. _coco-supported-platforms-components: +*********************************************** Software Components for Confidential Containers -=============================================== +*********************************************** The following is a brief overview of the software components in NVIDIA's Reference Architecture for Confidential Containers. Refer to the diagram above for a visual representation of the components. @@ -160,7 +164,7 @@ A minimal hardened init system that securely bootstraps the guest environment, l .. _coco-gpu-operator-cluster-topology: GPU Operator Cluster Topology Considerations --------------------------------------------- +============================================ The GPU Operator deploys and manages components for allocating and utilizing the GPU resources on your cluster. Depending on how you configure the Operator, different components are deployed on the worker nodes. @@ -183,21 +187,22 @@ Consider the following example where node A is configured to run traditional con * Node Feature Discovery * NVIDIA GPU Feature Discovery - * NVIDIA Confidential Computing Manager for Kubernetes - * NVIDIA Sandbox Device Plugin + * NVIDIA Kata Sandbox Device Plugin * NVIDIA VFIO Manager * Node Feature Discovery This configuration can be controlled through node labelling, as described in the :doc:`Confidential Containers deployment guide `. +******************************************* Supported Features and Deployment Scenarios -=========================================== +******************************************* The following features are supported with Confidential Containers: * Support for Confidential Container workloads as - * Single-GPU passthrough (one physical GPU per pod). - * Multi-GPU passthrough on NVSwitch (NVLink) based HGX systems. + * :ref:`Single-GPU passthrough ` (one physical GPU per pod). + * :ref:`Multi-GPU passthrough ` on NVSwitch (NVLink) based HGX systems. .. note:: @@ -218,8 +223,9 @@ More information on these features can be found in the `Confidential Containers .. _coco-limitations: +**************************** Limitations and Restrictions -============================ +**************************** * NVIDIA supports the GPU Operator and confidential computing with the containerd runtime only. * All GPUs on the host must be configured for Confidential Computing. @@ -241,7 +247,7 @@ Limitations and Restrictions Refer to the `QEMU IOMMUFD documentation `_ for more information. Security Considerations ------------------------ +======================= * Application security defects: Confidential Computing does not protect against threats within the confidential VM, including vulnerabilities in the application itself. Applications must still follow security best practices such as input validation. @@ -259,8 +265,9 @@ Security Considerations * Availability: Confidential Computing does not provide availability guarantees. Achieve availability through replication, which is standard practice in Kubernetes deployments. +********** Next Steps -========== +********** Refer to the following pages to learn more about deploying with Confidential Containers: .. grid:: 3 diff --git a/confidential-containers/prerequisites.rst b/confidential-containers/prerequisites.rst new file mode 100644 index 000000000..29c4128d3 --- /dev/null +++ b/confidential-containers/prerequisites.rst @@ -0,0 +1,181 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _coco-prerequisites: + +############# +Prerequisites +############# + +The following prerequisites are required to configure your cluster to deploy Confidential Containers. + +Refer to the :doc:`Supported Platforms ` page for validated hardware and software versions. + +***************** +Hardware and BIOS +***************** + +* Use a supported platform configured for Confidential Computing. + For more information on machine setup, refer to :doc:`Supported Platforms `. + +* Ensure hosts are configured to enable hardware virtualization and Access Control Services (ACS). With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER). Enable these features in the host BIOS. + +* Configure hosts to support IOMMU. + You can check if your host is configured for IOMMU by running the following command: + + .. code-block:: console + + $ ls /sys/kernel/iommu_groups + + If the output of this command includes 0, 1, and so on, then your host is configured for IOMMU. + + If the host is not configured or if you are unsure, add the ``amd_iommu=on`` Linux kernel command-line argument for AMD CPUs, or ``intel_iommu=on`` for Intel CPUs. For most Linux distributions, add the argument to the ``/etc/default/grub`` file, for instance: + + .. code-block:: console + + ... + GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on modprobe.blacklist=nouveau" + ... + + After making the change, configure the bootloader. + + .. code-block:: console + + $ sudo update-grub + + *Example Output:* + + .. code-block:: output + + Sourcing file `/etc/default/grub' + Generating grub configuration file ... + Found linux image: /boot/vmlinuz-5.15.0-generic + Found initrd image: /boot/initrd.img-5.15.0-generic + done + + Reboot the host after configuring the bootloader. + + .. note:: + + After configuring IOMMU, you might see QEMU warnings about PCI P2P DMA when running GPU workloads. + These are expected and can be safely ignored. + Refer to :ref:`coco-limitations` for details. + +* Ensure that no NVIDIA GPU drivers are installed on the host. + Confidential Containers uses VFIO to pass GPUs directly to the confidential VM, and host-level GPU drivers interfere with VFIO device binding. + + To check if NVIDIA GPU drivers are installed, run the following command: + + .. code-block:: console + + $ lsmod | grep nvidia + + If the output is empty, no NVIDIA GPU drivers are loaded. + If modules such as ``nvidia``, ``nvidia_uvm``, or ``nvidia_modeset`` are listed, NVIDIA GPU drivers are present and must be removed before proceeding. + Refer to `Removing the Driver `_ in the NVIDIA Driver Installation Guide. + +****************** +Kubernetes Cluster +****************** + +* A Kubernetes cluster with cluster administrator privileges. + Refer to the :ref:`Supported Software Components ` table for supported Kubernetes versions. + +* containerd version 2.2.2 installed. + Refer to the `containerd Getting Started guide `_ for installation instructions. + + To verify the installed version, run the following command: + + .. code-block:: console + + $ containerd --version + + *Example Output:* + + .. code-block:: output + + containerd containerd.io 2.2.2 ... + +* Helm installed. + Use the command below to install Helm or refer to the `Helm documentation `_ for installation instructions. + + .. code-block:: console + + $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \ + && chmod 700 get_helm.sh \ + && ./get_helm.sh + +* Enable the ``KubeletPodResourcesGet`` and ``RuntimeClassInImageCriApi`` Kubelet feature gates on your cluster. + + * ``KubeletPodResourcesGet``: Enabled by default on Kubernetes v1.34 and later. + On older versions, you must enable it explicitly. + The Kata runtime uses this feature gate to query the Kubelet Pod Resources API and discover allocated GPU devices during sandbox creation. + + * ``RuntimeClassInImageCriApi``: Alpha since Kubernetes v1.29 and is not enabled by default. + This feature gate is required to support pod deployments that use multiple snapshotters side-by-side. + + Add both feature gates to your Kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): + + .. code-block:: yaml + + apiVersion: kubelet.config.k8s.io/v1beta1 + kind: KubeletConfiguration + featureGates: + KubeletPodResourcesGet: true + RuntimeClassInImageCriApi: true + + If your ``config.yaml`` already has a ``featureGates`` section, add the gates to the existing section rather than creating a duplicate. + + Restart the Kubelet service to apply the changes: + + .. code-block:: console + + $ sudo systemctl restart kubelet + +* Configure image pull timeouts. The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start. + Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state. + + If you plan to use large images, increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers. + + Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``): + + .. code-block:: yaml + :emphasize-lines: 3 + + apiVersion: kubelet.config.k8s.io/v1beta1 + kind: KubeletConfiguration + runtimeRequestTimeout: 20m + + Restart the kubelet service to apply the change: + + .. code-block:: console + + $ sudo systemctl restart kubelet + + Optionally, you can configure additional timeouts for the NVIDIA Shim and Kata Agent Policy. + The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes). + This controls the time the shim allows for a container to remain in container creating state. + If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull. + To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation. + +********** +Next Steps +********** + +After completing the prerequisites, proceed to :doc:`Deploy Confidential Containers `. diff --git a/confidential-containers/release-notes.rst b/confidential-containers/release-notes.rst index 5f7ddbe4f..94e430ee9 100644 --- a/confidential-containers/release-notes.rst +++ b/confidential-containers/release-notes.rst @@ -18,9 +18,9 @@ .. _coco-release-notes: -************* +############# Release Notes -************* +############# This document describes the new features and known issues for the NVIDIA Confidential Containers Reference Architecture. @@ -28,8 +28,9 @@ This document describes the new features and known issues for the NVIDIA Confide .. _coco-v1.0.0: +***** 1.0.0 -===== +***** This is the initial general availability (GA) release of the NVIDIA Confidential Containers Reference Architecture, a validated deployment model for running GPU-accelerated AI workloads inside hardware-enforced Trusted Execution Environments (TEEs). It is designed for organizations in regulated industries that require strong isolation and cryptographic verification to protect model intellectual property and sensitive data on untrusted infrastructure. @@ -37,7 +38,7 @@ It is designed for organizations in regulated industries that require strong iso The architecture combines NVIDIA GPU Confidential Computing, Kata Containers, and the NVIDIA GPU Operator to provide a secure, attestable, Kubernetes-native platform for confidential AI workloads. Key Features ------------- +============ * This release supports HGX platforms with: @@ -66,7 +67,7 @@ Key Features Limitations and Restrictions ----------------------------- +============================ * NVIDIA supports the GPU Operator and confidential computing with the containerd runtime only. diff --git a/confidential-containers/run-sample-workload.rst b/confidential-containers/run-sample-workload.rst new file mode 100644 index 000000000..10f5668bb --- /dev/null +++ b/confidential-containers/run-sample-workload.rst @@ -0,0 +1,125 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +.. _coco-run-sample-workload: + +##################### +Run a Sample Workload +##################### + +After completing the :doc:`deployment steps `, verify your +installation by running a basic single-GPU sample workload inside a Confidential Container. + +This page intentionally uses the simplest possible manifest so that you can confirm the +deployment end-to-end. +For the full set of workload configuration options, including runtime class selection, +resource type naming, and multi-GPU passthrough, refer to +:doc:`Configuring Confidential Container Workloads `. + +#. Create a file named ``cuda-vectoradd-kata.yaml`` with the following sample manifest: + + .. code-block:: yaml + :emphasize-lines: 7,14 + + apiVersion: v1 + kind: Pod + metadata: + name: cuda-vectoradd-kata + namespace: default + spec: + runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx + restartPolicy: Never + containers: + - name: cuda-vectoradd + image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04" + resources: + limits: + nvidia.com/pgpu: "1" + memory: 16Gi + + Before applying the manifest, adjust the two highlighted lines for your environment: + + * **Runtime class.** Use ``kata-qemu-nvidia-gpu-snp`` on AMD SEV-SNP nodes or + ``kata-qemu-nvidia-gpu-tdx`` on Intel TDX nodes. + * **GPU resource type.** The sample requests ``nvidia.com/pgpu``, which is the default + resource name advertised by the NVIDIA Kata sandbox device plugin. + If your cluster was installed with the ``P_GPU_ALIAS=""`` setting, replace it with the + model-specific name advertised on your node, for example ``nvidia.com/GH100_H200_141GB``. + + Refer to :doc:`Configuring Confidential Container Workloads ` for + guidance on each option. + +#. Create the pod: + + .. code-block:: console + + $ kubectl apply -f cuda-vectoradd-kata.yaml + + *Example Output:* + + .. code-block:: output + + pod/cuda-vectoradd-kata created + +#. Optional: Verify the pod is running: + + .. code-block:: console + + $ kubectl get pod cuda-vectoradd-kata + + *Example Output:* + + .. code-block:: output + + NAME READY STATUS RESTARTS AGE + cuda-vectoradd-kata 1/1 Running 0 10s + +#. View the logs from the pod after the container starts: + + .. code-block:: console + + $ kubectl logs -n default cuda-vectoradd-kata + + *Example Output:* + + .. code-block:: output + + [Vector addition of 50000 elements] + Copy input data from the host memory to the CUDA device + CUDA kernel launch with 196 blocks of 256 threads + Copy output data from the CUDA device to the host memory + Test PASSED + Done + +#. Delete the pod: + + .. code-block:: console + + $ kubectl delete -f cuda-vectoradd-kata.yaml + + +********** +Next Steps +********** + +* :doc:`Configure Confidential Container workloads ` for runtime class + selection, resource type naming, and single- or multi-GPU passthrough patterns. +* Configure :doc:`Attestation ` with the Trustee framework to enable remote + verification of your confidential environment. +* Manage the :doc:`confidential computing mode ` on your GPUs. diff --git a/confidential-containers/supported-platforms.rst b/confidential-containers/supported-platforms.rst index 986d170ef..f3391f003 100644 --- a/confidential-containers/supported-platforms.rst +++ b/confidential-containers/supported-platforms.rst @@ -18,17 +18,18 @@ .. _coco-supported-platforms: -******************* +################### Supported Platforms -******************* +################### Following are the platforms supported by the NVIDIA Confidential Containers Reference Architecture. -Supported Hardware Platform -=========================== +******** +Hardware +******** NVIDIA GPUs ------------ +=========== .. list-table:: :header-rows: 1 @@ -57,8 +58,8 @@ NVIDIA GPUs .. note:: - Multi-GPU passthrough on NVIDIA Hopper HGX systems requires that you set the Confidential Computing mode to ``ppcie`` mode. - Refer to :ref:`Managing the Confidential Computing Mode ` in the deployment guide for details. + :ref:`Multi-GPU passthrough ` on NVIDIA Hopper HGX systems requires that you set the Confidential Computing mode to ``ppcie`` mode. + Refer to :doc:`Managing the Confidential Computing Mode ` for details. .. note:: @@ -66,7 +67,7 @@ NVIDIA GPUs Configuring only some GPUs on a node for Confidential Computing is not supported. CPU Platforms -------------- +============= .. flat-table:: :header-rows: 1 @@ -98,7 +99,7 @@ For additional resources on machine setup: .. _coco-supported-software-components: Supported Software Components ------------------------------ +============================= .. flat-table:: :header-rows: 1