From 45f64f91cd59ff670f8ecaafaf5e7f2533831dc5 Mon Sep 17 00:00:00 2001
From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Date: Mon, 11 May 2026 10:28:14 -0400
Subject: [PATCH 1/4] Minor nits in coco docs
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
---
confidential-containers/attestation.rst | 28 ++++---
.../confidential-containers-deploy.rst | 80 +++++++++----------
2 files changed, 55 insertions(+), 53 deletions(-)
diff --git a/confidential-containers/attestation.rst b/confidential-containers/attestation.rst
index 56528dbc5..0f746d139 100644
--- a/confidential-containers/attestation.rst
+++ b/confidential-containers/attestation.rst
@@ -69,8 +69,7 @@ Provision Trustee
Trustee is an open-source framework used in Confidential Containers to verify attestation evidence and conditionally release secrets.
For a full overview of attestation with Trustee, refer to the upstream `Trustee documentation `_.
-To provision a Trustee instance, follow the upstream `Install Trustee in Docker `_ guide.
-This is the recommended install method.
+To provision a Trustee instance, follow the recommended upstream `Install Trustee in Docker `_ guide.
.. note::
@@ -92,25 +91,30 @@ To enable attestation for your workloads, point them to the Trustee network endp
io.katacontainers.config.hypervisor.kernel_params: "agent.aa_kbc_params=cc_kbc::http://:"
-Replace ```` with the IP address or hostname at which your Trustee instance is reachable from the worker nodes, and ```` with the port (default: ``8080``).
+Replace ```` with the IP address or hostname at which your Trustee instance is reachable from the worker nodes.
+Replace ```` with the port that Trustee listens on (default: ``8080``).
Refer to the upstream `Setup Confidential Containers `_ documentation for more information on configuring workloads for attestation.
.. _customize-attestation:
-Customize Attestation Workflows
-===============================
+Optional: Customize Attestation Workflows
+=========================================
-After Trustee is provisioned and workloads are configured, you can customize attestation workflows to enforce your desired security policies.
-This can include configuring the following:
+Confidential Containers enables sensible default attestation policies for NVIDIA Confidential Computing GPUs.
+In most cases, the policy is already configured appropriately and you only need to provide reference values.
+Refer to the upstream `Confidential Containers reference values `_ documentation for more information on the reference values.
-* KBS Client Tool: Configure Trustee resources and secrets by using the Key Broker Service (KBS) Client Tool.
- Refer to the upstream documentation on `using the KBS Client Tool `_.
-* Configure resources: Create resources, or secrets, that your workloads need.
+You can use the Key Broker Service (KBS) Client Tool to configure Trustee reference values and secrets.
+Refer to the upstream documentation on `using the KBS Client Tool `_.
+
+If you choose to customize attestation workflows, refer to the following Confidential Containers documentation for more details:
+
+* Configure resources: Create resources, or secrets, that your workloads need.
Refer to the upstream `Confidential Containers resources `_ documentation for more information on the resources.
-* Configure policies: Confidential Containers uses different policy types to secure workload at different layers.
+* Configure policies: Confidential Containers uses different policy types to secure workloads at different layers.
Refer to the upstream `Confidential Containers policy `_ documentation for more information on the policy types and configuring policies.
-
+
Refer to the upstream `Confidential Containers Features `_ documentation for a full list of attestation features and how to configure them.
Troubleshooting
diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst
index d48fe8797..648a67c4f 100644
--- a/confidential-containers/confidential-containers-deploy.rst
+++ b/confidential-containers/confidential-containers-deploy.rst
@@ -162,7 +162,7 @@ Kubernetes Cluster
* ``RuntimeClassInImageCriApi``: Alpha since Kubernetes v1.29 and is not enabled by default.
This feature gate is required to support pod deployments that use multiple snapshotters side-by-side.
- Add both feature gates to your Kubelet configuration (typically ``/var/lib/kubelet/config.yaml``):
+ Add both feature gates to your Kubelet configuration (typically ``sudo vi /var/lib/kubelet/config.yaml``):
.. code-block:: yaml
@@ -180,6 +180,35 @@ Kubernetes Cluster
$ sudo systemctl restart kubelet
+.. _configure-image-pull-timeouts:
+
+* Configure image pull timeouts. The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start.
+ Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state.
+
+ If you plan to use large images, increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers.
+
+ Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``):
+
+ .. code-block:: yaml
+ :emphasize-lines: 3
+
+ apiVersion: kubelet.config.k8s.io/v1beta1
+ kind: KubeletConfiguration
+ runtimeRequestTimeout: 20m
+
+ Restart the kubelet service to apply the change:
+
+ .. code-block:: console
+
+ $ sudo systemctl restart kubelet
+
+ Optionally, you can configure additional timeouts for the NVIDIA Shim and Kata Agent Policy.
+ The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
+ This controls the time the shim allows for a container to remain in container creating state.
+ If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
+ To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation.
+
+
.. _installation-and-configuration:
Installation
@@ -461,7 +490,7 @@ For further configuration settings, refer to the following sections:
Run a Sample Workload
=====================
-A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX.
+A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for AMD based systems or ``kata-qemu-nvidia-gpu-tdx`` for Intel based systems.
1. Create a file, such as the following ``cuda-vectoradd-kata.yaml`` sample, specifying the kata-qemu-nvidia-gpu-snp runtime class:
@@ -474,35 +503,37 @@ A pod manifest for a confidential container GPU workload requires that you speci
name: cuda-vectoradd-kata
namespace: default
spec:
- runtimeClassName: kata-qemu-nvidia-gpu-snp
+ runtimeClassName: kata-qemu-nvidia-gpu-snp # or kata-qemu-nvidia-gpu-tdx
restartPolicy: Never
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
resources:
limits:
- nvidia.com/pgpu: "1"
+ nvidia.com/pgpu: "1" # for single GPU passthrough
memory: 16Gi
The following are Confidential Containers configurations in the sample manifest:
- * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX, depending on the node type where the workloads should run.
+ * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for AMD based systems or ``kata-qemu-nvidia-gpu-tdx`` for Intel based systems, depending on the node type where the workloads should run.
* In the sample above, ``nvidia.com/pgpu`` is the default resource type for GPUs.
If you are deploying on a heterogeneous cluster, you might want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the Kata device plugin.
Refer to the :ref:`Configuring GPU or NVSwitch Resource Types Name ` section on this page for more details.
- * If you have machines that support multi-GPU passthrough, use a pod deployment manifest that specifies 8 PGPU and 4 NVSwitch resources.
+ * If you have machines that support multi-GPU passthrough, use a pod deployment manifest that specifies 8 PGPU.
+ If you are using NVIDIA Hopper GPUs with PPCIE mode, also specify 4 NVSwitch resources.
.. code-block:: yaml
resources:
limits:
nvidia.com/pgpu: "8"
- nvidia.com/nvswitch: "4"
+ nvidia.com/nvswitch: "4" # Only for NVIDIA Hopper GPUs with PPCIE mode
.. note::
- If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, also refer to :ref:`Managing the Confidential Computing Mode ` for details on how to set the ``ppcie`` mode.
+ If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, you must also set the Confidential Computing mode to ``ppcie`` mode.
+ Refer to :ref:`Managing the Confidential Computing Mode ` for details.
2. Create the pod:
@@ -807,39 +838,6 @@ Refer to the :ref:`Managing the Confidential Computing Mode `_ to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers.
-
-Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``):
-
-.. code-block:: yaml
- :emphasize-lines: 3
-
- apiVersion: kubelet.config.k8s.io/v1beta1
- kind: KubeletConfiguration
- runtimeRequestTimeout: 20m
-
-Restart the kubelet service to apply the change:
-
-.. code-block:: console
-
- $ sudo systemctl restart kubelet
-
-Additional timeouts to consider updating are the NVIDIA Shim and Kata Agent Policy timeouts.
-The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
-This controls the time the shim allows for a container to remain in container creating state.
-
-If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
-To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation.
-
-
Next Steps
==========
From b4d8467c1e1713682e934ee071a412b9a63d4601 Mon Sep 17 00:00:00 2001
From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Date: Mon, 11 May 2026 14:51:50 -0400
Subject: [PATCH 2/4] make attestion more clearly call out source of truth is
upstream docs
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
---
confidential-containers/attestation.rst | 35 ++++++++++---------
.../confidential-containers-deploy.rst | 2 +-
2 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/confidential-containers/attestation.rst b/confidential-containers/attestation.rst
index 0f746d139..55be9dbdc 100644
--- a/confidential-containers/attestation.rst
+++ b/confidential-containers/attestation.rst
@@ -23,8 +23,9 @@
Attestation
***********
-This page provides an overview of how to configure remote attestation for Confidential Container workloads.
-Attestation cryptographically verifies the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload.
+
+The :doc:`Confidential Containers deployment guide ` configures your cluster to run workloads in a Confidential Container.
+To strengthen workload security, configure attestation to verify the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload.
Attestation is required for any feature that depends on secrets, including:
@@ -35,8 +36,14 @@ Attestation is required for any feature that depends on secrets, including:
When a workload requires a secret, such as a key to decrypt a container image or model, guest components collect hardware evidence from the active CPU and GPU enclaves.
The evidence is sent to a remote verifier, Trustee, which evaluates the evidence against configured policies and conditionally releases the secret.
+Trustee is typically deployed in a separate trusted environment that is reachable from your worker nodes over the network.
+
+.. note::
-For background on how attestation fits into the Confidential Containers architecture, refer to the :doc:`NVIDIA Confidential Containers Reference Architecture overview `.
+ This page is an educational overview of attestation with Confidential Containers, not a complete configuration guide.
+ The attestation workflow is fully documented in the upstream `Confidential Containers documentation `_, which is the source of truth for setup and configuration details.
+
+ Attestation is not required to deploy Confidential Containers; it is needed only for features that rely on secret release, such as those listed above.
Prerequisites
@@ -85,7 +92,7 @@ After you complete installation, Trustee is configured to use the NVIDIA Remote
Configure Workloads for Attestation
====================================
-To enable attestation for your workloads, point them to the Trustee network endpoint, sometimes referred to as the Key Broker Service (KBS) endpoint, by adding the following annotation to your workload pod spec:
+To enable attestation for your workloads, point them to the Trustee network endpoint, also called the Key Broker Service (KBS) endpoint, by adding the following annotation to your workload pod spec:
.. code-block:: yaml
@@ -102,20 +109,17 @@ Optional: Customize Attestation Workflows
=========================================
Confidential Containers enables sensible default attestation policies for NVIDIA Confidential Computing GPUs.
-In most cases, the policy is already configured appropriately and you only need to provide reference values.
-Refer to the upstream `Confidential Containers reference values `_ documentation for more information on the reference values.
+In most cases, the default policy is appropriate and you only need to provide reference values.
+For more information, refer to the upstream `Confidential Containers reference values `_ documentation.
You can use the Key Broker Service (KBS) Client Tool to configure Trustee reference values and secrets.
Refer to the upstream documentation on `using the KBS Client Tool `_.
-If you choose to customize attestation workflows, refer to the following Confidential Containers documentation for more details:
-
-* Configure resources: Create resources, or secrets, that your workloads need.
- Refer to the upstream `Confidential Containers resources `_ documentation for more information on the resources.
-* Configure policies: Confidential Containers uses different policy types to secure workloads at different layers.
- Refer to the upstream `Confidential Containers policy `_ documentation for more information on the policy types and configuring policies.
+For more advanced customization, refer to the following upstream Confidential Containers documentation:
-Refer to the upstream `Confidential Containers Features `_ documentation for a full list of attestation features and how to configure them.
+* `Resources `_: Create the resources, such as secrets, that your workloads need.
+* `Policies `_: Configure the policy types that secure workloads at different layers.
+* `Features `_: Browse the full list of attestation features and how to configure them.
Troubleshooting
===============
@@ -126,6 +130,5 @@ Use the Trustee log to diagnose the attestation process.
Next Steps
==========
-* Refer to the :doc:`deployment guide ` for Confidential Containers setup instructions.
-* Refer to the upstream `Confidential Containers Features `_ documentation for a complete list of attestation-dependent features.
-* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information.
+* Refer to the upstream `Confidential Containers Features `_ for complete documentation on attestation features.
+* If you haven't already, refer to the :doc:`Confidential Containers deployment guide ` to configure your environment for confidential workloads.
diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst
index 648a67c4f..6b79090ff 100644
--- a/confidential-containers/confidential-containers-deploy.rst
+++ b/confidential-containers/confidential-containers-deploy.rst
@@ -586,6 +586,7 @@ A pod manifest for a confidential container GPU workload requires that you speci
$ kubectl delete -f cuda-vectoradd-kata.yaml
+
.. _coco-configuration-settings:
Common GPU Operator Configuration Settings
@@ -844,5 +845,4 @@ Next Steps
* Refer to the :doc:`Attestation ` page for more information on configuring attestation.
* To help manage the lifecycle of Kata Containers, install the `Kata Lifecycle Manager `_.
This Argo Workflows-based tool manages Kata Containers upgrades and day-two operations.
-* Refer to the `NVIDIA Confidential Computing documentation `_ for additional information.
* Licensing information is available on the :doc:`Licensing ` page.
\ No newline at end of file
From 453c7a63c116999b78b8222509e49f3e22e2d478 Mon Sep 17 00:00:00 2001
From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Date: Tue, 12 May 2026 21:41:32 -0400
Subject: [PATCH 3/4] remove bullet from cc.mode, revert attestation changes
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
---
confidential-containers/attestation.rst | 36 ++++++++-----------
.../confidential-containers-deploy.rst | 1 -
2 files changed, 14 insertions(+), 23 deletions(-)
diff --git a/confidential-containers/attestation.rst b/confidential-containers/attestation.rst
index 55be9dbdc..77f70806b 100644
--- a/confidential-containers/attestation.rst
+++ b/confidential-containers/attestation.rst
@@ -23,9 +23,8 @@
Attestation
***********
-
-The :doc:`Confidential Containers deployment guide ` configures your cluster to run workloads in a Confidential Container.
-To strengthen workload security, configure attestation to verify the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload.
+This page provides an overview of how to configure remote attestation for Confidential Container workloads.
+Attestation cryptographically verifies the guest Trusted Execution Environment (TEE) for the CPU and GPU before secrets are released to a workload.
Attestation is required for any feature that depends on secrets, including:
@@ -36,14 +35,8 @@ Attestation is required for any feature that depends on secrets, including:
When a workload requires a secret, such as a key to decrypt a container image or model, guest components collect hardware evidence from the active CPU and GPU enclaves.
The evidence is sent to a remote verifier, Trustee, which evaluates the evidence against configured policies and conditionally releases the secret.
-Trustee is typically deployed in a separate trusted environment that is reachable from your worker nodes over the network.
-
-.. note::
-
- This page is an educational overview of attestation with Confidential Containers, not a complete configuration guide.
- The attestation workflow is fully documented in the upstream `Confidential Containers documentation `_, which is the source of truth for setup and configuration details.
- Attestation is not required to deploy Confidential Containers; it is needed only for features that rely on secret release, such as those listed above.
+For background on how attestation fits into the Confidential Containers architecture, refer to the :doc:`NVIDIA Confidential Containers Reference Architecture overview `.
Prerequisites
@@ -108,18 +101,17 @@ Refer to the upstream `Setup Confidential Containers `_ documentation.
-
-You can use the Key Broker Service (KBS) Client Tool to configure Trustee reference values and secrets.
-Refer to the upstream documentation on `using the KBS Client Tool `_.
-
-For more advanced customization, refer to the following upstream Confidential Containers documentation:
-
-* `Resources `_: Create the resources, such as secrets, that your workloads need.
-* `Policies `_: Configure the policy types that secure workloads at different layers.
-* `Features `_: Browse the full list of attestation features and how to configure them.
+After Trustee is provisioned and workloads are configured, you can customize attestation workflows to enforce your desired security policies.
+This can include configuring the following:
+
+* KBS Client Tool: Configure Trustee resources and secrets by using the Key Broker Service (KBS) Client Tool.
+ Refer to the upstream documentation on `using the KBS Client Tool `_.
+* Configure resources: Create resources, or secrets, that your workloads need.
+ Refer to the upstream `Confidential Containers resources `_ documentation for more information on the resources.
+* Configure policies: Confidential Containers uses different policy types to secure workload at different layers.
+ Refer to the upstream `Confidential Containers policy `_ documentation for more information on the policy types and configuring policies.
+
+Refer to the upstream `Confidential Containers Features `_ documentation for a full list of attestation features and how to configure them.
Troubleshooting
===============
diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst
index 6b79090ff..31ecb60f4 100644
--- a/confidential-containers/confidential-containers-deploy.rst
+++ b/confidential-containers/confidential-containers-deploy.rst
@@ -696,7 +696,6 @@ When you change the mode, the manager performs the following actions:
However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode.
-* Unbinds the GPU from the VFIO PCI device driver.
* Changes the mode and resets the GPU.
* Reschedules the other GPU Operator operands.
From bd3beb943240ec141a69955e2bf0cf3f07208a7c Mon Sep 17 00:00:00 2001
From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Date: Tue, 12 May 2026 22:44:48 -0400
Subject: [PATCH 4/4] update wording for image pull timeout
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
---
confidential-containers/attestation.rst | 2 +-
.../confidential-containers-deploy.rst | 12 +++++-------
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/confidential-containers/attestation.rst b/confidential-containers/attestation.rst
index 77f70806b..70999ee37 100644
--- a/confidential-containers/attestation.rst
+++ b/confidential-containers/attestation.rst
@@ -108,7 +108,7 @@ This can include configuring the following:
Refer to the upstream documentation on `using the KBS Client Tool `_.
* Configure resources: Create resources, or secrets, that your workloads need.
Refer to the upstream `Confidential Containers resources `_ documentation for more information on the resources.
-* Configure policies: Confidential Containers uses different policy types to secure workload at different layers.
+* Configure policies: Confidential Containers uses different policies to secure workload at different layers.
Refer to the upstream `Confidential Containers policy `_ documentation for more information on the policy types and configuring policies.
Refer to the upstream `Confidential Containers Features `_ documentation for a full list of attestation features and how to configure them.
diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst
index 31ecb60f4..51df262b4 100644
--- a/confidential-containers/confidential-containers-deploy.rst
+++ b/confidential-containers/confidential-containers-deploy.rst
@@ -182,10 +182,11 @@ Kubernetes Cluster
.. _configure-image-pull-timeouts:
-* Configure image pull timeouts. The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start.
+* Increase kubelet image pull timeouts configuration to 20 minutes to avoid timeouts on large image pulls.
Kubelet can de-allocate your pod if the image pull exceeds the configured timeout before the container transitions to the running state.
+ This is more likely to happen when using large images.
- If you plan to use large images, increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the NVIDIA shim configurations in Kata Containers.
+ Increase ``runtimeRequestTimeout`` in your `kubelet configuration `_ to ``20m`` to match the default values for the Kata shim configurations in Kata Containers.
Add or update the ``runtimeRequestTimeout`` field in your kubelet configuration (typically ``/var/lib/kubelet/config.yaml``):
@@ -202,13 +203,10 @@ Kubernetes Cluster
$ sudo systemctl restart kubelet
- Optionally, you can configure additional timeouts for the NVIDIA Shim and Kata Agent Policy.
- The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
- This controls the time the shim allows for a container to remain in container creating state.
- If you need a timeout of more than 1200 seconds, you will also need to adjust Kata Agent Policy's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
+ If you need a timeout of more than 1200 seconds (20 minutes), you will also need to adjust the Kata Agent's ``image_pull_timeout``, which defaults to 1200s.
+ This setting also sets the confidential data hub's image pull API timeout in seconds.
To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation.
-
.. _installation-and-configuration:
Installation