Skip to content

Minor nits in coco docs#394

Open
a-mccarthy wants to merge 2 commits into
NVIDIA:mainfrom
a-mccarthy:kata-coco-nits
Open

Minor nits in coco docs#394
a-mccarthy wants to merge 2 commits into
NVIDIA:mainfrom
a-mccarthy:kata-coco-nits

Conversation

@a-mccarthy
Copy link
Copy Markdown
Collaborator

@a-mccarthy a-mccarthy commented May 11, 2026

  • adds come clarification text to the run a workload sample
  • re-adds image pull timeout to the prereqs
  • improves attestation workflow to read clearer.

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

Documentation preview

https://nvidia.github.io/cloud-native-docs/review/pr-394

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Copy link
Copy Markdown

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Most of my comments probably go beyond the level of detail needed in the doc, but maybe they can help massage things a little bit.

This page is an educational overview of attestation with Confidential Containers, not a complete configuration guide.
The attestation workflow is fully documented in the upstream `Confidential Containers documentation <https://confidentialcontainers.org/docs/attestation/>`_, which is the source of truth for setup and configuration details.

Attestation is not required to deploy Confidential Containers; it is needed only for features that rely on secret release, such as those listed above.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to massage this a little bit. It's true that you can start workloads without attestation, and the features above are the ones that require attestation, but key detail is that your workload isn't really secure without using attestation (via one of these features, usually). I'm not sure we really want to get into that here, though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. [We can phrase it that attestation is not required to get a pod going for evaluation purposes but trust in the deployment can only be established via attestation and via other CoCo mechanisms such as image signature verification, the use of a proper agent security policy - all this is documented in the upstream repo as pointed out]

=========================================

Confidential Containers enables sensible default attestation policies for NVIDIA Confidential Computing GPUs.
In most cases, the default policy is appropriate and you only need to provide reference values.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot packed into these two sentences.

For NVIDIA devices, you don't have to do anything regarding attestation policy or reference values. Basically, NRAS handles all of this for you.

Your guest will also have a CPU, though. Attesting the CPU does require setting reference values. You can still use the default policy, but you will need to set some reference values per the link below.

Fine to keep it as is, which is correct, and avoids some subtleties, but setting reference values for the CPU attestation is a key step for secure workloads.


Refer to the upstream `Confidential Containers Features <https://confidentialcontainers.org/docs/features>`_ documentation for a full list of attestation features and how to configure them.
* `Resources <https://confidentialcontainers.org/docs/attestation/resources/>`_: Create the resources, such as secrets, that your workloads need.
* `Policies <https://confidentialcontainers.org/docs/attestation/policies/>`_: Configure the policy types that secure workloads at different layers.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: policy types -> policies


.. _configure-image-pull-timeouts:

* Configure image pull timeouts. The guest-pull mechanism pulls images inside the confidential VM, which means large images can take longer to download and delay container start.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, pulling images inside the guest doesn't necessarily take longer than pulling on the host. The issue is more that guest pull invalidates any kind of caching that would normally happen. Basically you have to pull the image every time you run a pod.

=====================

A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX.
A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` runtime class for AMD based systems or ``kata-qemu-nvidia-gpu-tdx`` for Intel based systems.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hyphen in AMD-based? (and intel too)


.. code-block:: yaml

io.katacontainers.config.hypervisor.kernel_params: "agent.aa_kbc_params=cc_kbc::http://<kbs-ip>:<kbs-port>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fitzthum do we actually want to propose this? another method to provide this information is via init-data. should we not rather also abstract from this?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with either. This way is simpler; maybe better in the short term. Init-data is an important concept, though. We should cover it at some point.

In most cases, the default policy is appropriate and you only need to provide reference values.
For more information, refer to the upstream `Confidential Containers reference values <https://confidentialcontainers.org/docs/attestation/reference-values/>`_ documentation.

You can use the Key Broker Service (KBS) Client Tool to configure Trustee reference values and secrets.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fitzthum please comment here if you would propose adding that this tool would usually also be run in a secure environment and the tool would communicate with KBS in a secure way. If you want to omit this here, please resolve the comment.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be worth saying that the configuration endpoints of the KBS are privileged and the KBS Client will use a secure channel to access them.

This feature gate is required to support pod deployments that use multiple snapshotters side-by-side.

Add both feature gates to your Kubelet configuration (typically ``/var/lib/kubelet/config.yaml``):
Add both feature gates to your Kubelet configuration (typically ``sudo vi /var/lib/kubelet/config.yaml``):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so sure. Can we just say 'typically adjust the file ...'. sudo vi is a bit specific and a bit loose. [There may be users which literally wouldn't be able to exit vim anymore]


$ sudo systemctl restart kubelet

Optionally, you can configure additional timeouts for the NVIDIA Shim and Kata Agent Policy.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of NVIDIA shim -> Kata Shim.

Please do not use Kata Agent Policy here. This timeout is unrelated to the kata agent policy feature. We can just say 'kata agent' here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants