Merged
Conversation
KbsAttestationKeySecretName and KbsAttestationCertSecretName are both mounted into the KBS container by newKbsDeployment but were absent from secretToKbsConfigMapper. Rotating these secrets (delete+recreate) would run the mapper but enqueue no reconcile request, leaving the deployment referencing the deleted secret. Add both fields to the match expression. Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…et-mapper reconcile: add missing attestation secrets to secret mapper
…ller
TrusteeConfig creates and owns several ConfigMaps (kbs-config,
resource-policy, rvps-reference-values, tdx-config, attestation-policy,
gpu-attestation-policy) and Secrets (auth-secret, sample-secret,
https-key, https-cert, attestation-key, attestation-cert). If any of
these is deleted, the KbsConfig controller fails to mount it, but
TrusteeConfig is never notified to recreate it.
Add Owns(&ConfigMap{}) and Owns(&Secret{}) to SetupWithManager so that
deletion of any owned resource triggers TrusteeConfig reconcile, which
recreates the missing resource. This is safe because all createOrUpdate
helpers are create-once: when a resource already exists they return nil
without calling r.Update, so no update loop can form.
Also fix four generator functions (generateHttpsKeySecret,
generateHttpsCertSecret, generateAttestationKeySecret,
generateAttestationCertSecret) that were silently discarding the error
from ctrl.SetControllerReference with _ =. Without a proper owner
reference the secrets are not tracked by Owns() and are not
garbage-collected when TrusteeConfig is deleted. Change their signatures
to return (*corev1.Secret, error) and propagate the error to callers.
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…letion
Add a kuttl test suite that creates a TrusteeConfig in Permissive mode,
waits for TrusteeConfig.Status.IsReady and the trustee deployment to be
ready, then deletes the trusteeconfig-sample-kbs-config ConfigMap and
asserts it is recreated within 60 seconds.
This exercises the Owns(&ConfigMap{}) watch added to TrusteeConfigReconciler:
without it the controller is never notified of the deletion, leaving the
KbsConfig controller unable to mount the missing ConfigMap and the trustee
deployment unable to recover.
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…deletion
Extend the trusteeconfig-self-healing suite with a step that deletes the
trusteeconfig-sample-auth-secret and asserts it is recreated within 60
seconds.
This exercises the Owns(&Secret{}) watch added to TrusteeConfigReconciler.
The auth secret contains the Ed25519 public key that KBS uses to verify
client tokens; without self-healing, deleting it leaves KBS unable to
authenticate any new client after the next pod restart.
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…e loop
createOrUpdateKbsConfig called r.Update on the KbsConfig on every
TrusteeConfig reconcile regardless of whether the spec had changed.
Every successful Update increments resourceVersion, which fires the
Watches(&KbsConfig{}) handler registered in SetupWithManager. That
handler enqueues another TrusteeConfig reconcile, which calls
createOrUpdateKbsConfig again, which calls r.Update again — an infinite
reconcile loop generating constant API traffic.
Fix by computing the desired spec (merged or generated) and only calling
r.Update when apiequality.Semantic.DeepEqual reports the current and
desired specs differ.
Add a kuttl test that records the KbsConfig resourceVersion after
TrusteeConfig becomes ready, waits 20 seconds for multiple reconcile
cycles to complete, and asserts the resourceVersion is unchanged. A
reconcile loop would cause continuous increments that the test detects.
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…ation failure configurePermissiveProfile and configureRestrictedProfile called each createOrUpdate helper, logged any error, and returned the partially populated spec to the caller. buildKbsConfigSpec then returned that partial spec as if everything had succeeded. The result: if, for example, createOrUpdateKbsConfigMap failed, the function returned a spec where KbsConfigMapName was still the empty string. createOrUpdateKbsConfig then created or updated a KbsConfig referencing a ConfigMap that did not exist. The KbsConfig controller's createConfigMapVolume returns an error for an empty configmap name, so the trustee Deployment could never be created — yet TrusteeConfig gave no indication that anything had gone wrong. Change configurePermissiveProfile and configureRestrictedProfile to return (KbsConfigSpec, error) and propagate the first failure instead of swallowing it. Change buildKbsConfigSpec to the same signature and propagate errors from the profile configurers and from the HTTPS/ attestation secret helpers. Update Reconcile to return the error so that controller-runtime retries rather than committing a broken KbsConfig. Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
When TrusteeConfig is deleted first (as in the trusteeconfig-self-healing
teardown), the owned KbsConfig is garbage-collected via owner references
before uninstall-operator.sh runs. The previous jsonpath query
'{.items[0].metadata.name}' errors on an empty list, leaving CR_NAME
empty and causing 'kubectl delete' to fail with 'no name specified'.
Replace with 'kubectl delete kbsconfig --all --ignore-not-found' which
handles both the already-deleted and still-present cases cleanly.
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
TrusteeConfig.Status.IsReady was unconditionally set to true as soon as KbsConfig was created/updated, regardless of whether the underlying trustee deployment was actually running. It now mirrors KbsConfig.Status.IsReady so the status accurately reflects end-to-end readiness. Assisted-by: AI Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Leonardo Milleri <lmilleri@redhat.com>
…onfig-self-healing Trusteeconfig self healing
…tatus Fix KbsConfig and TrusteeConfig statuses
buildKbsContainer read KBS_IMAGE_NAME / KBS_IMAGE_NAME_MICROSERVICES from the environment but had no fallback when the variable was empty. An empty Image field in the container spec causes an ErrImagePull failure at the kubelet with no actionable error from the operator. buildAsContainer and buildRvpsContainer already fall back to their respective defaults; apply the same pattern to the KBS container using the existing DefaultKbsImageName constant. Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…onfig-image-fallback fix: fall back to DefaultKbsImageName when KBS env var is unset
Signed-off-by: Leonardo Milleri <lmilleri@redhat.com>
…pu-policy Update cpu attestation policy for TDX changes
Author
|
/retest |
bpradipt
approved these changes
Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of changes: