Skip to content

Rebase from upstream#300

Merged
lmilleri merged 17 commits intoopenshift:mainfrom
lmilleri:rebase-25032026
Mar 26, 2026
Merged

Rebase from upstream#300
lmilleri merged 17 commits intoopenshift:mainfrom
lmilleri:rebase-25032026

Conversation

@lmilleri
Copy link
Copy Markdown

Summary of changes:

  • Fix TrusteeConfig/KbsConfig reconciliations
  • Observability improvements
  • Fix TrusteeConfig and KbsConfig statuses
  • fallback to default KBS image when env is not set

bpradipt and others added 17 commits March 13, 2026 22:03
KbsAttestationKeySecretName and KbsAttestationCertSecretName are both
mounted into the KBS container by newKbsDeployment but were absent from
secretToKbsConfigMapper. Rotating these secrets (delete+recreate) would
run the mapper but enqueue no reconcile request, leaving the deployment
referencing the deleted secret. Add both fields to the match expression.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…et-mapper

reconcile: add missing attestation secrets to secret mapper
…ller

TrusteeConfig creates and owns several ConfigMaps (kbs-config,
resource-policy, rvps-reference-values, tdx-config, attestation-policy,
gpu-attestation-policy) and Secrets (auth-secret, sample-secret,
https-key, https-cert, attestation-key, attestation-cert). If any of
these is deleted, the KbsConfig controller fails to mount it, but
TrusteeConfig is never notified to recreate it.

Add Owns(&ConfigMap{}) and Owns(&Secret{}) to SetupWithManager so that
deletion of any owned resource triggers TrusteeConfig reconcile, which
recreates the missing resource. This is safe because all createOrUpdate
helpers are create-once: when a resource already exists they return nil
without calling r.Update, so no update loop can form.

Also fix four generator functions (generateHttpsKeySecret,
generateHttpsCertSecret, generateAttestationKeySecret,
generateAttestationCertSecret) that were silently discarding the error
from ctrl.SetControllerReference with _ =. Without a proper owner
reference the secrets are not tracked by Owns() and are not
garbage-collected when TrusteeConfig is deleted. Change their signatures
to return (*corev1.Secret, error) and propagate the error to callers.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…letion

Add a kuttl test suite that creates a TrusteeConfig in Permissive mode,
waits for TrusteeConfig.Status.IsReady and the trustee deployment to be
ready, then deletes the trusteeconfig-sample-kbs-config ConfigMap and
asserts it is recreated within 60 seconds.

This exercises the Owns(&ConfigMap{}) watch added to TrusteeConfigReconciler:
without it the controller is never notified of the deletion, leaving the
KbsConfig controller unable to mount the missing ConfigMap and the trustee
deployment unable to recover.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…deletion

Extend the trusteeconfig-self-healing suite with a step that deletes the
trusteeconfig-sample-auth-secret and asserts it is recreated within 60
seconds.

This exercises the Owns(&Secret{}) watch added to TrusteeConfigReconciler.
The auth secret contains the Ed25519 public key that KBS uses to verify
client tokens; without self-healing, deleting it leaves KBS unable to
authenticate any new client after the next pod restart.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…e loop

createOrUpdateKbsConfig called r.Update on the KbsConfig on every
TrusteeConfig reconcile regardless of whether the spec had changed.
Every successful Update increments resourceVersion, which fires the
Watches(&KbsConfig{}) handler registered in SetupWithManager. That
handler enqueues another TrusteeConfig reconcile, which calls
createOrUpdateKbsConfig again, which calls r.Update again — an infinite
reconcile loop generating constant API traffic.

Fix by computing the desired spec (merged or generated) and only calling
r.Update when apiequality.Semantic.DeepEqual reports the current and
desired specs differ.

Add a kuttl test that records the KbsConfig resourceVersion after
TrusteeConfig becomes ready, waits 20 seconds for multiple reconcile
cycles to complete, and asserts the resourceVersion is unchanged. A
reconcile loop would cause continuous increments that the test detects.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…ation failure

configurePermissiveProfile and configureRestrictedProfile called each
createOrUpdate helper, logged any error, and returned the partially
populated spec to the caller. buildKbsConfigSpec then returned that
partial spec as if everything had succeeded.

The result: if, for example, createOrUpdateKbsConfigMap failed, the
function returned a spec where KbsConfigMapName was still the empty
string. createOrUpdateKbsConfig then created or updated a KbsConfig
referencing a ConfigMap that did not exist. The KbsConfig controller's
createConfigMapVolume returns an error for an empty configmap name, so
the trustee Deployment could never be created — yet TrusteeConfig gave
no indication that anything had gone wrong.

Change configurePermissiveProfile and configureRestrictedProfile to
return (KbsConfigSpec, error) and propagate the first failure instead of
swallowing it. Change buildKbsConfigSpec to the same signature and
propagate errors from the profile configurers and from the HTTPS/
attestation secret helpers. Update Reconcile to return the error so that
controller-runtime retries rather than committing a broken KbsConfig.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
When TrusteeConfig is deleted first (as in the trusteeconfig-self-healing
teardown), the owned KbsConfig is garbage-collected via owner references
before uninstall-operator.sh runs. The previous jsonpath query
'{.items[0].metadata.name}' errors on an empty list, leaving CR_NAME
empty and causing 'kubectl delete' to fail with 'no name specified'.

Replace with 'kubectl delete kbsconfig --all --ignore-not-found' which
handles both the already-deleted and still-present cases cleanly.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
TrusteeConfig.Status.IsReady was unconditionally set to true as soon as
KbsConfig was created/updated, regardless of whether the underlying
trustee deployment was actually running. It now mirrors
KbsConfig.Status.IsReady so the status accurately reflects end-to-end
readiness.

Assisted-by: AI
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Leonardo Milleri <lmilleri@redhat.com>
…onfig-self-healing

Trusteeconfig self healing
…tatus

Fix KbsConfig and TrusteeConfig statuses
buildKbsContainer read KBS_IMAGE_NAME / KBS_IMAGE_NAME_MICROSERVICES
from the environment but had no fallback when the variable was empty.
An empty Image field in the container spec causes an ErrImagePull
failure at the kubelet with no actionable error from the operator.

buildAsContainer and buildRvpsContainer already fall back to their
respective defaults; apply the same pattern to the KBS container
using the existing DefaultKbsImageName constant.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
…onfig-image-fallback

fix: fall back to DefaultKbsImageName when KBS env var is unset
Signed-off-by: Leonardo Milleri <lmilleri@redhat.com>
…pu-policy

Update cpu attestation policy for TDX changes
@lmilleri lmilleri requested a review from bpradipt March 25, 2026 11:12
@lmilleri
Copy link
Copy Markdown
Author

/retest

@lmilleri lmilleri merged commit ee0b81f into openshift:main Mar 26, 2026
6 checks passed
@lmilleri lmilleri deleted the rebase-25032026 branch April 15, 2026 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants