Skip to content

feat: BYO container registry support#99

Open
minmzzhang wants to merge 15 commits intovalidatedpatterns:mainfrom
minmzzhang:byo-container-registry-fresh-install
Open

feat: BYO container registry support#99
minmzzhang wants to merge 15 commits intovalidatedpatterns:mainfrom
minmzzhang:byo-container-registry-fresh-install

Conversation

@minmzzhang
Copy link
Copy Markdown
Collaborator

@minmzzhang minmzzhang commented Feb 18, 2026

Summary

Add support for Bring-Your-Own (BYO) container registry alongside the
existing built-in Quay and embedded OCP image registry options. A single
global.registry block in values-hub.yaml centralizes registry
credentials, and charts fall back to global defaults when local values
are empty.

Registry options (configure one in values-hub.yaml)

  • Option 1: Built-in Quay Registry
  • Option 2: BYO / External Registry (quay.io, ghcr.io, etc.)
  • Option 3: Embedded OCP Image Registry

Key changes

global.registry architecture

  • Shared domain, org, user, vaultPath, passwordVaultKey in
    global.registry; per-app overrides only set option-specific flags
  • Qtodo container image derived from global.registry.domain/org inside
    the chart (no VP --set overrides needed)

Supply-chain chart

  • Unified registry.* parameters with tpl domain resolution
  • Embedded OCP automation: auto-create image namespace, pipeline SA
    system:image-builder RoleBinding, route-enabler Job (no oc CLI)
  • CronJob + sync-hook seed Job for pipeline SA token refresh to Vault
    (SPIFFE JWT) on embedded OCP
  • ArgoCD hook annotations on Jobs (Sync + HookSucceeded)

Qtodo chart

  • Unified app.images.main.registry.* with tpl domain resolution
  • Registry ExternalSecret uses global fallback

ztvp-certificates chart

  • Node-level image pull trust for kubelet (imagePullTrust.*)
  • Per-hostname ingress CA ConfigMap in openshift-config
  • Patch image.config.openshift.io/cluster additionalTrustedCA + RBAC

ACS Central (bug fix)

  • Handle CA trust race on fresh bare-metal clusters: retry loop detects
    x509 errors, restarts Central to reload CA bundle (up to 3 retries)
  • Add apps/deployments get+patch to SA Role for rollout restart

vault-utils / ACS init RBAC

  • Sync vault-utils.sh Helm global.pattern substitution for Ansible
  • Add list/watch deployments to ACS Central cluster-init Role

Test tooling

  • scripts/gen-byo-container-registry-variants.py: generate
    values-hub.yaml variants for each registry option with full
    supply-chain stack enabled

Documentation

  • Comprehensive docs/supply-chain.md with steps for all three options
  • values-secret.yaml.template: registry-user commented out by default
    (only needed for Option 2 BYO)

Secrets model

Option Secret source
1 - Quay Auto-generated quay-users secret
2 - BYO Manual registry-user in ~/values-secret-*.yaml
3 - Embedded OCP Token refresher CronJob writes to Vault

Signed-off-by: Min Zhang minzhang@redhat.com

Restructure registry configuration to support three deployment states:
- Fresh install: No registry configured (both disabled by default)
- Built-in Quay: quay.enabled=true uses hub/infra/quay/ vault path
- External/BYO: externalRegistry.enabled=true uses hub/infra/registry/ path

Changes:
- Add externalRegistry.enabled flag to supply-chain and qtodo charts
- Separate vault paths for built-in Quay vs external registry
- Templates conditionally select vault path based on enabled flags
- Update supply-chain.md with BYO registry setup instructions
- Add helm template method and oc monitoring commands to supply-chain.md
- Follow VP best practice: external registry secrets in local ~/values-secret.yaml

To enable supply-chain:
1. Uncomment openshift-pipelines namespace and subscription
2. Uncomment supply-chain vault role (JWT auth)
3. Configure registry (BYO or built-in Quay) in application overrides
   - For BYO registry:
     - Set externalRegistry.enabled=true and configure registry settings
     - Add registry credentials to ~/values-secret.yaml
   - For built-in Quay:
     - Enable openshift-storage namespace
     - Enable ODF, NooBaa MCG
     - Enable Quay operator subscription, quay-registry application
4. RHTAS (signing): Enable rhtas-operator subscription and trusted-artifact-signer namespace
5. RHTPA (SBOM): Enable rhtpa-operator subscription, ODF, NooBaa, and trusted-profile-analyzer

Signed-off-by: Min Zhang <minzhang@redhat.com>
@minmzzhang
Copy link
Copy Markdown
Collaborator Author

This is the same PR of #98, which got accidentally closed.

Refactor supply-chain and qtodo charts to use a single, option-agnostic
registry configuration instead of separate per-registry blocks.

Registry options (configure one in values-hub.yaml):
  - Option 1: Built-in Quay Registry
  - Option 2: BYO/External Registry (quay.io, ghcr.io, etc.)
  - Option 3: Embedded OCP Image Registry

Key changes:

Supply-chain chart:
  * Unified registry.* parameters (domain, org, user, vaultPath, passwordVaultKey)
  * Use tpl function to resolve template expressions in registry.domain values
    passed as --set parameters from the validated patterns framework
  * Embedded OCP registry automation (registry.embeddedOCP.ensureImageNamespaceRBAC):
    - Auto-create image namespace matching registry.org
    - Grant pipeline SA system:image-builder via RoleBinding
    - Enable default route on OCP image registry via Kubernetes API
      (curl-based Job using ServiceAccount token, no oc CLI dependency)
  * ArgoCD hook annotations on the route-enabler Job (Sync + HookSucceeded)
  * Rename qtodo-registry-pass to qtodo-quay-pass for clarity

Qtodo chart:
  * Unified app.images.main.registry.* parameters
  * Use tpl function in registry-external-secret.yaml for domain resolution

ztvp-certificates chart:
  * Node-level image pull trust for kubelet (imagePullTrust.*)
  * Create ConfigMap with ingress CA per registry hostname in openshift-config
  * Patch image.config.openshift.io/cluster additionalTrustedCA
  * RBAC for patching image.config.openshift.io resources

Documentation: * Comprehensive supply-chain.md with configuration steps for all three
    registry options, vault paths, and example overrides
  * Updated values-secret.yaml.template with registry credential examples

Signed-off-by: Min Zhang <minzhang@redhat.com>
@minmzzhang minmzzhang force-pushed the byo-container-registry-fresh-install branch from 19faccc to b1203c1 Compare February 18, 2026 18:08
@minmzzhang minmzzhang requested review from mlorenzofr and sabre1041 and removed request for sabre1041 February 18, 2026 18:14
…try-fresh-install

Signed-off-by: Min Zhang <minzhang@redhat.com>

# Conflicts:
#	charts/ztvp-certificates/files/extract-certificates.sh.tpl
#	values-hub.yaml
…try-fresh-install

Resolve conflict in values-hub.yaml: keep multi-option registry
configuration from BYO branch and add sync-wave annotation
(argocd.argoproj.io/sync-wave: "48") from PR validatedpatterns#109.

Signed-off-by: Min Zhang <minzhang@redhat.com>
@minmzzhang minmzzhang force-pushed the byo-container-registry-fresh-install branch 10 times, most recently from 78bb62c to 4c89939 Compare April 4, 2026 03:29
Add scripts/gen-byo-container-registry-variants.py that reads the base
values-hub.yaml (all supply-chain components commented out) and produces
up to 3 variants with the chosen registry option enabled:

  Option 1: Built-in Quay Registry
  Option 2: BYO / External Registry
  Option 3: Embedded OCP Image Registry

Each variant also enables the common supply-chain stack (OpenShift
Pipelines, ODF, NooBaa, RHTAS, RHTPA, and their namespaces,
subscriptions, vault roles).

Signed-off-by: Min Zhang <minzhang@redhat.com>
On a fresh bare-metal cluster the proxy trustedCA injection may not
have propagated to Central's mounted CA bundle by the time the
create-auth-provider Job runs.  Central caches its TLS trust pool at
startup, so all Job retries fail with "x509: certificate signed by
unknown authority" when Central tries to validate the Keycloak OIDC
discovery endpoint.

- Add retry loop in create-auth-provider Job that detects the specific
  TLS CA error, restarts Central to reload the CA bundle, then retries
  (up to 3 times)
- Add apps/deployments get+patch to the service account Role so the
  Job can run "oc rollout restart"
- Refactor script: extract wait_for_central() and escape_sed() helpers

Signed-off-by: Min Zhang <minzhang@redhat.com>
…odo image in chart

Move shared registry credentials (domain, org, user, vaultPath, passwordVaultKey)
into a single global.registry block in values-hub.yaml. Supply-chain and qtodo
charts fall back to global.registry.* when local registry values are empty.

Derive the qtodo container image from global.registry.domain/org when registry
is enabled, avoiding Validated Patterns --set overrides (Helm templates are not
available there).

- Add global.registry defaults to supply-chain and qtodo chart values
- Update templates to use | default .Values.global.registry.*
- Simplify values-hub.yaml application overrides for option-specific flags
- Rewrite gen-byo-container-registry-variants.py for the structure
- Update docs/supply-chain.md for global.registry architecture

Signed-off-by: Min Zhang <minzhang@redhat.com>
Sync common/scripts/vault-utils.sh (Helm global.pattern substitution for
Ansible) and charts/acs-central cluster-init Role (list/watch deployments)
from embedded-ocp-registry for parity across registry option branches.

Signed-off-by: Min Zhang <minzhang@redhat.com>
@minmzzhang minmzzhang force-pushed the byo-container-registry-fresh-install branch from f6d956b to f716fcd Compare April 7, 2026 15:03
Add CronJob and sync-hook seed Job for pipeline SA token refresh to Vault
(SPIFFE JWT). Extend supply-chain values, docs/supply-chain.md, and
values-hub for embedded OCP (merged with fresh-install baseline).

Signed-off-by: Min Zhang <minzhang@redhat.com>
@minmzzhang minmzzhang force-pushed the byo-container-registry-fresh-install branch from f716fcd to 78229ad Compare April 7, 2026 16:48
Two bugs in gen-byo-container-registry-variants.py:

1. The supply-chain JWT role subject regex used ns/pipeline which no
   longer matches after the namespace was changed to
   {{ $.Values.global.pattern }}-hub.  Changed to sa/pipeline which
   matches both old and new formats.

2. enable_image_pull_trust looked for the stale <registry-hostname>
   placeholder.  Changed to match by position (value line after the
   imagePullTrust.registries line) so it works regardless of the
   default value in the base file.

Signed-off-by: Min Zhang <minzhang@redhat.com>
- Comment out registry-user in values-secret.yaml.template (was active
  by default but unnecessary for minimal deployments)
- Update supply-chain.md step 2 to clarify that only Option 2 (BYO
  registry) needs the manual registry-user secret
- Option 1 (Quay) uses auto-generated quay-users secret
- Option 3 (embedded OCP) token refresher writes to Vault automatically

Signed-off-by: Min Zhang <minzhang@redhat.com>
@minmzzhang minmzzhang force-pushed the byo-container-registry-fresh-install branch from 6455f14 to 808914a Compare April 7, 2026 22:29
Copy link
Copy Markdown
Collaborator

@sabre1041 sabre1041 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @minmzzhang

This is really good work! I did confirm that each of the 3 of the options were successful. I added a number of comments inline. However, there were a few areas that I'd like there t be further refinement:

  1. Rename references of OCP to OpenShift as OCP is a particular subscription type
  2. qtodo image: Review how to define and manage the reference to the qtodo image. Currently, the use can define the domain and namespace. However when the image is stored in a external registry, there is no guarantee that the name of the image will be qtodo
  3. When using the internal OpenShift registry, what can be done to improve the handling of the token associated with the ServiceAccount token? The primary concern is that the CronJob that populates the token in Vault only runs (by default) every 6 hours. As a result, a secret may not be populated in Vault for a maximum six hours. I had to manually start the CronJob to utilize the pipeline.

Comment thread charts/qtodo/templates/_helpers.tpl Outdated
Comment thread charts/qtodo/templates/app-serviceaccount.yaml Outdated
Comment thread charts/qtodo/templates/app-deployment.yaml Outdated
Comment thread charts/qtodo/values.yaml Outdated
Comment thread charts/supply-chain/templates/rbac/registry-image-namespace.yaml Outdated
Comment thread charts/supply-chain/templates/rbac/registry-image-namespace.yaml Outdated
Comment thread charts/supply-chain/templates/rbac/registry-token-refresher.yaml Outdated
Comment thread charts/supply-chain/templates/rbac/registry-token-refresher.yaml Outdated
Comment thread charts/supply-chain/templates/rbac/registry-token-refresher.yaml Outdated
Comment thread charts/supply-chain/templates/rbac/registry-token-refresher.yaml Outdated
minmzzhang added a commit to minmzzhang/layered-zero-trust that referenced this pull request Apr 13, 2026
…efactor

Update feature YAML files and gen-feature-variants script/docs for:
- org -> repository (e.g. "ztvp/qtodo")
- embeddedOCP -> embeddedOpenShift
- Rename option-3-embedded-ocp.yaml -> option-3-embedded-openshift.yaml

Signed-off-by: Min Zhang <minzhang@redhat.com>
minmzzhang added a commit to minmzzhang/layered-zero-trust that referenced this pull request Apr 14, 2026
…efactor

Update feature YAML files and gen-feature-variants script/docs for:
- org -> repository (e.g. "ztvp/qtodo")
- embeddedOCP -> embeddedOpenShift
- Rename option-3-embedded-ocp.yaml -> option-3-embedded-openshift.yaml

Signed-off-by: Min Zhang <minzhang@redhat.com>
- Rename org -> repository throughout (global.registry and supply-chain)
- Rename embeddedOCP -> embeddedOpenShift in supply-chain templates and docs
- Scope registry image rewrite via useRegistry flag in qtodo.image helper
- Guard imagePullSecrets on vaultPath being set (not just registry.enabled)
- Add Vault auth retry loop to refresh_registry_token.sh for seed Job timing
- Extract image namespace from first path component of repository (splitList)
- Update docs/supply-chain.md with new parameter names and examples

Signed-off-by: Min Zhang <minzhang@redhat.com>
@minmzzhang
Copy link
Copy Markdown
Collaborator Author

Hi @minmzzhang

This is really good work! I did confirm that each of the 3 of the options were successful. I added a number of comments inline. However, there were a few areas that I'd like there t be further refinement:

  1. Rename references of OCP to OpenShift as OCP is a particular subscription type
  2. qtodo image: Review how to define and manage the reference to the qtodo image. Currently, the use can define the domain and namespace. However when the image is stored in a external registry, there is no guarantee that the name of the image will be qtodo
  3. When using the internal OpenShift registry, what can be done to improve the handling of the token associated with the ServiceAccount token? The primary concern is that the CronJob that populates the token in Vault only runs (by default) every 6 hours. As a result, a secret may not be populated in Vault for a maximum six hours. I had to manually start the CronJob to utilize the pipeline.

For the token refresh, upon fresh install, the seed job (registry-token-refresher-seed) runs as an ArgoCD Sync hook on every sync to populate the token immediately. The 6 hours CronJob schedule was to handle the ongoing refresh given the 48 hours token lifetime configured by registry.embeddedOpenShift.tokenRefresher.tokenDuration. However, the seed job (first deploy) might fail if there is startup race condition between Vault and JWT-auth, thus, a retry was added to the seed job for the dependencies to stabilize.

minmzzhang added a commit to minmzzhang/layered-zero-trust that referenced this pull request Apr 14, 2026
…efactor

Update feature YAML files and gen-feature-variants script/docs for:
- org -> repository (e.g. "ztvp/qtodo")
- embeddedOCP -> embeddedOpenShift
- Rename option-3-embedded-ocp.yaml -> option-3-embedded-openshift.yaml

Signed-off-by: Min Zhang <minzhang@redhat.com>
minmzzhang added a commit to minmzzhang/layered-zero-trust that referenced this pull request Apr 14, 2026
…efactor

Update feature YAML files and gen-feature-variants script/docs for:
- org -> repository (e.g. "ztvp/qtodo")
- embeddedOCP -> embeddedOpenShift
- Rename option-3-embedded-ocp.yaml -> option-3-embedded-openshift.yaml

Signed-off-by: Min Zhang <minzhang@redhat.com>
minmzzhang added a commit to minmzzhang/layered-zero-trust that referenced this pull request Apr 14, 2026
…efactor

Update feature YAML files and gen-feature-variants script/docs for:
- org -> repository (e.g. "ztvp/qtodo")
- embeddedOCP -> embeddedOpenShift
- Rename option-3-embedded-ocp.yaml -> option-3-embedded-openshift.yaml

Signed-off-by: Min Zhang <minzhang@redhat.com>
Copy link
Copy Markdown
Collaborator

@sabre1041 sabre1041 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional comments based on testing

Comment thread values-hub.yaml Outdated
Comment thread values-hub.yaml Outdated
Comment thread values-hub.yaml Outdated
Comment thread common/scripts/vault-utils.sh Outdated
Comment thread values-hub.yaml Outdated
Comment thread scripts/gen-byo-container-registry-variants.py Outdated
Comment thread scripts/gen-byo-container-registry-variants.py Outdated
Comment thread scripts/gen-byo-container-registry-variants.py Outdated
Comment thread scripts/gen-byo-container-registry-variants.py Outdated
namespace: {{ .Values.global.namespace }}
annotations:
# Run after wave 0 (ConfigMaps, RBAC, ztvp ns) and wave 1 (enable-registry-default-route hook).
argocd.argoproj.io/sync-wave: "10"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a concern that this uses a sync wave of 10? Since there are ExternalSecrets at a lower sync wave (qtodo-registry-auth at sync wave 0), this job is never triggered

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the sync-wave order in supply-chain

- Rename OCP_DOMAIN to OPENSHIFT_DOMAIN in vault-utils.sh
- Use repository: ztvp/qtodo for all 3 registry options in values-hub.yaml
- Add sync-wave "15" to qtodo-registry-auth ExternalSecret so it runs
  after the registry-token-refresher-seed Job at wave 10, preventing a
  deadlock where the ExternalSecret blocks Argo from reaching the seed
- Update SYNC-WAVE-INVENTORY.md with full supply-chain chart internals

Signed-off-by: Min Zhang <minzhang@redhat.com>
Signed-off-by: Min Zhang <minzhang@redhat.com>
- Rename org -> repository: ztvp/qtodo for Options 1 and 3
- Rename Embedded OCP -> Embedded OpenShift throughout
- Rename embeddedOCP -> embeddedOpenShift in supply-chain overrides
- Update domain/org -> domain/repository in comments

Signed-off-by: Min Zhang <minzhang@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants