Conversation
This commit adds native pg_tde extension support into operator.
**This commit only adds Vault KMS support for pg_tde. KMIP support will
be added in future releases.**
When pg_tde is enabled and Vault configuration is provided, the operator:
- appends pg_tde into shared_preload_libraries,
- mounts Vault token and CA secrets into database containers,
- runs CREATE EXTENSION in all databases,
- creates Vault provider by running pg_tde_add_global_key_provider_vault_v2,
- create a global key by running pg_tde_create_key_using_global_key_provider,
- sets the default key by running pg_tde_set_default_key_using_global_key_provider.
-> Example configuration
pg_tde:
enabled: true
vault:
host: https://vault-service.vault-service.svc:8200
mountPath: tde
tokenSecret:
name: vault-secret
key: token
caSecret:
name: vault-secret
key: ca.crt
Note that:
- Mount path needs to be a KV v2 storage engine.
- caSecret is optional and can be omitted if you want to use http. But
in my testing I couldn't manage the make vault work without TLS. It
responds with HTTP 405 if I disable TLS in vault.
- tokenSecret and caSecret can be the same secret or different. Operator
doesn't assume anything about the contents of the secrets since you'll
need to set secret keys in cr.yaml yourself.
- Using a non-root token requires more configuration. Check out pg_tde
docs for that. But don't forget to add these in the Vault policy:
```
path "sys/internal/ui/mounts/*" {
capabilities = ["read"]
}
path "sys/mounts/*" {
capabilities = ["read"]
}
```
-> API changes
pg_tde requires more configuration options than other extensions
operator supports. This required us make some changes in the extensions
API. With these changes, 'spec.extensions.builtin' section is deprecated
and all builtin extensions are moved to 'spec.extensions.<extension>'
(i.e. 'spec.extensions.pg_stat_monitor'). Right now extensions can be
enabled/disabled with the old and the new method. If two methods are
used at the same time, 'spec.extensions.builtin' takes precedence.
-> Status changes
A hash will be calculated using pg_tde configuration provided by user.
Operator uses this hash to understand if config is changed and it should
reconfigure pg_tde. The hash can be found in status.pgTDERevision field
of **PostgresCluster** object. This hash will be removed when pg_tde is
disabled.
Operator also communicates the status of pg_tde with conditions. The
condition with type=PGTDEEnabled can be found in both PerconaPGCluster
and PostgresCluster statuses.
-> Disabling pg_tde
Disabling pg_tde is more complex than other extensions:
- First of all any encrypted objects must be dropped before disabling.
Otherwise DROP EXTENSION will fail with a descriptive error message.
**Operator won't drop anything, user needs to do this manually.**
- The extension needs to be disabled in two steps:
1. First set pg_tde.enabled=false without removing the vault section.
Operator will drop the extension and restart the pods.
2. Then you can remove pg_tde.vault. Database pods will be restarted
again to remove secret mounts from containers.
- It's recommended to run CHECKPOINT before removing pg_tde.vault. Even
though extension is dropped, Postgres might still try to use encrypted
objects during recovery after restart and it might try to access token
secret. CHECKPOINT helps you prevent this failure case.
-> Deleting and recreating clusters
If cluster with pg_tde enabled is deleted but PVCs are retained, on
recreation you'll see some errors about pg_tde in operator logs. They
happen because the vault provider and/or global key already exists.
Operator will handle these errors gracefully and configure pg_tde. Same
thing applies when pg_tde is disabled and re-enabled. Since both vault
provider and global key already exists, operator will handle "already
exists" errors and configure pg_tde.
The global key name is determined by cluster's .metadata.uid. For
example 'global-master-key-ad19534a-d778-460e-ac87-ca38ef5e6755'. This
means the key will be changed if cluster is deleted and recreated. As
long as the old key and the new key is accessible to pg_tde, this won't
cause any issues. pg_tde will handle it as it handles key rotation.
-> Validations
- You can't set pg_tde.enabled=true without setting pg_tde.vault.
- If you already had pg_tde.enabled, you can't remove pg_tde section
completely.
- If you already had pg_tde.enabled, you can't remove pg_tde.vault
section completely.
---------
K8SPG-911: pg_tde improvements/fixes
- add pg version validation
- explicitly disable wal encryption
- enable pg_tde in restore job
- [e2e] read from all pods after restore
- use pg_tde binaries in patroni
- fix vault provider change
All items except the last is straightforward. Fixing the vault provider
change, required a lot of changes.
The problem with changing the Vault token in pg_tde was that pg_tde
requires both the new and the old token at the same time to perform the
change. This is not trivial to achieve on K8s, since operator needs to
mount the new secret to the pods and somehow needs the keep the old
secret mounted.
To achieve this, operator performs provider change in two phases:
1. In the first phase, operator keeps the old secret mounted in the pod
and prevents restart. Then it fetches the new secret contents and
stores them in temporary files in `/pgdata` directory. Then, operator
runs pg_tde_change_global_key_provider_vault_v2.
2. In the second phase, operator mounts the new secret and restarts the
pods. Then it runs pg_tde_change_global_key_provider_vault_v2 with
standard credential paths. At the end of this phase, temporary files
are cleaned up.
| local command=${1} | ||
| local uri=${2} | ||
| local driver=${3:-postgres} | ||
|
|
There was a problem hiding this comment.
[shfmt] reported by reviewdog 🐶
There was a problem hiding this comment.
[shfmt] reported by reviewdog 🐶
percona-postgresql-operator/e2e-tests/functions
Lines 1705 to 1711 in d0f42ba
There was a problem hiding this comment.
Pull request overview
Adds native pg_tde (PostgreSQL Transparent Data Encryption) support to the operator, including Vault (KV v2) KMS integration, extension lifecycle management, and e2e coverage. This fits into the operator’s extension-management and reconciliation flow by wiring pg_tde into pod specs, Patroni config, restore jobs, and status/conditions.
Changes:
- Introduces new
pg_tdeAPI fields (Vault config, status revision + condition) and updates deep-copies/CRDs/bundles accordingly. - Implements runtime reconciliation for
pg_tde(shared_preload_libraries, CREATE/DROP EXTENSION, Vault provider + global key management, two-phase provider rotation using temp credentials on/pgdata). - Adds/updates E2E tests for
pg_tde(enable, verify encryption, backup/restore, token rotation, disable + remove config) and updates extension API usage in existing tests.
Reviewed changes
Copilot reviewed 60 out of 62 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/apis/postgres-operator.crunchydata.com/v1beta1/zz_generated.deepcopy.go | DeepCopy updates for new PGTDE/extensions fields. |
| pkg/apis/postgres-operator.crunchydata.com/v1beta1/postgrescluster_types.go | Adds pg_tde spec types, status revision, and condition constant. |
| pkg/apis/postgres-operator.crunchydata.com/v1beta1/postgrescluster_test.go | Adjusts YAML fixtures to include extensions.pg_tde. |
| pkg/apis/pgv2.percona.com/v2/zz_generated.deepcopy.go | Adds DeepCopy for new extension spec types and PGTDE. |
| pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go | Deprecates extensions.builtin, adds extensions.<ext>.enabled shape + defaults + Crunchy mapping + PG17 validation for pg_tde. |
| percona/controller/pgcluster/controller_test.go | Adds CRD validation tests for pg_tde enable/disable/transition rules. |
| percona/controller/pgbackup/controller.go | Calls pg.Default() before updating cluster during backup start. |
| internal/postgres/reconcile.go | Adds projected secret volume + mount helpers for pg_tde Vault credentials and wires into instance pods. |
| internal/pgvector/postgres.go | Fixes comments to correctly reference pgvector (no functional change). |
| internal/pgtde/postgres.go | New module: installs/drops extension, sets PG parameters, manages Vault provider/global key/default key, supports provider change. |
| internal/pgtde/postgres_test.go | New unit tests for pgtde SQL execution and provider logic. |
| internal/pgbackrest/config.go | Extends restore command to optionally preload pg_tde during restore. |
| internal/pgbackrest/config_test.go | Updates tests for new RestoreCommand signature/behavior. |
| internal/patroni/config.go | Adds Patroni bin_name overrides when pg_tde is enabled. |
| internal/patroni/config_test.go | Tests presence/absence of bin_name when pg_tde enabled/disabled. |
| internal/naming/names.go | Adds constants for pg_tde volume name/mount path/provider/key names. |
| internal/naming/annotations.go | Adds tde-installed annotation constant used for orchestration. |
| internal/controller/postgrescluster/postgres.go | Wires pgtde extension reconciliation into database reconcile and adds provider reconcile (two-phase secret rotation). |
| internal/controller/postgrescluster/pgbackrest.go | Ensures restore job enables pg_tde and mounts Vault secrets when configured. |
| internal/controller/postgrescluster/instance.go | Adds logic to hold old TDE volume during provider rotation; adds vault revision hashing helper. |
| internal/controller/postgrescluster/controller.go | Adds pg_tde parameters into PostgreSQL config until extension is fully removed; calls provider reconcile after DB reconcile. |
| e2e-tests/vars.sh | Adds VAULT_VER for Vault Helm install. |
| e2e-tests/tests/upgrade-minor/05-sleep-after-operator-update.yaml | Adjusts timeout placement/value for the sleep step. |
| e2e-tests/tests/pg-tde/00-deploy-operator.yaml | New e2e: deploy operator/client for pg_tde scenario. |
| e2e-tests/tests/pg-tde/00-assert.yaml | New e2e: asserts operator deployment readiness. |
| e2e-tests/tests/pg-tde/01-deploy-vault.yaml | New e2e: deploy Vault (TLS). |
| e2e-tests/tests/pg-tde/01-assert.yaml | New e2e: asserts Vault secret exists. |
| e2e-tests/tests/pg-tde/02-create-cluster.yaml | New e2e: enables pg_tde + Vault config on cluster. |
| e2e-tests/tests/pg-tde/02-assert.yaml | New e2e: asserts mounts/annotation/condition/revision on enable. |
| e2e-tests/tests/pg-tde/03-write-data.yaml | New e2e: creates encrypted table + writes data. |
| e2e-tests/tests/pg-tde/04-verify-encryption.yaml | New e2e: verifies extension + encryption + key verification. |
| e2e-tests/tests/pg-tde/04-assert.yaml | New e2e: asserts verification results via ConfigMaps. |
| e2e-tests/tests/pg-tde/05-create-backup.yaml | New e2e: triggers a full backup. |
| e2e-tests/tests/pg-tde/05-assert.yaml | New e2e: asserts backup job + status succeeded. |
| e2e-tests/tests/pg-tde/06-write-data.yaml | New e2e: writes additional data pre-restore. |
| e2e-tests/tests/pg-tde/07-create-restore.yaml | New e2e: triggers restore. |
| e2e-tests/tests/pg-tde/07-assert.yaml | New e2e: asserts restore succeeded + cluster ready. |
| e2e-tests/tests/pg-tde/08-read-data.yaml | New e2e: reads from primary + replicas after restore. |
| e2e-tests/tests/pg-tde/08-assert.yaml | New e2e: asserts read results from all pods. |
| e2e-tests/tests/pg-tde/09-change-vault-provider.yaml | New e2e: rotates Vault token/secret and applies CR update. |
| e2e-tests/tests/pg-tde/09-assert.yaml | New e2e: asserts new secret is mounted + revision updated + condition true. |
| e2e-tests/tests/pg-tde/10-verify-after-change.yaml | New e2e: verifies reads + key verification + temp-file cleanup. |
| e2e-tests/tests/pg-tde/10-assert.yaml | New e2e: asserts read-after-change values. |
| e2e-tests/tests/pg-tde/11-disable-pgtde.yaml | New e2e: drops encrypted objects, checkpoints, disables pg_tde while keeping vault. |
| e2e-tests/tests/pg-tde/11-assert.yaml | New e2e: asserts secrets still mounted while disabled + condition false. |
| e2e-tests/tests/pg-tde/12-remove-pgtde-config.yaml | New e2e: removes pg_tde config from spec (post-disable step). |
| e2e-tests/tests/pg-tde/12-assert.yaml | New e2e: asserts mounts removed and condition remains disabled. |
| e2e-tests/tests/custom-extensions/00-deploy-operator.yaml | Timeout placement update. |
| e2e-tests/tests/builtin-extensions/00-deploy-operator.yaml | Timeout placement update. |
| e2e-tests/tests/builtin-extensions/03-install-all-ext.yaml | Updates builtin-extensions test to new extensions.<ext>.enabled shape. |
| e2e-tests/tests/builtin-extensions/06-uninstall-all-ext.yaml | Updates builtin-extensions test to new extensions.<ext>.enabled shape. |
| e2e-tests/run-release.csv | Adds pg-tde suite to release run list. |
| e2e-tests/run-pr.csv | Adds pg-tde suite to PR run list. |
| e2e-tests/functions | Adds helpers for psql execution/URI construction and Vault deployment logic. |
| deploy/cw-bundle.yaml | CRD bundle updates for new extension schema + validations + status field. |
| deploy/crd.yaml | CRD bundle updates for new extension schema + validations + status field. |
| deploy/cr.yaml | Example CR updates for new extension schema + pg_tde example config. |
| deploy/bundle.yaml | CRD bundle updates for new extension schema + validations + status field. |
| config/crd/bases/postgres-operator.crunchydata.com_postgresclusters.yaml | Base CRD updates for pg_tde spec + validations + status revision. |
| config/crd/bases/pgv2.percona.com_perconapgclusters.yaml | Base CRD updates for extension schema changes + pg_tde PG17 validation. |
| build/crd/percona/generated/pgv2.percona.com_perconapgclusters.yaml | Generated Percona CRD updates reflecting new extension schema + validations. |
| build/crd/crunchy/generated/postgres-operator.crunchydata.com_postgresclusters.yaml | Generated Crunchy CRD updates reflecting pg_tde schema + validations. |
Comments suppressed due to low confidence (1)
percona/controller/pgbackup/controller.go:699
- Problem:
startBackupnow callspg.Default()on the fetchedPerconaPGClusterand then persists the whole object viaUpdate;Default()now populates both the newspec.extensions.<ext>.enabledpointers and the deprecatedspec.extensions.builtin.*pointers.
Why it matters: Persisting those defaultedbuiltinpointers can unintentionally make the deprecatedbuiltinpath appear "set" and override future user changes made viaspec.extensions.<ext>(becauseSetExtensionDefaults()prefersbuiltinwhen non-nil).
Fix: Avoid persisting unrelated defaults here: either removepg.Default()fromstartBackup, or switch this update to a targetedPatchthat only modifies the backup annotations andspec.backups.pgbackrest.manualfields (leaving extensions fields untouched).
| if cluster.Spec.Extensions.PGTDE.Enabled { | ||
| postgresqlSection := root["postgresql"].(map[string]any) | ||
| postgresqlSection["bin_name"] = map[string]any{ | ||
| "pg_basebackup": "pg_tde_basebackup", | ||
| "pg_rewind": "pg_tde_rewind", | ||
| } | ||
| } |
There was a problem hiding this comment.
Problem: clusterYAML only sets Patroni postgresql.bin_name when spec.extensions.pg_tde.enabled is true, but during the pg_tde disable flow there is an intermediate reconcile where enabled=false while the extension is still installed (condition PGTDEEnabled=True until the SQL drop succeeds).
Why it matters: This can remove the pg_tde pg_basebackup/pg_rewind binary overrides too early, potentially triggering a rollout/config mismatch before pg_tde is actually removed.
Fix: Gate bin_name on (spec.extensions.pg_tde.enabled || status.conditions[type=PGTDEEnabled].status==True) (same pattern used for shared_preload_libraries in the main controller).
commit: d0f42ba |
This commit adds native pg_tde extension support into operator.
This commit only adds Vault KMS support for pg_tde. KMIP support will be added in future releases.
When pg_tde is enabled and Vault configuration is provided, the operator:
-> Example configuration
pg_tde:
enabled: true vault: host: https://vault-service.vault-service.svc:8200 mountPath: tde tokenSecret: name: vault-secret key: token caSecret: name: vault-secret key: ca.crt
Note that:
-> API changes
pg_tde requires more configuration options than other extensions operator supports. This required us make some changes in the extensions API. With these changes, 'spec.extensions.builtin' section is deprecated and all builtin extensions are moved to 'spec.extensions.' (i.e. 'spec.extensions.pg_stat_monitor'). Right now extensions can be enabled/disabled with the old and the new method. If two methods are used at the same time, 'spec.extensions.builtin' takes precedence.
-> Status changes
A hash will be calculated using pg_tde configuration provided by user. Operator uses this hash to understand if config is changed and it should reconfigure pg_tde. The hash can be found in status.pgTDERevision field of PostgresCluster object. This hash will be removed when pg_tde is disabled.
Operator also communicates the status of pg_tde with conditions. The condition with type=PGTDEEnabled can be found in both PerconaPGCluster and PostgresCluster statuses.
-> Disabling pg_tde
Disabling pg_tde is more complex than other extensions:
-> Deleting and recreating clusters
If cluster with pg_tde enabled is deleted but PVCs are retained, on recreation you'll see some errors about pg_tde in operator logs. They happen because the vault provider and/or global key already exists. Operator will handle these errors gracefully and configure pg_tde. Same thing applies when pg_tde is disabled and re-enabled. Since both vault provider and global key already exists, operator will handle "already exists" errors and configure pg_tde.
The global key name is determined by cluster's .metadata.uid. For example 'global-master-key-ad19534a-d778-460e-ac87-ca38ef5e6755'. This means the key will be changed if cluster is deleted and recreated. As long as the old key and the new key is accessible to pg_tde, this won't cause any issues. pg_tde will handle it as it handles key rotation.
-> Validations
K8SPG-911: pg_tde improvements/fixes
All items except the last is straightforward. Fixing the vault provider change, required a lot of changes.
The problem with changing the Vault token in pg_tde was that pg_tde requires both the new and the old token at the same time to perform the change. This is not trivial to achieve on K8s, since operator needs to mount the new secret to the pods and somehow needs the keep the old secret mounted.
To achieve this, operator performs provider change in two phases:
/pgdatadirectory. Then, operator runs pg_tde_change_global_key_provider_vault_v2.CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
Config/Logging/Testability