Skip to content

K8SPG-915: add leader election options to operator.yaml#1433

Open
pooknull wants to merge 11 commits intomainfrom
K8SPG-915
Open

K8SPG-915: add leader election options to operator.yaml#1433
pooknull wants to merge 11 commits intomainfrom
K8SPG-915

Conversation

@pooknull
Copy link
Contributor

@pooknull pooknull commented Feb 3, 2026

Due to the high volume of requests, we're unable to provide free service for this account. To continue using the service, please upgarde to a paid plan.

https://perconadev.atlassian.net/browse/K8SPG-915

DESCRIPTION

This PR exposes the following operator options through env vars:

  • PGO_CONTROLLER_LEASE_NAME - determines the name of the resource that leader election will use for holding the leader lock
  • PGO_CONTROLLER_LEASE_DURATION - duration that non-leader candidates will wait to force acquire leadership. This is measured against time of last observed ack
  • PGO_CONTROLLER_RENEW_DEADLINE - duration that the acting controlplane will retry refreshing leadership before giving up
  • PGO_CONTROLLER_RETRY_PERIOD - duration the LeaderElector clients should wait between tries of actions
  • PGO_CONTROLLER_LEADER_ELECTION_ENABLED - disable/enable leader election

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

@pooknull pooknull marked this pull request as ready for review February 10, 2026 11:55
Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pooknull users should be able to disable leader election completely

gkech
gkech previously approved these changes Feb 11, 2026
Copilot AI review requested due to automatic review settings February 11, 2026 09:45
Copy link
Member

@mayankshah1607 mayankshah1607 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, lets address @egegunes comment

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configuration knobs for controller-runtime leader election timing to the operator’s manifests and wires those env vars into the operator’s manager initialization.

Changes:

  • Expose leader election env vars (PGO_CONTROLLER_LEASE_*) in deployment manifests (including generated bundle YAMLs).
  • Parse the duration env vars in initManager and set controller-runtime leader election options (LeaseDuration, RenewDeadline, RetryPeriod).
  • Extend initManager unit test to assert the new default duration values.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
deploy/operator.yaml Adds leader election-related env vars to the main deployment manifest.
deploy/cw-operator.yaml Adds the same env vars to the CW deployment manifest.
deploy/cw-bundle.yaml Propagates env vars into the CW generated bundle.
deploy/bundle.yaml Propagates env vars into the generated bundle.
config/manager/default/manager.yaml Adds env vars to the controller manager default deployment.
cmd/postgres-operator/main.go Reads duration env vars and sets controller-runtime leader election timing options.
cmd/postgres-operator/main_test.go Asserts default leader election timing values and keeps zero-value checks working.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hors
hors previously approved these changes Feb 11, 2026
}

// K8SPG-915
getEnvDuration := func(name string, defaultDur time.Duration) (time.Duration, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to use a library like https://github.com/kelseyhightower/envconfig?tab=readme-ov-file#struct-tag-support ? It will save us a lot of this boilerplate because it supports time.Duration type as well. Example

type LeaderElectionConfig struct {
	LeaderElectionEnabled bool          `default:"true" envconfig:"PGO_LEADER_ELECTION_ENABLED"`
	LeaderElectionID      string        `default:"everest-leader-election" envconfig:"PGO_LEADER_ELECTION_ID"`
	LeaseDuration         time.Duration `default:"15s" envconfig:"PGO_CONRTOLLER_LEASE_DURATION"`
	RenewDeadline         time.Duration `default:"10s" envconfig:"PGO_CONRTOLLER_RENEW_DEADLINE"`
	RetryPeriod           time.Duration `default:"2s" envconfig:"PGO_CONRTOLLER_RETRY_PERIOD"`
}

func main() {
	leaderElectionConfig := &LeaderElectionConfig{}
	err := envconfig.Process("", leaderElectionConfig)
	if err != nil {
		// handle err
	}
    leaderElectionConfig.IntoOptions(options) // parse into controller runtime
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like an extra dependency a small feature, but I like this approach, so I implemented it without the dependency: 518e892

Copy link
Member

@mayankshah1607 mayankshah1607 Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that this is a small dependency (~400 LOC) meant for exactly such use-cases - avoiding boilerplate when parsing envvars. I think we should use it rather than re-inventing it ourselves, I don't see the benefit in avoiding it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a small dependency is not the worst thing. On the other hand, the boilerplate Andrii needed to add is not huge either ¯_(ツ)_/¯

Only advantage of using the dependency is the ability to pass durations as env vars I guess. @pooknull if you're not fundamentally against using it, let's use this thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

egegunes
egegunes previously approved these changes Feb 12, 2026
Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but consider @mayankshah1607's suggestion

Copilot AI review requested due to automatic review settings February 12, 2026 10:59
@pooknull pooknull dismissed stale reviews from egegunes and hors via 518e892 February 12, 2026 10:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@egegunes egegunes added this to the v2.9.0 milestone Feb 12, 2026
Copilot AI review requested due to automatic review settings February 13, 2026 10:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Comment on lines +520 to +524
type envConfig struct {
LeaderElection bool `default:"true" envconfig:"PGO_CONTROLLER_LEADER_ELECTION_ENABLED"`
LeaderElectionID string `envconfig:"PGO_CONTROLLER_LEASE_NAME"`
LeaderElectionNamespace string `envconfig:"PGO_NAMESPACE"`

Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Problem: envConfig uses unexported (lowercase) struct fields, which github.com/kelseyhightower/envconfig cannot populate via reflection, so defaults/env vars may not be applied.
  2. Why it matters: This can silently disable leader election and set zero durations, potentially breaking operator behavior at runtime.
  3. Fix: Export the fields (e.g., LeaderElection, LeaseDuration, etc.) or switch to a parsing approach that does not rely on setting unexported fields.

Copilot uses AI. Check for mistakes.
Comment on lines +294 to +298
// K8SPG-915
envs := new(envConfig)
if err := envconfig.Process("", envs); err != nil {
return options, errors.Wrap(err, "parse env vars")
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Problem: envconfig.Process now makes initManager fail when PGO_WORKERS is set to a non-integer value (e.g., 3.14), whereas previously invalid values were logged and ignored.
  2. Why it matters: This is a behavior/compatibility change that can prevent the operator from starting if a bad value is present in an existing deployment.
  3. Fix: If fail-fast isn’t intended, parse PGO_WORKERS separately (string -> int) and keep the previous log-and-ignore behavior for invalid values, while still using envconfig for the new leader-election durations.

Copilot uses AI. Check for mistakes.
options, err := initManager(ctx)
assert.NilError(t, err)
if v == "3.14" {
assert.ErrorContains(t, err, "parse env vars: envconfig.Process: assigning PGO_WORKERS to Workers: converting '3.14' to type int. details: strconv.ParseInt: parsing \"3.14\": invalid syntax")
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Problem: The test asserts an overly specific substring for the envconfig parse error, including library-internal wording.
  2. Why it matters: Minor changes in envconfig (or Go’s strconv error text) can break the test even when behavior is still correct.
  3. Fix: Assert on a few stable substrings (e.g., "parse env vars", "PGO_WORKERS", and "invalid syntax") instead of the full error text.
Suggested change
assert.ErrorContains(t, err, "parse env vars: envconfig.Process: assigning PGO_WORKERS to Workers: converting '3.14' to type int. details: strconv.ParseInt: parsing \"3.14\": invalid syntax")
assert.ErrorContains(t, err, "parse env vars")
assert.ErrorContains(t, err, "PGO_WORKERS")
assert.ErrorContains(t, err, "invalid syntax")

Copilot uses AI. Check for mistakes.
@JNKPercona
Copy link
Collaborator

Test Name Result Time
backup-enable-disable passed 00:06:12
builtin-extensions passed 00:05:12
custom-envs passed 00:18:37
custom-extensions passed 00:14:33
custom-tls passed 00:05:15
database-init-sql passed 00:02:03
demand-backup passed 00:24:08
finalizers passed 00:03:52
init-deploy passed 00:03:00
huge-pages passed 00:02:46
monitoring passed 00:06:58
monitoring-pmm3 passed 00:08:08
one-pod passed 00:05:49
operator-self-healing passed 00:10:13
pitr passed 00:12:56
scaling passed 00:05:02
scheduled-backup passed 00:24:23
self-healing passed 00:09:09
sidecars passed 00:02:45
standby-pgbackrest passed 00:12:00
standby-streaming passed 00:09:08
start-from-backup passed 00:10:44
tablespaces passed 00:07:17
telemetry-transfer passed 00:04:24
upgrade-consistency passed 00:05:26
upgrade-minor passed 00:06:15
users passed 00:04:33
Summary Value
Tests Run 27/27
Job Duration 01:22:57
Total Test Time 03:51:00

commit: 403100a
image: perconalab/percona-postgresql-operator:PR-1433-403100a65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants