Skip to content

OCPEDGE-2197: Promote TNF (Dual Replica) to default#2723

Open
eggfoobar wants to merge 3 commits intoopenshift:masterfrom
eggfoobar:OCPEDGE-2070-promote-TNF-to-GA
Open

OCPEDGE-2197: Promote TNF (Dual Replica) to default#2723
eggfoobar wants to merge 3 commits intoopenshift:masterfrom
eggfoobar:OCPEDGE-2070-promote-TNF-to-GA

Conversation

@eggfoobar
Copy link
Contributor

Promote DualReplica to default.

Two nodes with Fencing (DualReplica) allows deploying of a two-node OpenShift cluster with the necessary fencing tooling to prevent split-brain.

Enhancement Proposal: openshift/enhancements#1675

Duplicate of #2490 while author is on PTO

fonta-rh and others added 3 commits February 24, 2026 09:55
All features enabled in Default must also be enabled in OKD to maintain
feature parity with the community distribution. This fixes the failing
TestOKDHasAllDefaultFeatureGates unit test.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: ehila <ehila@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 24, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 24, 2026

@eggfoobar: This pull request references OCPEDGE-2197 which is a valid jira issue.

Details

In response to this:

Promote DualReplica to default.

Two nodes with Fencing (DualReplica) allows deploying of a two-node OpenShift cluster with the necessary fencing tooling to prevent split-brain.

Enhancement Proposal: openshift/enhancements#1675

Duplicate of #2490 while author is on PTO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

The pull request adds DualReplica as a supported control plane topology and enables it across multiple cluster profiles. Changes include adding DualReplica enum values to controlPlaneTopology fields in multiple CRD files (Infrastructure and ControllerConfig variants), moving the DualReplica feature gate from disabled to enabled lists in Hypershift and SelfManagedHA feature gate manifests, and expanding feature gate activation conditions to include inDefault() and inOKD() profiles. Additional modifications include a table row repositioning in documentation and removal of a metadata annotation from a CRD.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: promoting DualReplica to default, matching the substantive changes across feature gates and CRDs.
Description check ✅ Passed The description directly relates to the changeset, explaining the business purpose of promoting DualReplica and referencing the enhancement proposal and related PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented
The command is terminated due to an error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 24, 2026

Hello @eggfoobar! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 24, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml (2)

1046-1059: ⚠️ Potential issue | 🟡 Minor

DualReplica (and HighlyAvailableArbiter) are undocumented in the field description.

The description for status.controlPlaneTopology explains HighlyAvailable, SingleReplica, and External, but the enum now contains five values. Neither DualReplica (added here) nor HighlyAvailableArbiter (pre-existing gap) are described. Downstream consumers relying on this description for behaviour guidance won't know what DualReplica means or when to expect it.

📝 Suggested description update
             description: |-
               controlPlaneTopology expresses the expectations for operands that normally run on control nodes.
               The default is 'HighlyAvailable', which represents the behavior operators have in a "normal" cluster.
               The 'SingleReplica' mode will be used in single-node deployments
               and the operators should not configure the operand for highly-available operation
               The 'External' mode indicates that the control plane is hosted externally to the cluster and that
               its components are not visible within the cluster.
+              The 'DualReplica' mode will be used in two-node deployments with fencing tooling to prevent split-brain
+              and the operators should configure the operand accordingly.
+              The 'HighlyAvailableArbiter' mode indicates a highly-available cluster with an arbiter node.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml`
around lines 1046 - 1059, The description for status.controlPlaneTopology is
missing definitions for the enum values "DualReplica" and
"HighlyAvailableArbiter"; update the multi-line description under
controlPlaneTopology to add concise explanations for HighlyAvailableArbiter
(what arbiter means for quorum/placement and when it is used) and DualReplica
(what two-replica behavior implies for availability/placement/failure modes and
when the operator will set it), keeping the existing descriptions for
HighlyAvailable, SingleReplica and External; ensure the added wording clearly
maps each enum value to expected operator behavior and cluster topology so
downstream consumers can determine behavior from status.controlPlaneTopology.

1086-1099: ⚠️ Potential issue | 🟡 Minor

Add clarifying note that DualReplica infrastructure topology is not applicable to this field.

The infrastructureTopology field intentionally restricts its enum to HighlyAvailable and SingleReplica, excluding DualReplica even though controlPlaneTopology accepts it. The existing description already notes that External mode is not applicable; add a similar explicit note about DualReplica to clarify the intent and prevent future confusion. Test files confirm this design: they pair controlPlaneTopology: DualReplica with infrastructureTopology: HighlyAvailable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml`
around lines 1086 - 1099, Update the infrastructureTopology description to
explicitly state that the DualReplica topology is not applicable to this field:
modify the documentation under the infrastructureTopology key (where enum
contains HighlyAvailable and SingleReplica) to add a sentence similar to the
existing External note, e.g., "NOTE: DualReplica mode is not applicable for this
field." Also mention controlPlaneTopology for clarity if helpful (tests pair
controlPlaneTopology: DualReplica with infrastructureTopology: HighlyAvailable)
so future readers know DualReplica is intentionally excluded from the enum.
payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml (1)

1327-1338: ⚠️ Potential issue | 🟡 Minor

controlPlaneTopology description omits DualReplica (and the pre-existing HighlyAvailableArbiter).

The field description at Lines 1327–1333 only documents HighlyAvailable, SingleReplica, and External, leaving both HighlyAvailableArbiter and the newly added DualReplica undocumented. Since this CRD is generated from Go type source comments, the fix should be applied to the upstream controlPlaneTopology field doc comment (and re-generated) rather than directly in this YAML.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml`
around lines 1327 - 1338, The CRD description for the controlPlaneTopology enum
is missing documentation for the DualReplica and HighlyAvailableArbiter values;
update the upstream Go struct field comment for controlPlaneTopology (the doc
comment on the Go type that generates this CRD) to include short descriptions
for HighlyAvailableArbiter and DualReplica consistent with the existing entries,
then re-generate the CRD so the YAML reflects the updated comment (ensure the
comment sits immediately above the controlPlaneTopology field in the Go source
and run the project’s CRD generation script).
🧹 Nitpick comments (1)
payload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yaml (1)

1045-1058: Clarify DualReplica semantics in the description.

Now that DualReplica is an allowed value, consider documenting its meaning alongside the other modes to avoid ambiguity for API consumers.

✏️ Suggested doc tweak
               controlPlaneTopology:
                 default: HighlyAvailable
                 description: |-
                   controlPlaneTopology expresses the expectations for operands that normally run on control nodes.
                   The default is 'HighlyAvailable', which represents the behavior operators have in a "normal" cluster.
+                  The 'DualReplica' mode represents a two-node control plane with fencing to prevent split-brain.
                   The 'SingleReplica' mode will be used in single-node deployments
                   and the operators should not configure the operand for highly-available operation
                   The 'External' mode indicates that the control plane is hosted externally to the cluster and that
                   its components are not visible within the cluster.
                 enum:
                 - HighlyAvailable
                 - HighlyAvailableArbiter
                 - SingleReplica
                 - DualReplica
                 - External
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yaml`
around lines 1045 - 1058, The description for the controlPlaneTopology enum is
missing semantics for the new DualReplica value; update the controlPlaneTopology
description (the enum block containing HighlyAvailable, HighlyAvailableArbiter,
SingleReplica, DualReplica, DualReplica) to add a concise explanation of
DualReplica (e.g., indicate it represents a two-node highly-available control
plane with automatic quorum/arbiter behavior or specific failover
characteristics used for small HA clusters) so API consumers understand when to
choose DualReplica; ensure the new sentence follows the same style as existing
entries and appears adjacent to the other mode descriptions in the
controlPlaneTopology CRD description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml`:
- Around line 1046-1059: The description for status.controlPlaneTopology is
missing definitions for the enum values "DualReplica" and
"HighlyAvailableArbiter"; update the multi-line description under
controlPlaneTopology to add concise explanations for HighlyAvailableArbiter
(what arbiter means for quorum/placement and when it is used) and DualReplica
(what two-replica behavior implies for availability/placement/failure modes and
when the operator will set it), keeping the existing descriptions for
HighlyAvailable, SingleReplica and External; ensure the added wording clearly
maps each enum value to expected operator behavior and cluster topology so
downstream consumers can determine behavior from status.controlPlaneTopology.
- Around line 1086-1099: Update the infrastructureTopology description to
explicitly state that the DualReplica topology is not applicable to this field:
modify the documentation under the infrastructureTopology key (where enum
contains HighlyAvailable and SingleReplica) to add a sentence similar to the
existing External note, e.g., "NOTE: DualReplica mode is not applicable for this
field." Also mention controlPlaneTopology for clarity if helpful (tests pair
controlPlaneTopology: DualReplica with infrastructureTopology: HighlyAvailable)
so future readers know DualReplica is intentionally excluded from the enum.

In
`@payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml`:
- Around line 1327-1338: The CRD description for the controlPlaneTopology enum
is missing documentation for the DualReplica and HighlyAvailableArbiter values;
update the upstream Go struct field comment for controlPlaneTopology (the doc
comment on the Go type that generates this CRD) to include short descriptions
for HighlyAvailableArbiter and DualReplica consistent with the existing entries,
then re-generate the CRD so the YAML reflects the updated comment (ensure the
comment sits immediately above the controlPlaneTopology field in the Go source
and run the project’s CRD generation script).

---

Nitpick comments:
In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yaml`:
- Around line 1045-1058: The description for the controlPlaneTopology enum is
missing semantics for the new DualReplica value; update the controlPlaneTopology
description (the enum block containing HighlyAvailable, HighlyAvailableArbiter,
SingleReplica, DualReplica, DualReplica) to add a concise explanation of
DualReplica (e.g., indicate it represents a two-node highly-available control
plane with automatic quorum/arbiter behavior or specific failover
characteristics used for small HA clusters) so API consumers understand when to
choose DualReplica; ensure the new sentence follows the same style as existing
entries and appears adjacent to the other mode descriptions in the
controlPlaneTopology CRD description.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between de86ee3 and 81c74f2.

⛔ Files ignored due to path filters (5)
  • config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_infrastructures-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_infrastructures-OKD.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • etcd/v1alpha1/zz_generated.crd-manifests/0000_25_etcd_01_pacemakerclusters.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • machineconfiguration/v1/zz_generated.crd-manifests/0000_80_machine-config_01_controllerconfigs-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • machineconfiguration/v1/zz_generated.crd-manifests/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
📒 Files selected for processing (11)
  • features.md
  • features/features.go
  • payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yaml
  • payload-manifests/crds/0000_25_etcd_01_pacemakerclusters.crd.yaml
  • payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-Default.crd.yaml
  • payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml
  • payload-manifests/featuregates/featureGate-4-10-Hypershift-Default.yaml
  • payload-manifests/featuregates/featureGate-4-10-Hypershift-OKD.yaml
  • payload-manifests/featuregates/featureGate-4-10-SelfManagedHA-Default.yaml
  • payload-manifests/featuregates/featureGate-4-10-SelfManagedHA-OKD.yaml
💤 Files with no reviewable changes (1)
  • payload-manifests/crds/0000_25_etcd_01_pacemakerclusters.crd.yaml

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eggfoobar
Copy link
Contributor Author

/test verify-feature-promotion

@qodo-code-review
Copy link

ⓘ Your monthly quota for Qodo has expired. Upgrade your plan
ⓘ Paying users. Check that your Qodo account is linked with this Git user account

@eggfoobar
Copy link
Contributor Author

/hold

Holding, we need to wait for verify-feature-promotion to be in a good state

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 24, 2026
@eggfoobar
Copy link
Contributor Author

/test verify-feature-promotion

@qodo-code-review
Copy link

ⓘ Your monthly quota for Qodo has expired. Upgrade your plan
ⓘ Paying users. Check that your Qodo account is linked with this Git user account

@eggfoobar
Copy link
Contributor Author

/test verify-feature-promotion

@qodo-code-review
Copy link

ⓘ Your monthly quota for Qodo has expired. Upgrade your plan
ⓘ Paying users. Check that your Qodo account is linked with this Git user account

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2026

@eggfoobar: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-feature-promotion 81c74f2 link true /test verify-feature-promotion

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants