OCPEDGE-2197: Promote TNF (Dual Replica) to default#2723
OCPEDGE-2197: Promote TNF (Dual Replica) to default#2723eggfoobar wants to merge 3 commits intoopenshift:masterfrom
Conversation
All features enabled in Default must also be enabled in OKD to maintain feature parity with the community distribution. This fixes the failing TestOKDHasAllDefaultFeatureGates unit test. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: ehila <ehila@redhat.com>
|
@eggfoobar: This pull request references OCPEDGE-2197 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
📝 WalkthroughWalkthroughThe pull request adds DualReplica as a supported control plane topology and enables it across multiple cluster profiles. Changes include adding DualReplica enum values to controlPlaneTopology fields in multiple CRD files (Infrastructure and ControllerConfig variants), moving the DualReplica feature gate from disabled to enabled lists in Hypershift and SelfManagedHA feature gate manifests, and expanding feature gate activation conditions to include inDefault() and inOKD() profiles. Additional modifications include a table row repositioning in documentation and removal of a metadata annotation from a CRD. 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented Comment |
|
Hello @eggfoobar! Some important instructions when contributing to openshift/api: |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml (2)
1046-1059:⚠️ Potential issue | 🟡 Minor
DualReplica(andHighlyAvailableArbiter) are undocumented in the field description.The
descriptionforstatus.controlPlaneTopologyexplainsHighlyAvailable,SingleReplica, andExternal, but the enum now contains five values. NeitherDualReplica(added here) norHighlyAvailableArbiter(pre-existing gap) are described. Downstream consumers relying on this description for behaviour guidance won't know whatDualReplicameans or when to expect it.📝 Suggested description update
description: |- controlPlaneTopology expresses the expectations for operands that normally run on control nodes. The default is 'HighlyAvailable', which represents the behavior operators have in a "normal" cluster. The 'SingleReplica' mode will be used in single-node deployments and the operators should not configure the operand for highly-available operation The 'External' mode indicates that the control plane is hosted externally to the cluster and that its components are not visible within the cluster. + The 'DualReplica' mode will be used in two-node deployments with fencing tooling to prevent split-brain + and the operators should configure the operand accordingly. + The 'HighlyAvailableArbiter' mode indicates a highly-available cluster with an arbiter node.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml` around lines 1046 - 1059, The description for status.controlPlaneTopology is missing definitions for the enum values "DualReplica" and "HighlyAvailableArbiter"; update the multi-line description under controlPlaneTopology to add concise explanations for HighlyAvailableArbiter (what arbiter means for quorum/placement and when it is used) and DualReplica (what two-replica behavior implies for availability/placement/failure modes and when the operator will set it), keeping the existing descriptions for HighlyAvailable, SingleReplica and External; ensure the added wording clearly maps each enum value to expected operator behavior and cluster topology so downstream consumers can determine behavior from status.controlPlaneTopology.
1086-1099:⚠️ Potential issue | 🟡 MinorAdd clarifying note that
DualReplicainfrastructure topology is not applicable to this field.The
infrastructureTopologyfield intentionally restricts its enum toHighlyAvailableandSingleReplica, excludingDualReplicaeven thoughcontrolPlaneTopologyaccepts it. The existing description already notes thatExternalmode is not applicable; add a similar explicit note aboutDualReplicato clarify the intent and prevent future confusion. Test files confirm this design: they paircontrolPlaneTopology: DualReplicawithinfrastructureTopology: HighlyAvailable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml` around lines 1086 - 1099, Update the infrastructureTopology description to explicitly state that the DualReplica topology is not applicable to this field: modify the documentation under the infrastructureTopology key (where enum contains HighlyAvailable and SingleReplica) to add a sentence similar to the existing External note, e.g., "NOTE: DualReplica mode is not applicable for this field." Also mention controlPlaneTopology for clarity if helpful (tests pair controlPlaneTopology: DualReplica with infrastructureTopology: HighlyAvailable) so future readers know DualReplica is intentionally excluded from the enum.payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml (1)
1327-1338:⚠️ Potential issue | 🟡 Minor
controlPlaneTopologydescription omitsDualReplica(and the pre-existingHighlyAvailableArbiter).The field description at Lines 1327–1333 only documents
HighlyAvailable,SingleReplica, andExternal, leaving bothHighlyAvailableArbiterand the newly addedDualReplicaundocumented. Since this CRD is generated from Go type source comments, the fix should be applied to the upstreamcontrolPlaneTopologyfield doc comment (and re-generated) rather than directly in this YAML.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml` around lines 1327 - 1338, The CRD description for the controlPlaneTopology enum is missing documentation for the DualReplica and HighlyAvailableArbiter values; update the upstream Go struct field comment for controlPlaneTopology (the doc comment on the Go type that generates this CRD) to include short descriptions for HighlyAvailableArbiter and DualReplica consistent with the existing entries, then re-generate the CRD so the YAML reflects the updated comment (ensure the comment sits immediately above the controlPlaneTopology field in the Go source and run the project’s CRD generation script).
🧹 Nitpick comments (1)
payload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yaml (1)
1045-1058: ClarifyDualReplicasemantics in the description.Now that
DualReplicais an allowed value, consider documenting its meaning alongside the other modes to avoid ambiguity for API consumers.✏️ Suggested doc tweak
controlPlaneTopology: default: HighlyAvailable description: |- controlPlaneTopology expresses the expectations for operands that normally run on control nodes. The default is 'HighlyAvailable', which represents the behavior operators have in a "normal" cluster. + The 'DualReplica' mode represents a two-node control plane with fencing to prevent split-brain. The 'SingleReplica' mode will be used in single-node deployments and the operators should not configure the operand for highly-available operation The 'External' mode indicates that the control plane is hosted externally to the cluster and that its components are not visible within the cluster. enum: - HighlyAvailable - HighlyAvailableArbiter - SingleReplica - DualReplica - External🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yaml` around lines 1045 - 1058, The description for the controlPlaneTopology enum is missing semantics for the new DualReplica value; update the controlPlaneTopology description (the enum block containing HighlyAvailable, HighlyAvailableArbiter, SingleReplica, DualReplica, DualReplica) to add a concise explanation of DualReplica (e.g., indicate it represents a two-node highly-available control plane with automatic quorum/arbiter behavior or specific failover characteristics used for small HA clusters) so API consumers understand when to choose DualReplica; ensure the new sentence follows the same style as existing entries and appears adjacent to the other mode descriptions in the controlPlaneTopology CRD description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yaml`:
- Around line 1046-1059: The description for status.controlPlaneTopology is
missing definitions for the enum values "DualReplica" and
"HighlyAvailableArbiter"; update the multi-line description under
controlPlaneTopology to add concise explanations for HighlyAvailableArbiter
(what arbiter means for quorum/placement and when it is used) and DualReplica
(what two-replica behavior implies for availability/placement/failure modes and
when the operator will set it), keeping the existing descriptions for
HighlyAvailable, SingleReplica and External; ensure the added wording clearly
maps each enum value to expected operator behavior and cluster topology so
downstream consumers can determine behavior from status.controlPlaneTopology.
- Around line 1086-1099: Update the infrastructureTopology description to
explicitly state that the DualReplica topology is not applicable to this field:
modify the documentation under the infrastructureTopology key (where enum
contains HighlyAvailable and SingleReplica) to add a sentence similar to the
existing External note, e.g., "NOTE: DualReplica mode is not applicable for this
field." Also mention controlPlaneTopology for clarity if helpful (tests pair
controlPlaneTopology: DualReplica with infrastructureTopology: HighlyAvailable)
so future readers know DualReplica is intentionally excluded from the enum.
In
`@payload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yaml`:
- Around line 1327-1338: The CRD description for the controlPlaneTopology enum
is missing documentation for the DualReplica and HighlyAvailableArbiter values;
update the upstream Go struct field comment for controlPlaneTopology (the doc
comment on the Go type that generates this CRD) to include short descriptions
for HighlyAvailableArbiter and DualReplica consistent with the existing entries,
then re-generate the CRD so the YAML reflects the updated comment (ensure the
comment sits immediately above the controlPlaneTopology field in the Go source
and run the project’s CRD generation script).
---
Nitpick comments:
In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yaml`:
- Around line 1045-1058: The description for the controlPlaneTopology enum is
missing semantics for the new DualReplica value; update the controlPlaneTopology
description (the enum block containing HighlyAvailable, HighlyAvailableArbiter,
SingleReplica, DualReplica, DualReplica) to add a concise explanation of
DualReplica (e.g., indicate it represents a two-node highly-available control
plane with automatic quorum/arbiter behavior or specific failover
characteristics used for small HA clusters) so API consumers understand when to
choose DualReplica; ensure the new sentence follows the same style as existing
entries and appears adjacent to the other mode descriptions in the
controlPlaneTopology CRD description.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (5)
config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_infrastructures-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_infrastructures-OKD.crd.yamlis excluded by!**/zz_generated.crd-manifests/*etcd/v1alpha1/zz_generated.crd-manifests/0000_25_etcd_01_pacemakerclusters.crd.yamlis excluded by!**/zz_generated.crd-manifests/*machineconfiguration/v1/zz_generated.crd-manifests/0000_80_machine-config_01_controllerconfigs-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/*machineconfiguration/v1/zz_generated.crd-manifests/0000_80_machine-config_01_controllerconfigs-OKD.crd.yamlis excluded by!**/zz_generated.crd-manifests/*
📒 Files selected for processing (11)
features.mdfeatures/features.gopayload-manifests/crds/0000_10_config-operator_01_infrastructures-Default.crd.yamlpayload-manifests/crds/0000_10_config-operator_01_infrastructures-OKD.crd.yamlpayload-manifests/crds/0000_25_etcd_01_pacemakerclusters.crd.yamlpayload-manifests/crds/0000_80_machine-config_01_controllerconfigs-Default.crd.yamlpayload-manifests/crds/0000_80_machine-config_01_controllerconfigs-OKD.crd.yamlpayload-manifests/featuregates/featureGate-4-10-Hypershift-Default.yamlpayload-manifests/featuregates/featureGate-4-10-Hypershift-OKD.yamlpayload-manifests/featuregates/featureGate-4-10-SelfManagedHA-Default.yamlpayload-manifests/featuregates/featureGate-4-10-SelfManagedHA-OKD.yaml
💤 Files with no reviewable changes (1)
- payload-manifests/crds/0000_25_etcd_01_pacemakerclusters.crd.yaml
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test verify-feature-promotion |
ⓘ Your monthly quota for Qodo has expired. Upgrade your plan ⓘ Paying users. Check that your Qodo account is linked with this Git user account |
|
/hold Holding, we need to wait for |
|
/test verify-feature-promotion |
ⓘ Your monthly quota for Qodo has expired. Upgrade your plan ⓘ Paying users. Check that your Qodo account is linked with this Git user account |
|
/test verify-feature-promotion |
ⓘ Your monthly quota for Qodo has expired. Upgrade your plan ⓘ Paying users. Check that your Qodo account is linked with this Git user account |
|
@eggfoobar: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Promote DualReplica to default.
Two nodes with Fencing (DualReplica) allows deploying of a two-node OpenShift cluster with the necessary fencing tooling to prevent split-brain.
Enhancement Proposal: openshift/enhancements#1675
Duplicate of #2490 while author is on PTO