CORS-4337: allow AWS Europe Sovereign Cloud partition#2708
CORS-4337: allow AWS Europe Sovereign Cloud partition#2708tthvo wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@tthvo: This pull request references CORS-4337 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @tthvo! Some important instructions when contributing to openshift/api: |
📝 WalkthroughWalkthroughThis pull request adds the AWS partition value "aws-eusc" to ARN validation patterns and documentation for DNS privateZoneIAMRole and CSI driver KMSKeyARN. Changes update kubebuilder/OpenAPI regexes, inline comments, generated Swagger docs, multiple CRD variants, and related tests. No field names, types, JSON tags, or runtime logic were changed; the modifications are limited to validation patterns and descriptive text. 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoSupport AWS Europe Sovereign Cloud partition in ARN validation
WalkthroughsDescription• Add support for AWS Europe Sovereign Cloud partition (aws-eusc) • Update ARN validation patterns for DNS and KMS configurations • Add comprehensive test cases for all AWS partitions • Update documentation and generated manifests Diagramflowchart LR
A["AWS Partitions<br/>aws, aws-cn, aws-us-gov, aws-eusc"] -->|Update Validation| B["DNS IAM Role ARN"]
A -->|Update Validation| C["KMS Key ARN"]
B -->|Pattern Match| D["^arn:partition:iam::account:role/.*$"]
C -->|Pattern Match| E["^arn:partition:kms:region:account:key/id$"]
B -->|Test Cases| F["All Partitions Validated"]
C -->|Test Cases| F
File Changes1. config/v1/types_dns.go
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@config/v1/types_kmsencryption.go`:
- Around line 26-36: Update the XValidation message for keyARN to accurately
describe allowed characters: reference the keyARN field and its
+kubebuilder:validation:XValidation rule and change the message text to state
that the region may contain lowercase letters, digits and hyphens and that the
key ID must be lowercase hexadecimal characters and hyphens; ensure the new
message keeps the format example
`arn:<partition>:kms:<region>:<account_id>:key/<key_id>` and mentions the
account ID must be 12 digits and the region is lowercase letters/digits/hyphens
while the key ID is lowercase hex and hyphens.
Code Review by Qodo
1. PrivateZoneIAMRole pattern undocumented
|
everettraven
left a comment
There was a problem hiding this comment.
Overall, this change seems reasonable.
Only question - is every component that relies on the input provided in these fields able to successfully handle the newly introduced partitions?
|
/hold
Ah sorry, I am the middle of checking these scenarios. So, I am holding this PR for now... Background: AWS Europe Sovereign Cloud (EUSC) requires AWS SDK v2 for out-of-the-box support (i.e. able to resolve the correct endpoint for EUSC regions). However, several cluster operators still use AWS SDK v1 (now EOL), and migrating them to SDK v2 is not feasible within the 4.22 timeline or near future. As a workaround, we're enabling EUSC support through user-provided service endpoints. I'm currently testing whether the new EUSC region honors these custom endpoints, or if only minor patches are needed to make it work. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
operator/v1/tests/clustercsidrivers.operator.openshift.io/AAA_ungated.yaml (1)
134-156: Consider adding positive test coverage for the remaining ISO-variant partitions.The validation regex includes
aws-iso-b,aws-iso-e, andaws-iso-f, but onlyaws-isohas a positive test case. Since this PR is already adding broad partition coverage, it would be a natural opportunity to close this gap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@operator/v1/tests/clustercsidrivers.operator.openshift.io/AAA_ungated.yaml` around lines 134 - 156, Add positive test cases alongside the existing "Should be able to specify an AWS ISO KMS key ARN" case to cover the other ISO partitions: aws-iso-b, aws-iso-e, and aws-iso-f. For each new test, copy the existing initial/expected blocks (preserving apiVersion/kind/metadata/spec structure) and change the kmsKeyARN host partition portion to use arn:aws-iso-b:kms:..., arn:aws-iso-e:kms:..., and arn:aws-iso-f:kms:... respectively so the validation regex is exercised for each partition variant (match the format used in the original kmsKeyARN string).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@operator/v1/tests/clustercsidrivers.operator.openshift.io/AAA_ungated.yaml`:
- Around line 180-191: The negative test uses an ARN that matches the CRD regex,
so update the initial test case for "Should not be able to specify invalid AWS
KMS key ARN" by replacing spec.driverConfig.aws.kmsKeyARN with a deliberately
malformed ARN that fails the pattern—e.g. use an invalid partition name (not
matching arn:(aws|aws-cn|...)) or an account ID with incorrect digit count (not
12 digits); ensure you edit the initial block for the test case that sets
spec.driverConfig.aws.kmsKeyARN so the value no longer matches the regex and the
expectedError for spec.driverConfig.aws.kmsKeyARN triggers.
---
Nitpick comments:
In `@operator/v1/tests/clustercsidrivers.operator.openshift.io/AAA_ungated.yaml`:
- Around line 134-156: Add positive test cases alongside the existing "Should be
able to specify an AWS ISO KMS key ARN" case to cover the other ISO partitions:
aws-iso-b, aws-iso-e, and aws-iso-f. For each new test, copy the existing
initial/expected blocks (preserving apiVersion/kind/metadata/spec structure) and
change the kmsKeyARN host partition portion to use arn:aws-iso-b:kms:...,
arn:aws-iso-e:kms:..., and arn:aws-iso-f:kms:... respectively so the validation
regex is exercised for each partition variant (match the format used in the
original kmsKeyARN string).
operator/v1/tests/clustercsidrivers.operator.openshift.io/AAA_ungated.yaml
Show resolved
Hide resolved
According to AWS docs, ARNs in AWS European Sovereign Cloud begin with arn:aws-eusc: Thus, to support EUS Cloud, we need to update the validation to allow this new format. This commits only focuses on kmsKeyARN and privateZoneIAMRole.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
payload-manifests/crds/0000_10_config-operator_01_dnses.crd.yaml (1)
74-77: LGTM —aws-euscpartition and updated description are correct.The AWS European Sovereign Cloud is accessed using partition name
aws-eusc, so the regex addition is accurate. The documentation on lines 74–76 precisely maps to the updated pattern.One concern worth tracking (which the PR is already on hold for): the AWS European Sovereign Cloud uses a distinct domain namespace (
*.amazonaws.eu) separate from commercial AWS. Validating the ARN format here is a necessary first step, but ensure every downstream operator that consumesprivateZoneIAMRolecan resolve*.amazonaws.euendpoints — AWS SDK (Terraform AWS provider) 6.x and Terraform 1.14+ support ESC natively; older versions require manual endpoint configuration. Components still on AWS SDK v1 (EOL) will not resolve EUSC endpoints out of the box, which the PR discussion already flags.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_10_config-operator_01_dnses.crd.yaml` around lines 74 - 77, CRD change adds the aws-eusc partition to the ARN regex for privateZoneIAMRole but downstream consumers may not resolve AWS European Sovereign Cloud endpoints; update the `privateZoneIAMRole` description (and any operator docs) to explicitly state the requirement that consumers support the aws-eusc (`*.amazonaws.eu`) endpoints and list the minimum SDK/provider versions (e.g., AWS SDK v2+/Terraform AWS provider 6.x, Terraform 1.14+) or add a validating webhook/annotation check to reject configurations when the operator runtime is on unsupported SDK/provider versions; reference `privateZoneIAMRole` and the updated ARN pattern in your edits so the note is colocated with the schema change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@payload-manifests/crds/0000_10_config-operator_01_dnses.crd.yaml`:
- Around line 74-77: CRD change adds the aws-eusc partition to the ARN regex for
privateZoneIAMRole but downstream consumers may not resolve AWS European
Sovereign Cloud endpoints; update the `privateZoneIAMRole` description (and any
operator docs) to explicitly state the requirement that consumers support the
aws-eusc (`*.amazonaws.eu`) endpoints and list the minimum SDK/provider versions
(e.g., AWS SDK v2+/Terraform AWS provider 6.x, Terraform 1.14+) or add a
validating webhook/annotation check to reject configurations when the operator
runtime is on unsupported SDK/provider versions; reference `privateZoneIAMRole`
and the updated ARN pattern in your edits so the note is colocated with the
schema change.
|
@tthvo: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/unhold |
|
@tthvo It might make sense for us to feature gate this change so that we can merge the validation change and add automated regression testing that helps us identify cases where this causes things to break. From there, we can update things accordingly under the same feature gate. |
Right... I am just a bit hesitant if we should (or have ever) create a feature gate for installing into a (new) region. I'll double check with @patrickdillon when he's back next week. My understand is that allowing
To answer your original question, after some testing, I'd say yes with a small adjustment to make sure the operators honour necessary custom service endpoints, which they really should. They may have missed it when adding these arn input fields previously - bug? The following API fields are updated to allow
|
|
@everettraven Thanks for having a look! I added my views and (local) test results above. Please let me know what you think 🙏 I'll check with Patrick about the feature gate when he's back. |
@everettraven I'm having trouble wrapping my mind around how this would be feature gated; and I'm not sure it makes sense. Is it even possible to isolate the changes needed here behind a feature gate without affecting the existing validation? |
We have ways to feature gate CEL-based validations. As an example: api/config/v1/types_authentication.go Line 253 in de86ee3 You can also set: to have a default ungated behavior that is overridden when specific gates are enabled. This would allow us to add regression testing for the newly valid input formats for impacted components before shipping as GA. |
Ah, cool! That is exactly what I was missing. I still don't quite get how it works--do we need to specify the validation that is overridden or would adding
I'm fine with adding a feature gate if that is the preference, but I'm not sure it amounts to any meaningful added value because testing EUS will already require creating new jobs. The testing pattern would be the same with or without the gate. I think the real value in feature gates is when we can use them to turn features on/off in a general job; when the feature requires a dedicated job I don't see much benefit. |
You would need to convert this pattern validation to a gated CEL expression, which CEL is our new preference anyways for regex-based input constraints because we can provide better error messages to users than just throwing a regex at them.
The main reason we have this preference is for situations where there are significant enough unknowns in the impacts to components across the platform with a change like this. For simple validation changes that impact a single component that we are certain won't cause issues once we GA the validation change we don't require the gate. It is basically a stop-gap for us to prevent customers being hit with an issue because we GA'd the validation change without fixing an issue in a component that could have been caught through automated regression testing requirements as part of the feature promotion process. |
|
@everettraven The point I was trying to make is that in this case feature-gate promotion and regression testing are orthogonal: all existing aws testing will be a regression test for this change. I don't think the feature gate particularly helps with that. More relevantly: after further considering Thuan's comment, I think there is a good case that this is a bug. The validation is overly specific and the validation is the only thing preventing BYO storage keys. If we feature gate this, it could be an issue in the case we need to backport the change. |
It makes sure that components that cannot support this do not see a value they cannot handle until they have been successfully fixed to support that value.
You can backport feature gates and their enablement. It is just a bit more work than a traditional bug fix. @tthvo 's comment of:
and the requirement that openshift/cluster-ingress-operator#1360 needed to be made to support this change makes me think this is a bit more than a bug because this wasn't something previously supported by at least one component. $Thing should support X often reads to me as more of a feature request than a bug. I consider a validation loosening change a bugfix only if the components impacted by the validation change has already supported the newly allowed input for as far back as the fix needs to be backported (i.e there is no risk of a user upgrading to the patch release, using the newly valid value, and a component breaking because it can't handle the new value). I'm not trying to be an unnecessary blocker here. I'm just not confident that this falls under the bug criteria here. Maybe @JoelSpeed has a differing opinion here, and if he agrees that this is reasonable to consider a bug I'm happy to concede as he has more experience in what reasonably qualifies as a bug here than I do. |
|
Looking at how much content is attached to https://issues.redhat.com/browse/OCPSTRAT-2500, I think this is a feature and should be gated. Note that there are epics in there that aren't even started yet. We should aim not to deliver a half baked feature ungated and at this point in the release cycle, are we likely to get this all merged in time? |
|
We will go ahead with creating the feature gate. To be honest, I was reluctant to start this conversation, because I think it is challenging to discuss the feature gate process without coming across as "whining" or like you're trying to get around something. I have been a long-term supporter of feature gating. I appreciate your keeping an open mind. The reason I'm speaking up here, is because I'm concerned about the maintainability of this specific code and that the semantics of this process are getting in the way of delivering value to our customers. For example, we could perform a UPI install in this region for an already-released version of openshift, perhaps if we had attached a bug for that instead the outcome would be different. More to the point, I do not think we should consider specific AWS partitions features. Keeping a hard-coded list like this is not ideal for maintainability. We can see other examples where they cause issues: #1740 To move the discussion out of this pr, I'm going to slack you with a high-level. |
This commit adds CI infrastructure support for AWS European Sovereign Cloud (EUSC) testing using the eusc-de-east-1 region. Changes: - Add cluster-secrets-aws-eusc-qe to secret bootstrap config - Add aws-eusc-qe-quota-slice boskos resource pool with 5 quota slices in eusc-de-east-1 region - Generate updated _boskos.yaml configuration The profile uses vault secret: cluster-secrets-aws-eusc-qe Region Details: - Region: eusc-de-east-1 (Brandenburg, Germany) - Availability zones: eusc-de-east-1a, eusc-de-east-1b (2 zones only) - Note: No edge zones (Local/Wavelength) available in this region Dependencies: - Installer support: openshift/installer#10303 - Ingress operator support: openshift/cluster-ingress-operator#1360 - API support (optional): openshift/api#2708 Tested installation scenarios (from installer PR openshift#10303): - ✅ Default minimal config - ✅ Ingress NLB type - ✅ BYO Private Hosted zone - ✅ BYO KMS key - ✅ BYO VPC and subnets Installation time: ~45m26s Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit adds CI infrastructure support for AWS European Sovereign Cloud (EUSC) testing using the eusc-de-east-1 region. Changes: - Add cluster-secrets-aws-eusc-qe to secret bootstrap config - Add aws-eusc-qe-quota-slice boskos resource pool with 5 quota slices in eusc-de-east-1 region - Generate updated _boskos.yaml configuration Region Details: - Region: eusc-de-east-1 (Brandenburg, Germany) - Availability zones: eusc-de-east-1a, eusc-de-east-1b (2 zones only) - Note: No edge zones (Local/Wavelength) available in this region Dependencies: - Installer support: openshift/installer#10303 - Ingress operator support: openshift/cluster-ingress-operator#1360 - API support (optional): openshift/api#2708
According to AWS docs, ARNs in AWS European Sovereign Cloud begin with
Thus, to support EUS Cloud, we need to update the validation to allow this new format.