Region normalization fix#46937
Open
dibahlfi wants to merge 4 commits into
Open
Conversation
Member
Author
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a routing correctness issue in azure-cosmos where customer-supplied region strings in preferred_locations / excluded_locations (client-level and per-request) were compared against account region names using exact string matching, causing non-canonical spellings (e.g., east-us-2, eastus2) to be silently ignored.
Changes:
- Add a single region-name normalization routine and apply it consistently across routing, refresh decisions, and bootstrap locational endpoint construction.
- Add config-time warnings (deduped across refreshes) when configured preferred/excluded regions don’t match any account regions.
- Add targeted tests validating normalization behavior and warning deduplication; update CHANGELOG with the bug fix note.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py |
Implements region normalization, uses normalized lookups for endpoint selection/refresh logic, and emits deduped mismatch warnings. |
sdk/cosmos/azure-cosmos/tests/test_location_cache.py |
Adds regression tests for normalized preferred/excluded locations, non-preferred routing paths, locational endpoint construction, and warning dedupe behavior. |
sdk/cosmos/azure-cosmos/CHANGELOG.md |
Documents the bug fix in the unreleased changelog section. |
Comment on lines
+41
to
+45
| def _normalize_region_name(region_name: str | None) -> str: | ||
| if region_name is None: | ||
| return "" | ||
| normalized = "".join(str(region_name).strip().lower().split()) | ||
| return normalized.replace("-", "").replace("_", "") |
updating chnagelog Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Member
Author
|
/azp run python - cosmos - tests |
Member
Author
|
@sdkReviewAgent-2 |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run python - cosmos - tests |
Member
Author
|
@sdkReviewAgent-2 |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a customer configures a Cosmos client, they pass region names as strings (preferred_locations, excluded_locations). The SDK previously did exact string comparisons against the canonical names returned by the account ("East US 2", "West US 3", ...).
A small spelling difference — "eastus2", "east-us-2", "east_us_2" — silently failed to match, the entry was dropped, and the client could end up routing all traffic through the global endpoint instead of the regional pool.
What this change does:
Region matching is now tolerant of case, surrounding/internal whitespace, hyphens, and underscores.
Equivalent inputs all resolve to the same region:
"East US 2"
"east us 2"
"eastus2"
"EASTUS2"
"east-us-2"
"east_us_2"
" EastUs2 "
Anything beyond that (punctuation, digits, fuzzy matching) is intentionally not stripped — a more aggressive rule could collapse genuinely different regions like "East US" and "East US 2" into the same key and silently route to the wrong region. All client-supplied region-name strings — client-level preferred, client-level excluded, and request-level excluded are normalized
The same matching rule is applied wherever the customer's region string is consumed. Previously some paths used exact-string comparisons; now they all share one normalization rule:
Routing: which region serves a request (preferred + excluded, client-level + per-request).
Refresh decision: whether to schedule a background refresh based on the most-preferred region.
Bootstrap fallback URL: when the global endpoint can't be reached at startup and the SDK constructs a regional URL from a preferred region.
Misconfigured region names now produce a visible warning that names the dropped entry and the regions that were available, emitted at config time — when account metadata is processed at startup and on each background refresh.
There is no warning for per-request values as those happen thousands of times and its a lower blast radius.
No public API change - preferred_locations and excluded_locations remain plain list[str].
Behavior examples
Account regions: ["East US 2", "West US 3"]
Backwards compatibility:
If client config already uses the exact spelling Azure returns (e.g., "East US 2"), nothing changes
New Azure regions continue to work without an SDK upgrade; nothing in this change hardcodes a region list.
What this PR does not do:
Does not add a Regions constants surface. Adding a named-constant list of well-known regions would expand the public API surface area. Normalization plus a visible warning is enough to close the failure mode without touching the public surface today. Constants are therefore deferred to a later stable release as a separate, additive change.