Fixes UnsupportedOperationException when using readManyByPartitionKey for empty pages#49311
Fixes UnsupportedOperationException when using readManyByPartitionKey for empty pages#49311FabianMeiswinkel wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a Cosmos DB readManyByPartitionKey(s) failure where empty pages with immutable response headers could throw UnsupportedOperationException when stamping continuation tokens.
Changes:
- Detects known immutable header map shapes and defensively copies before mutating continuation headers.
- Adds unit tests for immutable-map detection and the customer-reported continuation-token stamping path.
- Preserves the no-op fast path for empty headers when no continuation header needs to be changed.
Show a summary per file
| File | Description |
|---|---|
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/FeedResponse.java |
Copies known immutable header maps before setting/removing continuation tokens. |
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Utils.java |
Adds isImmutableMap helper for known immutable map implementations used by Cosmos internals. |
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/UtilsImmutableMapTests.java |
Adds unit coverage for immutable-map detection. |
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ReadManyByPartitionKeyContinuationTokenTest.java |
Adds regression coverage for immutable non-empty headers in continuation-token stamping. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 1
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
| } | ||
|
|
||
| if (Utils.isImmutableMap(this.header)) { | ||
| this.header = new HashMap<>(this.header); |
There was a problem hiding this comment.
getResponseHeaders() contract changed — silently.? It's a corner concern but the contract fo Before the fix, callers got whatever was passed in (often Collections.unmodifiableMap, i.e. read-only). After a swap, they get a mutable HashMap. External callers could now response.getResponseHeaders().put(...) and coulde pollute the FeedResponse.?
There was a problem hiding this comment.
That is fine - in vast majority of cases it is mutable today anyway - which is a problem but one overlooked when intriducing the public API at V4 day 0. And we can't relaly breka it anymore.
I am mroe concerned with this not being final anymore - that could cause some interesting edge cases - let me look at that mode.
| private void setContinuationTokenInternal(String headerName, String continuationToken) { | ||
| if (!Strings.isNullOrWhiteSpace(continuationToken)) { | ||
| boolean setting = !Strings.isNullOrWhiteSpace(continuationToken); | ||
| boolean clearing = !setting && !this.header.isEmpty() && this.header.containsKey(headerName); |
There was a problem hiding this comment.
Nit: this.header.containsKey(headerName) already checks for this.header.isEmpty()
Is this a deliberate choice?
ananth7592
left a comment
There was a problem hiding this comment.
LGTM, Except for 2 minor comments
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@sdkReviewAgent |
| return; | ||
| } | ||
|
|
||
| if (Utils.isImmutableMap(this.header)) { |
There was a problem hiding this comment.
🟡 Recommendation — Design: Stale contract comment contradicts this fix
The comment at lines 59–69 (just above, not in the diff) explicitly states:
"If a future code path introduces an immutable non-null header map, the setContinuationTokenInternal method will fail fast with UnsupportedOperationException, and the fix should be to make the upstream pipeline emit a mutable map rather than adding defensive cloning here."
This PR does exactly what that comment says not to do — and doesn't update the comment. The comment also asserts "Non-null maps are always mutable", which is factually incorrect given that five headerResponse() methods produce immutable maps via Utils.immutableMapOf(...).
Why this matters: A future maintainer reading the comment will believe immutable maps should cause a fail-fast crash and that the fix belongs upstream. If they add a new immutable producer that isImmutableMap() doesn't recognize (e.g., Collections.singletonMap() or Map.of()), they'll get the original UnsupportedOperationException and be confused about which strategy to follow.
Suggestion — consider the simpler upstream fix the comment prescribes: The only confirmed crash source is ParallelDocumentQueryExecutionContext.headerResponse() (line 285). Changing it to return a mutable HashMap is a ~3-line change:
private static Map<String, String> headerResponse(double requestCharge) {
Map<String, String> map = new HashMap<>();
map.put(HttpConstants.HttpHeaders.REQUEST_CHARGE, String.valueOf(requestCharge));
return map;
}This follows the pattern already used by addCompositeContinuationToken() at line 267 (new HashMap<>(page.getResponseHeaders())), keeps header as final (addressing your own concern about thread safety), and avoids the need for the new isImmutableMap() infrastructure entirely.
At minimum, if the downstream-defense approach ships, please update the comment at lines 59–69 to reflect the new strategy — otherwise it's a maintenance trap.
|
✅ Review complete (54:17) Posted 1 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Description
The driver stack trace when it fails is the following
26/05/29 08:30:24 WARN TaskSetManager: Lost task 3.0 in stage 42.0 (TID 108773) (10.181.218.61 executor 44): java.lang.UnsupportedOperationException
at java.util.Collections$UnmodifiableMap.put(Collections.java:1459)
at azure_cosmos_spark.com.azure.cosmos.models.FeedResponse.setContinuationTokenInternal(FeedResponse.java:453)
at azure_cosmos_spark.com.azure.cosmos.models.FeedResponse.setContinuationToken(FeedResponse.java:448)
at azure_cosmos_spark.com.azure.cosmos.models.ModelBridgeInternal.setFeedResponseContinuationToken(ModelBridgeInternal.java:605)
at azure_cosmos_spark.com.azure.cosmos.implementation.RxDocumentClientImpl.lambda$buildSequentialBatchFlux$132(RxDocumentClientImpl.java:4940)
at azure_cosmos_spark.reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:106)
at azure_cosmos_spark.reactor.core.publisher.FluxFlatMap$FlatMapMain.tryEmit(FluxFlatMap.java:547)
at azure_cosmos_spark.reactor.core.publisher.FluxFlatMap$FlatMapInner.onNext(FluxFlatMap.java:988)
at azure_cosmos_spark.reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
This can happen when for a single physical partition the requested tuples don't have any matches - the empty page returned would have immutable headers.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines