Retry OAuth token refresh on server errors#4513
Open
gkatz2 wants to merge 2 commits intostacklok:mainfrom
Open
Retry OAuth token refresh on server errors#4513gkatz2 wants to merge 2 commits intostacklok:mainfrom
gkatz2 wants to merge 2 commits intostacklok:mainfrom
Conversation
When a load balancer or CDN returns an HTML error page during token refresh, the workload is immediately marked as unauthenticated with no retry. Remote MCP servers become permanently broken until manually restarted, even when the OAuth server recovers seconds later. Fixes stacklok#4512 Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Greg Katz <gkatz@indeed.com>
Signed-off-by: Greg Katz <gkatz@indeed.com>
de82747 to
540a487
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4513 +/- ##
==========================================
- Coverage 69.64% 69.07% -0.58%
==========================================
Files 491 502 +11
Lines 50304 51973 +1669
==========================================
+ Hits 35036 35900 +864
- Misses 12580 13285 +705
- Partials 2688 2788 +100 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jerm-dro
approved these changes
Apr 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
isTransientNetworkError()treats these as permanent auth failures, immediately marking the workload as "unauthenticated" with no retry. Remote MCP servers become permanently broken until manually restarted.*oauth2.RetrieveErrorwith 5xx status as transient (retry with backoff). 4xx errors remain permanent.Fixes #4512
Type of change
Test plan
task test)task lint-fix)Does this introduce a user-facing change?
Remote MCP servers with OAuth authentication now survive transient token endpoint outages (5xx errors, HTML error pages from load balancers) instead of permanently breaking.
Special notes for reviewers
The
isOAuthParseErrorhelper uses string matching against"oauth2: cannot parse json"and"oauth2: cannot parse response"because the oauth2 library wraps these withfmt.Errorf("%v", err)(not%w), making type-based detection impossible. These strings have been stable across oauth2 v0.33.0 through v0.36.0. If they ever change, the worst case is a return to current behavior (no regression).Generated with Claude Code