fix(mcp/oauth): discover RFC 8414 §3.1 path-aware metadata URLs#2877
Open
dgageot wants to merge 4 commits into
Open
fix(mcp/oauth): discover RFC 8414 §3.1 path-aware metadata URLs#2877dgageot wants to merge 4 commits into
dgageot wants to merge 4 commits into
Conversation
Adds TestCatalogOAuthDiscoveryLive, a subtest-per-server probe over every
oauth server in the embedded catalog. For each it verifies the structural
prerequisites docker-agent needs:
- 401 + WWW-Authenticate from the MCP endpoint
- protected-resource metadata reachable
- authorization-server metadata reachable (with openid-configuration
fallback)
- HTTPS registration_endpoint (Dynamic Client Registration)
- PKCE S256 advertised
Skipped by default because it makes real HTTPS calls to ~17 third-party
servers and is unsuitable for CI without explicit opt-in. Run with:
MCP_CATALOG_OAUTH_LIVE=1 go test -run TestCatalogOAuthDiscoveryLive \
-v -count=1 -timeout=180s ./pkg/tools/builtin/mcpcatalog
api.grafbase.com/mcp returns 404 on every plausible path (including the bare origin), and the well-known oauth-protected-resource is gone too. The entry has been dead long enough that it is now actively misleading in search results — surfaced by TestCatalogOAuthDiscoveryLive. If grafbase relaunches their MCP server we can re-add it.
When the issuer URL has a path component (e.g. https://access.stripe.com/mcp), RFC 8414 §3.1 specifies that the well-known suffix must be inserted between the origin and the path: https://access.stripe.com/.well-known/oauth-authorization-server/mcp Stripe's authorization server follows the spec; the legacy 'append the well-known suffix to the full issuer URL' form 404s. Until now getAuthorizationServerMetadata only tried the append form (and an OIDC fallback), so for stripe-remote it silently fell through to createDefaultMetadata — leaving the OAuth flow with synthesised endpoints that fail later in handshake. Refactor metadata discovery to walk an ordered candidate list: 1. RFC 8414 §3.1 path-aware oauth-authorization-server (NEW), 2. legacy append-form oauth-authorization-server (preserved for the many widely-deployed servers that ship that way), 3. path-aware openid-configuration (NEW), 4. append-form openid-configuration. A 200 wins; 404s flow through; any other status is a hard error so we don't paper over a misconfigured auth server. URLs already containing '/.well-known/' pass through untouched. Verified by: - new TestMetadataDiscoveryURLs and TestGetAuthorizationServerMetadata_* unit tests covering all four issuer shapes, - TestCatalogOAuthDiscoveryLive (live probe, opt-in) which now reports 16/16 oauth servers passing — including stripe-remote. The live probe duplicates the same candidate ordering on purpose so it remains a black-box audit independent of any future refactor of the discovery code.
The RFC 8414 §3.1 path-aware variant added in e3fed03 is a speculative guess about where an authorization server *might* publish its metadata. Several real-world deployments answer that URL with something other than 404 (e.g. a gateway that returns 403 for unknown well-known prefixes, or a 5xx from an upstream component) while still serving valid metadata at the legacy 'append-to-issuer' URL. Before this change, getAuthorizationServerMetadata short-circuited on the first non-404 status with a hard error, even though further candidates would have succeeded. That regressed every issuer with a path component whose path-aware endpoint doesn't 404. Probe semantics now: - 200 with a decodable body wins immediately. - Non-404 statuses, transport errors and JSON-decode failures are logged and the next candidate is tried. - If at least one candidate produced a non-404 status or a transport-level error and none returned 200, surface the most diagnostic failure so a misconfigured auth server is still visible. - If every candidate 404'd, fall back to default metadata exactly as before (legacy behaviour). Covered by three new tests: - TestGetAuthorizationServerMetadata_NonFatalCandidateStatus: 403 on candidate #1, 200 on candidate docker#2 → success. - TestGetAuthorizationServerMetadata_AllUnreachableSurfacesError: every candidate 500s → error surfaced. - TestGetAuthorizationServerMetadata_All404FallsBackToDefaults: legacy behaviour preserved.
6b42d22 to
974d1d3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
OAuth servers publish their authorization-server metadata at multiple URL candidates depending on their implementation and standards compliance. The RFC 8414 §3.1 path-aware URL — placing
oauth-authorization-serverafter the authority and before the resource path — is well-established but docker-agent's discovery code never tried it, silently falling back to hardcoded default metadata that broke the OAuth handshake.A live audit test (TestCatalogOAuthDiscoveryLive) that probes every OAuth server in the embedded MCP catalog revealed two real findings: Grafbase's MCP endpoint is decommissioned, and Stripe's authorization server publishes its metadata at the RFC 8414 §3.1 path-aware URL. Dropping Grafbase (count 45 → 44) and fixing the Stripe case requires updating the metadata discovery logic to walk an ordered candidate list and try multiple URL patterns before giving up.
The refactored
getAuthorizationServerMetadatafunction now walks: RFC 8414 §3.1 path-awareoauth-authorization-server, legacy "append-to-issuer"oauth-authorization-server(preserved for widely-deployed servers that ship that way), then the OIDC variants of each. A 200 wins; non-200 statuses, transport errors, and decode failures are logged and the next candidate is tried. If all fail, the most diagnostic error is surfaced; if every candidate 404'd, default metadata is returned to preserve legacy behavior.Unit tests in
oauth_test.gocover URL discovery ordering, the RFC 8414 path-aware case, backward compatibility with the append-form, and the probe-and-decide flow. After the fix, all 16 OAuth servers in the MCP catalog pass their structural prerequisites (401+WWW-Authenticate challenge, reachable protected-resource and authorization-server metadata, HTTPS registration_endpoint, and PKCE S256).To re-run the live probe:
MCP_CATALOG_OAUTH_LIVE=1 go test -run TestCatalogOAuthDiscoveryLive -v -count=1 -timeout=180s ./pkg/tools/builtin/mcpcatalog