Skip to content

fix(mcp/oauth): discover RFC 8414 §3.1 path-aware metadata URLs#2877

Open
dgageot wants to merge 4 commits into
docker:mainfrom
dgageot:mcp/oauth-discovery-rfc8414
Open

fix(mcp/oauth): discover RFC 8414 §3.1 path-aware metadata URLs#2877
dgageot wants to merge 4 commits into
docker:mainfrom
dgageot:mcp/oauth-discovery-rfc8414

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented May 22, 2026

OAuth servers publish their authorization-server metadata at multiple URL candidates depending on their implementation and standards compliance. The RFC 8414 §3.1 path-aware URL — placing oauth-authorization-server after the authority and before the resource path — is well-established but docker-agent's discovery code never tried it, silently falling back to hardcoded default metadata that broke the OAuth handshake.

A live audit test (TestCatalogOAuthDiscoveryLive) that probes every OAuth server in the embedded MCP catalog revealed two real findings: Grafbase's MCP endpoint is decommissioned, and Stripe's authorization server publishes its metadata at the RFC 8414 §3.1 path-aware URL. Dropping Grafbase (count 45 → 44) and fixing the Stripe case requires updating the metadata discovery logic to walk an ordered candidate list and try multiple URL patterns before giving up.

The refactored getAuthorizationServerMetadata function now walks: RFC 8414 §3.1 path-aware oauth-authorization-server, legacy "append-to-issuer" oauth-authorization-server (preserved for widely-deployed servers that ship that way), then the OIDC variants of each. A 200 wins; non-200 statuses, transport errors, and decode failures are logged and the next candidate is tried. If all fail, the most diagnostic error is surfaced; if every candidate 404'd, default metadata is returned to preserve legacy behavior.

Unit tests in oauth_test.go cover URL discovery ordering, the RFC 8414 path-aware case, backward compatibility with the append-form, and the probe-and-decide flow. After the fix, all 16 OAuth servers in the MCP catalog pass their structural prerequisites (401+WWW-Authenticate challenge, reachable protected-resource and authorization-server metadata, HTTPS registration_endpoint, and PKCE S256).

To re-run the live probe:

MCP_CATALOG_OAUTH_LIVE=1 go test -run TestCatalogOAuthDiscoveryLive -v -count=1 -timeout=180s ./pkg/tools/builtin/mcpcatalog

@dgageot dgageot requested a review from a team as a code owner May 22, 2026 15:25
docker-agent

This comment was marked as outdated.

docker-agent

This comment was marked as outdated.

@aheritier aheritier added area/mcp MCP protocol, MCP tool servers, integration kind/fix PR fixes a bug (maps to fix: commit prefix) labels May 22, 2026
aheritier

This comment was marked as outdated.

dgageot added 4 commits May 24, 2026 10:30
Adds TestCatalogOAuthDiscoveryLive, a subtest-per-server probe over every
oauth server in the embedded catalog. For each it verifies the structural
prerequisites docker-agent needs:

  - 401 + WWW-Authenticate from the MCP endpoint
  - protected-resource metadata reachable
  - authorization-server metadata reachable (with openid-configuration
    fallback)
  - HTTPS registration_endpoint (Dynamic Client Registration)
  - PKCE S256 advertised

Skipped by default because it makes real HTTPS calls to ~17 third-party
servers and is unsuitable for CI without explicit opt-in. Run with:

    MCP_CATALOG_OAUTH_LIVE=1 go test -run TestCatalogOAuthDiscoveryLive \
        -v -count=1 -timeout=180s ./pkg/tools/builtin/mcpcatalog
api.grafbase.com/mcp returns 404 on every plausible path (including the
bare origin), and the well-known oauth-protected-resource is gone too.
The entry has been dead long enough that it is now actively misleading
in search results — surfaced by TestCatalogOAuthDiscoveryLive.

If grafbase relaunches their MCP server we can re-add it.
When the issuer URL has a path component (e.g. https://access.stripe.com/mcp),
RFC 8414 §3.1 specifies that the well-known suffix must be inserted between
the origin and the path:

    https://access.stripe.com/.well-known/oauth-authorization-server/mcp

Stripe's authorization server follows the spec; the legacy 'append the
well-known suffix to the full issuer URL' form 404s. Until now
getAuthorizationServerMetadata only tried the append form (and an OIDC
fallback), so for stripe-remote it silently fell through to
createDefaultMetadata — leaving the OAuth flow with synthesised endpoints
that fail later in handshake.

Refactor metadata discovery to walk an ordered candidate list:

  1. RFC 8414 §3.1 path-aware oauth-authorization-server (NEW),
  2. legacy append-form oauth-authorization-server (preserved for the
     many widely-deployed servers that ship that way),
  3. path-aware openid-configuration (NEW),
  4. append-form openid-configuration.

A 200 wins; 404s flow through; any other status is a hard error so we
don't paper over a misconfigured auth server. URLs already containing
'/.well-known/' pass through untouched.

Verified by:

  - new TestMetadataDiscoveryURLs and TestGetAuthorizationServerMetadata_*
    unit tests covering all four issuer shapes,
  - TestCatalogOAuthDiscoveryLive (live probe, opt-in) which now reports
    16/16 oauth servers passing — including stripe-remote.

The live probe duplicates the same candidate ordering on purpose so it
remains a black-box audit independent of any future refactor of the
discovery code.
The RFC 8414 §3.1 path-aware variant added in e3fed03 is a speculative
guess about where an authorization server *might* publish its metadata.
Several real-world deployments answer that URL with something other
than 404 (e.g. a gateway that returns 403 for unknown well-known
prefixes, or a 5xx from an upstream component) while still serving
valid metadata at the legacy 'append-to-issuer' URL.

Before this change, getAuthorizationServerMetadata short-circuited on
the first non-404 status with a hard error, even though further
candidates would have succeeded. That regressed every issuer with a
path component whose path-aware endpoint doesn't 404.

Probe semantics now:

  - 200 with a decodable body wins immediately.
  - Non-404 statuses, transport errors and JSON-decode failures are
    logged and the next candidate is tried.
  - If at least one candidate produced a non-404 status or a
    transport-level error and none returned 200, surface the most
    diagnostic failure so a misconfigured auth server is still visible.
  - If every candidate 404'd, fall back to default metadata exactly as
    before (legacy behaviour).

Covered by three new tests:

  - TestGetAuthorizationServerMetadata_NonFatalCandidateStatus: 403 on
    candidate #1, 200 on candidate docker#2 → success.
  - TestGetAuthorizationServerMetadata_AllUnreachableSurfacesError: every
    candidate 500s → error surfaced.
  - TestGetAuthorizationServerMetadata_All404FallsBackToDefaults:
    legacy behaviour preserved.
@dgageot dgageot force-pushed the mcp/oauth-discovery-rfc8414 branch from 6b42d22 to 974d1d3 Compare May 24, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mcp MCP protocol, MCP tool servers, integration kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants