Skip to content

Add mTLS app-to-app routing support (RFC draft)#535

Draft
rkoster wants to merge 44 commits intodevelopfrom
feature/app-to-app-mtls-routing
Draft

Add mTLS app-to-app routing support (RFC draft)#535
rkoster wants to merge 44 commits intodevelopfrom
feature/app-to-app-mtls-routing

Conversation

@rkoster
Copy link
Copy Markdown

@rkoster rkoster commented Mar 5, 2026

Summary

Implements Phase 1 (1a + 1b) of the App-to-App mTLS Routing RFC.

Note: This PR is a draft because the RFC for App-to-App mTLS Routing has not been approved yet.

Phase 1a: mTLS Infrastructure

  • Per-domain TLS configuration via GetConfigForClient callback
  • Domain-aware client certificate validation
  • XFCC header handling (sanitize_set mode) for mTLS domains
  • Configurable XFCC format: raw (base64 cert) or envoy (compact hash+subject)
  • BOSH job properties for router.mtls_domains

Phase 1b: Authorization

  • Identity extraction from Diego instance identity certificates
  • Authorization handler enforcing mTLS authorization rules
  • RFC-0027 compliant flat options: mtls_allowed_apps, mtls_allowed_spaces, mtls_allowed_orgs (comma-separated GUIDs), mtls_allow_any (boolean)
  • Route-registrar support for mTLS route options
  • RTR access logs emitted for denied requests (401/403)

Testing

  • Unit tests for all new handlers
  • Integration tests for end-to-end mTLS routing
  • BOSH template tests for configuration

Key Files Changed

GoRouter:

  • src/code.cloudfoundry.org/gorouter/config/config.go - MtlsDomainConfig struct
  • src/code.cloudfoundry.org/gorouter/router/router.go - GetConfigForClient callback
  • src/code.cloudfoundry.org/gorouter/handlers/clientcert.go - Domain-aware XFCC
  • src/code.cloudfoundry.org/gorouter/handlers/identity.go - XFCC parsing
  • src/code.cloudfoundry.org/gorouter/handlers/mtls_authorization.go - Authorization handler
  • src/code.cloudfoundry.org/gorouter/mbus/subscriber.go - Route message parsing
  • src/code.cloudfoundry.org/gorouter/route/pool.go - AllowedSources storage
  • src/code.cloudfoundry.org/gorouter/proxy/proxy.go - Handler wiring

Route Registrar:

  • src/code.cloudfoundry.org/route-registrar/config/config.go - AllowedSources in Options
  • src/code.cloudfoundry.org/route-registrar/messagebus/messagebus.go - NATS message format

BOSH:

  • jobs/gorouter/spec - router.mtls_domains property
  • jobs/gorouter/templates/gorouter.yml.erb - Template configuration

Configuration Example

# BOSH manifest
router:
  mtls_domains:
  - domain: "*.apps.mtls.internal"
    ca_certs: "((diego_instance_identity_ca.certificate))"
    forwarded_client_cert: sanitize_set

# Route registration
routes:
- name: my-api
  uris: ["my-api.apps.mtls.internal"]
  options:
    mtls_allowed_apps: "frontend-app-guid"
    mtls_allowed_spaces: "trusted-space-guid"

Related PRs

@rkoster
Copy link
Copy Markdown
Author

rkoster commented Apr 16, 2026

Latest Update: RFC-Compliant Post-Selection Authorization

Implemented breaking change to replace pre-selection authorization with strict post-selection enforcement per RFC lines 475-517.

Key Changes (commit cbf0695)

Architecture:

  • ✅ Composable PostSelectionHandler interface for middleware pipeline
  • ✅ Separation of pre-selection checks (SNI, route lookup, identity) from post-selection authorization
  • ✅ Immediate 403 on authorization failure (non-retriable, per RFC)
  • ✅ Post-selection scope checking with :post-selection suffix in metrics

Implementation:

  • handlers/post_selection_pipeline.go - Infrastructure for composable checks
  • handlers/mtls_scope_auth.go - Org/space boundary enforcement
  • handlers/mtls_access_rules_auth.go - Access rules evaluation (cf:app:, cf:space:, etc.)
  • handlers/mtls_pre_auth.go - Pre-selection checks only
  • handlers/mtls_auth_error.go - Custom error type with Rule/Reason/HTTPStatus

Test Coverage:

  • +44 new tests (14 scope + 17 access rules + 13 pipeline)
  • +4 integration tests for shared route scenarios
  • All 393 tests passing

RFC Compliance

Intermittent 403s - Expected for shared routes across scope boundaries (RFC-compliant)
Error messages - Include "caller org X does not match selected backend org Y"
Strict enforcement - Prevents unauthorized cross-scope access

Breaking Change

⚠️ This replaces the permissive pre-selection authorization entirely. No feature flag provided as this is a security improvement required by the RFC.

Deprecated:

  • handlers/mtls_authorization.go (old implementation with migration notes)
  • route/pool.go EndpointOrgIDs/SpaceIDs methods

Integration Test Results

All integration tests compile successfully. Shared route scenarios validate:

  • Intermittent 403s with scope=space (different spaces in same org)
  • Always succeed with scope=org (same org, different spaces)
  • Always fail with scope=org (different orgs)
  • Per-endpoint access rules with intermittent behavior

Ready for full integration test run and review.

@rkoster
Copy link
Copy Markdown
Author

rkoster commented Apr 16, 2026

Refactoring: AuthError for Future Extensibility

Commit: 4ff64b9

Renamed MtlsAuthError to AuthError to prepare for future authentication methods beyond mTLS, such as SPIFFE JWT tokens.

Changes

  • ✅ Renamed handlers/mtls_auth_error.gohandlers/auth_error.go
  • ✅ Updated struct, constructor functions, and all references
  • ✅ Changed error messages from "mTLS authorization denied" to "authorization denied"
  • ✅ Updated all test files

Benefits

  • 🔮 Future-proof: Ready for SPIFFE JWT token authentication
  • 🏗️ Generic design: Error type not tied to specific auth mechanism
  • 🧩 Reusable: Can be used across different authentication methods
  • Clean: Better naming convention for authorization errors

No functional changes - pure refactoring for extensibility.

@rkoster rkoster force-pushed the feature/app-to-app-mtls-routing branch 3 times, most recently from 1f9b804 to 79271b7 Compare April 17, 2026 12:12
rkoster added 8 commits April 20, 2026 09:16
- Add MtlsDomainConfig struct with domain-specific CA pool and forwarding modes
- Add MtlsDomains field and mtlsDomainMap for fast domain lookups
- Implement GetMtlsDomainConfig() for exact and wildcard domain matching
- Add processMtlsDomains() to validate and build CA pools per domain
- Support wildcard domains like *.apps.internal

This enables GoRouter to enforce different mTLS policies per domain.
- Add getTLSConfigForClient() callback to dynamically select TLS config
- For mTLS domains: require and verify client certs with domain CA pool
- For regular domains: use base TLS configuration
- Use SNI to determine which domain configuration to apply

This allows GoRouter to enforce client certificate validation only on
designated mTLS domains while leaving other domains unchanged.
- Add config parameter to NewClientCert() and clientCert struct
- Check if request is for an mTLS domain using GetMtlsDomainConfig()
- Use domain-specific ForwardedClientCert mode when applicable
- Update all call sites and tests to pass config

This allows different XFCC header handling policies per domain,
supporting both legacy and mTLS-secured routes simultaneously.
- Add router.mtls_domains property to gorouter job spec
- Implement ERB template validation and processing logic
- Validate required fields (domain, ca_certs) and optional forwarded_client_cert
- Support both wildcard (*.apps.internal) and exact domain matching

This completes Phase 1a, enabling operators to configure per-domain
mTLS policies via BOSH deployment manifests.
- Add AllowedSources struct to subscriber.go with app_guids list
- Add AllowedSources field to RegistryMessage for NATS route messages
- Add AllowedSourceAppGUIDs to EndpointOpts and Endpoint structs
- Update Endpoint.Equal() to compare allowed source GUIDs
- Add helper function getAllowedSourceAppGUIDs()

This enables route registrations to specify which source apps are
authorized to access endpoints on mTLS domains. Work in progress
for Phase 1b authorization enforcement.
Ensures the allowed source app GUIDs are properly propagated from
EndpointOpts to the Endpoint instance.
- Create handlers/identity.go with XFCC header parsing logic
- Add CallerIdentity struct containing app GUID
- Add CallerIdentity field to RequestInfo
- Extract app GUID from certificate OU field (format: app:<guid>)
- Parse X-Forwarded-Client-Cert header with PEM certificate

The identity handler extracts the calling application's identity from
the client certificate, enabling authorization checks in downstream
handlers.
- Create handlers/mtls_authorization.go to enforce authorization
- Check if request is on an mTLS domain using config.IsMtlsDomain()
- Verify endpoint has AllowedSourceAppGUIDs configured
- Match caller identity app GUID against allowed sources list
- Return 403 Forbidden if caller not in allowed list
- Return 401 Unauthorized if no caller identity present
- Skip authorization for non-mTLS domains

This handler ensures only explicitly authorized apps can communicate
on mTLS-secured domains, completing the authorization enforcement
layer for Phase 1b.
rkoster added 22 commits April 20, 2026 09:16
Set RouteEndpoint on RequestInfo before returning 401/403 responses
so that access logs are emitted to the target app's log stream.

This allows operators to see denied requests in 'cf logs <app>'
for the backend app, which is essential for debugging authorization
issues in mTLS app-to-app communication.
RFC-0027 requires options values to be only strings, numbers, or
booleans - not nested objects/arrays. Updated:

- RegistryMessageOpts: Use flat fields (mtls_allowed_apps,
  mtls_allowed_spaces, mtls_allowed_orgs, mtls_allow_any) with
  comma-separated GUIDs instead of nested MtlsAllowedSources struct

- parseCommaSeparatedGUIDs(): New helper to split comma-separated
  GUID strings into slices

- getEffectiveMtlsAllowedSources(): Parse flat options from Options
  struct while maintaining top-level MtlsAllowedSources precedence
  for route-registrar compatibility

- Tests: Updated to verify flat options parsing
Replace MtlsAllowedSources model with AccessScope/AccessRules selectors,
add per-connection TLS state tracking via ConnContext, implement two-layer
RFC authorization handler (SNI/Host 421 check + scope/rules enforcement),
emit mTLS fields in RTR access logs, and rename router.mtls_domains to
router.domains in BOSH config.
…tion

BREAKING CHANGE: Replace pre-selection authorization with post-selection
enforcement for strict org/space scope and access rules checking.

This implementation follows the Cloud Foundry RFC for App-to-App mTLS
Routing (lines 475-517) which mandates post-selection authorization to
ensure proper scope boundary enforcement.

Changes:
- Add composable PostSelectionHandler interface for middleware pipeline
- Implement post-selection scope checking (org/space boundaries)
- Implement post-selection access rules evaluation (cf:app:, cf:space:, etc.)
- Separate pre-selection checks (SNI, route lookup, identity extraction)
- Return 403 immediately on authorization failure (non-retriable)
- Add MtlsAuthError type with Rule/Reason/HTTPStatus fields
- Deprecate old pre-selection authorization handlers
- Add :post-selection suffix to MtlsRule values for observability

Test Coverage:
- 14 new tests for scope authorization
- 17 new tests for access rules authorization
- 13 new tests for post-selection pipeline infrastructure
- 4 new integration tests for shared route scenarios
- All 393 tests passing (349 existing + 44 new)

RFC Compliance:
- Intermittent 403s expected for shared routes across scope boundaries
- Error messages include 'caller org X does not match selected backend org Y'
- Strict enforcement prevents unauthorized cross-scope access

Migration:
- handlers/mtls_authorization.go is deprecated with migration notes
- route/pool.go EndpointOrgIDs/SpaceIDs methods deprecated
- No feature flag - this is a breaking security improvement
Rename MtlsAuthError to AuthError to prepare for future authentication
methods beyond mTLS, such as SPIFFE JWT tokens. This makes the error
type generic and reusable across different authentication mechanisms.

Changes:
- Rename handlers/mtls_auth_error.go to handlers/auth_error.go
- Rename MtlsAuthError struct to AuthError
- Rename NewMtlsAuthError to NewAuthError
- Rename NewMtlsAuthErrorWithStatus to NewAuthErrorWithStatus
- Update error message from "mTLS authorization denied" to "authorization denied"
- Update all references in handlers and proxy round tripper
- Update test files with new type names

No functional changes - this is purely a refactoring for better naming
and future extensibility.
Extract three shared helper functions from the deprecated mtls_authorization.go
handler into a dedicated mtls_helpers.go file:

- domainMatches(): checks if hostname matches domain pattern (wildcards)
- setRouteEndpointForAccessLog(): sets endpoint for access logs pre-selection
- evaluateAccessRules(): evaluates RFC access rules (cf:app:, cf:space:, etc.)

These helpers are used by both the deprecated pre-selection handler and the new
RFC-compliant post-selection handlers (MtlsScopeAuth, MtlsAccessRulesAuth).

This improves code organization by separating reusable utilities from the
deprecated handler implementation, making it easier to eventually remove the
deprecated code while preserving the shared logic.

No functional changes - purely a refactoring for better code structure.
Remove the old pre-selection mtls_authorization handler and its tests.
This handler was part of the initial iteration and has been superseded by:

- MtlsPreAuth: pre-selection checks (SNI, route lookup, identity)
- MtlsScopeAuth: post-selection scope enforcement (RFC-compliant)
- MtlsAccessRulesAuth: post-selection access rules evaluation

The shared helper functions (domainMatches, setRouteEndpointForAccessLog,
evaluateAccessRules) have been preserved in mtls_helpers.go.

This cleanup removes ~350 lines of deprecated code and 33 obsolete tests,
leaving only the production implementation.
Move helper functions from mtls_helpers.go directly into the handlers
that use them, eliminating the unnecessary shared helpers file.

Changes:
- Move domainMatches() and setRouteEndpointForAccessLog() to mtls_pre_auth.go
- Move evaluateAccessRules() to mtls_access_rules_auth.go
- Delete mtls_helpers.go (no longer needed)

Each helper function is now co-located with its single caller, improving
code organization and maintainability. All 360 handler tests passing.
Replace mTLS-specific field names with auth-method-neutral naming to
align with the RFC's identity-aware routing positioning. This prepares
the codebase for future authentication mechanisms (e.g., JWT tokens)
while maintaining the current mTLS implementation.

Structural improvements:
- Introduce AuthResult struct to group authorization outcome fields
- Remove redundant CallerApp/Space/Org flat fields from RequestInfo
  (access log now reads directly from CallerIdentity)
- Replace 6 flat fields on RequestInfo with 2 struct pointers

Naming changes:
- RequestInfo.MtlsAuth/Rule/DeniedReason → AuthResult.Outcome/Rule/DeniedReason
- AccessLogRecord.MtlsAuth/Rule/DeniedReason → AuthOutcome/AuthRule/AuthDeniedReason
- RTR log fields: mtls_auth → auth:, mtls_rule → auth_rule:,
  mtls_denied_reason → auth_denied_reason:

User-facing impact:
- RTR log field names changed from mtls_* to auth_* (acceptable since
  RFC code not yet released)
- Functionally equivalent - same authorization logic, clearer naming

All 360 handler tests passing.
- Rename mtls_app_to_app_test.go -> identity_aware_routing_test.go
- Update main test suite: 'App-to-App mTLS Routing' -> 'Identity-Aware Routing'
- Rename registerWithAllowedSources() -> registerWithAccessRules()
- Rename registerWithScopeAndAllowedSources() -> registerWithScopeAndAccessRules()
- Update parameter names: allowedSources -> accessRules
- Update test descriptions: 'mtls_allowed_sources' -> 'access rules'
- Update comments to reflect access-rules terminology

This aligns with the RFC's positioning of the feature as 'identity-aware
routing' with access rules, rather than mTLS-specific 'allowed sources'.
…ction auth

Remove EndpointOrgIDs() and EndpointSpaceIDs() which collected org/space
IDs from all endpoints in a pool. These were used by the deprecated
pre-selection authorization approach.

Post-selection authorization (RFC-compliant) checks org/space boundaries
against the SELECTED endpoint's tags, not against all endpoints in the pool,
making these aggregation methods unnecessary.
Route-registrar is used by BOSH-deployed system components (CC, UAA, etc.)
to register their routes. These system components:
- Don't have CF app identities (no Diego instance identity certs)
- Don't use mTLS domains with access control enforcement
- Are out of scope for the app-to-app identity-aware routing RFC

Only Cloud Controller → Diego → NATS should send access_scope/access_rules
for actual CF app routes. Route-registrar doesn't need these fields.
Devbox is used for local development environment setup but should not
be tracked in the repository. The files remain in the working directory
for developers who use devbox.
routing-api is a local submodule in src/code.cloudfoundry.org/routing-api,
not an external dependency. It should not be listed in go.mod.

This was incorrectly added during rebase conflict resolution.
- Export MakeEndpoint method for test access
- Fix test calls to MakeEndpoint with correct parameters
- Remove unnecessary fmt.Sprintf for string argument
This commit fixes all failing integration tests for the identity-aware
routing feature by addressing three critical issues:

1. mTLS client certificate trust chain issue
   - Tests were creating instance identity certs with a different CA than
     the one configured in GoRouter's mTLS domain settings
   - Added CreateInstanceIdentityCertWithCA() helper that accepts an
     existing CA to ensure proper trust chain
   - Updated all test cases to use the shared mtlsDomainCA

2. Authorization errors returning HTTP 502 instead of 403
   - Added custom ErrorHandler to httputil.ReverseProxy that checks for
     AuthError and returns the appropriate HTTP status code (403)
   - Previously all transport errors defaulted to 502 Bad Gateway

3. Per-endpoint access rules not working correctly
   - Authorization handler was checking pool-level access rules (first
     endpoint only) instead of the selected endpoint's rules
   - Changed to use endpoint.AccessRules to support different backends
     with different authorization requirements on the same route

4. Default-deny not enforced for routes without access rules
   - Changed enforcement logic to apply to all requests with CallerIdentity,
     regardless of AccessScope setting

5. SNI/Host header mismatch in test requests
   - Added newMtlsGetRequest() helper with custom DialTLSContext that
     connects to 127.0.0.1 while preserving hostname for TLS SNI
   - Updated all identity-aware routing tests to use this helper

Test results: 20/20 integration tests passing, 17/17 unit tests passing
@rkoster rkoster force-pushed the feature/app-to-app-mtls-routing branch from 5cc4170 to b875867 Compare April 20, 2026 09:18
rkoster added 2 commits April 20, 2026 09:36
This prevents the subscriber's ClosedCB from firing log.Fatal when
NATS is stopped first, which was causing the test process to exit
prematurely and leading to port binding conflicts in parallel test
runs.

The cleanup order is now:
1. Terminate gorouter session
2. Stop NATS server
3. Clean up test files

This matches the fix from upstream PR #555 (commit b2bf830) which
resolved similar issues in router/router_test.go.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant