| title | NATS Configuration | ||
|---|---|---|---|
| expires_at | never | ||
| tags |
|
Note
This section applies to non-TLS routes. By default, application endpoints are reached via TLS. Route pruning is not enabled on TLS routes.
In the context of Cloud Foundry, when an application instance crashes or is stopped (either intentionally or through scaling down), its allocated IP and port are released to the pool. The same IP and port may then be assigned to a new instance of another application, as when a new app is started, scaled up, or a crashed instance is recreated. Under normal operation, each of these events will result in a change to Gorouter's routing table. Updates to the routing table depend on a message being sent by a client to NATS (e.g. Route Emitter is responsible for sending changes to routing data for apps running on Diego), and on Gorouter fetching the message from NATS.
If Gorouter loses its connection to NATS, it will attempt to reconnect to all servers in the NATS cluster. If it is unable to reconnect to any NATS server, and so is unable to receive changes to the routing table, connections for one application may be routed to an unintended one. These are called "stale routes," or the routing table is said to be "stale."
To prevent stale routes, Gorouter is by default optimized for consistency over availability. Each route has a TTL of 120 seconds (see droplet_stale_threshold), and clients are responsible for heartbeating registration of their routes. Each time Gorouter receives a heartbeat for a route, the TTL is reset. If Gorouter does not receive a heartbeat within the TTL, the route is pruned from the routing table. If all backends for a route are pruned, Gorouter will respond with a 404 to requests for the route. If Gorouter can't reach NATS, then all routes are pruned and Gorouter will respond with a 404 to all requests. This constitutes a total application outage.
If an operator prefers to favor availability over consistency, the configuration property suspend_pruning_if_nats_unavailable can be used to ignore route TTL and prevent pruning in the event that Gorouter cannot connect to NATS. This config option will also set max reconnect in the NATS client to -1 (no limit) which prevents Gorouter from crashing and losing its in-memory routing table. This configuration option is set to false by default.
Warning
There is a significant probability of routing to an incorrect backend endpoint in the case of port re-use. Suspending route pruning should be used with caution.
DropletStaleThreshold: Time after which Gorouter considers the route
information as stale.
NATS PingInterval: Interval configured by the NATS client to
ping configured NATS servers.
MinimumRegistrationInterval: Expected interval for
Gorouter clients to send the routing info. (e.g., Route
Registrar)
In a deployment with multiple NATS servers, if one of the servers becomes unhealthy, Gorouter should fail over to a healthy server (if any available) before DropletStaleThreshold value is reached to avoid pruning routes. Ping interval for NATS clients is calculated by the following equation:
PingInterval = (DropletStaleThreshold - (StartResponseDelayInterval +
minimumRegistrationInterval) - (NATS Timeout * NumberOfNatsServers))/3
(StartResponseDelayInterval + minimumRegistrationInterval): This part accounts for the startup
delay before Gorouter accepts requests, plus the registration interval from Gorouter clients.
(NATS Timeout * NumberOfNatsServers): This part accounts for the number of configured NATS servers.
The default connection timeout for NATS clients is 2 seconds.
Currently, operators cannot set the value for
DropletStaleThreshold and StartResponseDelayInterval, so there is no practical
need for the above equation to calculate the ping interval. After careful
consideration of different scenarios, the interval is set to 20 seconds.