diff --git a/content/consul/v1.21.x/content/docs/manage/scale/autopilot.mdx b/content/consul/v1.21.x/content/docs/manage/scale/autopilot.mdx index da8cb1e113..3e26460241 100644 --- a/content/consul/v1.21.x/content/docs/manage/scale/autopilot.mdx +++ b/content/consul/v1.21.x/content/docs/manage/scale/autopilot.mdx @@ -1,51 +1,27 @@ --- layout: docs -page_title: Consul autopilot +page_title: Consul Autopilot description: >- Use Autopilot features to monitor the Raft cluster, introduce stable servers, and clean up dead servers. --- -# Consul autopilot +# Consul Autopilot -This page describes Consul autopilot, which supports automatic, operator-friendly management of Consul -servers. It includes cleanup of dead servers, monitoring the state of the Raft -cluster, and stable server introduction. +This page describes Consul Autopilot, a set of features that provide operator-friendly management automations for Consul servers. -To use autopilot features (with the exception of dead server cleanup), the -[`raft_protocol`](/consul/docs/reference/agent/configuration-file/raft#raft_protocol) -setting in the Consul agent configuration must be set to 3 or higher on all -servers. In Consul `0.8` this setting defaults to 2; in Consul `1.0` it will -default to 3. For more information, check the [Version Upgrade -section](/consul/docs/upgrade/version-specific) on Raft protocol -versions in Consul `1.0`. +## Overview -In this tutorial you will learn how Consul tracks the stability of servers, how -to tune those conditions, and get some details on the other autopilot's features. +Consul autopilot helps you maintain the health and stability of the Consul server cluster. It includes the following features: -- Server Stabilization -- Dead server cleanup -- Redundancy zones (only available in Consul Enterprise) -- Automated upgrades (only available in Consul Enterprise) - -Note, in this tutorial we are using examples from a Consul `1.7` datacenter, we -are starting with Autopilot enabled by default. +- [Server health checking](#server-health-checking) +- [Server stabilization time](#server-stabilization-time) +- [Dead server cleanup](#dead-server-cleanup) +- [Redundancy zones (only available in Consul Enterprise)](#redundancy-zones) +- [Automated upgrades (only available in Consul Enterprise)](#automated-upgrades) ## Default configuration -The configuration of Autopilot is loaded by the leader from the agent's -[autopilot settings](/consul/docs/reference/agent/configuration-file/general#autopilot) -when initially bootstrapping the datacenter. Since autopilot and its features -are already enabled, you only need to update the configuration to disable them. - -All Consul servers should have Autopilot and its features either enabled or -disabled to ensure consistency across servers in case of a failure. -Additionally, Autopilot must be enabled to use any of the features, but the -features themselves can be configured independently. Meaning you can enable or -disable any of the features separately, at any time. - -You can check the default values using the `consul operator` CLI command or -using the [`/v1/operator/autopilot` -endpoint](/consul/api-docs/operator/autopilot) +To check the default autopilot values, use the `consul operator` CLI command or the [`/v1/operator/autopilot` endpoint](/consul/api-docs/operator/autopilot). @@ -90,36 +66,31 @@ $ curl http://127.0.0.1:8500/v1/operator/autopilot/configuration -### Autopilot and Consul snapshots +The following table lists autopilot configuration parameters, their descriptions, and their default values: -Changes to the autopilot configuration are persisted in the Raft database -maintained by the Consul servers. This means that autopilot configuration will -be included in the Consul snapshot data. Any snapshot taken prior to autopilot -configuration changes will contain the old configuration, and should be -considered unsafe to restore since they will remove the change and cause -unpredictable behaviors for the automations that might rely on the new -configuration. +| Autopilot setting | Type | Default value | Description | +| :------------------------ | :------- | :------------ | :-------------------------------------------------------------------------------------------------------------------------------- | +| `CleanupDeadServers` | Boolean | `true` | Enables periodic dead server removal from the Raft peer set. | +| `LastContactThreshold` | Duration | `200ms` | The interval that can elapse between a server's last contact with the current leader before Consul considers it unhealthy. | +| `MaxTrailingLogs` | Integer | `250` | Maximum number of Raft log entries that a server can trail the leader by and still be considered healthy. | +| `MinQuorum` | Integer | `0` | Minimum number of healthy voting servers required to maintain quorum in the datacenter. | +| `ServerStabilizationTime` | Duration | `10s` | Time duration that a new server must remain healthy before it can become a voting member. | +| `RedundancyZoneTag` | String | `""` | Tag name used to identify redundancy zones for servers in Consul Enterprise. | +| `DisableUpgradeMigration` | Boolean | `false` | Flag to disable automatic upgrade migrations in Consul Enterprise. | +| `UpgradeVersionTag` | String | `""` | Tag name used to identify server versions for automated upgrades in Consul Enterprise. | -We recommend that you take a snapshot after any changes to the autopilot -configuration, and consider that as the last safe point in time to roll-back in -case a restore is needed. +Consul servers maintain changes to the autopilot configuration in the Raft database. As a result, autopilot configurations are included in the Consul snapshot data. ## Server health checking -An internal health check runs on the leader to track the stability of servers. - -A server is considered healthy if all of the following conditions are true. +An internal health check runs on the leader to track the stability of servers. A server is considered healthy if all of the following conditions are true. - It has a SerfHealth status of 'Alive'. -- The time since its last contact with the current leader is below - `LastContactThreshold` (that by default is `200ms`). +- The time since its last contact with the current leader is below `LastContactThreshold`. The default value is `200ms`. - Its latest Raft term matches the leader's term. -- The number of Raft log entries it trails the leader by does not exceed - `MaxTrailingLogs` (that by default is `250`). +- The number of Raft log entries it trails the leader by does not exceed `MaxTrailingLogs`. The default value is `250`. -The status of these health checks can be viewed through the -`/v1/operator/autopilot/health` HTTP endpoint, with a top level `Healthy` field -indicating the overall status of the datacenter: +To return the status of these health checks, use the `/v1/operator/autopilot/health` HTTP endpoint. The `Healthy` field at the top indicates the overall status of the datacenter: ```shell-session $ curl localhost:8500/v1/operator/autopilot/health | jq . @@ -172,21 +143,12 @@ $ curl localhost:8500/v1/operator/autopilot/health | jq . ## Server stabilization time -When a new server is added to the datacenter, there is a waiting period where it -must be healthy and stable for a certain amount of time before being promoted to -a full, voting member. This is defined by the `ServerStabilizationTime` -autopilot's parameter and by default is 10 seconds. +When a new server joins the datacenter, there is an initial waiting period where it must stay healthy and stable before it can become a voting member. This duration is configured by the `ServerStabilizationTime` parameter. By default it is 10 seconds. -In case your configuration require a different amount of time for the node to -get ready, for example in case you have some extra VM checks at startup that -might affect node resource availability, you can tune the parameter and assign -it a different duration. +If you need a different amount of time, you can tune the parameter to set a different duration. The following example extends the waiting period to 15 seconds: ```shell-session $ consul operator autopilot set-config -server-stabilization-time=15s -``` - -```plaintext hideClipboard Configuration updated! ``` @@ -194,9 +156,6 @@ Use the `get-config` command to check the configuration. ```shell-session $ consul operator autopilot get-config -``` - -```plaintext hideClipboard CleanupDeadServers = true LastContactThreshold = 200ms MaxTrailingLogs = 250 @@ -207,92 +166,34 @@ DisableUpgradeMigration = false UpgradeVersionTag = "" ``` -## Dead server cleanup - -If autopilot is disabled, it will take 72 hours for dead servers to be -automatically reaped or an operator must write a script to `consul force-leave`. -If another server failure occurred it could jeopardize the quorum, even if the -failed Consul server had been automatically replaced. Autopilot helps prevent -these kinds of outages by quickly removing failed servers as soon as a -replacement Consul server comes online. When servers are removed by the cleanup -process they will enter the "left" state. - -With Autopilot's dead server cleanup enabled, dead servers will periodically be -cleaned up and removed from the Raft peer set to prevent them from interfering -with the quorum size and leader elections. The cleanup process will also be -automatically triggered whenever a new server is successfully added to the -datacenter. - -We suggest leaving the feature enabled to avoid introducing manual steps in -the Consul management to make sure the faulty nodes are not remaining in the -Raft pool for too long without the need for manual pruning. In test scenarios or -in environments where you want to delegate the faulty node pruning to an -external tool or system you can disable the dead server cleanup feature using -the `consul operator` command. - -```shell-session -$ consul operator autopilot set-config -cleanup-dead-servers=false -``` - -```plaintext hideClipboard -Configuration updated! -``` +### Dead server cleanup -Use the `get-config` command to check the configuration. +When autopilot is disabled, it takes 72 hours for Consul to automatically reap dead servers. The alternative would be for an operator to manually issue the `consul force-leave ` command for each dead server. -```shell-session -$ consul operator autopilot get-config -``` +In this situation, another server failure could jeopardize the cluster's quorum. The Consul cluster still considers the missing server a member of the datacenter, even if the failed Consul server was automatically replaced. -```plaintext hideClipboard -CleanupDeadServers = false -LastContactThreshold = 200ms -MaxTrailingLogs = 250 -MinQuorum = 0 -ServerStabilizationTime = 10s -RedundancyZoneTag = "" -DisableUpgradeMigration = false -UpgradeVersionTag = "" -``` +Autopilot helps prevent these kinds of outages from becoming outages. It quickly removes failed servers as soon as a replacement Consul server comes online. When servers are removed by the cleanup process, they enter the "left" state and are not considered for the datacenter's quorum. -## Enterprise features +Autopilot also triggers the cleanup process automatically whenever a new server successfully joins the datacenter. -Consul Enterprise customer can take advantage of two more features of autopilot -to further strengthen and automate Consul operations. +We recommend leaving autopilot enabled to avoid issues with faulty nodes that require manual pruning. In test scenarios and dev environments you can disable the faulty node pruning with the `consul operator autopilot set-config -cleanup-dead-servers=false` command. -### Redundancy zones +## Redundancy zones (Enterprise) -Consul’s redundancy zones provide high availability in the case of server -failure through the Enterprise feature of autopilot. Autopilot allows you to add -read replicas to your datacenter that will be promoted to the "voting" status in -case of voting server failure. +Redundancy zones provide high availability in case of server failure. With Consul Enterprise, autopilot helps you create redundancy zones by adding read replicas to your datacenter that will be promoted to the "voting" status if a voting server fails. -You can use this tutorial to implement isolated failure domains such as AWS -Availability Zones (AZ) to obtain redundancy within an AZ without having to -sustain the overhead of a large quorum. +You can set up redundancy zones to implement isolated failure domains. For example, deploying a server and a read replica in each AWS Availability Zones (AZ) provides additional protection against failure within a region. -Check [provide fault tolerance with redundancy zones](/consul/tutorials/operate-consul/redundancy-zones) -to learn more on the functionality. +To learn more, refer to [provide fault tolerance with redundancy zones](/consul/docs/manage/scale/redundancy-zone). -### Automated upgrades +## Automated upgrades (Enterprise) -Consul’s automatic upgrades provide a simplified way to upgrade existing Consul -datacenter. This functionally is provided through the Enterprise feature of -autopilot. Autopilot allows you to add new servers directly to the datacenter -and waits until you have enough servers running the new version to perform a -leadership change and demote the old servers as "non-voters". +Automated upgrades are an Enterprise feature that helps you upgrade existing Consul datacenter. With autopilot, you can add new servers running a new Consul version directly to the datacenter. Then when you have enough servers running the new version, you can perform a leadership change and demote the old servers to "non-voters". -Check [automate upgrades with Consul Enterprise](/consul/tutorials/datacenter-operations/upgrade-automation) -to learn more on the functionality. +To learn more, refer to [automate upgrades with Consul Enterprise](/consul/docs/upgrade/automated). ## Next steps -In this tutorial you got an overview of the autopilot features and got examples -on how and when tune the default values. +To learn more about the autopilot features described on this page, refer to [read replicas](/consul/docs/manage/scale/read-replica) and [redundancy zones](/consul/docs/manage/scale/redundancy-zone). -To learn more about the Autopilot settings you did not configure in this tutorial, -[last_contact_threshold](/consul/docs/reference/agent/configuration-file/general#last_contact_threshold) -and -[max_trailing_logs](/consul/docs/reference/agent/configuration-file/general#max_trailing_logs), -either read the agent configuration documentation or use the help flag with the -operator autopilot `consul operator autopilot set-config -h`. +For agent specifications related to autopilot settings for stability, refer to the [last_contact_threshold](/consul/docs/reference/agent/configuration-file/bootstrap#last_contact_threshold) and [max_trailing_logs](/consul/docs/reference/agent/configuration-file/bootstrap#max_trailing_logs) parameters in the Consul agent configuration documentation. \ No newline at end of file diff --git a/content/consul/v1.21.x/content/docs/manage/scale/index.mdx b/content/consul/v1.21.x/content/docs/manage/scale/index.mdx index 11879cf026..6af498fc2b 100644 --- a/content/consul/v1.21.x/content/docs/manage/scale/index.mdx +++ b/content/consul/v1.21.x/content/docs/manage/scale/index.mdx @@ -77,13 +77,10 @@ Consul server agents are an important part of Consul’s architecture. This sect Consul servers can be deployed on a few different runtimes: -- **HashiCorp Cloud Platform (HCP) Consul (Managed)**. These Consul servers are deployed in a hosted environment managed by HCP. To get started with HCP Consul servers in Kubernetes or VM deployments, refer to the [Deploy HCP Consul tutorial](/consul/tutorials/get-started-hcp/hcp-gs-deploy). - **VMs or bare metal servers (Self-managed)**. To get started with Consul on VMs or bare metal servers, refer to the [Deploy Consul server tutorial](/consul/tutorials/get-started-vms/virtual-machine-gs-deploy). For a full list of configuration options, refer to [Agents Overview](/consul/docs/fundamentals/agent). - **Kubernetes (Self-managed)**. To get started with Consul on Kubernetes, refer to the [Deploy Consul on Kubernetes tutorial](/consul/tutorials/get-started-kubernetes/kubernetes-gs-deploy). - **Other container environments, including Docker, Rancher, and Mesos (Self-managed)**. -@include 'alerts/hcp-dedicated-eol.mdx' - When operating Consul at scale, self-managed VM or bare metal server deployments offer the most flexibility. Some Consul Enterprise features that can enhance fault tolerance and read scalability, such as [redundancy zones](/consul/docs/manage/scale/redundancy-zone) and [read replicas](/consul/docs/manage/scale/read-replica), are not available to server agents on Kubernetes runtimes. To learn more, refer to [Consul Enterprise feature availability by runtime](/consul/docs/enterprise#feature-availability-by-runtime). ### Number of Consul servers @@ -327,7 +324,7 @@ Enterprise customers might also rely on [automated backups](/consul/docs/manage/ We do not recommend automated scaling of Consul server nodes based on load or usage unless it is coupled by some logic that prevents the cluster from losing quorum. -One way to improve your datacenter resiliency and to leverage automatic scaling is to use [read replicas](/consul/docs/manage/scale/read-replica) and [redundancy zones](/consul/docs/manage/scale/redundancy-zone). +One way to improve your datacenter resiliency and to leverage automatic scaling is to use [autopilot](/consul/docs/manage/scale/autopilot), as well as Enterprise features such as [read replicas](/consul/docs/manage/scale/read-replica) and [redundancy zones](/consul/docs/manage/scale/redundancy-zone). These features provide support for read-heavy workload periods without risking the stability of the overall cluster. diff --git a/content/consul/v1.22.x/content/docs/manage/scale/autopilot.mdx b/content/consul/v1.22.x/content/docs/manage/scale/autopilot.mdx index da8cb1e113..f051c79bcf 100644 --- a/content/consul/v1.22.x/content/docs/manage/scale/autopilot.mdx +++ b/content/consul/v1.22.x/content/docs/manage/scale/autopilot.mdx @@ -1,51 +1,27 @@ --- layout: docs -page_title: Consul autopilot +page_title: Consul Autopilot description: >- Use Autopilot features to monitor the Raft cluster, introduce stable servers, and clean up dead servers. --- -# Consul autopilot +# Consul Autopilot -This page describes Consul autopilot, which supports automatic, operator-friendly management of Consul -servers. It includes cleanup of dead servers, monitoring the state of the Raft -cluster, and stable server introduction. +This page describes Consul Autopilot, a set of features that provide operator-friendly management automations for Consul servers. -To use autopilot features (with the exception of dead server cleanup), the -[`raft_protocol`](/consul/docs/reference/agent/configuration-file/raft#raft_protocol) -setting in the Consul agent configuration must be set to 3 or higher on all -servers. In Consul `0.8` this setting defaults to 2; in Consul `1.0` it will -default to 3. For more information, check the [Version Upgrade -section](/consul/docs/upgrade/version-specific) on Raft protocol -versions in Consul `1.0`. +## Overview -In this tutorial you will learn how Consul tracks the stability of servers, how -to tune those conditions, and get some details on the other autopilot's features. +Consul autopilot helps you maintain the health and stability of the Consul server cluster. It includes the following features: -- Server Stabilization -- Dead server cleanup -- Redundancy zones (only available in Consul Enterprise) -- Automated upgrades (only available in Consul Enterprise) - -Note, in this tutorial we are using examples from a Consul `1.7` datacenter, we -are starting with Autopilot enabled by default. +- [Server health checking](#server-health-checking) +- [Server stabilization time](#server-stabilization-time) +- [Dead server cleanup](#dead-server-cleanup) +- [Redundancy zones (only available in Consul Enterprise)](#redundancy-zones) +- [Automated upgrades (only available in Consul Enterprise)](#automated-upgrades) ## Default configuration -The configuration of Autopilot is loaded by the leader from the agent's -[autopilot settings](/consul/docs/reference/agent/configuration-file/general#autopilot) -when initially bootstrapping the datacenter. Since autopilot and its features -are already enabled, you only need to update the configuration to disable them. - -All Consul servers should have Autopilot and its features either enabled or -disabled to ensure consistency across servers in case of a failure. -Additionally, Autopilot must be enabled to use any of the features, but the -features themselves can be configured independently. Meaning you can enable or -disable any of the features separately, at any time. - -You can check the default values using the `consul operator` CLI command or -using the [`/v1/operator/autopilot` -endpoint](/consul/api-docs/operator/autopilot) +To check the default autopilot values, use the `consul operator` CLI command or the [`/v1/operator/autopilot` endpoint](/consul/api-docs/operator/autopilot). @@ -90,36 +66,31 @@ $ curl http://127.0.0.1:8500/v1/operator/autopilot/configuration -### Autopilot and Consul snapshots +The following table lists autopilot configuration parameters, their descriptions, and their default values: -Changes to the autopilot configuration are persisted in the Raft database -maintained by the Consul servers. This means that autopilot configuration will -be included in the Consul snapshot data. Any snapshot taken prior to autopilot -configuration changes will contain the old configuration, and should be -considered unsafe to restore since they will remove the change and cause -unpredictable behaviors for the automations that might rely on the new -configuration. +| Autopilot setting | Type | Default value | Description | +| :------------------------ | :------- | :------------ | :-------------------------------------------------------------------------------------------------------------------------------- | +| `CleanupDeadServers` | Boolean | `true` | Enables periodic dead server removal from the Raft peer set. | +| `LastContactThreshold` | Duration | `200ms` | The interval that can elapse between a server's last contact with the current leader before Consul considers it unhealthy. | +| `MaxTrailingLogs` | Integer | `250` | Maximum number of Raft log entries that a server can trail the leader by and still be considered healthy. | +| `MinQuorum` | Integer | `0` | Minimum number of healthy voting servers required to maintain quorum in the datacenter. | +| `ServerStabilizationTime` | Duration | `10s` | Time duration that a new server must remain healthy before it can become a voting member. | +| `RedundancyZoneTag` | String | `""` | Tag name used to identify redundancy zones for servers in Consul Enterprise. | +| `DisableUpgradeMigration` | Boolean | `false` | Flag to disable automatic upgrade migrations in Consul Enterprise. | +| `UpgradeVersionTag` | String | `""` | Tag name used to identify server versions for automated upgrades in Consul Enterprise. | -We recommend that you take a snapshot after any changes to the autopilot -configuration, and consider that as the last safe point in time to roll-back in -case a restore is needed. +Consul servers maintain changes to the autopilot configuration in the Raft database. As a result, autopilot configurations are included in the Consul snapshot data. ## Server health checking -An internal health check runs on the leader to track the stability of servers. - -A server is considered healthy if all of the following conditions are true. +An internal health check runs on the leader to track the stability of servers. A server is considered healthy if all of the following conditions are true. - It has a SerfHealth status of 'Alive'. -- The time since its last contact with the current leader is below - `LastContactThreshold` (that by default is `200ms`). +- The time since its last contact with the current leader is below `LastContactThreshold`. The default value is `200ms`. - Its latest Raft term matches the leader's term. -- The number of Raft log entries it trails the leader by does not exceed - `MaxTrailingLogs` (that by default is `250`). +- The number of Raft log entries it trails the leader by does not exceed `MaxTrailingLogs`. The default value is `250`. -The status of these health checks can be viewed through the -`/v1/operator/autopilot/health` HTTP endpoint, with a top level `Healthy` field -indicating the overall status of the datacenter: +To return the status of these health checks, use the `/v1/operator/autopilot/health` HTTP endpoint. The `Healthy` field at the top indicates the overall status of the datacenter: ```shell-session $ curl localhost:8500/v1/operator/autopilot/health | jq . @@ -137,7 +108,7 @@ $ curl localhost:8500/v1/operator/autopilot/health | jq . "SerfStatus": "alive", "Version": "1.7.2", "Leader": false, - # ... + # ... "Healthy": true, "Voter": true, # ... @@ -172,21 +143,12 @@ $ curl localhost:8500/v1/operator/autopilot/health | jq . ## Server stabilization time -When a new server is added to the datacenter, there is a waiting period where it -must be healthy and stable for a certain amount of time before being promoted to -a full, voting member. This is defined by the `ServerStabilizationTime` -autopilot's parameter and by default is 10 seconds. +When a new server joins the datacenter, there is an initial waiting period where it must stay healthy and stable before it can become a voting member. This duration is configured by the `ServerStabilizationTime` parameter. By default it is 10 seconds. -In case your configuration require a different amount of time for the node to -get ready, for example in case you have some extra VM checks at startup that -might affect node resource availability, you can tune the parameter and assign -it a different duration. +If you need a different amount of time, you can tune the parameter to set a different duration. The following example extends the waiting period to 15 seconds: ```shell-session $ consul operator autopilot set-config -server-stabilization-time=15s -``` - -```plaintext hideClipboard Configuration updated! ``` @@ -194,9 +156,6 @@ Use the `get-config` command to check the configuration. ```shell-session $ consul operator autopilot get-config -``` - -```plaintext hideClipboard CleanupDeadServers = true LastContactThreshold = 200ms MaxTrailingLogs = 250 @@ -207,92 +166,34 @@ DisableUpgradeMigration = false UpgradeVersionTag = "" ``` -## Dead server cleanup - -If autopilot is disabled, it will take 72 hours for dead servers to be -automatically reaped or an operator must write a script to `consul force-leave`. -If another server failure occurred it could jeopardize the quorum, even if the -failed Consul server had been automatically replaced. Autopilot helps prevent -these kinds of outages by quickly removing failed servers as soon as a -replacement Consul server comes online. When servers are removed by the cleanup -process they will enter the "left" state. - -With Autopilot's dead server cleanup enabled, dead servers will periodically be -cleaned up and removed from the Raft peer set to prevent them from interfering -with the quorum size and leader elections. The cleanup process will also be -automatically triggered whenever a new server is successfully added to the -datacenter. - -We suggest leaving the feature enabled to avoid introducing manual steps in -the Consul management to make sure the faulty nodes are not remaining in the -Raft pool for too long without the need for manual pruning. In test scenarios or -in environments where you want to delegate the faulty node pruning to an -external tool or system you can disable the dead server cleanup feature using -the `consul operator` command. +### Dead server cleanup -```shell-session -$ consul operator autopilot set-config -cleanup-dead-servers=false -``` +When autopilot is disabled, it takes 72 hours for Consul to automatically reap dead servers. The alternative would be for an operator to manually issue the `consul force-leave ` command for each dead server. -```plaintext hideClipboard -Configuration updated! -``` +In this situation, another server failure could jeopardize the cluster's quorum. The Consul cluster still considers the missing server a member of the datacenter, even if the failed Consul server was automatically replaced. -Use the `get-config` command to check the configuration. - -```shell-session -$ consul operator autopilot get-config -``` - -```plaintext hideClipboard -CleanupDeadServers = false -LastContactThreshold = 200ms -MaxTrailingLogs = 250 -MinQuorum = 0 -ServerStabilizationTime = 10s -RedundancyZoneTag = "" -DisableUpgradeMigration = false -UpgradeVersionTag = "" -``` +Autopilot helps prevent these kinds of outages from becoming outages. It quickly removes failed servers as soon as a replacement Consul server comes online. When servers are removed by the cleanup process, they enter the "left" state and are not considered for the datacenter's quorum. -## Enterprise features +Autopilot also triggers the cleanup process automatically whenever a new server successfully joins the datacenter. -Consul Enterprise customer can take advantage of two more features of autopilot -to further strengthen and automate Consul operations. +We recommend leaving autopilot enabled to avoid issues with faulty nodes that require manual pruning. In test scenarios and dev environments you can disable the faulty node pruning with the `consul operator autopilot set-config -cleanup-dead-servers=false` command. -### Redundancy zones +## Redundancy zones (Enterprise) -Consul’s redundancy zones provide high availability in the case of server -failure through the Enterprise feature of autopilot. Autopilot allows you to add -read replicas to your datacenter that will be promoted to the "voting" status in -case of voting server failure. +Redundancy zones provide high availability in case of server failure. With Consul Enterprise, autopilot helps you create redundancy zones by adding read replicas to your datacenter that will be promoted to the "voting" status if a voting server fails. -You can use this tutorial to implement isolated failure domains such as AWS -Availability Zones (AZ) to obtain redundancy within an AZ without having to -sustain the overhead of a large quorum. +You can set up redundancy zones to implement isolated failure domains. For example, deploying a server and a read replica in each AWS Availability Zones (AZ) provides additional protection against failure within a region. -Check [provide fault tolerance with redundancy zones](/consul/tutorials/operate-consul/redundancy-zones) -to learn more on the functionality. +To learn more, refer to [provide fault tolerance with redundancy zones](/consul/docs/manage/scale/redundancy-zone). -### Automated upgrades +## Automated upgrades (Enterprise) -Consul’s automatic upgrades provide a simplified way to upgrade existing Consul -datacenter. This functionally is provided through the Enterprise feature of -autopilot. Autopilot allows you to add new servers directly to the datacenter -and waits until you have enough servers running the new version to perform a -leadership change and demote the old servers as "non-voters". +Automated upgrades are an Enterprise feature that helps you upgrade existing Consul datacenter. With autopilot, you can add new servers running a new Consul version directly to the datacenter. Then when you have enough servers running the new version, you can perform a leadership change and demote the old servers to "non-voters". -Check [automate upgrades with Consul Enterprise](/consul/tutorials/datacenter-operations/upgrade-automation) -to learn more on the functionality. +To learn more, refer to [automate upgrades with Consul Enterprise](/consul/docs/upgrade/automated). ## Next steps -In this tutorial you got an overview of the autopilot features and got examples -on how and when tune the default values. +To learn more about the autopilot features described on this page, refer to [read replicas](/consul/docs/manage/scale/read-replica) and [redundancy zones](/consul/docs/manage/scale/redundancy-zone). -To learn more about the Autopilot settings you did not configure in this tutorial, -[last_contact_threshold](/consul/docs/reference/agent/configuration-file/general#last_contact_threshold) -and -[max_trailing_logs](/consul/docs/reference/agent/configuration-file/general#max_trailing_logs), -either read the agent configuration documentation or use the help flag with the -operator autopilot `consul operator autopilot set-config -h`. +For agent specifications related to autopilot settings for stability, refer to the [last_contact_threshold](/consul/docs/reference/agent/configuration-file/bootstrap#last_contact_threshold) and [max_trailing_logs](/consul/docs/reference/agent/configuration-file/bootstrap#max_trailing_logs) parameters in the Consul agent configuration documentation. \ No newline at end of file diff --git a/content/consul/v1.22.x/content/docs/manage/scale/index.mdx b/content/consul/v1.22.x/content/docs/manage/scale/index.mdx index 82ecc5aa59..47d66a8680 100644 --- a/content/consul/v1.22.x/content/docs/manage/scale/index.mdx +++ b/content/consul/v1.22.x/content/docs/manage/scale/index.mdx @@ -324,7 +324,7 @@ Enterprise customers might also rely on [automated backups](/consul/docs/manage/ We do not recommend automated scaling of Consul server nodes based on load or usage unless it is coupled by some logic that prevents the cluster from losing quorum. -One way to improve your datacenter resiliency and to leverage automatic scaling is to use [read replicas](/consul/docs/manage/scale/read-replica) and [redundancy zones](/consul/docs/manage/scale/redundancy-zone). +One way to improve your datacenter resiliency and to leverage automatic scaling is to use [autopilot](/consul/docs/manage/scale/autopilot), [read replicas](/consul/docs/manage/scale/read-replica) and [redundancy zones](/consul/docs/manage/scale/redundancy-zone). These features provide support for read-heavy workload periods without risking the stability of the overall cluster.