Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 83 additions & 7 deletions docs/enterprise/embedded-manage-nodes.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
import HaArchitecture from "../partials/embedded-cluster/_multi-node-ha-arch.mdx"
import ShellCommand from "../partials/embedded-cluster/_shell-command.mdx"

# Manage Multi-Node Clusters with Embedded Cluster
# Access and Manage Embedded Clusters

This topic describes managing nodes in clusters created with Replicated Embedded Cluster, including how to add nodes and enable high-availability for multi-node clusters.

## Limitations
## Access the Cluster
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved "Access the Cluster" and "Reset a Node" from the old "Using Embedded Cluster" page (which is now focused on configuring Embedded Cluster)


With Embedded Cluster, end users rarely need to use the CLI. Typical workflows, like updating the application and the cluster, can be done through the Admin Console. However, there are times when vendors or their customers need to use the CLI for development or troubleshooting.

<ShellCommand/>

## Configure Multi-Node Clusters

This section describes how to join nodes to a cluster with Embedded Cluster.

### Limitations

Multi-node clusters with Embedded Cluster have the following limitations:

Expand All @@ -18,11 +29,11 @@ Multi-node clusters with Embedded Cluster have the following limitations:

* The `join print-command` command always returns the commands for joining a node with the controller role. It does not support printing the join command for any custom node roles defined in the Embedded Cluster Config `roles` key. See [Automate Controller Node Joins](#automate-node-joins) below.

## Requirement
### Requirement

To deploy multi-node clusters with Embedded Cluster, the **Multi-node Cluster (Embedded Cluster only)** license field must be enabled for the customer. For more information about managing customer licenses, see [Create and Manage Customers](/vendor/releases-creating-customer).

## Add Nodes to a Cluster {#add-nodes}
### Add Nodes to a Cluster {#add-nodes}

This section describes how to add nodes to a cluster with Embedded Cluster.

Expand Down Expand Up @@ -62,7 +73,7 @@ To add a node to a cluster with Embedded Cluster:

1. Repeat these steps for each node you want to add.

## Automate Controller Node Joins {#automate-node-joins}
### Automate Controller Node Joins {#automate-node-joins}

With Embedded Cluster, you can use the command line to get the commands for joining controller nodes, rather than having to log into the Admin Console UI to get the commands. This is especially useful when testing multi-node Embedded Cluster installations where you need to automate the process of joining controller nodes to a cluster.

Expand All @@ -89,7 +100,7 @@ To automate controller node joins with Embedded Cluster:

1. On the node that you want to join as a controller, run each of the commands provided in the `join print-command` output to download the Embedded Cluster binary, extract the binary, and join the node to the cluster.

## High Availability for Multi-Node Clusters {#ha}
## Configure High Availability for Multi-Node Clusters {#ha}

Multi-node clusters are not highly available by default. The first node of the cluster holds important data for Kubernetes and KOTS, such that the loss of this node would be catastrophic for the cluster. Enabling high availability requires that at least three controller nodes are present in the cluster.

Expand Down Expand Up @@ -156,4 +167,69 @@ To enable high availability for an existing Embedded Cluster installation with t
sudo ./APP_SLUG enable-ha
```

Where `APP_SLUG` is the unique slug for the application.
Where `APP_SLUG` is the unique slug for the application.

## Reset Nodes and Remove Clusters
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the procedures on resetting nodes to this Access and Manage page


This section describes how to reset individual nodes and how to delete an entire multi-node cluster using the Embedded Cluster [reset](/reference/embedded-cluster-reset) command.

### About the `reset` Command

Resetting a node with Embedded Cluster removes the cluster and your application from that node. This is useful for iteration, development, and when mistakes are made because you can reuse the machine instead of having to procure a new one.

The `reset` command performs the following steps:

1. Run safety checks. For example, `reset` does not remove a controller node when there are workers nodes available. And, it does not remove a node when the etcd cluster is unhealthy.
1. Drain the node and evict all the Pods gracefully
1. Delete the node from the cluster
1. Stop and reset k0s
1. Remove all Embedded Cluster files
1. Reboot the node

For more information about the command, see [reset](/reference/embedded-cluster-reset).

### Limitations and Best Practices

Before you reset a node or remove a cluster, consider the following limitations and best practices:

* When you reset a node, OpenEBS PVCs on the node are deleted. Only PVCs created as part of a StatefulSet are recreated automatically on another node in the cluster. To recreate other PVCs, redeploy the application in the cluster.

* If you need to reset one controller node in a three-node cluster, first join a fourth controller node to the cluster before removing the target node. This ensures that you maintain a minimum of three nodes for the Kubernetes control plane. You can add and remove worker nodes as needed because they do not have any control plane components.

* When resetting a single node or deleting a test environment, you can include the `--force` flag with the `reset` command to ignore any errors.

* When removing a multi-node cluster, run `reset` on each of the worker nodes first. Then, run `reset` on controller nodes. Controller nodes also remove themselves from etcd membership.

### Reset a Node

To reset a node:

1. SSH onto the node. Ensure that the Embedded Cluster binary is still available on the machine.

1. Run the following command to remove the node and reboot the machine:

```bash
sudo ./APP_SLUG reset
```
Where `APP_SLUG` is the unique slug for the application.

### Remove a Multi-Node Cluster

To remove a multi-node cluster:

1. SSH onto a worker node.

:::note
The safety checks for the `reset` command prevent you from removing a controller node when there are still worker nodes available in the cluster.
:::

1. Remove the node and reboot the machine:

```bash
sudo ./APP_SLUG reset
```
Where `APP_SLUG` is the unique slug for the application.

1. After removing all the worker nodes in the cluster, SSH onto a controller node and run the `reset` command to remove the node.

1. Repeat the previous step on the remaining controller nodes in the cluster.
2 changes: 1 addition & 1 deletion docs/reference/embedded-config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ For a full list of versions, see the [Embedded Cluster Release Notes](/release-n

<DoNotDowngrade/>

## roles (Beta)
## roles (Beta) {#roles}

You can optionally customize node roles in the Embedded Cluster Config using the `roles` key.

Expand Down
4 changes: 2 additions & 2 deletions docs/vendor/embedded-overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ For more information about creating HA multi-node clusters with Embedded Cluster

You can optionally define node roles in the Embedded Cluster Config. For multi-node clusters, roles can be useful for the purpose of assigning specific application workloads to nodes. If nodes roles are defined, users assign one or more roles to a node when it is joined to the cluster.

For more information, see [roles](/reference/embedded-config#roles-beta) in _Embedded Cluster Config_.
For more information, see [roles](/reference/embedded-config#roles) in _Embedded Cluster Config_.

## About Configuring Embedded Cluster

Expand Down Expand Up @@ -145,7 +145,7 @@ Embedded Cluster has the following limitations:

* **Partial rollback support**: In Embedded Cluster 1.17.0 and later, rollbacks are supported only when rolling back to a version where there is no change to the [Embedded Cluster Config](/reference/embedded-config) compared to the currently-installed version. For example, users can roll back to release version 1.0.0 after upgrading to 1.1.0 only if both 1.0.0 and 1.1.0 use the same Embedded Cluster Config. For more information about how to enable rollbacks for your application in the KOTS Application custom resource, see [allowRollback](/reference/custom-resource-application#allowrollback) in _Application_.

* **Changing node hostnames is not supported**: After a host is added to a cluster, Kubernetes assumes that the hostname and IP address of the host will not change. If you need to change the hostname or IP address of a node, you must first remove the node from the cluster, reset it, and then rejoin it. For information about how to reset nodes with Embedded Cluster, see [Reset a Node](/vendor/embedded-using#reset-a-node). For information about the requirements for naming nodes, see [Node name uniqueness](https://kubernetes.io/docs/concepts/architecture/nodes/#node-name-uniqueness) in the Kubernetes documentation.
* **Changing node hostnames is not supported**: After a host is added to a cluster, Kubernetes assumes that the hostname and IP address of the host will not change. If you need to change the hostname or IP address of a node, you must first remove the node from the cluster, reset it, and then rejoin it. For information about how to reset nodes with Embedded Cluster, see [Reset a Node](/enterprise/embedded-manage-nodes#reset-a-node). For information about the requirements for naming nodes, see [Node name uniqueness](https://kubernetes.io/docs/concepts/architecture/nodes/#node-name-uniqueness) in the Kubernetes documentation.

:::note
If you need to change the hostname or IP address of a controller node in a three-node cluster, Replicated recommends that you join a fourth controller node to the cluster before removing the target node. This ensures that you maintain a minimum of three nodes for the Kubernetes control plane. You can add and remove worker nodes as needed because they do not have any control plane components. For information about how to remove controller nodes, see [Remove or Replace a Controller](https://docs.k0sproject.io/stable/remove_controller/) in the k0s documentation.
Expand Down
10 changes: 5 additions & 5 deletions docs/vendor/embedded-troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ If there are any containerd services on the host, the NVIDIA GPU Operator will g

This is the result of a known issue with v24.9.x of the NVIDIA GPU Operator. For more information about the known issue, see [container-toolkit does not modify the containerd config correctly when there are multiple instances of the containerd binary](https://github.com/NVIDIA/nvidia-container-toolkit/issues/982) in the nvidia-container-toolkit repository in GitHub.

For more information about including the GPU Operator as a Helm extension, see [NVIDIA GPU Operator](/vendor/embedded-using#nvidia-gpu-operator) in _Using Embedded Cluster_.
For more information about including the GPU Operator as a Helm extension, see [NVIDIA GPU Operator](/vendor/embedded-using#nvidia-gpu-operator) in _Configure Embedded Cluster_.

#### Solution

Expand All @@ -86,7 +86,7 @@ To troubleshoot:
```
Where `APP_SLUG` is the unique slug for the application.

For more information, see [Reset a Node](/vendor/embedded-using#reset-a-node) in _Using Embedded Cluster_.
For more information, see [Reset a Node](/enterprise/embedded-manage-nodes#reset-a-node) in _Access and Manage Embedded Clusters_.

1. Re-install with Embedded Cluster.

Expand Down Expand Up @@ -152,7 +152,7 @@ Reasons can include:
```
Where `APP_SLUG` is the unique slug for the application.

For more information, see [Reset a Node](/vendor/embedded-using#reset-a-node) in _Using Embedded Cluster_.
For more information, see [Reset a Node](/enterprise/embedded-manage-nodes#reset-a-node) in _Access and Manage Embedded Clusters_.

1. Reinstall the application with different CIDRs using the `--cidr` flag:

Expand All @@ -163,7 +163,7 @@ Reasons can include:
For more information, see [Embedded Cluster Install Options](/reference/embedded-cluster-install).
</TabItem>
<TabItem value="kernel" label="Incorrect kernel parameter values">
Embedded Cluster 1.19.0 and later automatically sets the `net.ipv4.conf.default.arp_filter`, `net.ipv4.conf.default.arp_ignore`, and `net.ipv4.ip_forward` kernel parameters. Additionally, host preflight checks automatically run during installation to verify that the kernel parameters were set correctly. For more information about the Embedded Cluster preflight checks, see [About Host Preflight Checks](/vendor/embedded-overview#about-host-preflight-checks) in _Using Embedded Cluster_.
Embedded Cluster 1.19.0 and later automatically sets the `net.ipv4.conf.default.arp_filter`, `net.ipv4.conf.default.arp_ignore`, and `net.ipv4.ip_forward` kernel parameters. Additionally, host preflight checks automatically run during installation to verify that the kernel parameters were set correctly. For more information about the Embedded Cluster preflight checks, see [About Host Preflight Checks](/vendor/embedded-overview#about-host-preflight-checks) in _Embedded Cluster Overview_.

If kernel parameters are not set correctly and these preflight checks fail, you might see a message such as `IP forwarding must be enabled.` or `ARP filtering must be disabled by default for newly created interfaces.`.

Expand All @@ -185,7 +185,7 @@ Reasons can include:
sudo ./APP_SLUG reset
```
Where `APP_SLUG` is the unique slug for the application.
For more information, see [Reset a Node](/vendor/embedded-using#reset-a-node) in _Using Embedded Cluster_.
For more information, see [Reset a Node](/enterprise/embedded-manage-nodes#reset-a-node) in _Access and Manage Embedded Clusters_.

1. Re-install with Embedded Cluster.
</TabItem>
Expand Down
Loading