-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: High availability explanation page (#940)
- Loading branch information
1 parent
6087231
commit 37d86c1
Showing
2 changed files
with
45 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# High availability | ||
|
||
High availability (HA) is a core feature of {{ product }}, ensuring that | ||
a Kubernetes cluster remains operational and resilient, even when nodes or | ||
critical components encounter failures. This capability is crucial for | ||
maintaining continuous service for applications and workloads running in | ||
production environments. | ||
|
||
HA is automatically enabled in {{ product }} for clusters with three or | ||
more nodes independent of the deployment method. By distributing key components | ||
across multiple nodes, HA reduces the risk of downtime and service | ||
interruptions, offering built-in redundancy and fault tolerance. | ||
|
||
## Key components of a highly available cluster | ||
|
||
A highly available Kubernetes cluster exhibits the following characteristics: | ||
|
||
### 1. **Multiple nodes for redundancy** | ||
|
||
Having multiple nodes in the cluster ensures workload distribution and | ||
redundancy. If one node fails, workloads will be rescheduled automatically on | ||
other available nodes without disrupting services. This node-level redundancy | ||
minimizes the impact of hardware or network failures. | ||
|
||
### 2. **Control plane redundancy** | ||
|
||
The control plane manages the cluster’s state and operations. For high | ||
availability, the control plane components—such as the API server, scheduler, | ||
and controller-manager—are distributed across multiple nodes. This prevents a | ||
single point of failure from rendering the cluster inoperable. | ||
|
||
### 3. **Highly available datastore** | ||
|
||
By default, {{ product }} uses **dqlite** to manage the Kubernetes | ||
cluster state. Dqlite leverages the Raft consensus algorithm for leader | ||
election and voting, ensuring reliable data replication and failover | ||
capabilities. When a leader node fails, a new leader is elected seamlessly | ||
without administrative intervention. This mechanism allows the cluster to | ||
remain operational even in the event of node failures. More details on | ||
replication and leader elections can be found in | ||
the [dqlite replication documentation][Dqlite-replication]. | ||
|
||
<!-- LINKS --> | ||
[Dqlite-replication]: https://dqlite.io/docs/explanation/replication |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ channels | |
clustering | ||
ingress | ||
epa | ||
high-availability | ||
security | ||
cis | ||
``` | ||
|