High Availability Clusters
Set up multi-control-plane K3s clusters for fault tolerance
Overview
PodWarden supports multi-control-plane clusters using K3s embedded etcd. With 3 or more control plane nodes, your cluster remains operational even if one control plane goes down.
Prerequisites
- Minimum 3 hosts available for control plane roles — odd numbers required (3 or 5)
- All control plane hosts must be able to reach each other on ports 6443 (K3s API), 2379, and 2380 (etcd)
- All hosts joined to PodWarden (discovered via Tailscale or manually added)
Setting Up an HA Cluster
Step 1: Create the cluster
Create a new cluster as normal — select a host and provision it as a control plane. PodWarden automatically initializes the cluster with embedded etcd, which is HA-ready from day one.
Step 2: Add additional control planes
- Navigate to your cluster's detail page
- Click Add Control Plane in the Control Planes section
- Select an available host — PodWarden shows network compatibility status for each candidate
- Click Add as Control Plane to start provisioning
Repeat to add a third control plane. Three is the minimum for true HA — etcd quorum requires a majority of nodes to agree.
Step 3: Verify
The Control Planes section on the cluster detail page shows all CPs with their status. All should show Active. You can also verify directly via kubectl:
kubectl get nodes
# All control planes show role: control-plane,etcd,masterHow It Works
| Mechanism | Behavior |
|---|---|
| Etcd quorum | With 3 CPs, the cluster tolerates 1 failure. With 5, it tolerates 2. |
| Worker failover | Workers automatically connect to any available control plane. |
| Kubeconfig | PodWarden tries all CPs when fetching kubeconfig — if the primary is down, it falls back to others. |
Network Considerations
All control planes must share at least one common network. Etcd is sensitive to latency — ideally all CPs are co-located or on the same LAN.
| Configuration | Works? | Notes |
|---|---|---|
| All CPs on same LAN | Yes | Best performance — lowest etcd latency |
| All CPs on Tailscale | Yes | Works across sites |
| Mixed LAN + Tailscale | Yes (with warning) | Adds latency to etcd — PodWarden warns at setup |
Quorum Reference
| Control planes | Tolerated failures | Required for quorum |
|---|---|---|
| 1 | 0 | No HA |
| 3 | 1 | 2 of 3 |
| 5 | 2 | 3 of 5 |
Always use an odd number. An even count (2 or 4) does not improve fault tolerance and risks split-brain.
Limitations
- Minimum 3 control planes for HA — a single CP cluster has no redundancy
- Odd numbers only — even counts risk split-brain scenarios
- Adding a CP to a cluster that was created before HA support requires reprovisioning
Related Docs
- Disaster Recovery — Etcd snapshots and recovery procedures
- Node Management — Cordon, drain, and node-level operations
- Networking — Network considerations for multi-host clusters
- Infrastructure Canvas — Visual topology with control plane placement