Deployment

Setting Up a Kubernetes Cluster with k3s

Step-by-step guide for a highly available Kubernetes cluster with three servers. WireGuard mesh, k3s, Longhorn, and first workloads.

AuthorMarvin Strauch
PublishedMay 18, 2026
min read~46 min
Words8.000
Difficulty Advanced
StackKubernetes · k3s · WireGuard · Longhorn · Cluster · High Availability

This guide shows you how to set up a highly available Kubernetes cluster with three servers. You will connect three dataforest Seeds via an encrypted WireGuard mesh network, install k3s as a lightweight Kubernetes distribution, set up Longhorn for replicated storage volumes, and deploy a first application. By the end, you will have a production-ready cluster that automatically compensates for individual server failures, performs rolling updates without downtime, and replicates data across all nodes. Plan for 60 to 90 minutes for the entire setup.

Why a Kubernetes Cluster?

A single server is sufficient for many applications. However, once high availability, uninterrupted updates, or horizontal scaling are required, a single server reaches its limits. If it goes down, all services running on it are offline simultaneously. Updates require maintenance windows. Traffic spikes can only be absorbed through vertical scaling (more CPU, more RAM), which quickly hits physical constraints.

Kubernetes solves these problems by connecting multiple servers into a cluster. Applications are distributed as containers across the available nodes. If a node fails, Kubernetes automatically moves the affected workloads to the remaining nodes. Rolling updates replace containers incrementally, ensuring at least one healthy instance is running at all times. Horizontal scaling adds more replicas instead of upgrading a single server.

For applications that do not require high availability and where occasional brief outages are acceptable, a Kubernetes cluster is not strictly necessary. A deployment platform on a single server offers a much simpler solution. You can find an overview of these options on the Self-Hosted Deployment page.

Architecture Overview

The cluster consists of three layers that build upon each other.

WireGuard Mesh forms the network layer. All three servers are connected via encrypted point-to-point tunnels. All communication between nodes runs through this private network. External attackers can neither read nor manipulate cluster-internal traffic. You can find more background on WireGuard in the VPN guide.

k3s is a certified, lightweight Kubernetes distribution from Rancher (SUSE). It packages the entire Kubernetes stack into a single binary and requires significantly fewer resources than a standard Kubernetes cluster.

Kubernetes distinguishes between the Control Plane (the management layer) and Workloads (your applications). The Control Plane consists of the API Server (accepts kubectl commands), the Scheduler (decides which node a Pod runs on), the Controller Manager (ensures the desired state is maintained), and etcd (a distributed database that stores the entire cluster state: Deployments, Services, Secrets, configurations).

In a standard Kubernetes setup, the Control Plane and Workloads run on separate servers (control-plane nodes and worker nodes). k3s simplifies this: in this tutorial, all three nodes run as server nodes. Each server node runs both the Control Plane and your Workloads. All three nodes hold a copy of etcd. etcd uses a consensus algorithm (Raft) that requires a majority of nodes to confirm write operations. With three nodes, the majority is two. This means: if one node fails, the remaining two can still make decisions. If two nodes fail, the majority is lost and the cluster cannot accept new changes (running workloads on the remaining node are not affected, but new deployments or scaling operations are not possible).

Longhorn provides replicated storage volumes. When a Pod needs persistent data (database, uploads), Longhorn creates a volume and replicates it across multiple nodes. If a node fails, the data remains available on the other nodes.

Traefik acts as the Ingress Controller and is already integrated into k3s. It receives incoming HTTP/HTTPS traffic and routes it to the appropriate services based on routing rules.

Kubernetes Cluster Architecture
Kubernetes Cluster Architecture

NodePublic IPWireGuard IPRole
seed-k8s-01(your IP)10.222.0.1Server
seed-k8s-02(your IP)10.222.0.2Server
seed-k8s-03(your IP)10.222.0.3Server

Prerequisites

  • 3 Seeds in the dataforest Cloud. Minimum: Plan entry-c4-m8-s80 (4 CPU, 8 GB RAM, 80 GB SSD). Recommendation for comfortable operation: Plan entry-c8-m16-s320 (8 CPU, 16 GB RAM, 320 GB SSD). Kubernetes itself, k3s, Longhorn, and the overlay network already consume resources. With the larger plan, enough capacity remains for your actual workloads.
  • SSH access to all three Seeds
  • kubectl on your local machine (official installation guide)
  • Optional: A domain with DNS access to test HTTPS Ingress with Let's Encrypt
  • Basic knowledge: Docker, Linux terminal, SSH

The three Seeds can be created via the dataforest Cloud UI or through the Public API. Using the API, the first node looks like this:

bash
curl -X POST "https://api.dataforest.net/api/v1/public/seeds" \
  -H "Authorization: Bearer <API-Token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "seed-k8s-01",
    "plan": "lines/entry/models/entry-c4-m8-s80",
    "location": "fra01",
    "project_id": "<Project-ID>",
    "ssh_keys": ["<SSH-Key-ID>"],
    "source": {
      "type": "image",
      "ref": "images/debian/versions/debian-v13"
    },
    "enable_ipv4": true
  }'

Repeat the call with the names seed-k8s-02 and seed-k8s-03. You can find your API token and project ID in the team settings of the Cloud UI. Available SSH key IDs can be retrieved with GET /sshkeys.

Preparing the Seeds

Perform the following steps on all three nodes. Connect via SSH and work as root.

Update the System

bash
apt update && apt upgrade -y

A fresh system with up-to-date packages avoids compatibility issues when installing k3s and Longhorn.

Disable Swap

The kubelet process calculates available resources, schedules Pods based on memory requests, and enforces limits. If the operating system swaps memory to disk in the background, these calculations become unreliable. Pods could appear to use more memory than is available, and under actual memory pressure, the system responds with extreme slowdown instead of a controlled Pod restart. Since Kubernetes 1.34, a stable swap mode (LimitedSwap) allows Burstable Pods controlled swap access. For a cluster setup like this one, disabled swap remains the simplest and safest option.

bash
swapoff -a
sed -i '/swap/d' /etc/fstab

The first command disables swap immediately. The second removes the swap entry from /etc/fstab so that swap is not automatically re-enabled after a reboot.

Load Kernel Modules

Container networking requires two kernel modules: overlay for the overlay filesystem (how containers layer their filesystems) and br_netfilter for correct processing of bridge traffic through iptables.

bash
cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

The file under /etc/modules-load.d/ ensures the modules are loaded automatically after a reboot. The modprobe commands load them immediately into the running kernel.

Set Sysctl Parameters

Kubernetes networking requires the kernel to correctly forward packets between network bridges and filter them through iptables rules:

bash
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF

sysctl --system

net.bridge.bridge-nf-call-iptables ensures that traffic flowing through a Linux bridge passes through iptables rules. Without this setting, NetworkPolicies and Service routing would not work. net.ipv4.ip_forward allows the kernel to forward packets between network interfaces, which is required for routing between Pods on different nodes.

Configure the Firewall

Debian 13 does not come with iptables pre-installed. Install it:

bash
apt install iptables

The cluster requires the following ports. Each port serves a specific function:

PortProtocolPurpose
51820UDPWireGuard tunnel between nodes
6443TCPKubernetes API Server (kubectl communication, node registration)
9345TCPk3s Supervisor API (nodes joining the cluster)
10250TCPKubelet API (API Server communicates with kubelets on each node)
2379-2380TCPetcd client and peer communication (cluster state)
8472UDPVXLAN (Flannel overlay network between Pods)
80TCPHTTP Ingress (incoming web traffic)
443TCPHTTPS Ingress (incoming web traffic, TLS)

Open these ports for cluster-internal traffic (WireGuard subnet) and the publicly accessible ports:

bash
# WireGuard: must be reachable from all other nodes
iptables -A INPUT -p udp --dport 51820 -j ACCEPT

# Kubernetes API Server: public for kubectl access
iptables -A INPUT -p tcp --dport 6443 -j ACCEPT

# k3s Supervisor: only from the WireGuard network
iptables -A INPUT -s 10.222.0.0/24 -p tcp --dport 9345 -j ACCEPT

# Kubelet API: only from the WireGuard network
iptables -A INPUT -s 10.222.0.0/24 -p tcp --dport 10250 -j ACCEPT

# etcd: only from the WireGuard network
iptables -A INPUT -s 10.222.0.0/24 -p tcp --dport 2379:2380 -j ACCEPT

# Flannel VXLAN: only from the WireGuard network
iptables -A INPUT -s 10.222.0.0/24 -p udp --dport 8472 -j ACCEPT

# Ingress: publicly accessible
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

To persist these rules across reboots, install iptables-persistent:

bash
apt install iptables-persistent
netfilter-persistent save

During installation, the package asks whether the current rules should be saved. Confirm with Yes. After future changes, save again with netfilter-persistent save.

Setting Up the WireGuard Mesh

Kubernetes requires that all Pods can communicate with each other without NAT and that nodes can reach all Pods directly. On dedicated servers without a managed VLAN, a WireGuard mesh fulfills this requirement: it spans an encrypted overlay network across the public IPs of your nodes.

WireGuard has been part of the Linux kernel since version 5.6 (March 2020). According to University of Amsterdam benchmarks, the pure encryption overhead is below 0.5 ms per hop. In practice, factors like routing and system load add up, so you can expect low single-digit milliseconds.

In this tutorial, k3s uses VXLAN as the Flannel backend (--flannel-backend=vxlan) and routes it through the WireGuard interface (--flannel-iface=wg0). The WireGuard mesh encrypts all traffic between nodes, so a second encryption layer at the Flannel level is unnecessary. Both control-plane communication (API server, embedded etcd) and Pod-to-Pod traffic run through the mesh. That is why it must be in place before k3s is installed.

Install WireGuard

Run on all three nodes:

bash
apt install wireguard

The package installs the management tools wg and wg-quick. The WireGuard kernel module has been built into the kernel since Linux 5.6 and is available by default in Debian 13.

Generate Key Pairs

Each node needs its own key pair. The private key stays on the respective node. The public keys are shared with the other nodes.

Run on each of the three nodes:

bash
wg genkey | tee /etc/wireguard/private.key
chmod 600 /etc/wireguard/private.key
cat /etc/wireguard/private.key | wg pubkey > /etc/wireguard/public.key

Note down the public key of each node:

bash
cat /etc/wireguard/public.key

You will need these three public keys in the next step for configuring the peer sections.

Create the Configuration

In a mesh network, each node knows the other two as peers. Create the file /etc/wireguard/wg0.conf on each node with the corresponding content.

Node 1 (seed-k8s-01): /etc/wireguard/wg0.conf

ini
[Interface]
PrivateKey = <PRIVATE_KEY_NODE_1>
Address = 10.222.0.1/24
ListenPort = 51820
MTU = 1420

[Peer]
PublicKey = <PUBLIC_KEY_NODE_2>
AllowedIPs = 10.222.0.2/32
Endpoint = <PUBLIC_IP_NODE_2>:51820

[Peer]
PublicKey = <PUBLIC_KEY_NODE_3>
AllowedIPs = 10.222.0.3/32
Endpoint = <PUBLIC_IP_NODE_3>:51820

Node 2 (seed-k8s-02): /etc/wireguard/wg0.conf

ini
[Interface]
PrivateKey = <PRIVATE_KEY_NODE_2>
Address = 10.222.0.2/24
ListenPort = 51820
MTU = 1420

[Peer]
PublicKey = <PUBLIC_KEY_NODE_1>
AllowedIPs = 10.222.0.1/32
Endpoint = <PUBLIC_IP_NODE_1>:51820

[Peer]
PublicKey = <PUBLIC_KEY_NODE_3>
AllowedIPs = 10.222.0.3/32
Endpoint = <PUBLIC_IP_NODE_3>:51820

Node 3 (seed-k8s-03): /etc/wireguard/wg0.conf

ini
[Interface]
PrivateKey = <PRIVATE_KEY_NODE_3>
Address = 10.222.0.3/24
ListenPort = 51820
MTU = 1420

[Peer]
PublicKey = <PUBLIC_KEY_NODE_1>
AllowedIPs = 10.222.0.1/32
Endpoint = <PUBLIC_IP_NODE_1>:51820

[Peer]
PublicKey = <PUBLIC_KEY_NODE_2>
AllowedIPs = 10.222.0.2/32
Endpoint = <PUBLIC_IP_NODE_2>:51820

Explanation of the parameters:

  • PrivateKey: The private key of this node (from /etc/wireguard/private.key).
  • Address: The WireGuard IP of this node in the private subnet 10.222.0.0/24.
  • ListenPort: The UDP port on which WireGuard accepts connections.
  • MTU = 1420: WireGuard adds a header to each packet (60 bytes for IPv4, 80 bytes for IPv6). The standard Ethernet MTU is 1500 bytes. 1500 minus 80 equals 1420 bytes of usable packet size within the tunnel. An MTU value that is too high leads to packet fragmentation and performance issues.
  • PublicKey: The public key of the respective peer node.
  • AllowedIPs: Defines which IP addresses are reachable via this peer. In a mesh setup, this is the individual WireGuard IP of the peer (/32).
  • Endpoint: The public IP and port of the peer node, through which the tunnel is established.

Start WireGuard

Enable and start the tunnel on all three nodes:

bash
systemctl enable --now wg-quick@wg0

This command starts the WireGuard interface immediately and configures automatic startup after a reboot.

Validate the Connection

Test reachability from each node to the other two. On Node 1:

bash
ping -c 3 10.222.0.2
ping -c 3 10.222.0.3

On Node 2:

bash
ping -c 3 10.222.0.1
ping -c 3 10.222.0.3

On Node 3:

bash
ping -c 3 10.222.0.1
ping -c 3 10.222.0.2

All pings must return responses. If a ping fails, do not proceed with the k3s installation. The cluster only works if all nodes can communicate over the WireGuard mesh.

Troubleshooting WireGuard

If a ping fails, check the following in order:

Show tunnel status:

bash
wg show

Under latest handshake you can see whether a connection to the peer exists. If this entry is missing, no handshake has occurred yet.

Port reachable? Check on the target node whether WireGuard is listening on the correct port:

bash
ss -ulnp | grep 51820

If there is no output, WireGuard is not started or the port is misconfigured.

Check the firewall: Make sure UDP port 51820 is not blocked:

bash
iptables -L INPUT -n | grep 51820

Check public keys: A common mistake is swapping public and private keys or using the wrong public key in the peer configuration. Compare the output of cat /etc/wireguard/public.key on the respective node with the PublicKey entry in the peer sections of the other nodes.

Endpoint IP correct? The Endpoint value must be the public IP of the peer node, not the WireGuard IP.

Installing k3s

k3s is installed in two phases. Node 1 initializes the cluster with embedded etcd. Nodes 2 and 3 then join as additional server nodes. All three nodes are equal server nodes (not agents). This means: each node runs the full Kubernetes control plane and can assume the leader role in case of a failure.

Generate a Token

All nodes use a shared token to authenticate with each other. Generate a secure token on Node 1:

bash
openssl rand -hex 32

Note down the output. You will use this value as K3S_TOKEN during installation on all three nodes.

Node 1: Initialize the Cluster

Node 1 creates the cluster. Run the following command on seed-k8s-01:

bash
curl -sfL https://get.k3s.io | K3S_TOKEN=<YOUR_TOKEN> sh -s - server \
  --cluster-init \
  --node-ip=10.222.0.1 \
  --node-external-ip=<PUBLIC_IP_NODE_1> \
  --flannel-iface=wg0 \
  --flannel-backend=vxlan \
  --tls-san=<PUBLIC_IP_NODE_1> \
  --tls-san=10.222.0.1

Explanation of each flag:

  • K3S_TOKEN: The shared secret that nodes use to authenticate with the cluster. Without a valid token, no node can join.
  • --cluster-init: Enables embedded etcd and starts a new HA cluster. Without this flag, k3s would use SQLite as the database, which does not support high availability.
  • --node-ip=10.222.0.1: Tells k3s which IP address to use for internal cluster communication. By specifying the WireGuard IP, all cluster traffic runs through the encrypted tunnel.
  • --node-external-ip=<PUBLIC_IP_NODE_1>: The public IP address of this node. Used for Ingress traffic so that external requests can reach the node.
  • --flannel-iface=wg0: Instructs the Flannel overlay network to use the WireGuard interface for communication between Pods on different nodes. Without this flag, Flannel would use the default interface (eth0) and send traffic unencrypted over the public network.
  • --flannel-backend=vxlan: Uses VXLAN as the overlay protocol. k3s also offers wireguard-native as a backend. Since the connection between nodes is already encrypted by the WireGuard mesh, a second WireGuard layer would be redundant and only add overhead.
  • --tls-san=<PUBLIC_IP_NODE_1> and --tls-san=10.222.0.1: Adds these IP addresses as Subject Alternative Names to the API Server's TLS certificate. Without these entries, kubectl from your local machine would receive a certificate error because the IP is not included in the certificate.

Wait until the node is ready. This takes 30 to 60 seconds:

bash
kubectl get nodes

The output should show one node with status Ready:

text
NAME           STATUS   ROLES                       AGE   VERSION
seed-k8s-01   Ready    control-plane,etcd,master   45s   v1.31.x+k3s1

If the status shows NotReady, wait another 30 seconds and try again. k3s needs a moment to start all system Pods.

Nodes 2 and 3: Join the Cluster

Run on seed-k8s-02:

bash
curl -sfL https://get.k3s.io | K3S_TOKEN=<YOUR_TOKEN> sh -s - server \
  --server https://10.222.0.1:6443 \
  --node-ip=10.222.0.2 \
  --node-external-ip=<PUBLIC_IP_NODE_2> \
  --flannel-iface=wg0 \
  --flannel-backend=vxlan \
  --tls-san=<PUBLIC_IP_NODE_2> \
  --tls-san=10.222.0.2

And on seed-k8s-03:

bash
curl -sfL https://get.k3s.io | K3S_TOKEN=<YOUR_TOKEN> sh -s - server \
  --server https://10.222.0.1:6443 \
  --node-ip=10.222.0.3 \
  --node-external-ip=<PUBLIC_IP_NODE_3> \
  --flannel-iface=wg0 \
  --flannel-backend=vxlan \
  --tls-san=<PUBLIC_IP_NODE_3> \
  --tls-san=10.222.0.3

The key difference: --server https://10.222.0.1:6443 points to the WireGuard IP of Node 1, not its public IP. All cluster communication runs through the encrypted WireGuard network. Nodes 2 and 3 do not use --cluster-init but instead join an existing cluster.

Wait 30 to 60 seconds after each installation. etcd needs time to add the new members to the quorum.

Validate the Cluster

Check the cluster status on any node:

bash
kubectl get nodes

The output should show three nodes with status Ready:

text
NAME           STATUS   ROLES                       AGE     VERSION
seed-k8s-01   Ready    control-plane,etcd,master   5m      v1.31.x+k3s1
seed-k8s-02   Ready    control-plane,etcd,master   2m      v1.31.x+k3s1
seed-k8s-03   Ready    control-plane,etcd,master   90s     v1.31.x+k3s1

All three nodes carry the roles control-plane, etcd, and master. This confirms that the cluster is fully highly available.

Copy the Kubeconfig to Your Local Machine

To manage the cluster from your local machine, you need the kubeconfig file. This contains the connection details and credentials for the API Server. On every k3s server, it is located at /etc/rancher/k3s/k3s.yaml.

Copy the file from Node 1:

bash
scp root@<PUBLIC_IP_NODE_1>:/etc/rancher/k3s/k3s.yaml ~/.kube/config-k8s-cluster

The file contains 127.0.0.1 as the server address since it is intended for local use on the node. Replace it with the public IP of Node 1:

bash
sed -i 's/127.0.0.1/<PUBLIC_IP_NODE_1>/' ~/.kube/config-k8s-cluster

Set the environment variable so that kubectl uses this configuration:

bash
export KUBECONFIG=~/.kube/config-k8s-cluster

Test the access:

bash
kubectl get nodes

You should see the same output with three Ready nodes as on the server. If the connection fails, check whether port 6443 on Node 1 is reachable and whether the --tls-san flags include the public IP.

For permanent use, add the export to your shell configuration (e.g., ~/.bashrc or ~/.zshrc).

Troubleshooting k3s

Node does not join the cluster:

First check WireGuard connectivity. From Node 2 or 3, ping 10.222.0.1 must work. Then check whether the required ports are open:

bash
# Run on Node 1: is port 6443 reachable?
ss -tlnp | grep 6443

# Run on Node 1: is port 9345 reachable?
ss -tlnp | grep 9345

Token mismatch: Compare the token on all nodes. The value must match exactly, including case sensitivity.

Check logs:

bash
journalctl -u k3s -f

Common error messages and their causes:

  • certificate signed by unknown authority: The --tls-san flags are missing or the IP does not match.
  • etcd cluster is not healthy: The WireGuard connection between nodes is unstable. Check wg show for active handshakes.
  • connection refused on port 6443: k3s has not fully started on Node 1 yet. Wait 60 seconds and try again.

Setting Up Persistent Storage

Kubernetes distinguishes between ephemeral containers and persistent data. By default, all files inside a container are lost when the Pod is restarted. For databases, uploads, or configuration files, the cluster needs a storage system that stores data independently of the individual Pod and ideally independently of the individual node.

Storage Options Overview

In Kubernetes, storage is requested via PersistentVolumeClaims (PVC) and provisioned via StorageClasses. A StorageClass defines which backend creates the volumes. Depending on the environment, there are different options:

  • Cloud provider storage: With managed cloud platforms, the provider provisions block storage volumes (comparable to virtual hard drives) that automatically attach to Pods and can be moved between nodes. This is the simplest option but requires the provider to offer a CSI driver (Container Storage Interface).
  • local-path-provisioner (pre-installed in k3s): Creates volumes directly on the local disk of the node. No overhead, no replication. If the node fails, the data is unavailable. Suitable for development environments and applications that store their state externally.
  • Distributed Block Storage: Software that builds on the local disks of all nodes and forms a distributed, replicated storage system from them. Longhorn, Piraeus/LINSTOR, and Rook-Ceph fall into this category. The advantage: data is automatically replicated across multiple nodes. The disadvantage: additional resource consumption and lower IOPS than native SSDs.

In this tutorial, we use Longhorn because it was specifically designed for Kubernetes clusters with local disks, can be installed with a single Helm command, and includes a web UI for management.

How Longhorn Works

When a Pod creates a PersistentVolumeClaim, Longhorn reserves storage space on the local SSDs of the nodes and creates a block device. This block device is attached via iSCSI to the node where the Pod runs. Simultaneously, Longhorn replicates the data synchronously to other nodes (configurable, in this tutorial to 2 of 3 nodes). Each replica is a complete copy of the volume.

If the node where a Pod with a Longhorn volume runs fails, Kubernetes starts the Pod on another node. Longhorn detects that a replica of the volume exists on this node (or a reachable node) and attaches the volume there. The data is immediately available without requiring a complete rebuild.

Install Longhorn

Prerequisites on All Nodes

Longhorn uses iSCSI internally for volume management between its components. This package must be installed on each of the three nodes:

bash
apt install open-iscsi nfs-common
systemctl enable --now iscsid

open-iscsi provides the iSCSI initiator through which Longhorn attaches volumes to the correct Pods. nfs-common is needed for ReadWriteMany volumes and backups. The enable --now command starts the service immediately and enables it permanently.

Repeat this step on seed-k8s-01, seed-k8s-02, and seed-k8s-03.

Install Helm

Helm is the standard package manager for Kubernetes. It installs complex applications (consisting of many YAML manifests) as so-called "Charts" with a single command.

bash
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Verify the installation:

bash
helm version

Install Longhorn via Helm

Add the official Longhorn repository and install the chart:

bash
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultReplicaCount=2

The individual parameters in detail:

  • --namespace longhorn-system creates a dedicated namespace so that all Longhorn components are isolated from the rest of the cluster.
  • --create-namespace creates the namespace if it does not already exist.
  • --set defaultSettings.defaultReplicaCount=2 stores each volume on two of the three nodes. This provides failure tolerance (one node may fail) without tripling storage consumption.

Wait until all Pods are ready:

bash
kubectl -n longhorn-system get pods

This process can take 2 to 5 minutes. All Pods should reach the Running status.

Configure the Default StorageClass

k3s ships with its own StorageClass called local-path that stores data only locally on a single node. Longhorn registers its own StorageClass called longhorn during installation.

There are two ways to use Longhorn storage:

  1. Explicitly per PVC: Each PersistentVolumeClaim specifies storageClassName: longhorn. This is more explicit and documents in the manifest which storage is being used.
  2. As the default StorageClass: Longhorn becomes the default. PVCs without an explicit storageClassName automatically use Longhorn.

In this tutorial, we set Longhorn as the default so that PVCs without an explicit specification automatically receive replicated storage:

bash
kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

The local-path StorageClass remains available. If a PVC explicitly specifies storageClassName: local-path, local storage without replication is still used. This makes sense for temporary data or caches where replication would be unnecessary overhead.

Validation: Create a Test Volume

Create a file test-pvc.yaml:

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Apply the manifest and check the status:

bash
kubectl apply -f test-pvc.yaml
kubectl get pvc

The STATUS column should show Bound after a few seconds. This means: Longhorn has successfully created a 1 GB volume and replicated it across two nodes.

Clean up the test volume:

bash
kubectl delete pvc test-pvc

Optional: Longhorn UI

Longhorn includes a web interface where you can view volumes, snapshots, and the health of nodes. For quick access without Ingress:

bash
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80

The UI is then accessible at http://localhost:8080 (via SSH tunnel or directly on the node).

Troubleshooting

PVC stays in Pending status:

bash
kubectl describe pvc test-pvc

Common causes:

  • The iscsid service is not running on one or more nodes.
  • The StorageClass is not correctly marked as default.
  • Longhorn Pods have not fully started yet.

Longhorn Pods in CrashLoopBackOff:

bash
kubectl -n longhorn-system logs <pod-name>
df -h
free -m

Common causes:

  • Not enough free disk space on the node (Longhorn requires at least 25% free space).
  • Not enough RAM for the Longhorn components.

First Workload: Stateless Application

With a working cluster and storage system, it is time for the first workload. You will deploy a simple web application with two instances, an internal service, and optional HTTPS access via a custom domain.

Create a Namespace

bash
kubectl create namespace demo

Namespaces group related resources and isolate them from each other. All following manifests go into the demo namespace.

Create the Deployment

Create a file deployment.yaml:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  namespace: demo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: nginx
          image: nginx:alpine
          ports:
            - containerPort: 80

The individual fields in detail:

  • apiVersion: apps/v1 specifies the API group that provides Deployments.
  • kind: Deployment ensures that Kubernetes automatically maintains the desired number of Pods.
  • replicas: 2 creates two identical Pods. If one fails, the other continues running.
  • selector.matchLabels connects the Deployment to its Pods via the label app: web.
  • template is the template for each Pod. All Pods receive the label app: web.
  • image: nginx:alpine uses the official nginx image in the slim Alpine variant.
  • containerPort: 80 documents which port the application listens on.

Create the Service

Create a file service.yaml:

yaml
apiVersion: v1
kind: Service
metadata:
  name: web
  namespace: demo
spec:
  selector:
    app: web
  ports:
    - port: 80
      targetPort: 80

A Service is a stable abstraction layer within the cluster. It receives a fixed cluster IP and a DNS name through which other Pods can reach the application. The Service automatically distributes incoming traffic across all Pods with matching labels. Other Pods in the cluster can reach this application via http://web.demo.svc.cluster.local or simply http://web (within the same namespace).

Apply and Verify

bash
kubectl apply -f deployment.yaml -f service.yaml
kubectl -n demo get pods -o wide

The -o wide flag shows additional columns, including which node each Pod is running on. With replicas: 2, the Pods should be distributed across different nodes. Kubernetes tries by default to spread Pods across available nodes.

Test Internal Access

bash
kubectl -n demo exec -it deploy/web -- curl -s http://web

This command runs curl inside one of the web Pods and calls the Service. You should receive the default nginx welcome page as HTML.

HTTPS Access with a Custom Domain

If you want to point a domain to your cluster, three steps are required: configure Traefik for Let's Encrypt, set DNS records, and create an Ingress.

Step 1: Configure the Let's Encrypt resolver. k3s allows Traefik configuration via a HelmChartConfig resource. Create a file /var/lib/rancher/k3s/server/manifests/traefik-config.yaml on Node 1:

yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    additionalArguments:
      - "--certificatesresolvers.le.acme.email=your-email@example.com"
      - "--certificatesresolvers.le.acme.storage=/data/acme.json"
      - "--certificatesresolvers.le.acme.tlschallenge=true"

Replace the email address with your own. Files in /var/lib/rancher/k3s/server/manifests/ are automatically applied by k3s. Traefik restarts with the new configuration within seconds.

Step 2: Set DNS records. Create three A records for your domain pointing to the public IP addresses of all three nodes.

Step 3: Create an Ingress. Create a file ingress.yaml:

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  namespace: demo
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
    traefik.ingress.kubernetes.io/router.tls.certresolver: le
spec:
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 80

Replace app.example.com with your actual domain.

bash
kubectl apply -f ingress.yaml

Traefik automatically obtains a Let's Encrypt certificate for the domain. The first request may take a few seconds while the certificate is issued in the background.

Access via IP (Without a Domain)

In this tutorial, we test access via the public IP for simplicity. For production applications, we recommend configuring a custom domain and using the Ingress with automatic HTTPS certificates (as described in the previous section).

Without a configured domain, you can test the Service directly via the public IP of any node:

bash
curl -s http://<PUBLIC_IP_NODE_1>

ServiceLB (the load balancer integrated into k3s) opens the Service ports on all nodes. Regardless of which of the three IPs you call, the traffic reaches your Pods.

Demonstrate a Rolling Update

A rolling update incrementally replaces the running Pods with a new version. During the update, Pods are always reachable.

bash
kubectl -n demo set image deployment/web nginx=nginx:stable-alpine
kubectl -n demo rollout status deployment/web

The first command changes the image tag. Kubernetes then starts new Pods with the updated image and only terminates the old ones once the new ones are ready. rollout status shows progress in real time. The result: zero-downtime update.


Second Workload: Stateful Application (PostgreSQL)

A database is the classic test for persistent storage. You will deploy PostgreSQL, write data, and prove that the data survives a Pod restart.

Create a PersistentVolumeClaim

Create a file pvc.yaml:

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: demo
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

ReadWriteOnce means: exactly one Pod may write to the volume at a time. For a single database instance, this is the correct mode. Longhorn automatically creates two replicas of this volume on different nodes.

Create the PostgreSQL Deployment

Create a file postgres-deployment.yaml:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16-alpine
          env:
            - name: POSTGRES_PASSWORD
              value: "changeme"
            - name: POSTGRES_DB
              value: "demo"
            - name: PGDATA
              value: "/var/lib/postgresql/data/pgdata"
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: postgres-data

The key points:

  • replicas: 1, since PostgreSQL does not support multi-writer filesystems. For high availability, there are specialized operators which are beyond the scope of this guide.
  • env sets the password and the initial database name. In a production environment, the password belongs in a Kubernetes Secret.
  • PGDATA sets the actual data directory to a subdirectory (pgdata). Longhorn volumes contain a lost+found directory at the root. PostgreSQL refuses to start in a directory that already contains files. The subdirectory avoids this issue.
  • volumeMounts mounts the Longhorn volume at /var/lib/postgresql/data.
  • volumes references the previously created PersistentVolumeClaim.

Create the PostgreSQL Service

Create a file postgres-service.yaml:

yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: demo
spec:
  selector:
    app: postgres
  ports:
    - port: 5432
      targetPort: 5432

Other Pods in the cluster can now reach the database at postgres.demo.svc.cluster.local:5432.

Apply and Verify

bash
kubectl apply -f pvc.yaml -f postgres-deployment.yaml -f postgres-service.yaml
kubectl -n demo get pods,pvc

Wait until the Pod shows Running and the PVC shows Bound.

Write Data

bash
kubectl -n demo exec -it deploy/postgres -- psql -U postgres -d demo -c "
  CREATE TABLE test (id SERIAL PRIMARY KEY, message TEXT, created_at TIMESTAMP DEFAULT NOW());
  INSERT INTO test (message) VALUES ('Kubernetes works');
"

This command opens a psql session in the PostgreSQL Pod, creates a table, and inserts a record.

Persistence Test: Delete the Pod

The decisive test: does the database survive a Pod restart?

bash
kubectl -n demo delete pod -l app=postgres

Kubernetes immediately detects that the desired state (1 Pod) is no longer met and starts a new Pod. Observe the process:

bash
kubectl -n demo get pods -w

Once the new Pod is Running, check the data:

bash
kubectl -n demo exec -it deploy/postgres -- psql -U postgres -d demo -c "SELECT * FROM test;"

The table and the record are present. The Longhorn volume survived the Pod restart and was automatically attached to the new Pod.


Testing Resilience

A Kubernetes cluster is only as good as its behavior during failures. This section proves that the cluster delivers the promised fault tolerance.

Pod Self-Healing

bash
kubectl -n demo delete pod -l app=web
kubectl -n demo get pods -w

Within seconds, Kubernetes creates new Pods to restore the desired state (replicas: 2). For end users, there is no interruption since the Service only routes traffic to running Pods.

Rolling Updates

As demonstrated in the previous section: the kubectl set image command replaces Pods incrementally. New Pods start before old ones are terminated. No window without a reachable instance.

Node Maintenance with drain

For planned maintenance (updates, hardware replacement), you can take a node out of the cluster without downtime:

bash
kubectl drain seed-k8s-03 --ignore-daemonsets --delete-emptydir-data

The flags in detail:

  • --ignore-daemonsets leaves system-wide DaemonSet Pods (like Longhorn or Flannel) running on the node. These are managed by their DaemonSets.
  • --delete-emptydir-data allows deletion of Pods with temporary emptyDir volumes.

Verify that all workload Pods have moved to the remaining nodes:

bash
kubectl get pods -n demo -o wide

After maintenance, make the node available again:

bash
kubectl uncordon seed-k8s-03

The node is now available again for new Pod assignments.

What Happens When Two of Three Nodes Fail?

As described in the architecture section, etcd uses the Raft consensus algorithm. A write operation (e.g., creating a new Deployment) is only confirmed once the majority of etcd members agree. With three nodes, the majority is two.

If one node fails, two nodes remain. Two out of three is a majority. The cluster continues to operate fully: Pods are rescheduled, Deployments can be created, scaling is possible.

If two nodes fail, one node remains. One out of three is not a majority. etcd cannot confirm new write operations. The cluster enters a read-only state:

  • Pods on the remaining node continue running and answering requests.
  • New Pods cannot be scheduled.
  • Configuration changes (new Deployments, scaling) are not possible.
  • As soon as a second node returns, quorum is restored and the cluster operates normally again.

The formula for the number of tolerated failures is: (n - 1) / 2, where n is the number of nodes. Three nodes tolerate one failure. Five nodes tolerate two. This is why Kubernetes is always run with an odd number of server nodes.


kubectl Basics

kubectl is the central tool for managing the cluster. Here is an overview of the most important commands as a reference.

View Resources

bash
kubectl get pods                    # All Pods in the default namespace
kubectl get pods -n demo            # Pods in the "demo" namespace
kubectl get pods -A                 # Pods in all namespaces
kubectl get all -n demo             # Pods, Services, Deployments at a glance

Details and Troubleshooting

bash
kubectl describe pod <pod-name> -n demo    # Detailed information and events
kubectl logs <pod-name> -n demo            # Application output
kubectl logs <pod-name> -n demo -f         # Real-time log stream

describe shows the Scheduler's recent actions under "Events". If a Pod is not starting, the reasons appear here (missing images, insufficient resources, volume issues).

Exec into a Pod

bash
kubectl exec -it <pod-name> -n demo -- /bin/sh

This opens a shell inside the container. Useful for debugging, connectivity tests, or manual inspections.

Scaling

bash
kubectl scale deployment web -n demo --replicas=3

Changes the number of Pods immediately. Kubernetes starts or stops Pods until the desired state is reached.

Delete Resources

bash
kubectl delete -f deployment.yaml       # Delete everything defined in the file
kubectl delete pod <pod-name> -n demo   # Delete a single Pod
kubectl delete namespace demo           # Delete entire namespace including all resources

Namespaces

bash
kubectl create namespace production
kubectl get namespaces

Namespaces separate environments (e.g., staging and production) or different applications from each other.

Labels and Selectors

bash
kubectl get pods -n demo -l app=web          # Only Pods with label app=web
kubectl get pods -n demo -l app=postgres     # Only Pods with label app=postgres

Labels are key-value pairs that Kubernetes uses to link related resources. Services, Deployments, and Ingresses use labels to find their target Pods.

kubectl vs. Helm vs. Raw YAML

ApproachUse Case
kubectl apply -fIndividual manifests, simple applications, learning
Helm ChartsComplex applications with many resources, configurable via Values
Raw YAML in GitGitOps workflows where a repository describes the desired cluster state

Useful Shortcut

Add to your shell configuration:

bash
echo "alias k=kubectl" >> ~/.bashrc
source ~/.bashrc

From now on, k get pods is enough instead of kubectl get pods.


Backup and Maintenance

A cluster without a backup strategy is a cluster waiting for data loss. Three independent layers protect your Kubernetes cluster.

etcd Snapshots

etcd stores the entire cluster state: Deployments, Services, Secrets, ConfigMaps. k3s automatically creates periodic snapshots:

bash
ls -la /var/lib/rancher/k3s/server/db/snapshots/

For a manual snapshot before maintenance:

bash
k3s etcd-snapshot save --name manual-backup

The snapshot is then located in the same directory. Store critical snapshots additionally on an external system.

Longhorn Snapshots and Backups

Longhorn creates snapshots at the volume level that can be managed via the Longhorn UI or through kubectl. For a complete backup strategy, Longhorn can export volumes to S3-compatible storage. The configuration is done in the Longhorn UI under "Settings > Backup Target".

dataforest Cloud Backups

For complete server backups, the dataforest Cloud offers an add-on option that backs up entire Seeds at the infrastructure level. These backups are independent of Kubernetes and capture the complete system including all local data.

Updating the k3s Version

The k3s install script overwrites the systemd unit on every run. Flags passed during the initial curl | sh (--cluster-init, --node-ip, --flannel-iface, etc.) are lost if not provided again. The safest approach is to store all configuration in a file that the install script does not touch.

Create the file /etc/rancher/k3s/config.yaml on each node with the respective configuration. Example for Node 1:

yaml
cluster-init: true
token: <YOUR_TOKEN>
node-ip: 10.222.0.1
node-external-ip: <PUBLIC_IP_NODE_1>
flannel-iface: wg0
flannel-backend: vxlan
tls-san:
  - <PUBLIC_IP_NODE_1>
  - 10.222.0.1

For Node 2 and 3, replace cluster-init: true with server: https://10.222.0.1:6443 and adjust node-ip, node-external-ip, and tls-san accordingly.

Once the configuration is in config.yaml, update k3s node by node:

bash
kubectl drain seed-k8s-01 --ignore-daemonsets --delete-emptydir-data
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=stable sh -
kubectl uncordon seed-k8s-01

drain moves all workloads to the remaining nodes. curl ... | sh installs the latest stable k3s version and automatically reads the configuration from config.yaml. uncordon makes the node available again. Repeat the process for seed-k8s-02 and seed-k8s-03.

Check the cluster version after the update:

bash
kubectl get nodes

All nodes should display the same version.

Expanding the WireGuard Mesh

If you add a fourth node, its WireGuard peer configuration must be added to all existing nodes. Each node needs a [Peer] block with the public key and endpoint address of the new node. Conversely, the new node receives the peer blocks of all existing nodes.


Next Steps

The cluster is running, storage is set up, workloads are deployed. For production operation, there are additional building blocks that build on this foundation:

Helm Charts for complex applications: Instead of manually configuring each application via YAML, community charts provide ready-made packages for databases, message queues, monitoring stacks, and more. A single helm install command deploys a fully configured application.

GitOps with Flux or ArgoCD: A Git repository becomes the single source of truth for the cluster state. Changes are reviewed via pull requests and automatically applied to the cluster.

Monitoring with kube-prometheus-stack: Prometheus collects metrics from all nodes and Pods. Grafana visualizes them in dashboards. Alertmanager notifies on issues. The kube-prometheus-stack installs everything together via Helm chart.

Cert-Manager as an alternative: Traefik's built-in ACME resolver is sufficient for simple setups. Cert-Manager additionally offers wildcard certificates, DNS-01 challenges, and automatic certificate renewal as a standalone Kubernetes resource.

Horizontal Pod Autoscaler: Automatically scales Deployments based on CPU or memory utilization. More traffic means more Pods, less traffic means less resource consumption.

Adding more nodes: The cluster can be expanded with a fourth or fifth node at any time. More nodes mean more compute capacity and higher fault tolerance (five nodes tolerate two simultaneous failures).

Learn more about the benefits of your own Kubernetes cluster on our solutions page.

Summary

This guide set up the following components:

  • Three servers connected via an encrypted WireGuard mesh (10.222.0.0/24)
  • k3s as the Kubernetes distribution with a highly available control plane (embedded etcd, 3 server nodes)
  • Longhorn as replicated storage (2 replicas per volume)
  • Traefik as ingress controller with optional Let's Encrypt integration
  • A stateless application (nginx, 2 replicas) and a stateful application (PostgreSQL with persistent volume)

The cluster tolerates the failure of one server without interruption. Pods are automatically redistributed, data remains available on the replicas. Rolling updates deploy new versions without downtime.

Ready to get started?

Create your first Seed and start deploying in minutes.

Back to overview