High Availability Kubernetes
After you have created a kubernetes cluster using the getting started guide , we can take a look at a more complex example that involves a highly available control plane nodes and dedicated worker nodes. The result will be similar to a Typhoon cluster, but this version will be a little more “vanilla” and will run on libvirt VMs, which is not mentioned in their documentation as of time of writing.
Architecture
This documentation will walk you through creating 5 VMs with the following properties:
VM Name | Hostname | IP | Role | Runtime |
---|---|---|---|---|
virt-node1-ha | node1-cp | 192.168.122.31 | Control Plane | CRIO |
virt-node2-ha | node2-cp | 192.168.122.32 | Control Plane | CRIO |
virt-node3-ha | node3-cp | 192.168.122.33 | Control Plane | CRIO |
virt-node4-ha | node4-worker | 192.168.122.34 | Worker | CRIO |
virt-node5-ha | node5-worker | 192.168.122.35 | Worker | CRIO |
Control Plane
Each control plane node will be responsible for running the regular Kubernetes
control plane components, as well as HAProxy and Keepalived. The api-server
with have a VIP of 192.168.122.30
. HAProxy and Keepalived can be installed
through various methods, but this article will show you how to install them
using
Quadlet
through Ignition.
Bootstrap Sequence
We will install the necessary systemd files to use kubeadm to bootstrap a
cluster. This means essentially that node1-cp
will run kubeadm init ...
,
node2-cp
and node3-cp
will run kubeadm join --control-plane ...
, and the
worker nodes will run kubeadm join ...
. This is a
very common pattern
for bootstrapping a kubernetes cluster.
Typically, one node will call kubeadm init ...
, then kubeadm print join-token
, which then can be copied between nodes. The problem with this
approach is that it requires manual intervention at install-time which sort-of
defeats the purpose of Ignition. Luckily, we can provide all the information
ahead of time so nothing needs to be copied between nodes and the cluster
installation can still be secure and hands-off.
In order to do this, there are a few things that we need to pre-generate so
that we can pass them to the kubeadm
commands via ignition. These are:
- /etc/kubernetes/pki/ca.crt
- /etc/kubernetes/pki/ca.key
- /etc/kubernetes/pki/sa.key
- /etc/kubernetes/pki/sa.pub
- /etc/kubernetes/pki/front-proxy-ca.crt
- /etc/kubernetes/pki/front-proxy-ca.key
- /etc/kubernetes/pki/etcd/ca.crt
- /etc/kubernetes/pki/etcd/ca.key
We also will generate an initial join token and hash of the CA and provide that
in an EnvironmentFile called /etc/kubernetes/certs.conf
.
Fortunately, this can all be created via script, called
generate-k8s-certs.sh
. Before we talk about the script, we should
talk about the overall file structure.
Files
Building the configuration files and VMs is not a small feat, so we should create a dedicated workspace for this.
mkdir -p butane certs ign scripts
Now we can get started looking at our butane configs.
Butane Structure
All of our butane files will be in the butane
directory. Each VM will have
its own butane file that will have the same name as the VM and it will depend
on one or more other butane file. In order to get dependencies correct, the
base butane files will be prefixed by 2 numbers and they will be built in
numerical order. This way, our butane files can depend on any file sorted
before it, but not after. Files named 00_base*
should not depend on any
other butane config. The final butane configs for the VMs should only depend
on these base configs.
> tree butane/
butane/
├── 00_base-k8s-token.yaml
├── 00_base-k8s.yaml
├── 10_base-ha.yaml
├── 20_base-ha-cp.yaml
├── 30_base-ha-init.yaml
├── 31_base-ha-join-cp.yaml
├── 32_base-ha-join.yaml
├── node1-ha.yaml
├── node2-ha.yaml
├── node3-ha.yaml
├── node4-ha.yaml
└── node5-ha.yaml
This already seems a bit unwieldy, so let’s create a Makefile
to help build everything.
Makefile
Makefile
# Options are stable | alpha | beta
CHANNEL := stable
# Make sure to https://www.flatcar.org/docs/latest/installing/vms/qemu/#gentoo
TOKEN_FILE = butane/00_base-k8s-token.yaml
BUTANE = $(TOKEN_FILE) $(shell find butane -type f -not -name 'node*' | sort -h) $(shell find butane -type f -name "node*.yaml")
IGN = $(patsubst butane/%.yaml,ign/%.ign,$(BUTANE))
ifeq ($(shell command -v podman 2> /dev/null),)
CR=docker
else
CR=podman
endif
# Remove files that we can easily recreate
clean:
rm -f flatcar_production_qemu_image.img.sig config.json ign/* /var/lib/libvirt/images/flatcar/*.ign
# Remove files that take longer to recreate/download
cleanclean: clean
rm -f flatcar_production_qemu_image.img.fresh Flatcar_Image_Signing_Key.asc $(TOKEN_FILE)
.PHONY: cleanclean clean
Flatcar_Image_Signing_Key.asc:
curl -L -O https://www.flatcar.org/security/image-signing-key/Flatcar_Image_Signing_Key.asc
verify-gpg: Flatcar_Image_Signing_Key.asc
gpg --import --keyid-format LONG Flatcar_Image_Signing_Key.asc
.PHONY: verify-gpg
# Download and verify the image that we'll use as a base
flatcar_production_qemu_image.img.fresh:
wget "https://${CHANNEL}.release.flatcar-linux.net/amd64-usr/current/flatcar_production_qemu_image.img"
wget "https://${CHANNEL}.release.flatcar-linux.net/amd64-usr/current/flatcar_production_qemu_image.img.sig"
gpg --verify flatcar_production_qemu_image.img.sig
cp -f flatcar_production_qemu_image.img flatcar_production_qemu_image.img.fresh
# Create vm disks from the base image
%.qcow2: flatcar_production_qemu_image.img.fresh
mkdir -p /var/lib/libvirt/images/flatcar
qemu-img create -f qcow2 -F qcow2 -b $(shell pwd)/flatcar_production_qemu_image.img.fresh /var/lib/libvirt/images/flatcar/$@
# Convert butane configs to ignition configs
ign/%.ign:
mkdir -p ign
$(CR) run --rm -i -v $(shell pwd)/butane:/config/butane:ro -v $(shell pwd)/ign:/config/ign quay.io/coreos/butane:latest --pretty -d /config < butane/$*.yaml > $@
# Create VMs
virt-node%: flatcar_production_qemu_image-%.qcow2 $(IGN)
cp -fv ign/*.ign /var/lib/libvirt/images/flatcar
virt-install --connect qemu:///system --import --name $@ --ram 4096 --vcpus 4 --os-variant=generic --network network=default,model=virtio --disk path=/var/lib/libvirt/images/flatcar/flatcar_production_qemu_image-$*.qcow2,format=qcow2,bus=virtio --vnc --qemu-commandline="-fw_cfg name=opt/org.flatcar-linux/config,file=/var/lib/libvirt/images/flatcar/node$*.ign" --noautoconsole
# Don't delete intermediate files
.PRECIOUS: %.qcow2 ign/%.ign
# Stop and Remove VMs
rm-virt-node%:
virsh destroy virt-node$* || :
virsh undefine virt-node$*
# Create token butane file
butane/00_base-k8s-token.yaml:
./scripts/generate-k8s-certs.sh
# Create all ignition configs
new-cluster: clean butane/00_base-k8s-token.yaml $(IGN)
# Create all VMs
ha: virt-node1-ha virt-node2-ha virt-node3-ha virt-node4-ha virt-node5-ha
# Stop and Remove all VMs
rm-ha: rm-virt-node1-ha rm-virt-node2-ha rm-virt-node3-ha rm-virt-node4-ha rm-virt-node5-ha
.PHONY: new-cluster ha rm-ha
This makefile is maybe a bit long, but we’ll go through each piece in due time. First, we should look at these lines:
TOKEN_FILE = butane/00_base-k8s-token.yaml
BUTANE = $(TOKEN_FILE) $(shell find butane -type f -not -name 'node*' | sort -h) $(shell find butane -type f -name "node*.yaml")
IGN = $(patsubst butane/%.yaml,ign/%.ign,$(BUTANE))
This is specifying our special token file that will contain all of our certificates and such, as well as specifying the order in which our butane files should be built. Then, it uses path substitution to generate the name of the resulting ignition files.
# Convert butane configs to ignition configs
ign/%.ign:
mkdir -p ign
$(CR) run --rm -i -v $(shell pwd)/butane:/config/butane:ro -v $(shell pwd)/ign:/config/ign quay.io/coreos/butane:latest --pretty -d /config < butane/$*.yaml > $@
# ...
# Create token butane file
butane/00_base-k8s-token.yaml:
./scripts/generate-k8s-certs.sh
These lines show us how to to build each ignition file using a container. We also have to specify how to build our token file using a shell script.
Token File
Generating the token file is a fun exercise using openssl
, yaml, and bash
heredocs. In case you actually value your time, the script is provided here in
full. As always, please make sure you understand what commands you are running
before running a script you found on the internet.
scripts/generate-k8s-certs.sh
#!/usr/bin/env bash
set -e
# Set file paths
cert_dir="./certs"
output_yaml="butane/00_base-k8s-token.yaml"
# Create cert directory
mkdir -p "$cert_dir"
# Generate the token
token=$(echo "$(tr -dc 'a-z0-9' < /dev/urandom | head -c 6).$(tr -dc 'a-z0-9' < /dev/urandom | head -c 16)")
encoded_token=$(echo -n "$token" | base64)
# Generate Kubernetes CA (used for cluster signing)
openssl req -x509 -newkey rsa:2048 -keyout "$cert_dir/ca.key" -out "$cert_dir/ca.crt" -days 365 -nodes -subj "/CN=k8s-ca"
# Generate Front Proxy CA (used for API server aggregation)
openssl req -x509 -newkey rsa:2048 -keyout "$cert_dir/front-proxy-ca.key" -out "$cert_dir/front-proxy-ca.crt" -days 365 -nodes -subj "/CN=front-proxy-ca"
# Generate Service Account Signing Key
openssl genrsa -out "$cert_dir/sa.key" 2048
openssl rsa -in "$cert_dir/sa.key" -pubout -out "$cert_dir/sa.pub"
# Generate API server certificate (signed by Kubernetes CA)
openssl req -new -newkey rsa:2048 -keyout "$cert_dir/apiserver.key" -out "$cert_dir/apiserver.csr" -nodes -subj "/CN=kube-apiserver"
openssl x509 -req -in "$cert_dir/apiserver.csr" -CA "$cert_dir/ca.crt" -CAkey "$cert_dir/ca.key" -CAcreateserial -out "$cert_dir/apiserver.crt" -days 365
# Generate etcd CA (if using external etcd)
openssl req -x509 -newkey rsa:2048 -keyout "$cert_dir/etcd-ca.key" -out "$cert_dir/etcd-ca.crt" -days 365 -nodes -subj "/CN=etcd-ca"
indent=" "
# Encode certificates for YAML
ca_crt=$(sed "s/^/${indent}/" "$cert_dir/ca.crt")
ca_key=$(sed "s/^/${indent}/" "$cert_dir/ca.key")
front_proxy_ca_crt=$(sed "s/^/${indent}/" "$cert_dir/front-proxy-ca.crt")
front_proxy_ca_key=$(sed "s/^/${indent}/" "$cert_dir/front-proxy-ca.key")
sa_key=$(sed "s/^/${indent}/" "$cert_dir/sa.key")
sa_pub=$(sed "s/^/${indent}/" "$cert_dir/sa.pub")
etcd_ca_crt=$(sed "s/^/${indent}/" "$cert_dir/etcd-ca.crt")
etcd_ca_key=$(sed "s/^/${indent}/" "$cert_dir/etcd-ca.key")
# Compute CA hash
ca_hash="sha256:$(openssl x509 -pubkey -in "$cert_dir/ca.crt" | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //')"
encoded_base64_ca_hash=$(echo -n "$ca_hash" | base64 -w 0)
# Write the header to the output YAML file
cat > "$output_yaml" <<-EOF
---
# This is generated using generate-k8s-certs.sh
variant: flatcar
version: 1.1.0
storage:
files:
- path: /etc/kubernetes/pki/ca.crt
contents:
inline: |
$ca_crt
- path: /etc/kubernetes/pki/ca.key
contents:
inline: |
$ca_key
- path: /etc/kubernetes/pki/front-proxy-ca.crt
contents:
inline: |
$front_proxy_ca_crt
- path: /etc/kubernetes/pki/front-proxy-ca.key
contents:
inline: |
$front_proxy_ca_key
- path: /etc/kubernetes/pki/sa.key
contents:
inline: |
$sa_key
- path: /etc/kubernetes/pki/sa.pub
contents:
inline: |
$sa_pub
- path: /etc/kubernetes/pki/etcd/ca.crt
contents:
inline: |
$etcd_ca_crt
- path: /etc/kubernetes/pki/etcd/ca.key
contents:
inline: |
$etcd_ca_key
- path: /etc/kubernetes/certs.conf
contents:
inline: |
K8S_TOKEN='$token'
K8S_HASH='$ca_hash'
EOF
echo "Kubernetes certificates have been generated successfully!"
echo "YAML file '$output_yaml' has been successfully overwritten!"
Now we we can save this script in scripts/generate-k8s-certs.sh
run make butane/00_base-k8s-token.yaml
and check out our SSL goodies. Treat this file
like a password. If it gets made public (for anything other than test
clusters) make sure you recreate them. It’s usually recommended to not provide
sensitive information via Ignition configuration files. This is only for demo
purposes.
Butane Content
Now that we have the basic structure in place, we can get started filling out the content of our butane files. We will first look at the “base” configs, then look at node specific configs.
Base
butane/00_base-k8s.yaml
variant: flatcar
version: 1.1.0
storage:
links:
- path: /etc/extensions/docker-flatcar.raw
target: /dev/null
overwrite: true
- path: /etc/extensions/containerd-flatcar.raw
target: /dev/null
overwrite: true
- target: /opt/extensions/crio/crio-v1.31.3-x86-64.raw
path: /etc/extensions/crio.raw
hard: false
- target: /opt/extensions/kubernetes/kubernetes-v1.31.3-x86-64.raw
path: /etc/extensions/kubernetes.raw
hard: false
files:
- path: /etc/flatcar/enabled-sysext.conf
contents:
inline: |
zfs
podman
mode: 0644
- path: /etc/sysupdate.d/noop.conf
contents:
source: https://github.com/flatcar/sysext-bakery/releases/download/latest/noop.conf
- path: /opt/extensions/crio/crio-v1.31.3-x86-64.raw
mode: 0644
contents:
source: https://github.com/flatcar/sysext-bakery/releases/download/latest/crio-v1.31.3-x86-64.raw
- path: /etc/sysupdate.crio.d/crio.conf
contents:
source: https://github.com/flatcar/sysext-bakery/releases/download/latest/crio.conf
- path: /etc/sysupdate.kubernetes.d/kubernetes-v1.31.conf
contents:
source: https://github.com/flatcar/sysext-bakery/releases/download/latest/kubernetes-v1.31.conf
- path: /opt/extensions/kubernetes/kubernetes-v1.31.3-x86-64.raw
contents:
source: https://github.com/flatcar/sysext-bakery/releases/download/latest/kubernetes-v1.31.3-x86-64.raw
- path: /opt/bin/cilium.tar
mode: 0755
contents:
source: https://github.com/cilium/cilium-cli/releases/download/v0.16.24/cilium-linux-amd64.tar.gz
compression: gzip
systemd:
units:
- name: systemd-sysupdate.timer
enabled: true
- name: locksmithd.service
# NOTE: To coordinate the node reboot in this context, we recommend to use Kured.
mask: true
- name: systemd-sysupdate.service
dropins:
- name: kubernetes.conf
contents: |
[Service]
ExecStartPre=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/kubernetes.raw > /tmp/kubernetes"
ExecStartPre=/usr/lib/systemd/systemd-sysupdate -C kubernetes update
ExecStartPost=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/kubernetes.raw > /tmp/kubernetes-new"
ExecStartPost=/usr/bin/sh -c "if ! cmp --silent /tmp/kubernetes /tmp/kubernetes-new; then touch /run/reboot-required; fi"
- name: crio.conf
contents: |
[Service]
ExecStartPre=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/crio.raw > /tmp/crio"
ExecStartPre=/usr/lib/systemd/systemd-sysupdate -C crio update
ExecStartPost=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/crio.raw > /tmp/crio-new"
ExecStartPost=/usr/bin/sh -c "if ! cmp --silent /tmp/crio /tmp/crio-new; then touch /run/reboot-required; fi"
- name: cilium-setup.service
enabled: true
contents: |
[Unit]
Description=Download and install Cilium
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'tar -C /opt/bin -xf /opt/bin/cilium.tar'
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
- name: zfs-setup.service
enabled: true
contents: |
[Unit]
Description=Load zfs kernel modules
After=local-fs-pre.target
Wants=local-fs-pre.target
[Service]
Type=oneshot
ExecStart=/usr/bin/modprobe zfs
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
This base config does a couple of things. It disables the docker
and
containerd
sysexts, installs kubernetes
and crio
sysexts, and enables
the zfs
and podman
sysexts. It also creates a systemd unit to download the
cilium cli. This cli may only be needed on the first node, but we install it
on all hosts to make day 2 operations easier. We also disable locksmithd
to
prepare for using
Kured
.
butane/10_base-ha.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/00_base-k8s.ign
- local: ign/00_base-k8s-token.ign
storage:
files:
- path: /etc/kubernetes/kubeadm-env.conf
mode: 0644
contents:
inline: |
K8S_APISERVER_URL="192.168.122.30"
- path: /etc/containers/policy.json
contents:
inline: |
{
"default": [
{
"type": "insecureAcceptAnything"
}
]
}
We extend the first config by including it in this config, along with the token
file. We also set any environment variales that are shared across hosts, in
this case K8S_APISERVER_URL
, as well as set up some default podman/crio
policies.
butane/20_base-ha-cp.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/10_base-ha.ign
storage:
files:
- path: /etc/keepalived/check_apiserver.sh
mode: 0755
contents:
inline: |
#!/bin/bash
curl --silent --max-time 2 --insecure https://localhost:6443/healthz -o /dev/null || exit 1
- path: /etc/containers/systemd/keepalived.container
contents:
inline: |
[Unit]
Description=Keepalived Container
Requires=crio.service
After=crio.service
[Container]
Image=docker.io/osixia/keepalived:latest
AutoUpdate=registry
AddCapability=NET_ADMIN
AddCapability=NET_BROADCAST
AddCapability=NET_RAW
PodmanArgs=--privileged
Exec=--copy-service
Volume=/etc/keepalived/keepalived.conf:/container/service/keepalived/assets/keepalived.conf:ro
Volume=/etc/keepalived/check_apiserver.sh:/etc/keepalived/check_apiserver.sh:ro
Network=host
[Service]
Restart=always
RestartSec=30
[Install]
WantedBy=multi-user.target default.target
- path: /etc/containers/systemd/haproxy.container
contents:
inline: |
[Unit]
Description=HAProxy Container
Requires=keepalived.service
After=keepalived.service
[Container]
Image=docker.io/library/haproxy:alpine
AutoUpdate=registry
Network=host
Volume=/etc/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
[Service]
Restart=always
RestartSec=30
[Install]
WantedBy=multi-user.target default.target
- path: /etc/haproxy/haproxy.cfg
contents:
inline: |
# /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log stdout format raw local0
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 35s
timeout server 35s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the control plane nodes
#---------------------------------------------------------------------
frontend apiserver
bind 192.168.122.30:6444
mode tcp
option tcplog
default_backend apiserverbackend
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserverbackend
mode tcp
balance roundrobin
server node1-cp 192.168.122.31:6443 check verify none
server node2-cp 192.168.122.32:6443 check verify none
server node3-cp 192.168.122.33:6443 check verify none
systemd:
units:
- name: ip_vs-setup.service
enabled: true
contents: |
[Unit]
Description=Load ip_vs kernel modules
After=local-fs-pre.target
Wants=local-fs-pre.target
[Service]
Type=oneshot
ExecStart=/usr/bin/modprobe ip_vs
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
Here we are defining services that should be running on each of the control
plane nodes. We specify keepalived and haproxy containers that should be
running via Quadlet. These need to be running before kubernetes so its easier
to define them here. We also specify the haproxy config since it will be the
same across all nodes, but we don’t specify the keepalived config. That will
be different per node because it will include the nodes ip address and
rankings. Finally we make sure to run modprobe ip_vs
which is required when
running keepalived.
butane/30_base-ha-init.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/20_base-ha-cp.ign
storage:
files:
systemd:
units:
- name: kubeadm.service
enabled: false
contents: |
[Unit]
StartLimitInterval=200
StartLimitBurst=5
Description=Kubeadm service
Requires=haproxy.service
After=haproxy.service
ConditionPathExists=!/etc/kubernetes/kubelet.conf
[Service]
EnvironmentFile=/etc/kubernetes/kubeadm-env.conf
EnvironmentFile=/etc/kubernetes/certs.conf
# We skip kube-proxy because we install cilium
ExecStartPre=/usr/bin/kubeadm init --token ${K8S_TOKEN} --cri-socket=unix:///var/run/crio/crio.sock --skip-phases=addon/kube-proxy --control-plane-endpoint ${K8S_APISERVER_URL}:6444 --upload-certs
ExecStartPre=/usr/bin/mkdir /home/core/.kube
ExecStartPre=/usr/bin/cp /etc/kubernetes/admin.conf /home/core/.kube/config
ExecStart=/usr/bin/chown -R core:core /home/core/.kube
[Install]
WantedBy=multi-user.target
- name: cilium-install-cluster.service
enabled: true
contents: |
[Unit]
Description=Install Cilium as Kubernetes CNI
Requires=kubeadm.service
After=kubeadm.service
[Service]
Environment=KUBECONFIG='/home/core/.kube/config'
ExecStart=/opt/bin/cilium install --set kubeProxyReplacement=true --namespace=kube-system
[Install]
WantedBy=multi-user.target
This service defines what to do when intializing the kubernetes cluster. This
only needs to be run on one node. Essentially we are calling kubeadm init ...
and cilium install ...
, but its worth pointing out the dependencies
here. We specify that kubeadm should only start AFTER the haproxy service
(which is generated after we specify the haproxy container) because that is
load-balancing our api-server VIP that’s defined by keepalived. We also don’t
install Cilium to the cluster until there is a cluster to install to. So
cilium can’t start until kubeadm starts, kubeadm can’t start until haproxy
starts, and haproxy can’t start until keepalived starts. This ordering is
important for an HA control plane.
In the kubeadm init ...
command, we also specify variables that are defined
in EnvironmentFile=/etc/kubernetes/kubeadm-env.conf
and
EnvironmentFile=/etc/kubernetes/certs.conf
. This ensures we are using the
same secrets across all nodes, and since we pre-generated them, no information
needs to be copied from node to node.
butane/31_base-ha-join-cp.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/20_base-ha-cp.ign
storage:
files:
systemd:
units:
- name: kubeadm.service
enabled: true
contents: |
[Unit]
StartLimitInterval=200
StartLimitBurst=5
Description=Kubeadm service
Requires=crio.service
After=crio.service
ConditionPathExists=!/etc/kubernetes/kubelet.conf
[Service]
Restart=always
RestartSec=30
EnvironmentFile=/etc/kubernetes/certs.conf
EnvironmentFile=/etc/kubernetes/kubeadm-env.conf
ExecStartPre=/usr/bin/kubeadm config images pull
# Wait until first control plane is up
ExecStartPre=/usr/bin/sh -c "while ! nc -z ${K8S_APISERVER_URL} 6444; do sleep 5; done"
ExecStart=/usr/bin/kubeadm join ${K8S_APISERVER_URL}:6444 --token ${K8S_TOKEN} --discovery-token-ca-cert-hash ${K8S_HASH} --ignore-preflight-errors=FileAvailable--etc-kubernetes-pki-ca.crt --cri-socket=unix:///var/run/crio/crio.sock --control-plane --certificate-key ${K8S_CERT_KEY}
[Install]
WantedBy=multi-user.target
This config specifies how to join the cluster as a control plane node using the
information in the same EnvironmentFile
s. We also specify a ExecStartPre
which essentially just sleeps until the first nodes api-server is available.
butane/32_base-ha-join.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/10_base-ha.ign
systemd:
units:
- name: kubeadm.service
enabled: true
contents: |
[Unit]
StartLimitInterval=200
StartLimitBurst=5
Description=Kubeadm service
Requires=crio.service
After=crio.service
ConditionPathExists=!/etc/kubernetes/kubelet.conf
[Service]
Restart=always
RestartSec=30
EnvironmentFile=/etc/kubernetes/certs.conf
EnvironmentFile=/etc/kubernetes/kubeadm-env.conf
ExecStartPre=/usr/bin/kubeadm config images pull
ExecStartPre=/usr/bin/sh -c "while ! nc -z ${K8S_APISERVER_URL} 6444; do sleep 5; done"
ExecStart=/usr/bin/kubeadm join ${K8S_APISERVER_URL}:6444 --token ${K8S_TOKEN} --discovery-token-ca-cert-hash ${K8S_HASH} --ignore-preflight-errors=FileAvailable--etc-kubernetes-pki-ca.crt --cri-socket=unix:///var/run/crio/crio.sock
[Install]
WantedBy=multi-user.target
This is similar to the previous command, but we join as a worker node, not a control plane.
Node
butane/node1-ha.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/30_base-ha-init.ign
storage:
files:
- path: /etc/hostname
overwrite: true
contents:
inline: node1-cp
- path: /etc/systemd/network/00-eth0.network
contents:
inline: |
[Match]
Name=eth0
[Network]
Address=192.168.122.31/24
Gateway=192.168.122.1
DNS=192.168.122.1
- path: /etc/keepalived/keepalived.conf
mode: 0644
contents:
inline: |
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 101
authentication {
auth_type PASS
auth_pass kubevip
}
virtual_ipaddress {
192.168.122.30/24
}
track_script {
check_apiserver
}
}
Here we define the first node. We are including everything from the
ign/30_base-ha-init.ign
config, as well as defining our hostname and static
IP. We also provide the keepalived config that specifies this node as the
MASTER
.
butane/node2-ha.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/31_base-ha-join-cp.ign
storage:
files:
- path: /etc/hostname
overwrite: true
contents:
inline: node2-cp
- path: /etc/systemd/network/00-eth0.network
contents:
inline: |
[Match]
Name=eth0
[Network]
Address=192.168.122.32/24
Gateway=192.168.122.1
DNS=192.168.122.1
- path: /etc/keepalived/keepalived.conf
mode: 0644
contents:
inline: |
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
authentication {
auth_type PASS
auth_pass kubevip
}
virtual_ipaddress {
192.168.122.30/24
}
track_script {
check_apiserver
}
}
butane/node3-ha.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/31_base-ha-join-cp.ign
storage:
files:
- path: /etc/hostname
overwrite: true
contents:
inline: node3-cp
- path: /etc/systemd/network/00-eth0.network
contents:
inline: |
[Match]
Name=eth0
[Network]
Address=192.168.122.33/24
Gateway=192.168.122.1
DNS=192.168.122.1
- path: /etc/keepalived/keepalived.conf
mode: 0644
contents:
inline: |
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 99
authentication {
auth_type PASS
auth_pass kubevip
}
virtual_ipaddress {
192.168.122.30/24
}
track_script {
check_apiserver
}
}
These two files are similar in that they define the other two control plane nodes. The main differences are the hostnames, IPs, and the priority in the keepalived config.
butane/node4-ha.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/32_base-ha-join.ign
storage:
files:
- path: /etc/hostname
overwrite: true
contents:
inline: node4-worker
- path: /etc/systemd/network/00-eth0.network
contents:
inline: |
[Match]
Name=eth0
[Network]
Address=192.168.122.34/24
Gateway=192.168.122.1
DNS=192.168.122.1
butane/node5-ha.yaml
variant: flatcar
version: 1.1.0
ignition:
config:
merge:
- local: ign/32_base-ha-join.ign
storage:
files:
- path: /etc/hostname
overwrite: true
contents:
inline: node5-worker
- path: /etc/systemd/network/00-eth0.network
contents:
inline: |
[Match]
Name=eth0
[Network]
Address=192.168.122.35/24
Gateway=192.168.122.1
DNS=192.168.122.1
These final two configs specify the worker nodes. Each inherits
ign/32_base-ha-join.ign
, so we only need to specify the hostname and IP
addresses.
Running
Now that we have all of our files defined, we can actually spin up the cluster. We start by downloading the VM image, generating the configs, and running the VMs.
NOTE: You may want to run these make commands as root, depending on how your permissions are setup with libvirt, but again, make sure you understand what the commands are doing before you run them. There is no lifeguard at the pool.
Running make verify-gpg
should download flatcars GPG key and import it.
Then, running make new-cluster
should generate our token file and generate
all the Ignition files from the butane configs. Finally, running make ha
will create and run each VM. This involves downloading a base VM image,
verifying it with GPG, cloning the base image to create a working image, and
copying the image (and Igntion files) to /var/lib/libvirt/images/flatcar
.
After that, we should have a running kubernetes cluster. We can shut
everything down by running make rm-ha
or connect to a node via virsh connect virt-node1-ha
. Once we are in the first node we can run commands like
kubectl get nodes
or cilium status
to see the status of the cluster. Do
note, however, that this may take some time for everything to settle. Each
node will be downloading things from the internet and automatically registering
itself so you may not see all the nodes or see cilium in an error state for a
few minutes.
Once everything has settled, you can create an example deployment by running
kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/controllers/nginx-deployment.yaml
.
That’s it! You now have a fully functional, highly available kubernetes cluster running on your VMs!