High Availability Kubernetes

    After you have created a kubernetes cluster using the getting started guide , we can take a look at a more complex example that involves a highly available control plane nodes and dedicated worker nodes. The result will be similar to a Typhoon cluster, but this version will be a little more “vanilla” and will run on libvirt VMs, which is not mentioned in their documentation as of time of writing.

    Architecture

    This documentation will walk you through creating 5 VMs with the following properties:

    VM Name Hostname IP Role Runtime
    virt-node1-ha node1-cp 192.168.122.31 Control Plane CRIO
    virt-node2-ha node2-cp 192.168.122.32 Control Plane CRIO
    virt-node3-ha node3-cp 192.168.122.33 Control Plane CRIO
    virt-node4-ha node4-worker 192.168.122.34 Worker CRIO
    virt-node5-ha node5-worker 192.168.122.35 Worker CRIO

    Control Plane

    Each control plane node will be responsible for running the regular Kubernetes control plane components, as well as HAProxy and Keepalived. The api-server with have a VIP of 192.168.122.30. HAProxy and Keepalived can be installed through various methods, but this article will show you how to install them using Quadlet through Ignition.

    Bootstrap Sequence

    We will install the necessary systemd files to use kubeadm to bootstrap a cluster. This means essentially that node1-cp will run kubeadm init ..., node2-cp and node3-cp will run kubeadm join --control-plane ..., and the worker nodes will run kubeadm join .... This is a very common pattern for bootstrapping a kubernetes cluster.

    Typically, one node will call kubeadm init ..., then kubeadm print join-token, which then can be copied between nodes. The problem with this approach is that it requires manual intervention at install-time which sort-of defeats the purpose of Ignition. Luckily, we can provide all the information ahead of time so nothing needs to be copied between nodes and the cluster installation can still be secure and hands-off.

    In order to do this, there are a few things that we need to pre-generate so that we can pass them to the kubeadm commands via ignition. These are:

    • /etc/kubernetes/pki/ca.crt
    • /etc/kubernetes/pki/ca.key
    • /etc/kubernetes/pki/sa.key
    • /etc/kubernetes/pki/sa.pub
    • /etc/kubernetes/pki/front-proxy-ca.crt
    • /etc/kubernetes/pki/front-proxy-ca.key
    • /etc/kubernetes/pki/etcd/ca.crt
    • /etc/kubernetes/pki/etcd/ca.key

    We also will generate an initial join token and hash of the CA and provide that in an EnvironmentFile called /etc/kubernetes/certs.conf.

    Fortunately, this can all be created via script, called generate-k8s-certs.sh. Before we talk about the script, we should talk about the overall file structure.

    Files

    Building the configuration files and VMs is not a small feat, so we should create a dedicated workspace for this.

    mkdir -p butane certs ign scripts
    

    Now we can get started looking at our butane configs.

    Butane Structure

    All of our butane files will be in the butane directory. Each VM will have its own butane file that will have the same name as the VM and it will depend on one or more other butane file. In order to get dependencies correct, the base butane files will be prefixed by 2 numbers and they will be built in numerical order. This way, our butane files can depend on any file sorted before it, but not after. Files named 00_base* should not depend on any other butane config. The final butane configs for the VMs should only depend on these base configs.

    > tree butane/
    butane/
    ├── 00_base-k8s-token.yaml
    ├── 00_base-k8s.yaml
    ├── 10_base-ha.yaml
    ├── 20_base-ha-cp.yaml
    ├── 30_base-ha-init.yaml
    ├── 31_base-ha-join-cp.yaml
    ├── 32_base-ha-join.yaml
    ├── node1-ha.yaml
    ├── node2-ha.yaml
    ├── node3-ha.yaml
    ├── node4-ha.yaml
    └── node5-ha.yaml
    

    This already seems a bit unwieldy, so let’s create a Makefile to help build everything.

    Makefile

    Makefile
    # Options are stable | alpha | beta
    CHANNEL := stable
    
    # Make sure to https://www.flatcar.org/docs/latest/installing/vms/qemu/#gentoo
    
    TOKEN_FILE = butane/00_base-k8s-token.yaml
    BUTANE = $(TOKEN_FILE) $(shell find butane -type f -not -name 'node*' | sort -h) $(shell find butane -type f -name "node*.yaml")
    IGN = $(patsubst butane/%.yaml,ign/%.ign,$(BUTANE))
    
    ifeq ($(shell command -v podman 2> /dev/null),)
        CR=docker
    else
        CR=podman
    endif
    
    # Remove files that we can easily recreate
    clean:
    	rm -f flatcar_production_qemu_image.img.sig config.json ign/* /var/lib/libvirt/images/flatcar/*.ign
    
    # Remove files that take longer to recreate/download
    cleanclean: clean
    	rm -f flatcar_production_qemu_image.img.fresh Flatcar_Image_Signing_Key.asc $(TOKEN_FILE)
    
    .PHONY: cleanclean clean
    
    Flatcar_Image_Signing_Key.asc:
    	curl -L -O https://www.flatcar.org/security/image-signing-key/Flatcar_Image_Signing_Key.asc
    
    verify-gpg: Flatcar_Image_Signing_Key.asc
    	gpg --import --keyid-format LONG Flatcar_Image_Signing_Key.asc
    
    .PHONY: verify-gpg
    
    # Download and verify the image that we'll use as a base
    flatcar_production_qemu_image.img.fresh:
    	wget "https://${CHANNEL}.release.flatcar-linux.net/amd64-usr/current/flatcar_production_qemu_image.img"
    	wget "https://${CHANNEL}.release.flatcar-linux.net/amd64-usr/current/flatcar_production_qemu_image.img.sig"
    	gpg --verify flatcar_production_qemu_image.img.sig
    	cp -f flatcar_production_qemu_image.img flatcar_production_qemu_image.img.fresh
    
    # Create vm disks from the base image
    %.qcow2: flatcar_production_qemu_image.img.fresh
    	mkdir -p /var/lib/libvirt/images/flatcar
    	qemu-img create -f qcow2 -F qcow2 -b $(shell pwd)/flatcar_production_qemu_image.img.fresh /var/lib/libvirt/images/flatcar/$@
    
    # Convert butane configs to ignition configs
    ign/%.ign:
    	mkdir -p ign
    	$(CR) run --rm -i -v $(shell pwd)/butane:/config/butane:ro -v $(shell pwd)/ign:/config/ign quay.io/coreos/butane:latest --pretty -d /config < butane/$*.yaml > $@
    
    # Create VMs
    virt-node%: flatcar_production_qemu_image-%.qcow2 $(IGN)
    	cp -fv ign/*.ign /var/lib/libvirt/images/flatcar
    	virt-install --connect qemu:///system --import --name $@  --ram 4096  --vcpus 4 --os-variant=generic --network network=default,model=virtio --disk path=/var/lib/libvirt/images/flatcar/flatcar_production_qemu_image-$*.qcow2,format=qcow2,bus=virtio --vnc --qemu-commandline="-fw_cfg name=opt/org.flatcar-linux/config,file=/var/lib/libvirt/images/flatcar/node$*.ign" --noautoconsole
    
    # Don't delete intermediate files
    .PRECIOUS: %.qcow2 ign/%.ign
    
    # Stop and Remove VMs
    rm-virt-node%:
    	virsh destroy virt-node$* || :
    	virsh undefine virt-node$*
    
    # Create token butane file
    butane/00_base-k8s-token.yaml:
    	./scripts/generate-k8s-certs.sh
    
    # Create all ignition configs
    new-cluster: clean butane/00_base-k8s-token.yaml $(IGN)
    
    # Create all VMs
    ha: virt-node1-ha virt-node2-ha virt-node3-ha virt-node4-ha virt-node5-ha
    
    # Stop and Remove all VMs
    rm-ha: rm-virt-node1-ha rm-virt-node2-ha rm-virt-node3-ha rm-virt-node4-ha rm-virt-node5-ha
    
    .PHONY: new-cluster ha rm-ha
    

    This makefile is maybe a bit long, but we’ll go through each piece in due time. First, we should look at these lines:

    TOKEN_FILE = butane/00_base-k8s-token.yaml
    BUTANE = $(TOKEN_FILE) $(shell find butane -type f -not -name 'node*' | sort -h) $(shell find butane -type f -name "node*.yaml")
    IGN = $(patsubst butane/%.yaml,ign/%.ign,$(BUTANE))
    

    This is specifying our special token file that will contain all of our certificates and such, as well as specifying the order in which our butane files should be built. Then, it uses path substitution to generate the name of the resulting ignition files.

    # Convert butane configs to ignition configs
    ign/%.ign:
    	mkdir -p ign
    	$(CR) run --rm -i -v $(shell pwd)/butane:/config/butane:ro -v $(shell pwd)/ign:/config/ign quay.io/coreos/butane:latest --pretty -d /config < butane/$*.yaml > $@
    
    # ...
    
    # Create token butane file
    butane/00_base-k8s-token.yaml:
    	./scripts/generate-k8s-certs.sh
    

    These lines show us how to to build each ignition file using a container. We also have to specify how to build our token file using a shell script.

    Token File

    Generating the token file is a fun exercise using openssl, yaml, and bash heredocs. In case you actually value your time, the script is provided here in full. As always, please make sure you understand what commands you are running before running a script you found on the internet.

    scripts/generate-k8s-certs.sh
    #!/usr/bin/env bash
    
    set -e
    
    # Set file paths
    cert_dir="./certs"
    output_yaml="butane/00_base-k8s-token.yaml"
    
    # Create cert directory
    mkdir -p "$cert_dir"
    
    # Generate the token
    token=$(echo "$(tr -dc 'a-z0-9' < /dev/urandom | head -c 6).$(tr -dc 'a-z0-9' < /dev/urandom | head -c 16)")
    encoded_token=$(echo -n "$token" | base64)
    
    # Generate Kubernetes CA (used for cluster signing)
    openssl req -x509 -newkey rsa:2048 -keyout "$cert_dir/ca.key" -out "$cert_dir/ca.crt" -days 365 -nodes -subj "/CN=k8s-ca"
    
    # Generate Front Proxy CA (used for API server aggregation)
    openssl req -x509 -newkey rsa:2048 -keyout "$cert_dir/front-proxy-ca.key" -out "$cert_dir/front-proxy-ca.crt" -days 365 -nodes -subj "/CN=front-proxy-ca"
    
    # Generate Service Account Signing Key
    openssl genrsa -out "$cert_dir/sa.key" 2048
    openssl rsa -in "$cert_dir/sa.key" -pubout -out "$cert_dir/sa.pub"
    
    # Generate API server certificate (signed by Kubernetes CA)
    openssl req -new -newkey rsa:2048 -keyout "$cert_dir/apiserver.key" -out "$cert_dir/apiserver.csr" -nodes -subj "/CN=kube-apiserver"
    openssl x509 -req -in "$cert_dir/apiserver.csr" -CA "$cert_dir/ca.crt" -CAkey "$cert_dir/ca.key" -CAcreateserial -out "$cert_dir/apiserver.crt" -days 365
    
    # Generate etcd CA (if using external etcd)
    openssl req -x509 -newkey rsa:2048 -keyout "$cert_dir/etcd-ca.key" -out "$cert_dir/etcd-ca.crt" -days 365 -nodes -subj "/CN=etcd-ca"
    
    indent="          "
    
    # Encode certificates for YAML
    ca_crt=$(sed "s/^/${indent}/" "$cert_dir/ca.crt")
    ca_key=$(sed "s/^/${indent}/" "$cert_dir/ca.key")
    front_proxy_ca_crt=$(sed "s/^/${indent}/" "$cert_dir/front-proxy-ca.crt")
    front_proxy_ca_key=$(sed "s/^/${indent}/" "$cert_dir/front-proxy-ca.key")
    sa_key=$(sed "s/^/${indent}/" "$cert_dir/sa.key")
    sa_pub=$(sed "s/^/${indent}/" "$cert_dir/sa.pub")
    etcd_ca_crt=$(sed "s/^/${indent}/" "$cert_dir/etcd-ca.crt")
    etcd_ca_key=$(sed "s/^/${indent}/" "$cert_dir/etcd-ca.key")
    
    # Compute CA hash
    ca_hash="sha256:$(openssl x509 -pubkey -in "$cert_dir/ca.crt" | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //')"
    encoded_base64_ca_hash=$(echo -n "$ca_hash" | base64 -w 0)
    
    # Write the header to the output YAML file
    cat > "$output_yaml" <<-EOF
    ---
    # This is generated using generate-k8s-certs.sh
    variant: flatcar
    version: 1.1.0
    storage:
      files:
        - path: /etc/kubernetes/pki/ca.crt
          contents:
            inline: |
    $ca_crt
        - path: /etc/kubernetes/pki/ca.key
          contents:
            inline: |
    $ca_key
        - path: /etc/kubernetes/pki/front-proxy-ca.crt
          contents:
            inline: |
    $front_proxy_ca_crt
        - path: /etc/kubernetes/pki/front-proxy-ca.key
          contents:
            inline: |
    $front_proxy_ca_key
        - path: /etc/kubernetes/pki/sa.key
          contents:
            inline: |
    $sa_key
        - path: /etc/kubernetes/pki/sa.pub
          contents:
            inline: |
    $sa_pub
        - path: /etc/kubernetes/pki/etcd/ca.crt
          contents:
            inline: |
    $etcd_ca_crt
        - path: /etc/kubernetes/pki/etcd/ca.key
          contents:
            inline: |
    $etcd_ca_key
        - path: /etc/kubernetes/certs.conf
          contents:
            inline: |
                K8S_TOKEN='$token'
                K8S_HASH='$ca_hash'
    EOF
    
    echo "Kubernetes certificates have been generated successfully!"
    echo "YAML file '$output_yaml' has been successfully overwritten!"
    

    Now we we can save this script in scripts/generate-k8s-certs.sh run make butane/00_base-k8s-token.yaml and check out our SSL goodies. Treat this file like a password. If it gets made public (for anything other than test clusters) make sure you recreate them. It’s usually recommended to not provide sensitive information via Ignition configuration files. This is only for demo purposes.

    Butane Content

    Now that we have the basic structure in place, we can get started filling out the content of our butane files. We will first look at the “base” configs, then look at node specific configs.

    Base


    butane/00_base-k8s.yaml
    variant: flatcar
    version: 1.1.0
    storage:
      links:
        - path: /etc/extensions/docker-flatcar.raw
          target: /dev/null
          overwrite: true
        - path: /etc/extensions/containerd-flatcar.raw
          target: /dev/null
          overwrite: true
        - target: /opt/extensions/crio/crio-v1.31.3-x86-64.raw
          path: /etc/extensions/crio.raw
          hard: false
        - target: /opt/extensions/kubernetes/kubernetes-v1.31.3-x86-64.raw
          path: /etc/extensions/kubernetes.raw
          hard: false
      files:
        - path: /etc/flatcar/enabled-sysext.conf
          contents:
            inline: |
              zfs
              podman          
          mode: 0644
        - path: /etc/sysupdate.d/noop.conf
          contents:
            source: https://github.com/flatcar/sysext-bakery/releases/download/latest/noop.conf
        - path: /opt/extensions/crio/crio-v1.31.3-x86-64.raw
          mode: 0644
          contents:
            source: https://github.com/flatcar/sysext-bakery/releases/download/latest/crio-v1.31.3-x86-64.raw
        - path: /etc/sysupdate.crio.d/crio.conf
          contents:
            source: https://github.com/flatcar/sysext-bakery/releases/download/latest/crio.conf
        - path: /etc/sysupdate.kubernetes.d/kubernetes-v1.31.conf
          contents:
            source: https://github.com/flatcar/sysext-bakery/releases/download/latest/kubernetes-v1.31.conf
        - path: /opt/extensions/kubernetes/kubernetes-v1.31.3-x86-64.raw
          contents:
            source: https://github.com/flatcar/sysext-bakery/releases/download/latest/kubernetes-v1.31.3-x86-64.raw
        - path: /opt/bin/cilium.tar
          mode: 0755
          contents:
            source: https://github.com/cilium/cilium-cli/releases/download/v0.16.24/cilium-linux-amd64.tar.gz
            compression: gzip
    systemd:
      units:
        - name: systemd-sysupdate.timer
          enabled: true
        - name: locksmithd.service
          # NOTE: To coordinate the node reboot in this context, we recommend to use Kured.
          mask: true
        - name: systemd-sysupdate.service
          dropins:
            - name: kubernetes.conf
              contents: |
                [Service]
                ExecStartPre=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/kubernetes.raw > /tmp/kubernetes"
                ExecStartPre=/usr/lib/systemd/systemd-sysupdate -C kubernetes update
                ExecStartPost=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/kubernetes.raw > /tmp/kubernetes-new"
                ExecStartPost=/usr/bin/sh -c "if ! cmp --silent /tmp/kubernetes /tmp/kubernetes-new; then touch /run/reboot-required; fi"            
            - name: crio.conf
              contents: |
                [Service]
                ExecStartPre=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/crio.raw > /tmp/crio"
                ExecStartPre=/usr/lib/systemd/systemd-sysupdate -C crio update
                ExecStartPost=/usr/bin/sh -c "readlink --canonicalize /etc/extensions/crio.raw > /tmp/crio-new"
                ExecStartPost=/usr/bin/sh -c "if ! cmp --silent /tmp/crio /tmp/crio-new; then touch /run/reboot-required; fi"            
        - name: cilium-setup.service
          enabled: true
          contents: |
            [Unit]
            Description=Download and install Cilium
            After=network-online.target
            Wants=network-online.target
    
            [Service]
            Type=oneshot
            ExecStart=/bin/sh -c 'tar -C /opt/bin -xf /opt/bin/cilium.tar'
            RemainAfterExit=true
    
            [Install]
            WantedBy=multi-user.target        
        - name: zfs-setup.service
          enabled: true
          contents: |
            [Unit]
            Description=Load zfs kernel modules
            After=local-fs-pre.target
            Wants=local-fs-pre.target
    
            [Service]
            Type=oneshot
            ExecStart=/usr/bin/modprobe zfs
            RemainAfterExit=true
    
            [Install]
            WantedBy=multi-user.target        
    

    This base config does a couple of things. It disables the docker and containerd sysexts, installs kubernetes and crio sysexts, and enables the zfs and podman sysexts. It also creates a systemd unit to download the cilium cli. This cli may only be needed on the first node, but we install it on all hosts to make day 2 operations easier. We also disable locksmithd to prepare for using Kured .

    butane/10_base-ha.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/00_base-k8s.ign
          - local: ign/00_base-k8s-token.ign
    storage:
      files:
        - path: /etc/kubernetes/kubeadm-env.conf
          mode: 0644
          contents:
            inline: |
                        K8S_APISERVER_URL="192.168.122.30"
        - path: /etc/containers/policy.json
          contents:
            inline: |
              {
                "default": [
                  {
                    "type": "insecureAcceptAnything"
                  }
                ]
              }          
    

    We extend the first config by including it in this config, along with the token file. We also set any environment variales that are shared across hosts, in this case K8S_APISERVER_URL, as well as set up some default podman/crio policies.

    butane/20_base-ha-cp.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/10_base-ha.ign
    storage:
      files:
        - path: /etc/keepalived/check_apiserver.sh
          mode: 0755
          contents:
            inline: |
              #!/bin/bash
              curl --silent --max-time 2 --insecure https://localhost:6443/healthz -o /dev/null || exit 1          
        - path: /etc/containers/systemd/keepalived.container
          contents:
            inline: |
              [Unit]
              Description=Keepalived Container
              Requires=crio.service
              After=crio.service
    
              [Container]
              Image=docker.io/osixia/keepalived:latest
              AutoUpdate=registry
              AddCapability=NET_ADMIN
              AddCapability=NET_BROADCAST
              AddCapability=NET_RAW
              PodmanArgs=--privileged
              Exec=--copy-service
              Volume=/etc/keepalived/keepalived.conf:/container/service/keepalived/assets/keepalived.conf:ro
              Volume=/etc/keepalived/check_apiserver.sh:/etc/keepalived/check_apiserver.sh:ro
              Network=host
    
              [Service]
              Restart=always
              RestartSec=30
    
              [Install]
              WantedBy=multi-user.target default.target          
        - path: /etc/containers/systemd/haproxy.container
          contents:
            inline: |
              [Unit]
              Description=HAProxy Container
              Requires=keepalived.service
              After=keepalived.service
    
              [Container]
              Image=docker.io/library/haproxy:alpine
              AutoUpdate=registry
              Network=host
              Volume=/etc/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    
              [Service]
              Restart=always
              RestartSec=30
    
              [Install]
              WantedBy=multi-user.target default.target          
        - path: /etc/haproxy/haproxy.cfg
          contents:
            inline: |
              # /etc/haproxy/haproxy.cfg
              #---------------------------------------------------------------------
              # Global settings
              #---------------------------------------------------------------------
              global
                  log stdout format raw local0
                  daemon
    
              #---------------------------------------------------------------------
              # common defaults that all the 'listen' and 'backend' sections will
              # use if not designated in their block
              #---------------------------------------------------------------------
              defaults
                  mode                    http
                  log                     global
                  option                  httplog
                  option                  dontlognull
                  option http-server-close
                  option forwardfor       except 127.0.0.0/8
                  option                  redispatch
                  retries                 1
                  timeout http-request    10s
                  timeout queue           20s
                  timeout connect         5s
                  timeout client          35s
                  timeout server          35s
                  timeout http-keep-alive 10s
                  timeout check           10s
    
              #---------------------------------------------------------------------
              # apiserver frontend which proxys to the control plane nodes
              #---------------------------------------------------------------------
              frontend apiserver
                  bind 192.168.122.30:6444
                  mode tcp
                  option tcplog
                  default_backend apiserverbackend
    
              #---------------------------------------------------------------------
              # round robin balancing for apiserver
              #---------------------------------------------------------------------
              backend apiserverbackend
                  mode tcp
                  balance     roundrobin
                  
                  server node1-cp 192.168.122.31:6443 check verify none
                  server node2-cp 192.168.122.32:6443 check verify none
                  server node3-cp 192.168.122.33:6443 check verify none          
    systemd:
      units:
        - name: ip_vs-setup.service
          enabled: true
          contents: |
            [Unit]
            Description=Load ip_vs kernel modules
            After=local-fs-pre.target
            Wants=local-fs-pre.target
    
            [Service]
            Type=oneshot
            ExecStart=/usr/bin/modprobe ip_vs
            RemainAfterExit=true
    
            [Install]
            WantedBy=multi-user.target        
    

    Here we are defining services that should be running on each of the control plane nodes. We specify keepalived and haproxy containers that should be running via Quadlet. These need to be running before kubernetes so its easier to define them here. We also specify the haproxy config since it will be the same across all nodes, but we don’t specify the keepalived config. That will be different per node because it will include the nodes ip address and rankings. Finally we make sure to run modprobe ip_vs which is required when running keepalived.

    butane/30_base-ha-init.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/20_base-ha-cp.ign
    storage:
      files:
    systemd:
      units:
        - name: kubeadm.service
          enabled: false
          contents: |
            [Unit]
            StartLimitInterval=200
            StartLimitBurst=5
            Description=Kubeadm service
            Requires=haproxy.service
            After=haproxy.service
            ConditionPathExists=!/etc/kubernetes/kubelet.conf
            [Service]
            EnvironmentFile=/etc/kubernetes/kubeadm-env.conf
            EnvironmentFile=/etc/kubernetes/certs.conf
            # We skip kube-proxy because we install cilium
            ExecStartPre=/usr/bin/kubeadm init --token ${K8S_TOKEN} --cri-socket=unix:///var/run/crio/crio.sock --skip-phases=addon/kube-proxy --control-plane-endpoint ${K8S_APISERVER_URL}:6444 --upload-certs
            ExecStartPre=/usr/bin/mkdir /home/core/.kube
            ExecStartPre=/usr/bin/cp /etc/kubernetes/admin.conf /home/core/.kube/config
            ExecStart=/usr/bin/chown -R core:core /home/core/.kube
            [Install]
            WantedBy=multi-user.target        
        - name: cilium-install-cluster.service
          enabled: true
          contents: |
            [Unit]
            Description=Install Cilium as Kubernetes CNI
            Requires=kubeadm.service
            After=kubeadm.service
            [Service]
            Environment=KUBECONFIG='/home/core/.kube/config'
            ExecStart=/opt/bin/cilium install --set kubeProxyReplacement=true --namespace=kube-system
            [Install]
            WantedBy=multi-user.target        
    

    This service defines what to do when intializing the kubernetes cluster. This only needs to be run on one node. Essentially we are calling kubeadm init ... and cilium install ..., but its worth pointing out the dependencies here. We specify that kubeadm should only start AFTER the haproxy service (which is generated after we specify the haproxy container) because that is load-balancing our api-server VIP that’s defined by keepalived. We also don’t install Cilium to the cluster until there is a cluster to install to. So cilium can’t start until kubeadm starts, kubeadm can’t start until haproxy starts, and haproxy can’t start until keepalived starts. This ordering is important for an HA control plane.

    In the kubeadm init ... command, we also specify variables that are defined in EnvironmentFile=/etc/kubernetes/kubeadm-env.conf and EnvironmentFile=/etc/kubernetes/certs.conf. This ensures we are using the same secrets across all nodes, and since we pre-generated them, no information needs to be copied from node to node.

    butane/31_base-ha-join-cp.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/20_base-ha-cp.ign
    storage:
      files:
    systemd:
      units:
        - name: kubeadm.service
          enabled: true
          contents: |
            [Unit]
            StartLimitInterval=200
            StartLimitBurst=5
            Description=Kubeadm service
            Requires=crio.service
            After=crio.service
            ConditionPathExists=!/etc/kubernetes/kubelet.conf
            [Service]
            Restart=always
            RestartSec=30
            EnvironmentFile=/etc/kubernetes/certs.conf
            EnvironmentFile=/etc/kubernetes/kubeadm-env.conf
            ExecStartPre=/usr/bin/kubeadm config images pull
            # Wait until first control plane is up
            ExecStartPre=/usr/bin/sh -c "while ! nc -z ${K8S_APISERVER_URL} 6444; do sleep 5; done"
            ExecStart=/usr/bin/kubeadm join ${K8S_APISERVER_URL}:6444 --token ${K8S_TOKEN} --discovery-token-ca-cert-hash ${K8S_HASH} --ignore-preflight-errors=FileAvailable--etc-kubernetes-pki-ca.crt --cri-socket=unix:///var/run/crio/crio.sock --control-plane --certificate-key ${K8S_CERT_KEY}
            [Install]
            WantedBy=multi-user.target        
    

    This config specifies how to join the cluster as a control plane node using the information in the same EnvironmentFiles. We also specify a ExecStartPre which essentially just sleeps until the first nodes api-server is available.

    butane/32_base-ha-join.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/10_base-ha.ign
    systemd:
      units:
        - name: kubeadm.service
          enabled: true
          contents: |
            [Unit]
            StartLimitInterval=200
            StartLimitBurst=5
            Description=Kubeadm service
            Requires=crio.service
            After=crio.service
            ConditionPathExists=!/etc/kubernetes/kubelet.conf
            [Service]
            Restart=always
            RestartSec=30
            EnvironmentFile=/etc/kubernetes/certs.conf
            EnvironmentFile=/etc/kubernetes/kubeadm-env.conf
            ExecStartPre=/usr/bin/kubeadm config images pull
            ExecStartPre=/usr/bin/sh -c "while ! nc -z ${K8S_APISERVER_URL} 6444; do sleep 5; done"
            ExecStart=/usr/bin/kubeadm join ${K8S_APISERVER_URL}:6444 --token ${K8S_TOKEN} --discovery-token-ca-cert-hash ${K8S_HASH} --ignore-preflight-errors=FileAvailable--etc-kubernetes-pki-ca.crt --cri-socket=unix:///var/run/crio/crio.sock
            [Install]
            WantedBy=multi-user.target        
    

    This is similar to the previous command, but we join as a worker node, not a control plane.

    Node


    butane/node1-ha.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/30_base-ha-init.ign
    storage:
      files:
        - path: /etc/hostname
          overwrite: true
          contents:
            inline: node1-cp
        - path: /etc/systemd/network/00-eth0.network
          contents:
            inline: |
              [Match]
              Name=eth0
    
              [Network]
              Address=192.168.122.31/24
              Gateway=192.168.122.1
              DNS=192.168.122.1          
        - path: /etc/keepalived/keepalived.conf
          mode: 0644
          contents:
            inline: |
              ! /etc/keepalived/keepalived.conf
              ! Configuration File for keepalived
              global_defs {
                  router_id LVS_DEVEL
              }
              vrrp_script check_apiserver {
                script "/etc/keepalived/check_apiserver.sh"
                interval 3
                weight -2
                fall 10
                rise 2
              }
              vrrp_instance VI_1 {
                  state MASTER
                  interface eth0
                  virtual_router_id 51
                  priority 101
                  authentication {
                      auth_type PASS
                      auth_pass kubevip
                  }
                  virtual_ipaddress {
                      192.168.122.30/24
                  }
                  track_script {
                      check_apiserver
                  }
              }          
    

    Here we define the first node. We are including everything from the ign/30_base-ha-init.ign config, as well as defining our hostname and static IP. We also provide the keepalived config that specifies this node as the MASTER.

    butane/node2-ha.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/31_base-ha-join-cp.ign
    storage:
      files:
        - path: /etc/hostname
          overwrite: true
          contents:
            inline: node2-cp
        - path: /etc/systemd/network/00-eth0.network
          contents:
            inline: |
              [Match]
              Name=eth0
    
              [Network]
              Address=192.168.122.32/24
              Gateway=192.168.122.1
              DNS=192.168.122.1          
        - path: /etc/keepalived/keepalived.conf
          mode: 0644
          contents:
            inline: |
              ! /etc/keepalived/keepalived.conf
              ! Configuration File for keepalived
              global_defs {
                  router_id LVS_DEVEL
              }
              vrrp_script check_apiserver {
                script "/etc/keepalived/check_apiserver.sh"
                interval 3
                weight -2
                fall 10
                rise 2
              }
              vrrp_instance VI_1 {
                  state MASTER
                  interface eth0
                  virtual_router_id 51
                  priority 100
                  authentication {
                      auth_type PASS
                      auth_pass kubevip
                  }
                  virtual_ipaddress {
                      192.168.122.30/24
                  }
                  track_script {
                      check_apiserver
                  }
              }          
    

    butane/node3-ha.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/31_base-ha-join-cp.ign
    storage:
      files:
        - path: /etc/hostname
          overwrite: true
          contents:
            inline: node3-cp
        - path: /etc/systemd/network/00-eth0.network
          contents:
            inline: |
              [Match]
              Name=eth0
    
              [Network]
              Address=192.168.122.33/24
              Gateway=192.168.122.1
              DNS=192.168.122.1          
        - path: /etc/keepalived/keepalived.conf
          mode: 0644
          contents:
            inline: |
              ! /etc/keepalived/keepalived.conf
              ! Configuration File for keepalived
              global_defs {
                  router_id LVS_DEVEL
              }
              vrrp_script check_apiserver {
                script "/etc/keepalived/check_apiserver.sh"
                interval 3
                weight -2
                fall 10
                rise 2
              }
              vrrp_instance VI_1 {
                  state MASTER
                  interface eth0
                  virtual_router_id 51
                  priority 99
                  authentication {
                      auth_type PASS
                      auth_pass kubevip
                  }
                  virtual_ipaddress {
                      192.168.122.30/24
                  }
                  track_script {
                      check_apiserver
                  }
              }          
    

    These two files are similar in that they define the other two control plane nodes. The main differences are the hostnames, IPs, and the priority in the keepalived config.

    butane/node4-ha.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/32_base-ha-join.ign
    storage:
      files:
        - path: /etc/hostname
          overwrite: true
          contents:
            inline: node4-worker
        - path: /etc/systemd/network/00-eth0.network
          contents:
            inline: |
              [Match]
              Name=eth0
    
              [Network]
              Address=192.168.122.34/24
              Gateway=192.168.122.1
              DNS=192.168.122.1          
    

    butane/node5-ha.yaml
    variant: flatcar
    version: 1.1.0
    ignition:
      config:
        merge:
          - local: ign/32_base-ha-join.ign
    storage:
      files:
        - path: /etc/hostname
          overwrite: true
          contents:
            inline: node5-worker
        - path: /etc/systemd/network/00-eth0.network
          contents:
            inline: |
              [Match]
              Name=eth0
    
              [Network]
              Address=192.168.122.35/24
              Gateway=192.168.122.1
              DNS=192.168.122.1          
    

    These final two configs specify the worker nodes. Each inherits ign/32_base-ha-join.ign, so we only need to specify the hostname and IP addresses.

    Running

    Now that we have all of our files defined, we can actually spin up the cluster. We start by downloading the VM image, generating the configs, and running the VMs.

    NOTE: You may want to run these make commands as root, depending on how your permissions are setup with libvirt, but again, make sure you understand what the commands are doing before you run them. There is no lifeguard at the pool.

    Running make verify-gpg should download flatcars GPG key and import it. Then, running make new-cluster should generate our token file and generate all the Ignition files from the butane configs. Finally, running make ha will create and run each VM. This involves downloading a base VM image, verifying it with GPG, cloning the base image to create a working image, and copying the image (and Igntion files) to /var/lib/libvirt/images/flatcar.

    After that, we should have a running kubernetes cluster. We can shut everything down by running make rm-ha or connect to a node via virsh connect virt-node1-ha. Once we are in the first node we can run commands like kubectl get nodes or cilium status to see the status of the cluster. Do note, however, that this may take some time for everything to settle. Each node will be downloading things from the internet and automatically registering itself so you may not see all the nodes or see cilium in an error state for a few minutes.

    Once everything has settled, you can create an example deployment by running kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/controllers/nginx-deployment.yaml.

    That’s it! You now have a fully functional, highly available kubernetes cluster running on your VMs!