Skip to content

USBAkimbo/public-home-infra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is this?

  • This is the public fork of my home infra repo that I've been working on since 2024-04 (about 7 months so far)
  • This contains Terraform for Proxmox, Ansible to configure Ubuntu VMs and primarily Kubernetes using Talos Linux for apps
  • I've also got some self hosted runners, CCTV, lots of Ansible configs, etc
  • This will inevtiably rot over time, so please note that today is 2024-11-16 and if this hasn't been updated in more than 6 months, you should take everything here with a large grain of salt
  • Also note that Terraform for Proxmox is currently broken because I can't be bothered to update the provider as I don't use it - Talos VMs are too quick and simple to create
  • I heavily use the K8s parts, so you'll find they're up to date and hopefully useful
  • Enjoy

GitHub Actions self-hosted runner

  • For this, I followed the GitHub Actions self-hosted runner installer for Linux
  • My runner is an Ubuntu 22.04 LXC container that I manually created on my Proxmox host
  • Note that the IP of the LXC container will need firewall access to the hosts you want to configure
  • I followed these docs to install the runner as a service
  • This runner is only really used for Terraform in Proxmox as Terraform Cloud has to be logged into manually

Terraform user account for Proxmox

  • You need a Terraform user account on your Proxmox host for it to work
  • Follow the docs for this
  • If you're using a self-hosted runner and Terraform Cloud, you need to do a terraform login on the runner
  • This isn't ideal and I haven't found a better solution yet

Cloud init Ubuntu 22.04 image creation commands

wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
apt install guestfs-tools
virt-customize -a jammy-server-cloudimg-amd64.img --install qemu-guest-agent
qm create 9000 --name ci-template --memory 2048 --net0 virtio,bridge=vmbr2 --scsihw virtio-scsi-pci
qm set 9000 --scsi0 local-lvm:0,import-from=/root/jammy-server-cloudimg-amd64.img
qm set 9000 --ide2 local-lvm:cloudinit
qm set 9000 --boot order=scsi0
qm set 9000 --serial0 socket --vga serial0
qm template 9000

Required GitHub Actions secrets

  • Set the following secrets so your actions work
Secret Description
ANSIBLE_VAULT_PASSWORD Password to decrypt your Ansible Vault encrypted files
PM_API_URL Proxmox API URL
PM_USER Proxmox Terraform user
PM_PASSWORD Proxmox Terraform user password
SSH_HOST_KEY Ansible SSH private key in base64 format (will also automatically dos2unix the file too)

Create a new Ansible Vault secret

ansible-vault encrypt_string 'your text here'

Ansible command to patch Proxmox hosts

cd ansible
ansible-playbook -i hosts --vault-password-file vault-pass actions-playbooks/ansible-proxmox-patches.yml

Ansible command to configure parents server

cd ansible
ansible-playbook -i hosts --vault-password-file vault-pass actions-playbooks/ansible-parents-server.yml

Talos Linux on Proxmox for K8s

Talos Linux install

  • Go to https://factory.talos.dev/
  • Use the wizard to select the following
    • siderolabs/qemu-guest-agent
    • siderolabs/iscsi-tools
    • siderolabs/util-linux-tools
  • This will give you an ID, such as 88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b
  • Download the metal ISO and store it on each Proxmox node
  • On each Proxmox node, create a new VM with, for example, 8 CPUs, 16GB of RAM and a 500GB disk (will also be used for Longhorn data)
  • Connect the VM NIC to your K8s network
  • Boot the installer and follow the docs to install
  • These were my commands
# Install Talosctl on your admin machine
curl -sL https://talos.dev/install | sh

# Define vars
mkdir talos && cd talos

export NODE1="10.10.3.11"
export NODE2="10.10.3.12"
export NODE3="10.10.3.13"
export NODE4="10.10.3.14"
export TALOSCONFIG="_out/talosconfig"

# Generate cluster config
talosctl gen config talos-proxmox-cluster https://$NODE1:6443 --output-dir _out

# Move Talos config to home folder path
mv _out/talosconfig ~/.talos/config

# Create a copy of the kubeconfig in the _out folder (needed for backup)
talosctl kubeconfig _out

# Edit the controlplane.yaml to allow control planes to be used as workers too
sed -i 's/# allowSchedulingOnControlPlanes: true/allowSchedulingOnControlPlanes: true/' _out/controlplane.yaml

# Edit the control plane and worker config install section to use the custom iso as per the docs
  install:
    disk: /dev/sda # The disk used for installations.
    image: factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2

# Once you've updated the worker and controlplane, update them in KeePass
# Ensure that you backup all files in the _out folder

# Set config endpoint for Talos
talosctl config endpoint $NODE1
talosctl config node $NODE1

# Apply config to each node (this will reboot each node)
talosctl apply-config --insecure --nodes $NODE1 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes $NODE2 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes $NODE3 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes $NODE4 --file _out/worker.yaml

# Bootstrap the cluster to start it using $NODE1
talosctl bootstrap

# Export kubeconfig to default location `~/.kube/config`
# This will use $NODE1 for connections as multiple are not supported
talosctl kubeconfig
  • Give it some time and your cluster should be up
  • Install kubectl on your admin machine
  • Configure alias k=kubectl
  • Check if you can see the nodes with k get nodes
  • You should see something like this
# k get nodes
NAME         STATUS   ROLES           AGE   VERSION
p-h-k8s-01   Ready    control-plane   38h   v1.29.3
p-h-k8s-02   Ready    control-plane   38h   v1.29.3
p-h-k8s-03   Ready    control-plane   20m   v1.29.3
p-h-k8s-04   Ready    worker          20m   v1.29.3

Important! Backup configs

  • Encrypt and backup all the files in the _out folder
  • These files are very important so don't lose them
  • They also contain secrets, so don't store them in Git

Upgrade Talos nodes

  • To upgrade a node from (for example) 1.7.0 to 1.7.4, re-install talosctl using the above link
  • Then go to the factory link above and build a new image (making sure to select the qemu agent and the other extensions)
  • Then run the upgrade 1 node at a time
talosctl upgrade --nodes $NODE1 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2
talosctl upgrade --nodes $NODE2 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2
talosctl upgrade --nodes $NODE3 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2
talosctl upgrade --nodes $NODE4 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2

Kubernetes config

Install Helm

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Install Metal LB

  • Required for ingress to work so you can connect to myapp.mydomain.com which is running in your cluster
  • Kubes is cloud native and expects an auto-provisioned external load balancer
  • This is THE project for running a load balancer in the cluster and it works extremely well
# https://metallb.universe.tf/installation/

# Install (but check for an update first)
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.8/config/manifests/metallb-native.yaml

# Apply configs (wait for pods to be ready first)
k apply -f kubernetes/manual/metal-lb

Install Nginx Ingress Controller

  • Required so myapp.mydomain.com can be routed to the correct K8s service
# https://kubernetes.github.io/ingress-nginx/deploy/#quick-start
helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace

# Patch nginx to apply SSL passthrough to work for ArgoCD
kubectl -n ingress-nginx patch deployment/ingress-nginx-controller --patch-file kubernetes/manual/nginx-ingress-controller/enable-ssl.yml

Install cert-manager

  • Required for automated SSL certs for myapp.mydomain.com
# https://cert-manager.io/docs/installation/helm/
helm upgrade --install cert-manager cert-manager --repo https://charts.jetstack.io --namespace cert-manager --create-namespace --version v1.16.1 --set installCRDs=true

# See the repo for the secrets and issuer configs

ArgoCD config

  • Required for automated Git CI/CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml

# Edit the secret in the argo folder, then run
kubectl apply -f kubernetes/manual/argocd
  • Then get the admin login
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d && echo
  • You should now be able to log in and change the password
  • Argo uses an ApplicationSet to deploy everything in the kubernetes/apps folder
  • It also deploys Helm charts by "faking" a manifest that includes the actual helm chart as a dependency

Allow containers with NET_ADMIN in default namespace

  • This allows containers like Gluetun to get net admin for WireGuard
# https://www.talos.dev/latest/kubernetes-guides/configuration/pod-security/
# This will allow containers in the "default" namespace to use NET_ADMIN
kubectl label ns default pod-security.kubernetes.io/enforce=privileged

Install Longhorn

  • Required for replicated local cluster storage
  • "But why not just use NFS or something?"
  • Because SQLite locks up and the DB can corrupt on NFS shares
  • Low storage latency is required for some apps to function properly too
  • Each node has 500GB disks and Longhorn uses that storage under /var/lib/longhorn
  • Read the docs
  • https://longhorn.io/docs/latest/advanced-resources/os-distro-specific/talos-linux-support/
  • Ensure the Talos controlplane.yml and worker.yml files are updated with the correct image and machine config as described in the docs
  • https://longhorn.io/docs/latest/deploy/install/install-with-kubectl/
  • The manifest in the kubernetes/manual/longhorn folder in my repo has the installer along with my tweaks
    • 1 replica of each volume per node
    • Prefer to schedule the storage on the node that the app is running on
# Install using manifest in my repo
kubectl apply -f kubernetes/manual/longhorn

# Also set the pod security for the namespace otherwise Longhorn won't start
kubectl label namespace longhorn-system pod-security.kubernetes.io/enforce=privileged
  • Once the above is done, you should now have a reachable Longhorn web UI and a storage class
  • You want local storage, so Longhorn should keep 1 replica of a volume on each node
  • To set this, go into the Longhorn UI --> Settings --> General --> Backup Target --> Replica Node Level Soft Anti-Affinity --> Tick enabled
  • This prevents Lognhorn from doing something like "I need 4 replicas so 2 will go on node 1 and 2 will go on node 2"
  • Also set Replica Auto Balance to best-effort to ensure even distribution of replicas across nodes
  • Then modify longhorn.yml and numberOfReplicas to match the number of nodes in your cluster
  • Then set dataLocality: "best-effort"
  • These 2 settings ensure that at least 1 replica will be on each node
  • And best-effort means it will try to use the local replica
  • If you want to limit what nodes your replicas will run on, you can label the nodes and then edit the daemonset deployment to node select your label
  • Note that this will not apply to existing volumes - you need to fix them in the web UI (Longhorn is buggy, so if they fail to replicate, back them up, delete them then restore them, or backup the files in the volume and restore another way)
  • The storage class can be treated like any other storage class, so just create your PVCs for your deployments using the Longhorn storage class

Longhorn backups via NFS

  • Longhorn can auto backup volumes to NFS
  • You absolutely 100% need this as the cluster could be trashed, but TrueNAS wont be
  • Go into the Longhorn UI --> Settings --> General --> Backup Target
  • Set the NFS backup location to nfs://10.10.3.2:/mnt/ssd/longhorn-backups
  • Then go to the "Recurring Job" page and create a recurring backup job that runs daily
  • Ensure you select the default tag to backup all volumes
  • Note that this does block level backups, not file level, so if you need to restore you must do it via Longhorn
  • Not sure if I like this so I will consider file level backups using a cron job that mounts the volumes as read only and does an rsync to my NAS

Install cadvisor

  • Required to scrape container metrics
# Install Kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash

# Deploy cadvisor
# https://github.com/google/cadvisor/releases
VERSION=v0.49.1
kustomize build "https://github.com/google/cadvisor/deploy/kubernetes/base?ref=${VERSION}" | kubectl apply -f -

# Allow pods to run as privileged
kubectl label ns cadvisor pod-security.kubernetes.io/enforce=privileged

Kubeseal config

kubeseal < path/to/secret.yml > path/to/sealedsecret.yml
  • Your sealed secret will be in the same folder and can be applied by Argo and committed to the repo

Backup Kubeseal cert

  • Kubeseal works by generating a key pair and storing it in the K8s cluster
  • The CLI grabs the kubeseal public key and encrypts secrets with it
  • If your cluster breaks, your sealedsecrets will not be readable as the private key is lost
  • To prevent this, backup the key
kubectl get secret -n kube-system -l sealedsecrets.bitnami.com/sealed-secrets-key -o yaml >main.key

echo "---" >> main.key
kubectl get secret -n kube-system sealed-secrets-key -o yaml >>main.key
  • Save main.key to KeePass under Kubeseal Private Key as an attachment
  • You can restore using this if required
# Save main.key to a file
kubectl apply -f main.key
kubectl delete pod -n kube-system -l name=sealed-secrets-controller

GitHub Actions Runner Controller

  • Self hosted GitHub runners in Kubes
# https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller
# https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/using-actions-runner-controller-runners-in-a-workflow
# https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/authenticating-to-the-github-api#deploying-using-personal-access-token-classic-authentication

# Apply secret
kubectl create ns arc-systems
kubectl create ns arc-runners
kubectl apply -f kubernetes/manual/arc-runners/sealedsecret.yml

# Install the runner controller (watches for jobs)
NAMESPACE="arc-systems"
helm install arc \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

# Install the runner scale set (the containers that run jobs)
INSTALLATION_NAME="arc-runner-set"
NAMESPACE="arc-runners"
GITHUB_PAT="token"
helm upgrade "${INSTALLATION_NAME}" \
    --install \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    --set githubConfigSecret.github_token="${GITHUB_PAT}" \
    --values kubernetes/manual/arc-runners/values.yaml \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Namespace stuck terminating

# Set var
NAMESPACE=longhorn-system

# List all resources
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n $NAMESPACE

# Patch all resources in the namespace and set ther finalizers to null
for crd in $(kubectl get crd -o name | grep $NAMESPACE); do kubectl patch $crd -p '{"metadata":{"finalizers":[]}}' --type=merge; done;

# Hard namespace kill
kubectl proxy &
kubectl get namespace $NAMESPACE -o json | jq '.spec = {"finalizers":[]}' > temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
rm temp.json
pkill -f "kubectl proxy"

Unifi Network Application MongoDB config

  • For Unifi's MongoDB to work, run the deployment as normal
  • Once the DB is up, exec into it
k exec -it unifi-db-xxxx -- mongosh
  • Now create the DB config
db.getSiblingDB("unifi").createUser({user: "unifi", pwd: "unifi", roles: [{role: "readWrite", db: "unifi"}]});
db.getSiblingDB("unifi_stat").createUser({user: "unifi", pwd: "unifi", roles: [{role: "readWrite", db: "unifi_stat"}]});
  • Once this is done, Unifi should just start
  • My initial attempt to restore my config backup from my old Unifi Controller failed as the upload just kept failing
  • You can run through the setup to create a new instance, then once you're logged in you can go to the settings and restore
  • This restore also failed for me via ingress for some weird reason
  • To get around this I just did a k port-forward unifi-blah 8443 to access it
  • Restoring through that worked

Frigate basic auth

Force deployment to run on a specific node with node labels

kubectl label nodes p-h-k8s-04 frigate=allowed
kubectl label nodes p-h-k8s-04 immich=allowed
kubectl label nodes p-h-k8s-04 download=allowed
kubectl label nodes p-h-k8s-04 zabbixdb=allowed
  • Then use it in your deployment
spec:
  template:
    spec:
      nodeSelector:
        key: value

Zabbix monitoring

  • https://www.zabbix.com/integrations/kubernetes#kubernetes_http
  • My install is different to the docs
  • I've used Argo to deploy Zabbix without the Zabbix proxy - it's just the Zabbix server, web and DB with a daemonset for the agents
  • There's a service account that's required with this
  • To get the service account for Zabbix, run the following command to create a token that lasts 1 year
k create token zabbix-readonly --duration=8766h
  • This token is used in the Zabbix UI with the following
    • Create a host entry for your control plane node(s)
    • Apply the templates as-per the docs
    • This will monitor pretty much everything it can find in the cluster
  • This also needs the kube-state-metrics which is already installed via the ApplicationSet controller

Quick run krr in a container

  • Ensure you have Prom installed and it's collected a few weeks of data to be accurate
  • Run an Ubuntu container
k run --rm -it --image ubuntu:22.04 krr-temp -- bash
  • And then run this
# Install requirements
apt update && apt upgrade -y && apt install python3 python3-pip curl git gpg nano -y

# Clone and cd to repo
git clone https://github.com/robusta-dev/krr && cd krr

# Install pip requirements
pip install -r requirements.txt

# Copy your kubeconfig in
mkdir  ~/.kube/
nano ~/.kube/config

# Run krr and point at the prometheus helm chart service
python3 krr.py simple -p http://prometheus-server

About

Public fork of my home infra repo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published