What is this?

This is the public fork of my home infra repo that I've been working on since 2024-04 (about 7 months so far)
This contains Terraform for Proxmox, Ansible to configure Ubuntu VMs and primarily Kubernetes using Talos Linux for apps
I've also got some self hosted runners, CCTV, lots of Ansible configs, etc
This will inevtiably rot over time, so please note that today is 2024-11-16 and if this hasn't been updated in more than 6 months, you should take everything here with a large grain of salt
Also note that Terraform for Proxmox is currently broken because I can't be bothered to update the provider as I don't use it - Talos VMs are too quick and simple to create
I heavily use the K8s parts, so you'll find they're up to date and hopefully useful
Enjoy

GitHub Actions self-hosted runner

For this, I followed the GitHub Actions self-hosted runner installer for Linux
My runner is an Ubuntu 22.04 LXC container that I manually created on my Proxmox host
Note that the IP of the LXC container will need firewall access to the hosts you want to configure
I followed these docs to install the runner as a service
This runner is only really used for Terraform in Proxmox as Terraform Cloud has to be logged into manually

Terraform user account for Proxmox

You need a Terraform user account on your Proxmox host for it to work
Follow the docs for this
If you're using a self-hosted runner and Terraform Cloud, you need to do a terraform login on the runner
This isn't ideal and I haven't found a better solution yet

Cloud init Ubuntu 22.04 image creation commands

Read the docs as always

wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
apt install guestfs-tools
virt-customize -a jammy-server-cloudimg-amd64.img --install qemu-guest-agent
qm create 9000 --name ci-template --memory 2048 --net0 virtio,bridge=vmbr2 --scsihw virtio-scsi-pci
qm set 9000 --scsi0 local-lvm:0,import-from=/root/jammy-server-cloudimg-amd64.img
qm set 9000 --ide2 local-lvm:cloudinit
qm set 9000 --boot order=scsi0
qm set 9000 --serial0 socket --vga serial0
qm template 9000

Required GitHub Actions secrets

Set the following secrets so your actions work

Secret	Description
ANSIBLE_VAULT_PASSWORD	Password to decrypt your Ansible Vault encrypted files
PM_API_URL	Proxmox API URL
PM_USER	Proxmox Terraform user
PM_PASSWORD	Proxmox Terraform user password
SSH_HOST_KEY	Ansible SSH private key in base64 format (will also automatically dos2unix the file too)

Create a new Ansible Vault secret

ansible-vault encrypt_string 'your text here'

Ansible command to patch Proxmox hosts

cd ansible
ansible-playbook -i hosts --vault-password-file vault-pass actions-playbooks/ansible-proxmox-patches.yml

Ansible command to configure parents server

cd ansible
ansible-playbook -i hosts --vault-password-file vault-pass actions-playbooks/ansible-parents-server.yml

Talos Linux on Proxmox for K8s

Talos Linux is used to run the K8s cluster
A custom Talos ISO is also used
See the docs for this
This is required to support the qemu guest agent and the required components for Longhorn

Talos Linux install

Go to https://factory.talos.dev/
Use the wizard to select the following
- siderolabs/qemu-guest-agent
- siderolabs/iscsi-tools
- siderolabs/util-linux-tools
This will give you an ID, such as 88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b
Download the metal ISO and store it on each Proxmox node
On each Proxmox node, create a new VM with, for example, 8 CPUs, 16GB of RAM and a 500GB disk (will also be used for Longhorn data)
Connect the VM NIC to your K8s network
Boot the installer and follow the docs to install
These were my commands

# Install Talosctl on your admin machine
curl -sL https://talos.dev/install | sh

# Define vars
mkdir talos && cd talos

export NODE1="10.10.3.11"
export NODE2="10.10.3.12"
export NODE3="10.10.3.13"
export NODE4="10.10.3.14"
export TALOSCONFIG="_out/talosconfig"

# Generate cluster config
talosctl gen config talos-proxmox-cluster https://$NODE1:6443 --output-dir _out

# Move Talos config to home folder path
mv _out/talosconfig ~/.talos/config

# Create a copy of the kubeconfig in the _out folder (needed for backup)
talosctl kubeconfig _out

# Edit the controlplane.yaml to allow control planes to be used as workers too
sed -i 's/# allowSchedulingOnControlPlanes: true/allowSchedulingOnControlPlanes: true/' _out/controlplane.yaml

# Edit the control plane and worker config install section to use the custom iso as per the docs
  install:
    disk: /dev/sda # The disk used for installations.
    image: factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2

# Once you've updated the worker and controlplane, update them in KeePass
# Ensure that you backup all files in the _out folder

# Set config endpoint for Talos
talosctl config endpoint $NODE1
talosctl config node $NODE1

# Apply config to each node (this will reboot each node)
talosctl apply-config --insecure --nodes $NODE1 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes $NODE2 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes $NODE3 --file _out/controlplane.yaml
talosctl apply-config --insecure --nodes $NODE4 --file _out/worker.yaml

# Bootstrap the cluster to start it using $NODE1
talosctl bootstrap

# Export kubeconfig to default location `~/.kube/config`
# This will use $NODE1 for connections as multiple are not supported
talosctl kubeconfig

Give it some time and your cluster should be up
Install kubectl on your admin machine
Configure alias k=kubectl
Check if you can see the nodes with k get nodes
You should see something like this

# k get nodes
NAME         STATUS   ROLES           AGE   VERSION
p-h-k8s-01   Ready    control-plane   38h   v1.29.3
p-h-k8s-02   Ready    control-plane   38h   v1.29.3
p-h-k8s-03   Ready    control-plane   20m   v1.29.3
p-h-k8s-04   Ready    worker          20m   v1.29.3

Important! Backup configs

Encrypt and backup all the files in the _out folder
These files are very important so don't lose them
They also contain secrets, so don't store them in Git

Upgrade Talos nodes

To upgrade a node from (for example) 1.7.0 to 1.7.4, re-install talosctl using the above link
Then go to the factory link above and build a new image (making sure to select the qemu agent and the other extensions)
Then run the upgrade 1 node at a time

talosctl upgrade --nodes $NODE1 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2
talosctl upgrade --nodes $NODE2 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2
talosctl upgrade --nodes $NODE3 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2
talosctl upgrade --nodes $NODE4 --image factory.talos.dev/installer/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b:v1.8.2

Kubernetes config

The below is all in logical order as if you're starting from scratch
Also see the API reference docs (these should be easier to find ffs)
I have a rule as well - anything that could break the cluster DOES NOT go in ArgoCD

Install Helm

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Install Metal LB

Required for ingress to work so you can connect to myapp.mydomain.com which is running in your cluster
Kubes is cloud native and expects an auto-provisioned external load balancer
This is THE project for running a load balancer in the cluster and it works extremely well

# https://metallb.universe.tf/installation/

# Install (but check for an update first)
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.8/config/manifests/metallb-native.yaml

# Apply configs (wait for pods to be ready first)
k apply -f kubernetes/manual/metal-lb

Install Nginx Ingress Controller

Required so myapp.mydomain.com can be routed to the correct K8s service

# https://kubernetes.github.io/ingress-nginx/deploy/#quick-start
helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace

# Patch nginx to apply SSL passthrough to work for ArgoCD
kubectl -n ingress-nginx patch deployment/ingress-nginx-controller --patch-file kubernetes/manual/nginx-ingress-controller/enable-ssl.yml

Install cert-manager

Required for automated SSL certs for myapp.mydomain.com

# https://cert-manager.io/docs/installation/helm/
helm upgrade --install cert-manager cert-manager --repo https://charts.jetstack.io --namespace cert-manager --create-namespace --version v1.16.1 --set installCRDs=true

# See the repo for the secrets and issuer configs

ArgoCD config

Required for automated Git CI/CD

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml

# Edit the secret in the argo folder, then run
kubectl apply -f kubernetes/manual/argocd

Then get the admin login

kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d && echo

You should now be able to log in and change the password
Argo uses an ApplicationSet to deploy everything in the kubernetes/apps folder
It also deploys Helm charts by "faking" a manifest that includes the actual helm chart as a dependency

Allow containers with NET_ADMIN in default namespace

This allows containers like Gluetun to get net admin for WireGuard

# https://www.talos.dev/latest/kubernetes-guides/configuration/pod-security/
# This will allow containers in the "default" namespace to use NET_ADMIN
kubectl label ns default pod-security.kubernetes.io/enforce=privileged

Install Longhorn

Required for replicated local cluster storage
"But why not just use NFS or something?"
Because SQLite locks up and the DB can corrupt on NFS shares
Low storage latency is required for some apps to function properly too
Each node has 500GB disks and Longhorn uses that storage under /var/lib/longhorn
Read the docs
https://longhorn.io/docs/latest/advanced-resources/os-distro-specific/talos-linux-support/
Ensure the Talos controlplane.yml and worker.yml files are updated with the correct image and machine config as described in the docs
https://longhorn.io/docs/latest/deploy/install/install-with-kubectl/
The manifest in the kubernetes/manual/longhorn folder in my repo has the installer along with my tweaks
- 1 replica of each volume per node
- Prefer to schedule the storage on the node that the app is running on

# Install using manifest in my repo
kubectl apply -f kubernetes/manual/longhorn

# Also set the pod security for the namespace otherwise Longhorn won't start
kubectl label namespace longhorn-system pod-security.kubernetes.io/enforce=privileged

Once the above is done, you should now have a reachable Longhorn web UI and a storage class
You want local storage, so Longhorn should keep 1 replica of a volume on each node
To set this, go into the Longhorn UI --> Settings --> General --> Backup Target --> Replica Node Level Soft Anti-Affinity --> Tick enabled
This prevents Lognhorn from doing something like "I need 4 replicas so 2 will go on node 1 and 2 will go on node 2"
Also set Replica Auto Balance to best-effort to ensure even distribution of replicas across nodes
Then modify longhorn.yml and numberOfReplicas to match the number of nodes in your cluster
Then set dataLocality: "best-effort"
These 2 settings ensure that at least 1 replica will be on each node
And best-effort means it will try to use the local replica
If you want to limit what nodes your replicas will run on, you can label the nodes and then edit the daemonset deployment to node select your label
Note that this will not apply to existing volumes - you need to fix them in the web UI (Longhorn is buggy, so if they fail to replicate, back them up, delete them then restore them, or backup the files in the volume and restore another way)
The storage class can be treated like any other storage class, so just create your PVCs for your deployments using the Longhorn storage class

Longhorn backups via NFS

Longhorn can auto backup volumes to NFS
You absolutely 100% need this as the cluster could be trashed, but TrueNAS wont be
Go into the Longhorn UI --> Settings --> General --> Backup Target
Set the NFS backup location to nfs://10.10.3.2:/mnt/ssd/longhorn-backups
Then go to the "Recurring Job" page and create a recurring backup job that runs daily
Ensure you select the default tag to backup all volumes
Note that this does block level backups, not file level, so if you need to restore you must do it via Longhorn
Not sure if I like this so I will consider file level backups using a cron job that mounts the volumes as read only and does an rsync to my NAS

Install cadvisor

Required to scrape container metrics

# Install Kustomize
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash

# Deploy cadvisor
# https://github.com/google/cadvisor/releases
VERSION=v0.49.1
kustomize build "https://github.com/google/cadvisor/deploy/kubernetes/base?ref=${VERSION}" | kubectl apply -f -

# Allow pods to run as privileged
kubectl label ns cadvisor pod-security.kubernetes.io/enforce=privileged

Kubeseal config

https://github.com/bitnami-labs/sealed-secrets?tab=readme-ov-file#installation
https://github.com/bitnami-labs/sealed-secrets/releases
Install the controller with ArgoCD
Install the CLI from the docs
Create your secret.yml as you normally would, but don't commit it to the repo
Encrypt your secrets using kubeseal

kubeseal < path/to/secret.yml > path/to/sealedsecret.yml

Your sealed secret will be in the same folder and can be applied by Argo and committed to the repo

Backup Kubeseal cert

Kubeseal works by generating a key pair and storing it in the K8s cluster
The CLI grabs the kubeseal public key and encrypts secrets with it
If your cluster breaks, your sealedsecrets will not be readable as the private key is lost
To prevent this, backup the key

kubectl get secret -n kube-system -l sealedsecrets.bitnami.com/sealed-secrets-key -o yaml >main.key

echo "---" >> main.key
kubectl get secret -n kube-system sealed-secrets-key -o yaml >>main.key

Save main.key to KeePass under Kubeseal Private Key as an attachment
You can restore using this if required

# Save main.key to a file
kubectl apply -f main.key
kubectl delete pod -n kube-system -l name=sealed-secrets-controller

GitHub Actions Runner Controller

Self hosted GitHub runners in Kubes

# https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller
# https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/using-actions-runner-controller-runners-in-a-workflow
# https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/authenticating-to-the-github-api#deploying-using-personal-access-token-classic-authentication

# Apply secret
kubectl create ns arc-systems
kubectl create ns arc-runners
kubectl apply -f kubernetes/manual/arc-runners/sealedsecret.yml

# Install the runner controller (watches for jobs)
NAMESPACE="arc-systems"
helm install arc \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

# Install the runner scale set (the containers that run jobs)
INSTALLATION_NAME="arc-runner-set"
NAMESPACE="arc-runners"
GITHUB_PAT="token"
helm upgrade "${INSTALLATION_NAME}" \
    --install \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    --set githubConfigSecret.github_token="${GITHUB_PAT}" \
    --values kubernetes/manual/arc-runners/values.yaml \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Useful issues page on uninstall if you need it, but just try a k delete ns <namespace> --force first
actions/actions-runner-controller#2781

Namespace stuck terminating

Do you have a namespace that just says terminating and it won't die?
It's probably because of the CRDs
First, check the docs of the thing you're trying to remove
Failing that, you should be able to clean them up with the below commands
These 2 articles were useful for Longhorn
https://avasdream.engineer/kubernetes-longhorn-stuck-terminating
https://longhorn.io/docs/latest/deploy/uninstall/#uninstalling-longhorn-using-kubectl

# Set var
NAMESPACE=longhorn-system

# List all resources
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n $NAMESPACE

# Patch all resources in the namespace and set ther finalizers to null
for crd in $(kubectl get crd -o name | grep $NAMESPACE); do kubectl patch $crd -p '{"metadata":{"finalizers":[]}}' --type=merge; done;

# Hard namespace kill
kubectl proxy &
kubectl get namespace $NAMESPACE -o json | jq '.spec = {"finalizers":[]}' > temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
rm temp.json
pkill -f "kubectl proxy"

Unifi Network Application MongoDB config

For Unifi's MongoDB to work, run the deployment as normal
Once the DB is up, exec into it

k exec -it unifi-db-xxxx -- mongosh

Now create the DB config

db.getSiblingDB("unifi").createUser({user: "unifi", pwd: "unifi", roles: [{role: "readWrite", db: "unifi"}]});
db.getSiblingDB("unifi_stat").createUser({user: "unifi", pwd: "unifi", roles: [{role: "readWrite", db: "unifi_stat"}]});

Once this is done, Unifi should just start
My initial attempt to restore my config backup from my old Unifi Controller failed as the upload just kept failing
You can run through the setup to create a new instance, then once you're logged in you can go to the settings and restore
This restore also failed for me via ingress for some weird reason
To get around this I just did a k port-forward unifi-blah 8443 to access it
Restoring through that worked

Frigate basic auth

Useful docs on basic auth from Kubes

Force deployment to run on a specific node with node labels

First you need to label the node

kubectl label nodes p-h-k8s-04 frigate=allowed
kubectl label nodes p-h-k8s-04 immich=allowed
kubectl label nodes p-h-k8s-04 download=allowed
kubectl label nodes p-h-k8s-04 zabbixdb=allowed

Then use it in your deployment

spec:
  template:
    spec:
      nodeSelector:
        key: value

Zabbix monitoring

https://www.zabbix.com/integrations/kubernetes#kubernetes_http
My install is different to the docs
I've used Argo to deploy Zabbix without the Zabbix proxy - it's just the Zabbix server, web and DB with a daemonset for the agents
There's a service account that's required with this
To get the service account for Zabbix, run the following command to create a token that lasts 1 year

k create token zabbix-readonly --duration=8766h

This token is used in the Zabbix UI with the following
- Create a host entry for your control plane node(s)
- Apply the templates as-per the docs
- This will monitor pretty much everything it can find in the cluster
This also needs the kube-state-metrics which is already installed via the ApplicationSet controller

Quick run krr in a container

Ensure you have Prom installed and it's collected a few weeks of data to be accurate
Run an Ubuntu container

k run --rm -it --image ubuntu:22.04 krr-temp -- bash

And then run this

# Install requirements
apt update && apt upgrade -y && apt install python3 python3-pip curl git gpg nano -y

# Clone and cd to repo
git clone https://github.com/robusta-dev/krr && cd krr

# Install pip requirements
pip install -r requirements.txt

# Copy your kubeconfig in
mkdir  ~/.kube/
nano ~/.kube/config

# Run krr and point at the prometheus helm chart service
python3 krr.py simple -p http://prometheus-server

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
ansible		ansible
docker		docker
kubernetes		kubernetes
terraform		terraform
.gitignore		.gitignore
README.md		README.md
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is this?

GitHub Actions self-hosted runner

Terraform user account for Proxmox

Cloud init Ubuntu 22.04 image creation commands

Required GitHub Actions secrets

Create a new Ansible Vault secret

Ansible command to patch Proxmox hosts

Ansible command to configure parents server

Talos Linux on Proxmox for K8s

Talos Linux install

Important! Backup configs

Upgrade Talos nodes

Kubernetes config

Install Helm

Install Metal LB

Install Nginx Ingress Controller

Install cert-manager

ArgoCD config

Allow containers with NET_ADMIN in default namespace

Install Longhorn

Longhorn backups via NFS

Install cadvisor

Kubeseal config

Backup Kubeseal cert

GitHub Actions Runner Controller

Namespace stuck terminating

Unifi Network Application MongoDB config

Frigate basic auth

Force deployment to run on a specific node with node labels

Zabbix monitoring

Quick run krr in a container

About

Releases

Packages

Languages

USBAkimbo/public-home-infra

Folders and files

Latest commit

History

Repository files navigation

What is this?

GitHub Actions self-hosted runner

Terraform user account for Proxmox

Cloud init Ubuntu 22.04 image creation commands

Required GitHub Actions secrets

Create a new Ansible Vault secret

Ansible command to patch Proxmox hosts

Ansible command to configure parents server

Talos Linux on Proxmox for K8s

Talos Linux install

Important! Backup configs

Upgrade Talos nodes

Kubernetes config

Install Helm

Install Metal LB

Install Nginx Ingress Controller

Install cert-manager

ArgoCD config

Allow containers with NET_ADMIN in default namespace

Install Longhorn

Longhorn backups via NFS

Install cadvisor

Kubeseal config

Backup Kubeseal cert

GitHub Actions Runner Controller

Namespace stuck terminating

Unifi Network Application MongoDB config

Frigate basic auth

Force deployment to run on a specific node with node labels

Zabbix monitoring

Quick run krr in a container

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages