Skip to content

Commit

Permalink
Updating doc with architecture changes
Browse files Browse the repository at this point in the history
  • Loading branch information
ricsanfre committed Feb 3, 2024
1 parent 44c2752 commit e6a0008
Show file tree
Hide file tree
Showing 12 changed files with 198 additions and 170 deletions.
249 changes: 135 additions & 114 deletions design/picluster-architecture.drawio

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions docs/_docs/ansible-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Quick Start Instructions
permalink: /docs/ansible/
description: Quick Start guide to deploy our Raspberry Pi Kuberentes Cluster using cloud-init, ansible playbooks and ArgoCD
last_modified_at: "24-06-2023"
last_modified_at: "06-11-2023"
---

This are the instructions to quickly deploy Kuberentes Pi-cluster using the following tools:
Expand Down Expand Up @@ -115,9 +115,9 @@ Ansible Playbook used for doing the basic OS configuration (`setup_picluster.yml
<br>
LVM configuration is done by `setup_picluster.yml` Ansible's playbook and the variables used in the configuration can be found in `vars/centralized_san/centralized_san_target.yml`: `storage_volumegroups` and `storage_volumes` variables. Sizes of the different LUNs can be tweaked to fit the size of the SSD Disk used. I used a 480GB disk so, I was able to create LUNs of 100GB for each of the nodes.

- **Dedicated disks** setup assumes that all cluster nodes (`node1-5`) have a SSD disk attached that has been partitioned during server first boot (part of the cloud-init configuration) reserving 30Gb for the root partition and the rest of available disk for creating a Linux partition mounted as `/storage`
- **Dedicated disks** setup assumes that all cluster nodes (`node1-6`) have a SSD disk attached that has been partitioned during server first boot (part of the cloud-init configuration) reserving 30Gb for the root partition and the rest of available disk for creating a Linux partition mounted as `/storage`

Final `node1-5` disk configuration is:
Final `node1-6` disk configuration is:

- /dev/sda1: Boot partition
- /dev/sda2: Root filesystem
Expand Down Expand Up @@ -219,7 +219,7 @@ Once `gateway` is up and running the rest of the nodes can be installed and conn

#### Install Raspberry PI nodes

Install Operating System on Raspberry Pi nodes `node1-5`
Install Operating System on Raspberry Pi nodes `node1-6`

Follow the installation procedure indicated in ["Ubuntu OS Installation"](/docs/ubuntu/rpi/) using the corresponding cloud-init configuration files (`user-data` and `network-config`) depending on the storage setup selected. Since DHCP is used there is no need to change default `/boot/network-config` file located in the ubuntu image.

Expand All @@ -230,7 +230,7 @@ Follow the installation procedure indicated in ["Ubuntu OS Installation"](/docs/
{: .table .table-white .border-dark }


In above user-data files, `hostname` field need to be changed for each node (node1-node5).
In above user-data files, `hostname` field need to be changed for each node (node1-node6).


{{site.data.alerts.warning}}**About SSH keys**
Expand Down Expand Up @@ -285,7 +285,7 @@ All Ansible vault credentials (vault.yml) are also stored in Hashicorp Vault

## Configuring OS level backup (restic)

Automate backup tasks at OS level with restic in all nodes (`node1-node5` and `gateway`) running the command:
Automate backup tasks at OS level with restic in all nodes (`node1-node6` and `gateway`) running the command:

```shell
make configure-os-backup
Expand Down
31 changes: 17 additions & 14 deletions docs/_docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Lab Architecture
permalink: /docs/architecture/
description: Homelab architecture of our Pi Kuberentes cluster. Cluster nodes, firewall, and Ansible control node. Networking and cluster storage design.
last_modified_at: "02-07-2023"
last_modified_at: "03-02-2024"
---


Expand All @@ -12,14 +12,12 @@ The home lab I am building is shown in the following picture


A K3S cluster is composed of the following **cluster nodes**:
- 3 master nodes (`node1`, `node2` and `node3`), running on Raspberry Pi 4B (4GB)
- 3 master nodes (`node2`, `node3` and `node4`), running on Raspberry Pi 4B (4GB)
- 5 worker nodes:
- `node4` running on Raspberry Pi 4B (4GB)
- `node5` running on Raspberry Pi 4B (8GB)
- `node5` and `node6`running on Raspberry Pi 4B (8GB)
- `node-hp-1`, `node-hp-2` and `node-hp-3` running on HP Elitedesk 800 G3 (16GB)


A couple of **LAN switches** (8 Gigabit ports + 5 Gigabit ports) used to provide L2 connectivity to the cluster nodes. L3 connectivity and internet access is provided by a router/firewall (`gateway`) running on Raspberry Pi 4B (2GB).
A couple of **LAN switches** (8 Gigabit ports + 5 Gigabit ports) used to provide L2 connectivity to the cluster nodes. L3 connectivity and internet access is provided by a router/firewall (`gateway`) running on Raspberry Pi 4B (2GB).

`gateway`, **cluster firewall/router**, is connected to LAN Switch using its Gigabit Ethernet port. It is also connected to my home network using its WIFI interface, so it can route and filter traffic comming in/out the cluster. With this architecture my lab network can be isolated from my home network.

Expand All @@ -29,7 +27,12 @@ A K3S cluster is composed of the following **cluster nodes**:
- NTP
- DHCP

A load balancer is needed for providing Hight availability to Kubernetes API. In this cases a network load balancer, [HAProxy](https://www.haproxy.org/), will be deployed in `gateway` server.
`node1`, running on Raspberry Pi 4B (4GB), for providing **kubernetes external services**:
- Secret Management (Vault)
- Kuberentes API Load Balancer
- Backup server

A load balancer is needed for providing Hight availability to Kubernetes API. In this cases a network load balancer, [HAProxy](https://www.haproxy.org/), will be deployed in `node1` server.

For automating the OS installation of x86 nodes, a **PXE server** will be deployed in `gateway` node.

Expand All @@ -56,17 +59,17 @@ For building the cluster, using bare metal servers instead of virtual machines,
I have used the following hardware components to assemble Raspberry PI components of the cluster.

- [4 x Raspberry Pi 4 - Model B (4 GB)](https://www.tiendatec.es/raspberry-pi/gama-raspberry-pi/1100-raspberry-pi-4-modelo-b-4gb-765756931182.html) and [1 x Raspberry Pi 4 - Model B (8 GB)](https://www.tiendatec.es/raspberry-pi/gama-raspberry-pi/1231-raspberry-pi-4-modelo-b-8gb-765756931199.html) as ARM-based cluster nodes (1 master node and 5 worker nodes).
- [1 x Raspberry Pi 4 - Model B (2 GB)](https://www.tiendatec.es/raspberry-pi/gama-raspberry-pi/1099-raspberry-pi-4-modelo-b-2gb-765756931175.html) as router/firewall for the lab environment connected via wifi to my home network and securing the access to my lab network.
- [2 x Raspberry Pi 4 - Model B (2 GB)](https://www.tiendatec.es/raspberry-pi/gama-raspberry-pi/1099-raspberry-pi-4-modelo-b-2gb-765756931175.html) as router/firewall for the lab environment connected via wifi to my home network and securing the access to my lab network.
- [4 x SanDisk Ultra 32 GB microSDHC Memory Cards](https://www.amazon.es/SanDisk-SDSQUA4-064G-GN6MA-microSDXC-Adaptador-Rendimiento-dp-B08GY9NYRM/dp/B08GY9NYRM) (Class 10) for installing Raspberry Pi OS for enabling booting from USB (update Raspberry PI firmware and modify USB partition)
- [4 x Samsung USB 3.1 32 GB Fit Plus Flash Disk](https://www.amazon.es/Samsung-FIT-Plus-Memoria-MUF-32AB/dp/B07HPWKS3C)
- [1 x Kingston A400 SSD Disk 480GB](https://www.amazon.es/Kingston-SSD-A400-Disco-s%C3%B3lido/dp/B01N0TQPQB)
- [4 x Kingston A400 SSD Disk 240GB](https://www.amazon.es/Kingston-SSD-A400-Disco-s%C3%B3lido/dp/B01N5IB20Q)
- [5 x Startech USB 3.0 to SATA III Adapter](https://www.amazon.es/Startech-USB3S2SAT3CB-Adaptador-3-0-2-5-negro/dp/B00HJZJI84) for connecting SSD disk to USB 3.0 ports.
- [5 x Kingston A400 SSD Disk 240GB](https://www.amazon.es/Kingston-SSD-A400-Disco-s%C3%B3lido/dp/B01N5IB20Q)
- [6 x Startech USB 3.0 to SATA III Adapter](https://www.amazon.es/Startech-USB3S2SAT3CB-Adaptador-3-0-2-5-negro/dp/B00HJZJI84) for connecting SSD disk to USB 3.0 ports.
- [1 x GeeekPi Pi Rack Case](https://www.amazon.es/GeeekPi-Raspberry-Ventilador-refrigeraci%C3%B3n-disipador/dp/B07Z4GRQGH/ref=sr_1_11). It comes with a stack for 4 x Raspberry Pi’s, plus heatsinks and fans)
- [1 x SSD Rack Case](https://www.aliexpress.com/i/33008511822.html)
- [1 x ANIDEES AI CHARGER 6+](https://www.tiendatec.es/raspberry-pi/raspberry-pi-alimentacion/796-anidees-ai-charger-6-cargador-usb-6-puertos-5v-60w-12a-raspberry-pi-4712909320214.html). 6 port USB power supply (60 W and max 12 A)
- [1 x ANKER USB Charging Hub](https://www.amazon.es/Anker-Cargador-USB-6-Puertos/dp/B00PTLSH9G/). 6 port USB power supply (60 w and max 12 A)
- [6 x USB-C charging cable with ON/OFF switch](https://www.aliexpress.com/item/33049198504.html).
- [7 x USB-C charging cable with ON/OFF switch](https://www.aliexpress.com/item/33049198504.html).


#### x86-based old refurbished mini PC
Expand Down Expand Up @@ -127,7 +130,7 @@ x86 mini PCs has their own integrated disk (SSD disk or NVME). For Raspberry PIs

`gateway` uses local storage attached directly to USB 3.0 port (Flash Disk) for hosting the OS, avoiding the use of less reliable SDCards.

For having better cluster performance `node1-node5` will use SSDs attached to USB 3.0 port. SSD disk will be used to host OS (boot from USB) and to provide the additional storage required per node for deploying the Kubernetes distributed storage solution (Ceph or Longhorn).
For having better cluster performance `node1-node6` will use SSDs attached to USB 3.0 port. SSD disk will be used to host OS (boot from USB) and to provide the additional storage required per node for deploying the Kubernetes distributed storage solution (Ceph or Longhorn).

![pi-cluster-HW-2.0](/assets/img/pi-cluster-2.0.png)

Expand All @@ -136,11 +139,11 @@ For having better cluster performance `node1-node5` will use SSDs attached to US

A cheaper alternative architecture, instead of using dedicated SSD disks for each cluster node, one single SSD disk can be used for configuring a SAN service.

Each cluster node `node1-node5` can use local storage attached directly to USB 3.0 port (USB Flash Disk) for hosting the OS, avoiding the use of less reliable SDCards.
Each cluster node `node1-node6` can use local storage attached directly to USB 3.0 port (USB Flash Disk) for hosting the OS, avoiding the use of less reliable SDCards.

As additional storage (required by distributed storage solution), iSCSI SAN can be deployed instead of attaching an additional USB Flash Disks to each of the nodes.

A SAN (Storage Access Network) can be configured using `gateway` as iSCSI Storage Server, providing additional storage (LUNs) to `node1-node5`.
A SAN (Storage Access Network) can be configured using `gateway` as iSCSI Storage Server, providing additional storage (LUNs) to `node1-node6`.

As storage device, a SSD disk was attached to `gateway` node. This SSD disk was used as well to host the OS.

Expand Down
4 changes: 2 additions & 2 deletions docs/_docs/backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ Velero CLI need to be installed joinly with kubectl. `velero` uses kubectl confi
{{site.data.alerts.important}} k3s config file is located in `/etc/rancher/k3s/k3s.yaml` and it need to be copied into `$HOME/kube/config` in the server where `kubectl` and `velero` is going to be executed.
{{site.data.alerts.end}}

This will be installed in `node1`
This will be installed in `pimaster`

- Step 1: Download latest stable velero release from https://github.com/vmware-tanzu/velero/releases

Expand Down Expand Up @@ -696,7 +696,7 @@ Installation using `Helm` (Release 3):

#### GitOps installation (ArgoCD)

As alternative, for GitOps deployment (ArgoCD), instead of putting minio credentiasl into helm values in plain text, a Secret can be used to store the credentials.
As alternative, for GitOps deployment (ArgoCD), instead of putting minio credentials into helm values in plain text, a Secret can be used to store the credentials.

```yml
apiVersion: v1
Expand Down
2 changes: 1 addition & 1 deletion docs/_docs/basic-os-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Raspberry PI does not have by default a RTC (real-time clock) keeping the time w
Even when NTP is used to synchronize the time and date, when it boots takes as current-time the time of the first-installation and it could cause problems in boot time when the OS detect that a mount point was created in the future and ask for manual execution of fscsk

{{site.data.alerts.note}}
I have detected this behaviour with my Raspberry PIs when mounting the iSCSI LUNs in `node1-node5` and after rebooting the server, the server never comes up.
I have detected this behaviour with my Raspberry PIs when mounting the iSCSI LUNs in `node1-node6` and after rebooting the server, the server never comes up.
{{site.data.alerts.end}}

As a side effect the NTP synchronizatio will also take longer since NTP adjust the time in small steps.
Expand Down
13 changes: 9 additions & 4 deletions docs/_docs/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
title: Cluster Gateway
permalink: /docs/gateway/
description: How to configure a Raspberry Pi as router/firewall of our Kubernetes Cluster providing connectivity and basic services (DNS, DHCP, NTP, SAN).
last_modified_at: "18-06-2023"
last_modified_at: "03-02-2024"
---

One of the Raspeberry Pi (2GB), **gateway**, is used as Router and Firewall for the home lab, isolating the raspberry pi cluster from my home network.
It will also provide DNS, NTP and DHCP services to my lab network. In case of deployment using centralized SAN storage architectural option, `gateway` is providing SAN services also.

This Raspberry Pi (gateway), is connected to my home network using its WIFI interface (wlan0) and to the LAN Switch using the eth interface (eth0).

In order to ease the automation with Ansible, OS installed on **gateway** is the same as the one installed in the nodes of the cluster (**node1-node5**): Ubuntu 22.04 64 bits.
In order to ease the automation with Ansible, OS installed on **gateway** is the same as the one installed in the nodes of the cluster: Ubuntu 22.04 64 bits.


## Storage Configuration
Expand Down Expand Up @@ -529,6 +529,7 @@ For automating configuration tasks, ansible role [**ricsanfre.dnsmasq**](https:/
dhcp-host=e4:5f:01:2f:49:05,10.0.0.13
dhcp-host=e4:5f:01:2f:54:82,10.0.0.14
dhcp-host=e4:5f:01:d9:ec:5c,10.0.0.15
dhcp-host=d8:3a:dd:0d:be:c8,10.0.0.16
# Adding additional DHCP hosts
# Ethernet Switch
Expand All @@ -542,6 +543,7 @@ For automating configuration tasks, ansible role [**ricsanfre.dnsmasq**](https:/
host-record=node3.picluster.ricsanfre.com,10.0.0.13
host-record=node4.picluster.ricsanfre.com,10.0.0.14
host-record=node5.picluster.ricsanfre.com,10.0.0.15
host-record=node6.picluster.ricsanfre.com,10.0.0.16
# Adding additional DNS
# NTP Server
Expand All @@ -554,16 +556,19 @@ For automating configuration tasks, ansible role [**ricsanfre.dnsmasq**](https:/

Additional DNS records can be added for the different services exposed by the cluster. For example:

- S3 service DNS name pointing to `node1`
- S3/Vault service DNS name pointing to `node1`
```
# S3 Server
host-record=s3.picluster.ricsanfre.com,10.0.0.11
# Vault server
host-record=vault.picluster.ricsanfre.com,10.0.0.11
```
- Monitoring DNS service pointing to Ingress Controller IP address (from MetaLB pool)
```
# Monitoring
host-record=monitoring.picluster.ricsanfre.com,10.0.0.100
```

{{site.data.alerts.end}}

- Step 3. Restart dnsmasq service
Expand Down Expand Up @@ -743,7 +748,7 @@ Check time synchronization with Chronyc
## iSCSI configuration. Centralized SAN
`gateway` has to be configured as iSCSI Target to export LUNs mounted by `node1-node5`
`gateway` has to be configured as iSCSI Target to export LUNs mounted by `node1-node6`
iSCSI configuration in `gateway` has been automated developing a couple of ansible roles: **ricsanfre.storage** for managing LVM and **ricsanfre.iscsi_target** for configuring a iSCSI target.
Expand Down
4 changes: 2 additions & 2 deletions docs/_docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ last_modified_at: "21-01-2024"
## Scope
The scope of this project is to create a kubernetes cluster at home using ARM/x86 bare metal nodes (**Raspberry Pis** and low cost refurbished **mini PCs**) and to automate its deployment and configuration applying **IaC (infrastructure as a code)** and **GitOps** methodologies with tools like [Ansible](https://docs.ansible.com/), [cloud-init](https://cloudinit.readthedocs.io/en/latest/) and [Argo CD](https://argo-cd.readthedocs.io/en/stable/).

As part of the project, the goal is to use a lightweight Kubernetes flavor based on [K3S](https://k3s.io/) and deploy cluster basic services such as: 1) distributed block storage for POD's persistent volumes, [LongHorn](https://longhorn.io/), 2) backup/restore solution for the cluster, [Velero](https://velero.io/) and [Restic](https://restic.net/), 3) service mesh architecture, [Linkerd](https://linkerd.io/), and 4) observability platform based on metrics monitoring solution, [Prometheus](https://prometheus.io/), logging and analytics solution, EFḰ+LG stack ([Elasticsearch](https://www.elastic.co/elasticsearch/)-[Fluentd](https://www.fluentd.org/)/[Fluentbit](https://fluentbit.io/)-[Kibana](https://www.elastic.co/kibana/) + [Loki](https://grafana.com/oss/loki/)-[Grafana](https://grafana.com/oss/grafana/)), and distributed tracing solution, [Tempo](https://grafana.com/oss/tempo/).
As part of the project, the goal is to use a lightweight Kubernetes flavor based on [K3S](https://k3s.io/) and deploy cluster basic services such as: 1) distributed block storage for POD's persistent volumes, [LongHorn](https://longhorn.io/), 2) backup/restore solution for the cluster, [Velero](https://velero.io/) and [Restic](https://restic.net/), 3) service mesh architecture, [Linkerd](https://linkerd.io/), and 4) observability platform based on metrics monitoring solution, [Prometheus](https://prometheus.io/), logging and analytics solution, EFK+LG stack ([Elasticsearch](https://www.elastic.co/elasticsearch/)-[Fluentd](https://www.fluentd.org/)/[Fluentbit](https://fluentbit.io/)-[Kibana](https://www.elastic.co/kibana/) + [Loki](https://grafana.com/oss/loki/)-[Grafana](https://grafana.com/oss/grafana/)), and distributed tracing solution, [Tempo](https://grafana.com/oss/tempo/).


## Design Principles
Expand Down Expand Up @@ -233,7 +233,7 @@ There is another list of services that I have decided to run outside the kuberen

Minio backup servive is hosted in a VM running in Public Cloud, using [Oracle Cloud Infrastructure (OCI) free tier](https://www.oracle.com/es/cloud/free/).

Vault service is running in `gateway` node, since Vault kubernetes authentication method need access to Kuberentes API, I won't host Vault service in Public Cloud.
Vault service is running in one of the cluster nodes, `node1`, since Vault kubernetes authentication method need access to Kuberentes API, I won't host Vault service in Public Cloud.


## What I have built so far
Expand Down
Loading

0 comments on commit e6a0008

Please sign in to comment.