Edit for clarity and unpublish some unhelpful posts

kencx · Jan 22, 2024 · 1f9faa9 · 1f9faa9
1 parent 72432b2
commit 1f9faa9
Show file tree

Hide file tree

Showing 7 changed files with 103 additions and 90 deletions.
diff --git a/content/posts/auto-generate-resume.md b/content/posts/auto-generate-resume.md
@@ -1,7 +1,7 @@
 ---
 title: "Automating my Resume"
 date: 2023-08-21
-lastmod: 2023-08-21
+lastmod: 2024-01-22
 draft: false
 toc: true
 tags:
@@ -20,12 +20,12 @@ There's a [bunch](https://github.com/xitanggg/open-resume)
 [of](https://github.com/AmruthPillai/Reactive-Resume)
 [resume](https://github.com/topics/resume-builder) building sites out there,
 mostly catered for people in Tech, but many involve creating an account on their
-sites and using one of their templates. No thank you.
+sites and using one of their templates. That's not really my thing.
 
 I also have an existing resume in LaTeX, which I could just use to build a PDF
 automatically with Github Actions, and call it a day, but there's no fun in
-that. Instead, I *have* to over-engineer an entire resume pipeline that
-automatically builds it in three different formats:
+that. Instead, I wanted to over-engineer a full pipeline that automatically
+builds a resume in three different formats:
 
 1. pdf
 2. html to host a static site (because why not?)

diff --git a/content/posts/automated-testing-of-restic-backups.md b/content/posts/automated-testing-of-restic-backups.md
@@ -1,7 +1,7 @@
 ---
 title: "Automated Testing of Restic Backups"
 date: 2023-08-09
-lastmod: 2023-08-09
+lastmod: 2024-01-22
 draft: false
 toc: true
 tags:
@@ -17,22 +17,21 @@ rule](https://www.backblaze.com/blog/the-3-2-1-backup-strategy/) by being
 automatic, redundant and offsite.
 
 However, a backup is only as good as its ability to be restored successfully. It
-would be disastrous if you tried to restore a backup snapshot to find that your
-files have been unknowningly corrupted or loss during the backup process.
+can be potentially disastrous if we tried to restore a backup after data loss
+and realise that the data has been unknowningly corrupted or loss during the
+backup process.
 
-Which is why my paranoid self also runs automated restore testing as part of the
-daily backup process.
-
-After restic back ups any new data and prunes old snapshots, we run some
-restoration tests:
+A good measure involves performing automated restoration tests as part of the
+backup process. After restic backs up any new data and prunes old snapshots, it
+also performs the following:
 
 - Check a subset of all data with [restic
   check](https://restic.readthedocs.io/en/stable/045_working_with_repos.html#checking-integrity-and-consistency)
   (1% in my case)
 - Restore a series of test files and compare them with the original
 
 While complete checks and restores would be more representative of the integrity
-of the backups, they are also very unfeasible for obvious reasons[^1].
+of the backups, they are also very unfeasible[^1].
 
 ## Restic check
 
@@ -55,39 +54,48 @@ $ autorestic exec -av -- check --read-data-subset=1%
 ## Restoring Test Files
 
 In addition to restic's native integrity checks, we also run explicit checks by
-restoring a test file after the backup process.
-
-Before every backup, a `generate-restore-test-files` script is executed to
-create a test file with random contents in a specified test directory. The test
-directory stores the last 5 generated test files and is included in the backup.
+restoring a test file after the backup process. This involves creating a test
+file with random content in a specified test directory before the backup:
 
 ```bash
+#!/bin/bash
+
 # generate-restore-test-files.sh
-dd if=/dev/random of="$RESTORE_DIR/test-$(date +%Y-%m-%d)" count=10 >/dev/null 2>&1
+TEST_DIR="~/restore-test"
+dd if=/dev/random of="$TEST_DIR/test-$(date +%Y-%m-%d)" count=10 >/dev/null 2>&1
 
 # delete any files older than 5 days
 cd $TEST_DIR && \
     find . -type f ! -newerct "$(date --date='-5 days' '+%Y/%m/%d %H:%M:%S')" -delete
 ```
 
-After every backup, all files in the test directory are restored from the latest
-backup snapshot to a separate temporary directory. These restored files are `diff`-ed
-with the originals found in the test directory. The backup will fail if any of
-the files are different.
+Any test files older than the last 5 generated files are discarded. During the
+backup, we have restic restore the files in this directory to a temporary
+directory.
 
 ```bash
-# backup.sh
-
-...
+RESTORE_DIR="~/restore-test"
+TMP_DIR="$(mktemp -d)"
+autorestic restore -v --include "$RESTORE_DIR" --to "$TMP_DIR"
+```
+These restored files are `diff`-ed with the originals found in the test
+directory. The backup will fail if any of the files are different.
 
+```bash
 RESTORED_FILES="$(cd "$RESTORE_DIR" && find . -type f -printf '%f\n')"
 
 for file in $RESTORED_FILES; do
     diff "$RESTORE_DIR/$file" "${TMP_DIR}${RESTORE_DIR}/$file"
 done
 ```
 
-The backup process is run with systemd timers. An extract of the
+If there are any differences, `diff` returns an exit code of `1`, causing the
+script to fail. Otherwise, the backup passes and the script cleans up any
+temporary directories.
+
+## Systemd Timer
+
+The backup process is scheduled with systemd timers. An extract of the
 `backup.service` file is as follows:
 
 ```
@@ -108,6 +116,6 @@ role.
 ## References
 - [Preparing for the worst](https://tomm.org/2022/preparing-for-the-worst)
 
-[^1]: High bandwidth costs for checks on remote backup repositories, the need
-    for disk space to perform the restores to, very time-consuming depending on
-    your network speeds etc.
+[^1]: Due to high bandwidth costs for checks on remote backup repositories, the
+    need for disk space to perform the restores to, all of which can be very
+    expensive and time-consuming.
diff --git a/content/posts/hubble.md b/content/posts/hubble.md
@@ -1,7 +1,7 @@
 ---
 title: "Hubble Homelab"
 date: 2022-07-25T16:30:00+08:00
-lastmod: 2022-09-01
+lastmod: 2024-01-22
 draft: false
 toc: true
 images:
@@ -10,9 +10,6 @@ tags:
   - selfhosted
 ---
 
-After the [Planck]({{< ref "/posts/selfhosting.md" >}})[^1], I wanted a dedicated
-server for learning and working with DevOps concepts and tools.
-
 ## Hubble
 
 Hubble is an Intel HP Elitedesk 800 G2 Mini NUC (i5-6500T, 8GB DDR4). It has more than 1
@@ -183,7 +180,7 @@ were still ongoing since they began five hours ago. Not good.
 
 Without thinking, I decided to go for the easiest solution: turn it off and on again and
 hope for the best (In hindsight, NEVER DO THIS).  On boot, I checked for data loss.
-Everything seemed normal[^2] and all files were supposedly there. It was then I also
+Everything seemed normal[^1] and all files were supposedly there. It was then I also
 realised I needed a better way to check for data loss and backup integrity.
 
 Next, I tried to identity the root cause:
@@ -241,7 +238,5 @@ Let's see where we'll be in another six months.
 >At the time of writing (Jul 2022), Hubble has remained online and stable for
 >more than two months without maintenance, while I took a break.
 
-[^1]: Not to be confused with the [Planck]({{< ref "/posts/keyboards/planck.md" >}})
-  keyboard that I use.
-[^2]: Except that I discovered that the static route from the Proxmox host to NFS server
+[^1]: Except that I discovered that the static route from the Proxmox host to NFS server
   disappeared on reboot. I had forgotten to set up a permanent route.
diff --git a/content/posts/hugo-serve.md b/content/posts/hugo-serve.md
@@ -2,7 +2,7 @@
 title: "hugo serve"
 date: 2021-11-18T16:30:33+08:00
 lastmod: 2021-11-18
-draft: false
+draft: true
 toc: false
 images:
 tags:

diff --git a/content/posts/keyboards/planck.md b/content/posts/keyboards/planck.md
@@ -1,6 +1,7 @@
 ---
 title: "Keyboards - Planck"
 date: 2022-01-21T17:10:11+08:00
+lastmod: 2024-01-22
 draft: false
 toc: true
 tags:
@@ -9,35 +10,32 @@ tags:
 
 {{< figure src="https://imgs.xkcd.com/comics/borrow_your_laptop.png" caption="relevant xkcd 1806" link="https://xkcd.com/1806" class="center" >}}
 
-I have been using the Planck keyboard for almost a year now, at this time of writing.
-
-{{< figure src="/posts/keyboards/images/planck.png" caption="The Planck Rev 6" alt="The Planck Rev 6" class="center" width="350px">}}
-
-Specs:
+At the time of writing, I have been using the [Planck
+keyboard](https://olkb.com/collections/planck) for almost a year. Specs ([bill of materials](#bill-of-materials)):
 - 67g Tangerines linear switches
 - Black DSA blank keycaps
 - Lubed and filmed with Krytox 205g0 and Deskeys
 
-A [bill of materials](#bill-of-materials) is included at the end of this post.
+
+{{< figure src="/posts/keyboards/images/planck.png" caption="The Planck Rev 6" alt="The Planck Rev 6" class="center" width="350px">}}
 
 ## Features
 
 I got the Planck because I wanted to try out a 40% ortholinear keyboard. Why? I
 just thought it might be fun.
 
 The Planck has a 4x12 layout with a maximum of just 48 keys. It is fully
-programmable with [QMK firmware](https://github.com/qmk/qmk_firmware) and to top
-it off, its fully hotswappable. Of course, soldering is fun too, but there are
-other opportunities for that.
+programmable with [QMK firmware](https://github.com/qmk/qmk_firmware) and fully
+hotswappable[^1].
 
-As far as 40% keyboards go, the Planck is a classic choice. *Layers*[^1] make up
+As far as 40% keyboards go, the Planck is a classic choice. Layers[^2] make up
 for the lack of number and function rows, and you can create some cool key
 combos based on your workflow.
 
 However, the ortholinear layout does take some time getting used to, as opposed
-to the staggered layout. My WPM fell sharply in my first 3 weeks, but I
-adapted quickly as I was already practicing touch typing. I also switched to
-the Planck around the time I was fully writing my undergraduate thesis helped me
+to the staggered layout. My WPM fell sharply in my first 3 weeks, but I adapted
+quickly as I was already practicing touch typing. I also switched to the Planck
+around the time I was fully writing my undergraduate thesis which helped me
 practice.
 
 {{< figure src="/posts/keyboards/images/monkeytype.png" caption="You can clearly see the steep drop, followed by consistently low tries. From [monkeytype.com](https://monkeytype.com)" alt="My drop in WPM" class="center" >}}
@@ -48,7 +46,7 @@ a little better now - I can pinpoint `$` as the 4th symbol, although I still
 occasionally mix up the positions of `%, ^, &` and `*`.
 
 I also discovered that I use my index finger to hit the spacebar as opposed to
-my thumbs, and this is considered weird. To me, it seems natural, granted I've
+my thumbs and this is considered weird. To me, it seems natural, granted I've
 been doing it all my life. I did consider forcing myself to relearn this but I
 didn't see a point since my keyboard was already so tiny.
 
@@ -124,4 +122,5 @@ You also need the following optional items:
 - Switch puller
 - Switch opener (or use a screwdriver)
 
-[^1]: Layers are activated by holding down the *raise* or *lower* keys and pressing the desired key.
+[^1]: Soldering is fun, but there are other opportunities for that.
+[^2]: Layers are activated by holding down the *raise* or *lower* keys and pressing the desired key.
diff --git a/content/posts/monitoring-backups-with-prometheus.md b/content/posts/monitoring-backups-with-prometheus.md
@@ -1,7 +1,7 @@
 ---
 title: "Monitoring Backups With Prometheus"
 date: 2023-08-10
-lastmod: 2023-08-10
+lastmod: 2024-01-22
 draft: false
 toc: true
 tags:
@@ -10,52 +10,39 @@ tags:
   - prometheus
 ---
 
-I previously wrote about [running automated restore tests]({{< ref "automated-testing-of-restic-backups.md" >}})
-for daily restic backups. But we didn't dicuss how we will be alerted should any
-backups fail. Some possible methods of sending notifications are:
+I previously wrote about [running automated restore tests]({{< ref
+"automated-testing-of-restic-backups.md" >}}) when performing daily restic
+backups. If the backup script fails, it should send out an alert or
+notification. Some possible methods of alerting include:
 
 - systemd's `OnFailure` key to run a script on failure
-- via webhooks (eg. with [uptime-kuma](https://github.com/louislam/uptime-kuma)
-  or Gotify)
+- webhooks (eg. with [uptime-kuma](https://github.com/louislam/uptime-kuma) or
+  Gotify)
 - Prometheus and AlertManager
 
 I decided to go with the last option because I've never written a Prometheus
-exporter before and also wanted to monitor some backup metrics anyway.
+exporter before and wanted to try it out. It would also provide some backup
+metrics that might be useful.
 
 ## Prometheus Exporter
 
-The `backup-exporter` script generates metrics after every backup that are
-consumed by [Node exporter's
+My Prometheus exporter is a Python
+[script](https://github.com/kencx/homelab/blob/master/ansible/roles/autorestic/files/backup-exporter)
+that generates text metrics which are then consumed by [Node exporter's
 textfile-collector](https://github.com/prometheus/node_exporter#textfile-collector).
-These metrics are exposed to Prometheus, where they are consumed by Grafana and
-AlertManager.
+These metrics are exposed to Prometheus, where they are then consumed by Grafana
+and AlertManager.
 
-Extending on the previous `backup.service`:
-
-```
-# /etc/systemd/system/backup.service
-[Service]
-Type=oneshot
-ExecStartPre=/usr/bin/generate-restore-test-files.sh
-ExecStart=/usr/bin/autorestic-backup.sh
-ExecStartPost=/usr/bin/backup-exporter -l /var/log/autorestic.log -e restic.prom
-```
-
-{{< alert type="note" >}}
-The complete `backup-exporter` script is found
-[here](https://github.com/kencx/homelab/blob/master/ansible/roles/autorestic/files/backup-exporter).
-{{< /alert >}}
-
-Because restic does not output any metrics or logs in a machine-readable format,
-`backup-exporter` is a custom Python script to parse the log output of restic:
+Because restic does not output any metrics or logs in a machine-readable format
+(AKA `json`), the script reads and parses the log output of restic directly:
 
 ```bash
+# autorestic.log
 Files:           0 new,     0 changed, 11621 unmodified
 Dirs:            0 new,     2 changed,  1338 unmodified
 Added to the repository: 724 B (862 B stored)
 processed 11621 files, 14.647 GiB in 0:02
 ```
-
 ```python
 files = re.compile(r"Files:.*?(\d+) new.*?(\d+) changed.*?(\d+) unmodified")
 dirs = re.compile(r"Dirs:.*?(\d+) new.*?(\d+) changed.*?(\d+) unmodified")
@@ -78,10 +65,20 @@ restic_repo_total_files{location="archives",backend="remote"} 11621
 restic_repo_duration_seconds{location="archives",backend="remote"} 355
 ```
 
-These above metrics are repeated for each separate autorestic location and
+These generated metrics are repeated for each separate autorestic location and
 backend. With these repository specific metrics, there are also two general
 metrics that indicate if the backup passed and when the backup was last ran:
 
+```python
+def add_general_metrics(success):
+    num = 0 if success else 1
+    m = """
+restic_backup_success {num}
+restic_backup_latest_datetime {timestamp}
+    """.format(num=num, timestamp=datetime.datetime.now().timestamp())
+    return m.strip()
+```
+
 ```
 restic_backup_success 0
 restic_backup_latest_datetime 1691533984.343583
@@ -90,16 +87,30 @@ restic_backup_latest_datetime 1691533984.343583
 Should a backup fail without any logs/stats to parse, the script will only
 generate the general metrics.
 
+## Systemd
+
+This custom script is run after a backup by extending `backup.service` to
+include `ExecStartPost`
+
+```
+# /etc/systemd/system/backup.service
+[Service]
+Type=oneshot
+ExecStartPre=/usr/bin/generate-restore-test-files.sh
+ExecStart=/usr/bin/autorestic-backup.sh
+ExecStartPost=/usr/bin/backup-exporter -l /var/log/autorestic.log -e restic.prom
+```
+
 ## Grafana Dashboard
 
 {{< figure src="/posts/images/backup-grafana-dashboard.png" caption="Grafana dashboard for backups" class="center" >}}
 
 ## AlertManager
 
-AlertManager is configured to send a Telegram notification if:
+Finally, we configure AlertManager to send a Telegram notification if:
 
 - A backup fails
-- A backup has not been successfully completed in the past 26 hours (timestamp
+- A backup has not been successfully completed in the past 26 hours (i.e. timestamp
   metric is too old).
 
 ```yml
@@ -116,8 +127,8 @@ groups:
           summary: 'Backup failed at {{ with query "restic_backup_latest_datetime" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
 ```
 
-A 2 hour grace period is given to account for certain days where the backup
-might take longer than the previous day's, which would cause a false-negative.
+A 2 hour grace period is given to account for a scenario where a backup might
+take longer than the previous day, resulting in a false-negative.
 
 ## References