Skip to content

Commit

Permalink
Make changes to check for broken links (#4)
Browse files Browse the repository at this point in the history
  • Loading branch information
vincerubinetti authored Mar 13, 2024
1 parent 83767e7 commit c03c54c
Show file tree
Hide file tree
Showing 6 changed files with 252 additions and 145 deletions.
36 changes: 36 additions & 0 deletions .github/workflows/check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Check links

on:
# when someone makes a change directly to main branch
push:
branches:
- main
# when someone requests a change to main branch
pull_request:
branches:
- main

# run periodically
schedule:
- cron: "0 0 * * *"
# run manually
workflow_dispatch:

jobs:
check:
runs-on: ubuntu-latest
steps:
- if: runner.debug == '1'
uses: mxschmitt/action-tmate@v3

- name: Get this repo's code
uses: actions/checkout@v4

- name: Set up Bun
uses: oven-sh/setup-bun@v1

- name: Install packages
run: bun install glob@v9 yaml@v2

- name: Run check script
run: bun ./check.js
15 changes: 10 additions & 5 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,30 @@ env:

jobs:
encode:
name: Encode and deploy
runs-on: ubuntu-latest
steps:
- if: runner.debug == '1'
uses: mxschmitt/action-tmate@v3

- name: Get this repo's code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
path: redirects-repo # save in separate sub-folder

- name: Get website repo's code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: ${{ github.repository_owner }}/${{ env.website_repo }} # assume same user/org
path: website-repo # save in separate sub-folder

- name: Set up Bun
uses: oven-sh/setup-bun@v1

- name: Install packages
run: npm install glob@v9 yaml@v2
run: bun install glob@v9 yaml@v2

- name: Run encode script
run: node ./redirects-repo/encode.js
run: bun ./redirects-repo/encode.js

- name: Commit result to website repo
if: ${{ github.event_name == 'push' }}
Expand Down
45 changes: 24 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@ _Counterpart to the [redirects-website repo](../../../redirects-website)._

1. Add/change/remove redirect entries in one or more [`.yaml` files in the top folder](../../blob/main/redirects.yaml).
Note: the `from` field is **case-insensitive**.
2. Commit the changes to the `main` branch, either directly or with a pull request (recommended so the automatic process can catch errors before the changes go live).
3. Changes should take effect automatically within a minute or so.
1. Commit the changes to the `main` branch, either directly or with a pull request (recommended so the automatic process can catch errors before the changes go live).
1. Changes should take effect automatically within a minute or so.
Verify that no errors occurred in the automatic process here: [![Encode and deploy](../../actions/workflows/deploy.yaml/badge.svg)](../../actions/workflows/deploy.yaml)
1. Verify that none of your redirect links are reported broken in the automatic process here: [![Check links](../../actions/workflows/check.yaml/badge.svg)](../../actions/workflows/check.yaml).
Note that this is only a **rough check**.
There _may be false positives or true negatives_, as it simply checks the [status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) of the link, which the third-party may choose inappropriately.

You can do this [directly on github.com](../../edit/main/redirects.yaml) (tip: press <kbd>.</kbd> right now), or locally with git.

Expand Down Expand Up @@ -96,19 +99,19 @@ After the one-time setup, **all you have to do is edit the `.yaml` files, and ev
Adding/removing/changing a link goes like this:

1. You change one or more of the `.yaml` files in the _redirects repo_.
2. `deploy.yaml` tells [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions) that any time someone commits a change to the repo, it should automatically run the `encode.js` script.
3. The `encode.js` script combines all of your `.yaml` files into one, and encodes it[^1].
4. `deploy.yaml` then tells GitHub to take the result of the `encode.js` script and commit it to the `redirect.js` script in the _website repo_.
5. In the _website repo_, GitHub Pages detects a change in the `redirect.js` script, and updates the website.
1. `deploy.yaml` tells [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions) that any time someone commits a change to the repo, it should automatically run the `encode.js` script.
1. The `encode.js` script combines all of your `.yaml` files into one, and encodes it[^1].
1. `deploy.yaml` then tells GitHub to take the result of the `encode.js` script and commit it to the `redirect.js` script in the _website repo_.
1. In the _website repo_, GitHub Pages detects a change in the `redirect.js` script, and updates the website.

Then, a user visiting a link goes like this:

1. They navigate to a link on the website, e.g. `/chatroom`.
2. `chatroom.html` isn't a file in the _website repo_, and thus isn't a page on the website, so GitHub loads [`404.html`](https://en.wikipedia.org/wiki/HTTP_404) for the user instead (but preserves the `/chatroom` url).
1. `chatroom.html` isn't a file in the _website repo_, and thus isn't a page on the website, so GitHub loads [`404.html`](https://en.wikipedia.org/wiki/HTTP_404) for the user instead (but preserves the `/chatroom` url).
This file immediately runs some scripts:
3. The analytics code snippet sends[^2] stats like url, IP, date, time, location, etc. off to Google Analytics or whoever.
4. The `redirect.js` script decodes the redirect lists previously encoded from the _redirects repo_, finds the long url corresponding to "chatroom" (**case-insensitive**), and navigates there instead.
5. They arrive at the intended destination, e.g. `zoom.us/j/12345abcdef`, with virtually no perceptible delay.
1. The analytics code snippet sends[^2] stats like url, IP, date, time, location, etc. off to Google Analytics or whoever.
1. The `redirect.js` script decodes the redirect lists previously encoded from the _redirects repo_, finds the long url corresponding to "chatroom" (**case-insensitive**), and navigates there instead.
1. They arrive at the intended destination, e.g. `zoom.us/j/12345abcdef`, with virtually no perceptible delay.

## Setup

Expand All @@ -117,10 +120,10 @@ Then, a user visiting a link goes like this:
1. [Use the _redirects repo_ (this repo) as a template](https://github.com/CU-DBMI/redirects/generate).
**Do not fork**, because you cannot make forks private.
_Name it `redirects` and make it private_.
2. [Use the _website repo_ as a template](https://github.com/CU-DBMI/redirects-website/generate).
1. [Use the _website repo_ as a template](https://github.com/CU-DBMI/redirects-website/generate).
_Name it `redirects-website` and make it public_.
3. [Enable GitHub Pages](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site) on your copied _website repo_ with the default settings.
4. After a minute or so, GitHub should tell you that your site is now being hosted at `your-org.github.io/redirects-website`.
1. [Enable GitHub Pages](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site) on your copied _website repo_ with the default settings.
1. After a minute or so, GitHub should tell you that your site is now being hosted at `your-org.github.io/redirects-website`.

If you ever need to pull in updates from these templates, [see the instructions here](https://stackoverflow.com/questions/56577184/github-pull-changes-from-a-template-repository).

Expand All @@ -129,8 +132,8 @@ If you ever need to pull in updates from these templates, [see the instructions
To allow your _redirects repo_ to automatically write to your _website repo_, you need to "connect" them with a deploy key:

1. [Generate an SSH key pair](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#generating-a-new-ssh-key).
2. In your _redirects repo_, [create a new repository actions secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) named `DEPLOY_KEY`, and paste the private SSH key.
3. In your _website repo_, [create a new deploy key](https://docs.github.com/en/developers/overview/managing-deploy-keys#setup-2) with write/push access named `DEPLOY_KEY`, and paste the public SSH key.
1. In your _redirects repo_, [create a new repository actions secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) named `DEPLOY_KEY`, and paste the private SSH key.
1. In your _website repo_, [create a new deploy key](https://docs.github.com/en/developers/overview/managing-deploy-keys#setup-2) with write/push access named `DEPLOY_KEY`, and paste the public SSH key.

### Set up analytics

Expand All @@ -155,17 +158,17 @@ e.g. `your-domain.com/some-link`
In summary:

1. Purchase a domain name from a reputable service.
2. Point your domain name provider to GitHub Pages using an `A` record.
1. Point your domain name provider to GitHub Pages using an `A` record.
This is slightly different for each company; they should have their own instructions on how to do it.
3. Set the custom domain field in the "Pages" settings of your _website repo_ (automatically creates a `CNAME` file in the repo).
4. After a minute or so, GitHub should tell you that your site is now being hosted at `your-domain.com`.
1. Set the custom domain field in the "Pages" settings of your _website repo_ (automatically creates a `CNAME` file in the repo).
1. After a minute or so, GitHub should tell you that your site is now being hosted at `your-domain.com`.

#### GitHub user/org site

e.g. `your-org.github.io/some-link`

1. Name your _website repo_ `your-org.github.io` to match your GitHub user/organization name.
2. In your _redirects repo_, change `redirects-website` in `deploy.yaml` to the same name.
1. In your _redirects repo_, change `redirects-website` in `deploy.yaml` to the same name.

[About GitHub user/org sites](https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages#types-of-github-pages-sites).

Expand All @@ -188,8 +191,8 @@ In your _website repo_:
If you already have a website being hosted with GitHub Pages that you want to incorporate this approach into:

1. Skip templating the _website repo_.
2. Instead, copy its [`redirect.js` script](https://github.com/CU-DBMI/redirects-website/blob/main/redirect.js) into the **top folder** of your existing website repo, and modify `baseurl` in it as appropriate.
3. Include the script in your 404 page in the [same way it is done here](https://github.com/CU-DBMI/redirects-website/blob/main/404.html).
1. Instead, copy its [`redirect.js` script](https://github.com/CU-DBMI/redirects-website/blob/main/redirect.js) into the **top folder** of your existing website repo, and modify `baseurl` in it as appropriate.
1. Include the script in your 404 page in the [same way it is done here](https://github.com/CU-DBMI/redirects-website/blob/main/404.html).
If an existing page and a redirect have same name/path, the redirect won't happen since the user won't get a [`404`](https://en.wikipedia.org/wiki/HTTP_404).

If your existing website is built and hosted in a different way, this approach would require modification[^3] and might not be appropriate for you.
Expand Down
28 changes: 28 additions & 0 deletions check.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { addError, getList, onExit } from "./core";

onExit();

// check list of redirects for broken links
async function checkList(list) {
return await Promise.all(
// for each redirect
list.map(async ({ to }) => {
try {
// do simple request to target url
const response = await fetch(to);
if (
// only fail on certain status codes that might indicate link is "broken"
// select as desired from https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
[
400, 404, 405, 406, 408, 409, 410, 421, 500, 501, 502, 503, 504,
].includes(response.status)
)
throw Error(response.status);
} catch (error) {
addError(`"to: ${to}" may be a broken link\n(${error})`);
}
})
);
}

await checkList(getList());
124 changes: 124 additions & 0 deletions core.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
import { readFileSync } from "fs";
import { resolve } from "path";
import { globSync } from "glob";
import { parse } from "yaml";

// if running in github actions debug mode, do extra logging
export const verbose = !!process.env.RUNNER_DEBUG;

// get full list of redirects
export function getList() {
// get yaml files that match glob pattern
const files = globSync("*.y?(a)ml", { cwd: __dirname });

log("Files", files.join(" "));

// start combined list of redirects
const list = [];

// keep track of duplicate entries
const duplicates = {};

// go through each yaml file
for (const file of files) {
// load file contents
const contents = readFileSync(resolve(__dirname, file), "utf8");

// try to parse as yaml
let data;
try {
data = parse(contents);
} catch (error) {
addError(`Couldn't parse ${file}. Make sure it is valid YAML.`);
continue;
}

// check if top level is list
if (!Array.isArray(data)) {
addError(`${file} is not a list`);
continue;
}

// go through each entry
for (let [index, entry] of Object.entries(data)) {
index = Number(index) + 1;
const trace = `${file} entry ${index}`;

// check if dict
if (typeof entry !== "object") {
addError(`${trace} is not a dict`);
continue;
}

// check "from" field
if (!(typeof entry.from === "string" && entry.from.trim())) {
addError(`${trace} "from" field invalid`);
continue;
}

// check "to" field
if (!(typeof entry.to === "string" && entry.to.trim()))
addError(`${trace} "to" field invalid`);

// normalize "from" field. lower case, remove leading slashes.
entry.from = entry.from.toLowerCase().replace(/^(\/+)/, "");

// add to combined list
list.push(entry);

// add to duplicate list. record source file and entry number for logging.
duplicates[entry.from] ??= [];
duplicates[entry.from].push({ ...entry, file, index });
}
}

// check that any redirects exist
if (!list.length) addError("No redirects");

if (verbose) log("Combined redirects list", list);

// trigger errors for duplicates
for (const [from, entries] of Object.entries(duplicates)) {
const count = entries.length;
if (count <= 1) continue;
const duplicates = entries
.map(({ file, index }) => `\n ${file} entry ${index}`)
.join("");
addError(`"from: ${from}" appears ${count} time(s): ${duplicates}`);
}

return list;
}

// collect (caught) errors to report at end
const errors = [];

// add error
export function addError(error) {
errors.push(error);
}

// when script finished, report all errors together
export function onExit() {
process.on("exit", () => {
if (errors.length) {
errors.forEach(logError);
logError(`${errors.length} error(s)`);
process.exit(1);
} else {
process.exitCode = 0;
log("No errors!");
}
});
}

// formatted normal log
export function log(message, data) {
console.info("\x1b[1m\x1b[96m" + message + "\x1b[0m");
if (data) console.log(data);
}

// formatted error log
export function logError(message) {
console.error("\x1b[1m\x1b[91m" + message + "\x1b[0m");
}
Loading

0 comments on commit c03c54c

Please sign in to comment.