Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make changes to check for broken links #4

Merged
merged 5 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/workflows/check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Check links

on:
# when someone makes a change directly to main branch
push:
branches:
- main
# when someone requests a change to main branch
pull_request:
branches:
- main

# run periodically
schedule:
- cron: "0 0 * * *"
# run manually
workflow_dispatch:

jobs:
check:
runs-on: ubuntu-latest
steps:
- if: runner.debug == '1'
uses: mxschmitt/action-tmate@v3

- name: Get this repo's code
uses: actions/checkout@v4

- name: Set up Bun
uses: oven-sh/setup-bun@v1

- name: Install packages
run: bun install glob@v9 yaml@v2

- name: Run check script
run: bun ./check.js
15 changes: 10 additions & 5 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,30 @@ env:

jobs:
encode:
name: Encode and deploy
runs-on: ubuntu-latest
steps:
- if: runner.debug == '1'
uses: mxschmitt/action-tmate@v3

- name: Get this repo's code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
path: redirects-repo # save in separate sub-folder

- name: Get website repo's code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: ${{ github.repository_owner }}/${{ env.website_repo }} # assume same user/org
path: website-repo # save in separate sub-folder

- name: Set up Bun
uses: oven-sh/setup-bun@v1

- name: Install packages
run: npm install glob@v9 yaml@v2
vincerubinetti marked this conversation as resolved.
Show resolved Hide resolved
run: bun install glob@v9 yaml@v2

- name: Run encode script
run: node ./redirects-repo/encode.js
run: bun ./redirects-repo/encode.js

- name: Commit result to website repo
if: ${{ github.event_name == 'push' }}
Expand Down
45 changes: 24 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@ _Counterpart to the [redirects-website repo](../../../redirects-website)._

1. Add/change/remove redirect entries in one or more [`.yaml` files in the top folder](../../blob/main/redirects.yaml).
Note: the `from` field is **case-insensitive**.
2. Commit the changes to the `main` branch, either directly or with a pull request (recommended so the automatic process can catch errors before the changes go live).
3. Changes should take effect automatically within a minute or so.
1. Commit the changes to the `main` branch, either directly or with a pull request (recommended so the automatic process can catch errors before the changes go live).
1. Changes should take effect automatically within a minute or so.
Verify that no errors occurred in the automatic process here: [![Encode and deploy](../../actions/workflows/deploy.yaml/badge.svg)](../../actions/workflows/deploy.yaml)
1. Verify that none of your redirect links are reported broken in the automatic process here: [![Check links](../../actions/workflows/check.yaml/badge.svg)](../../actions/workflows/check.yaml).
Note that this is only a **rough check**.
There _may be false positives or true negatives_, as it simply checks the [status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) of the link, which the third-party may choose inappropriately.

You can do this [directly on github.com](../../edit/main/redirects.yaml) (tip: press <kbd>.</kbd> right now), or locally with git.

Expand Down Expand Up @@ -96,19 +99,19 @@ After the one-time setup, **all you have to do is edit the `.yaml` files, and ev
Adding/removing/changing a link goes like this:

1. You change one or more of the `.yaml` files in the _redirects repo_.
2. `deploy.yaml` tells [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions) that any time someone commits a change to the repo, it should automatically run the `encode.js` script.
3. The `encode.js` script combines all of your `.yaml` files into one, and encodes it[^1].
4. `deploy.yaml` then tells GitHub to take the result of the `encode.js` script and commit it to the `redirect.js` script in the _website repo_.
5. In the _website repo_, GitHub Pages detects a change in the `redirect.js` script, and updates the website.
1. `deploy.yaml` tells [GitHub Actions](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions) that any time someone commits a change to the repo, it should automatically run the `encode.js` script.
1. The `encode.js` script combines all of your `.yaml` files into one, and encodes it[^1].
1. `deploy.yaml` then tells GitHub to take the result of the `encode.js` script and commit it to the `redirect.js` script in the _website repo_.
1. In the _website repo_, GitHub Pages detects a change in the `redirect.js` script, and updates the website.

Then, a user visiting a link goes like this:

1. They navigate to a link on the website, e.g. `/chatroom`.
2. `chatroom.html` isn't a file in the _website repo_, and thus isn't a page on the website, so GitHub loads [`404.html`](https://en.wikipedia.org/wiki/HTTP_404) for the user instead (but preserves the `/chatroom` url).
1. `chatroom.html` isn't a file in the _website repo_, and thus isn't a page on the website, so GitHub loads [`404.html`](https://en.wikipedia.org/wiki/HTTP_404) for the user instead (but preserves the `/chatroom` url).
This file immediately runs some scripts:
3. The analytics code snippet sends[^2] stats like url, IP, date, time, location, etc. off to Google Analytics or whoever.
4. The `redirect.js` script decodes the redirect lists previously encoded from the _redirects repo_, finds the long url corresponding to "chatroom" (**case-insensitive**), and navigates there instead.
5. They arrive at the intended destination, e.g. `zoom.us/j/12345abcdef`, with virtually no perceptible delay.
1. The analytics code snippet sends[^2] stats like url, IP, date, time, location, etc. off to Google Analytics or whoever.
1. The `redirect.js` script decodes the redirect lists previously encoded from the _redirects repo_, finds the long url corresponding to "chatroom" (**case-insensitive**), and navigates there instead.
1. They arrive at the intended destination, e.g. `zoom.us/j/12345abcdef`, with virtually no perceptible delay.

## Setup

Expand All @@ -117,10 +120,10 @@ Then, a user visiting a link goes like this:
1. [Use the _redirects repo_ (this repo) as a template](https://github.com/CU-DBMI/redirects/generate).
**Do not fork**, because you cannot make forks private.
_Name it `redirects` and make it private_.
2. [Use the _website repo_ as a template](https://github.com/CU-DBMI/redirects-website/generate).
1. [Use the _website repo_ as a template](https://github.com/CU-DBMI/redirects-website/generate).
_Name it `redirects-website` and make it public_.
3. [Enable GitHub Pages](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site) on your copied _website repo_ with the default settings.
4. After a minute or so, GitHub should tell you that your site is now being hosted at `your-org.github.io/redirects-website`.
1. [Enable GitHub Pages](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site) on your copied _website repo_ with the default settings.
1. After a minute or so, GitHub should tell you that your site is now being hosted at `your-org.github.io/redirects-website`.

If you ever need to pull in updates from these templates, [see the instructions here](https://stackoverflow.com/questions/56577184/github-pull-changes-from-a-template-repository).

Expand All @@ -129,8 +132,8 @@ If you ever need to pull in updates from these templates, [see the instructions
To allow your _redirects repo_ to automatically write to your _website repo_, you need to "connect" them with a deploy key:

1. [Generate an SSH key pair](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#generating-a-new-ssh-key).
2. In your _redirects repo_, [create a new repository actions secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) named `DEPLOY_KEY`, and paste the private SSH key.
3. In your _website repo_, [create a new deploy key](https://docs.github.com/en/developers/overview/managing-deploy-keys#setup-2) with write/push access named `DEPLOY_KEY`, and paste the public SSH key.
1. In your _redirects repo_, [create a new repository actions secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) named `DEPLOY_KEY`, and paste the private SSH key.
1. In your _website repo_, [create a new deploy key](https://docs.github.com/en/developers/overview/managing-deploy-keys#setup-2) with write/push access named `DEPLOY_KEY`, and paste the public SSH key.

### Set up analytics

Expand All @@ -155,17 +158,17 @@ e.g. `your-domain.com/some-link`
In summary:

1. Purchase a domain name from a reputable service.
2. Point your domain name provider to GitHub Pages using an `A` record.
1. Point your domain name provider to GitHub Pages using an `A` record.
This is slightly different for each company; they should have their own instructions on how to do it.
3. Set the custom domain field in the "Pages" settings of your _website repo_ (automatically creates a `CNAME` file in the repo).
4. After a minute or so, GitHub should tell you that your site is now being hosted at `your-domain.com`.
1. Set the custom domain field in the "Pages" settings of your _website repo_ (automatically creates a `CNAME` file in the repo).
1. After a minute or so, GitHub should tell you that your site is now being hosted at `your-domain.com`.

#### GitHub user/org site

e.g. `your-org.github.io/some-link`

1. Name your _website repo_ `your-org.github.io` to match your GitHub user/organization name.
2. In your _redirects repo_, change `redirects-website` in `deploy.yaml` to the same name.
1. In your _redirects repo_, change `redirects-website` in `deploy.yaml` to the same name.

[About GitHub user/org sites](https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages#types-of-github-pages-sites).

Expand All @@ -188,8 +191,8 @@ In your _website repo_:
If you already have a website being hosted with GitHub Pages that you want to incorporate this approach into:

1. Skip templating the _website repo_.
2. Instead, copy its [`redirect.js` script](https://github.com/CU-DBMI/redirects-website/blob/main/redirect.js) into the **top folder** of your existing website repo, and modify `baseurl` in it as appropriate.
3. Include the script in your 404 page in the [same way it is done here](https://github.com/CU-DBMI/redirects-website/blob/main/404.html).
1. Instead, copy its [`redirect.js` script](https://github.com/CU-DBMI/redirects-website/blob/main/redirect.js) into the **top folder** of your existing website repo, and modify `baseurl` in it as appropriate.
1. Include the script in your 404 page in the [same way it is done here](https://github.com/CU-DBMI/redirects-website/blob/main/404.html).
If an existing page and a redirect have same name/path, the redirect won't happen since the user won't get a [`404`](https://en.wikipedia.org/wiki/HTTP_404).

If your existing website is built and hosted in a different way, this approach would require modification[^3] and might not be appropriate for you.
Expand Down
28 changes: 28 additions & 0 deletions check.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { addError, getList, onExit } from "./core";

onExit();

// check list of redirects for broken links
async function checkList(list) {
return await Promise.all(
// for each redirect
list.map(async ({ to }) => {
try {
// do simple request to target url
const response = await fetch(to);
if (
// only fail on certain status codes that might indicate link is "broken"
// select as desired from https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
[
400, 404, 405, 406, 408, 409, 410, 421, 500, 501, 502, 503, 504,
].includes(response.status)
)
throw Error(response.status);
} catch (error) {
addError(`"to: ${to}" may be a broken link\n(${error})`);
}
})
);
}

await checkList(getList());
124 changes: 124 additions & 0 deletions core.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
import { readFileSync } from "fs";
import { resolve } from "path";
import { globSync } from "glob";
import { parse } from "yaml";

// if running in github actions debug mode, do extra logging
export const verbose = !!process.env.RUNNER_DEBUG;

// get full list of redirects
export function getList() {
// get yaml files that match glob pattern
const files = globSync("*.y?(a)ml", { cwd: __dirname });

log("Files", files.join(" "));

// start combined list of redirects
const list = [];

// keep track of duplicate entries
const duplicates = {};

// go through each yaml file
for (const file of files) {
// load file contents
const contents = readFileSync(resolve(__dirname, file), "utf8");

// try to parse as yaml
let data;
try {
data = parse(contents);
} catch (error) {
addError(`Couldn't parse ${file}. Make sure it is valid YAML.`);
continue;
}

// check if top level is list
if (!Array.isArray(data)) {
addError(`${file} is not a list`);
continue;
}

// go through each entry
for (let [index, entry] of Object.entries(data)) {
index = Number(index) + 1;
const trace = `${file} entry ${index}`;

// check if dict
if (typeof entry !== "object") {
addError(`${trace} is not a dict`);
continue;
}

// check "from" field
if (!(typeof entry.from === "string" && entry.from.trim())) {
addError(`${trace} "from" field invalid`);
continue;
}

// check "to" field
if (!(typeof entry.to === "string" && entry.to.trim()))
addError(`${trace} "to" field invalid`);

// normalize "from" field. lower case, remove leading slashes.
entry.from = entry.from.toLowerCase().replace(/^(\/+)/, "");

// add to combined list
list.push(entry);

// add to duplicate list. record source file and entry number for logging.
duplicates[entry.from] ??= [];
duplicates[entry.from].push({ ...entry, file, index });
}
}

// check that any redirects exist
if (!list.length) addError("No redirects");

if (verbose) log("Combined redirects list", list);

// trigger errors for duplicates
for (const [from, entries] of Object.entries(duplicates)) {
const count = entries.length;
if (count <= 1) continue;
const duplicates = entries
.map(({ file, index }) => `\n ${file} entry ${index}`)
.join("");
addError(`"from: ${from}" appears ${count} time(s): ${duplicates}`);
}

return list;
}

// collect (caught) errors to report at end
const errors = [];

// add error
export function addError(error) {
errors.push(error);
}

// when script finished, report all errors together
export function onExit() {
process.on("exit", () => {
if (errors.length) {
errors.forEach(logError);
logError(`${errors.length} error(s)`);
process.exit(1);
} else {
process.exitCode = 0;
log("No errors!");
}
});
}

// formatted normal log
export function log(message, data) {
console.info("\x1b[1m\x1b[96m" + message + "\x1b[0m");
if (data) console.log(data);
}

// formatted error log
export function logError(message) {
console.error("\x1b[1m\x1b[91m" + message + "\x1b[0m");
}
Loading
Loading