Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pa11y runs are too slow #3752

Closed
4 tasks done
alexsobledotgov opened this issue Sep 7, 2023 · 26 comments
Closed
4 tasks done

Pa11y runs are too slow #3752

alexsobledotgov opened this issue Sep 7, 2023 · 26 comments

Comments

@alexsobledotgov
Copy link
Contributor

alexsobledotgov commented Sep 7, 2023

Problem

  • Right now the CI takes a very long time to run: ~15-20 minutes for each commit.
  • The pa11y CI step is the longest and takes about ~15 minutes to run.
  • This CI step scans every single page of the 18F website every time it runs. It's re-scanning the entire website unnecessarily, over and over and over.

As of 4/12/24: There's been a lot of activity on this ticket! We're moving it forward with the following acceptance criteria:

Acceptance criteria

  • Adjust the pa11y-limited npm script to scan appropriate pages for each Pull Request. (Will be up to the developer to determine pa11y-ci's capabilities here in terms of what sub-set of pages can be reasonably scanned.)
  • Use pa11y-limited script for pull request CI pipeline
  • Find a way to run the complete scan on some decided-upon schedule (once a week, once a month, etc) outside of the git workflow, and notify a maintainer of any errors. Note: as an example, Tock also runs a job on a daily schedule. Feel free to reuse the Tock configuration. As for notifications, we do an audit every week which lets us catch CI errors, but that probably won't work for TLC.
  • Document how the notifications of broken scheduled runs work — who gets notified and how.
@echappen
Copy link
Contributor

Path forward for this ticket may be determined by outcome of TLC-crew #474

@juliaklindpaintner
Copy link
Member

Working on an ADR to search for tools to address this

@echappen
Copy link
Contributor

Issue that blocks this work is here!

@juliaklindpaintner
Copy link
Member

Thanks @echappen! Can you link to the ADR as well? And then I think we decided we could move this one to done as the ticket names a specific solution. Let me know if you disagree or think the content of the ticket should be captured in the other ticket.

@echappen
Copy link
Contributor

echappen commented Nov 17, 2023

To recap the story of TLC’s pa11y work that came out of this issue:

  1. This issue was incorporated into this TLC issue to explore any and all ways to make pa11y faster (without proposing a solution)
  2. Here is the old proposed ADR for looking into above issue
  3. Discussion from that led to a new issue for looking into pa11y alternatives

In the interest of keeping the underlying problem ("pa11y runs are too slow") top of mind, I'm going to remove this issue from Blocked, but replace it with the issue in no 3 (TLC Crew 501) in the Blocked column.

@echappen
Copy link
Contributor

echappen commented Nov 17, 2023

Actually, I think the easiest thing would be to do what John suggested: keep this issue in Blocked, update the title to "Pa11y runs are too slow" and remove the proposed solution (or at least downgrade it as one option to explore, of many). @alexsoble as the original poster, can you make those changes?

@echappen
Copy link
Contributor

Tagged the wrong Alex—should be @alexsobledotgov

@alexsobledotgov alexsobledotgov changed the title speed up CI process -- pa11y CI should only scan pages where content was changed Pa11y runs are too slow Nov 17, 2023
@alexsobledotgov
Copy link
Contributor Author

@echappen Yep! updated the title, created a "problem" section and a list with a few potential solutions. Good suggestion/catch here @jskinne3 !

@cantsin
Copy link
Member

cantsin commented Dec 11, 2023

Let's investigate these CI slowdowns and see if we have a quick fix here. I think we should put someone on this issue so we can at least figure out better paths forward since we are probably sticking with pa11y for the immediate future. We should probably timebox the investigation too.

@echappen
Copy link
Contributor

Per our discussion at TLC Planning on 12/11/23, let's try this:

  1. Adjust the pa11y-limited npm script to scan appropriate pages for each Pull Request. (Will be up to the developer to determine pa11y-ci's capabilities here in terms of what sub-set of pages can be reasonably scanned.)
  2. Use pa11y-limited script for pull request CI pipeline
  3. Find a way to run the complete scan on some decided-upon schedule (once a week, once a month, etc) outside of the git workflow, and notify a maintainer of any errors.

@cantsin
Copy link
Member

cantsin commented Dec 12, 2023

For bullet point 3 above, as an example, Tock also runs a job on a daily schedule. Feel free to reuse the Tock configuration. As for notifications, we do an audit every week which lets us catch CI errors, but that probably won't work for TLC.

@kbighorse
Copy link
Contributor

validated concurrency problem:

concurrency 1:

⇒  time npm run test:pa11y-ci
...
✔ 224/224 URLs passed
npm run test:pa11y-ci  10.56s user 1.98s system 2% cpu 8:22.21 total

concurrency 2:

⇒  time npm run test:pa11y-ci
...
✘ 149/224 URLs passed
npm run test:pa11y-ci  4.27s user 0.99s system 2% cpu 3:40.19 total

explored the pa11y-ci API at https://github.com/pa11y/pa11y-ci:

  • accepts a sitemap file and traverses it. supports exceptions.
  • does not support a whitelist of files

did some desk research at https://opensource.com/article/23/2/automated-accessibility-testing`

recommendations:

  • investigate concurrency problem for a resolution
  • more aggressive exceptions to the sitemap since a whitelist approach is currently not supported.
  • work backward from actual a11y errors and preserve those for regressions of common a11y issues

@kbighorse kbighorse removed their assignment Feb 8, 2024
@beechnut
Copy link
Contributor

I'm coming to this thread after waiting 22 minutes for a pa11y scan. Changing a single boolean value (#3822) has now taken about an hour.

Where did we leave off with this? With @echappen's last comment, it looks like the path forward had been decided — use pa11y-limited for PR CI runs, and schedule full runs to be daily, like Tock.

Is there any compelling reason that someone shouldn't go ahead implement those changes? (I'm going to regret asking permission instead of forgiveness, aren't I?)

@echappen
Copy link
Contributor

@beechnut Tasks 1 and 2 should be straightforward. No. 3 gets fuzzy over who should be notified of job failures. TLC crew members are too transient for this. Maybe TLC leads could be notified of failures, and then they can assign whoever’s on the crew at that time to fix them.

@juliaklindpaintner
Copy link
Member

@beechnut Thanks for raising this again. We'll discuss with @adunkman again this afternoon and consider prioritizing for next increment — when you'll be joining us!

@beechnut
Copy link
Contributor

@caleywoods For limited runs, I was thinking about a Ruby script that:

  • Gets the files changed in this commit (git diff-tree --no-commit-id --name-only -r $(git rev-parse HEAD))
  • Resolves them to the right site URLs (Can we require in the resolver? If not, I have other ideas.)
  • Runs pa11y on that list of URLs only

@caleywoods
Copy link
Contributor

caleywoods commented Apr 16, 2024

@caleywoods For limited runs, I was thinking about a Ruby script that:

  • Gets the files changed in this commit (git diff-tree --no-commit-id --name-only -r $(git rev-parse HEAD))
  • Resolves them to the right site URLs (Can we require in the resolver? If not, I have other ideas.)
  • Runs pa11y on that list of URLs only

@beechnut I agree, I was also thinking about git diff --name-only during PR's for this. Kimball's comment says that Pa11y doesn't support a whitelist of files but I think it actually does if we create a temporary fake sitemap file in our script and point it to that. From reading Kimball's other comments it sounds like maybe the approach he was considering was actually launching concurrent instances of Pa11y against each URL that needs it.

For the complete/full scan I think that's easy enough to run on a schedule via GH actions similar to a cron just in the way that the provided Tock scheduled job is working. For the other point(s) I think we can tackle those after we take care of these first two.

Edit: Example of wrapping multiple instances of pa11y with Promise.all() that might be useful for us if we'd like to take that approach over writing a temporary fake sitemap file.

@beechnut
Copy link
Contributor

I have good news: on a local installation,pa11y-ci successfully ran given a list of HTML page file paths:

$ pa11y-ci 404.html _site/500/index.html

Running Pa11y on 2 URLs:
 > file:///Users/path/to/18f.gsa.gov/404.html - 0 errors
 > file:///Users/path/to/18f.gsa.gov/_site/500/index.html - 0 errors

✔ 2/2 URLs passed

So we may not even have to run the server to do the checks, if we're running the checks on the built _site pages.

It doesn't say so particularly clearly in the docs, but pa11y-ci does accept a list of paths as the last command-line argument, and pa11y says it accepts that, plus patterns or globs.


The problem I keep running into is that _site is gitignored — so how would we check which pages in _site are different, when we don't have last commit's _site as a basis for comparison?

I'm averse to getting overly clever, but, here's what's on my mind as possible ways to get the changed files/URLs given this constraint.

  1. Build both the last commit's site and this commit's site in a way where we can then diff the two to get the file paths.
  2. Diff the source files and store the list. Register a Jekyll hook for posts and pages, which checks whether the source file is one of the listed diffed files. If it is, keep track of the output HTML file path in another list, and later run pa11y checks on that list.

@cantsin
Copy link
Member

cantsin commented Apr 16, 2024

I typically don't like stateful solutions but honestly I prefer 2 above -- if I read you right, we'd store a file of sha1 (or whatever) hashes of _site paths and commit that file and then only test these paths that don't match the hashes? Downside is that we'd have to update that file somehow but that can be done in a separate (automated?) step, maybe.

The first option seems a bit too fiddly for me. 🤷🏻

@beechnut
Copy link
Contributor

beechnut commented Apr 16, 2024

One clarification: I don't see a need to hash anything — git diffing does everything we need. Otherwise, yes: we diff the source files to get new/changed source files, use Jekyll hooks on those new/changed source files to identify the resulting site files, and then run pa11y only on the new/changed site files.

Agreed, the first is quite fiddly. The second feels fiddly to me as well, but hopefully manageably so.

beechnut added a commit that referenced this issue Apr 22, 2024
This work attempts to solve the problem of overly long CI runs, in which single PRs were taking 25+ minutes, because each PR scans every page with pa11y-ci. See #3752.

This commit adds a Jekyll::Hook which keeps track of which destination files (built HTML files) should be checked with pa11y-ci for each PR. The files to check are based on which files changed in the last commit.

The design of the code lets you swap out different "differs", which are the objects responsible for getting the list of files. This design makes the code easier to test, and makes it more adaptable.

One problem I'm seeing so far is that we only check what files were changed *in the last commit*. This means that, if you're ever running CI locally, you'd have to commit your changes to have them picked up by the "differ". We should probably change this in a future commit, and have a "differ" for local development and a "differ" for CI. To summarize, the differs should differ.

We also have yet to implement the case where a stylesheet or something site-wide changes.
@beechnut
Copy link
Contributor

@caleywoods I just pushed a draft branch with the hook. The commit message explains where I left off.

@beechnut
Copy link
Contributor

@caleywoods I just added site-wide checking to the same branch.

@caleywoods
Copy link
Contributor

#3829 Satisfies this I believe.

@beechnut
Copy link
Contributor

@caleywoods #3829 satisfies the first two points of the acceptance criteria, and we need #3832 to check off the last two.

@caleywoods
Copy link
Contributor

@caleywoods #3829 satisfies the first two points of the acceptance criteria, and we need #3832 to check off the last two.

Thanks, I was coming back to adjust my comment and you beat me to it!

@cantsin
Copy link
Member

cantsin commented Jul 26, 2024

Chatted with @caleywoods:

The work is done 🎉

Associated wiki entry that covers everything in detail: https://github.com/18F/18f.gsa.gov/wiki/CircleCI-scheduled-nightly-pa11y-scan

We have some refinements we could do, but these are likely lower priority:

  • Figure out how to escape the body text of the GitHub issue. This involves editing the repository_dispatch GitHub workflow file. This is a “nice to have” feature because the output of the pa11y scan can have HTML and if the body isn’t wrapped in backticks it causes the HTML to be rendered which isn’t very helpful. Here’s an example, the ID shown there should actually say

    ID

  • Implement some logic in the workflow file to find a previous failed pa11y scan issue (using the gh issue tool and scanning for the tags nightly-pa11y-scan-failed,type:accessibility and compare the issue body of the previous issue to the new one coming in and take the appropriate action. The goal of this was only have one issue open at a time that represents a unique failure. We should always have an issue open for each unique failure but not have multiple issues open for the same failure that occurs multiple days in a row.

Closing in favor of #4027

@cantsin cantsin closed this as completed Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants