Merge pull request #751 from Kobzol/rustc-ci

Kobzol · web-flow · commit 46e9f0b971dd · 2024-06-13T22:44:08.000+02:00
Add a link to CI documentation in rustc-dev-guide
diff --git a/src/infra/docs/rustc-ci.md b/src/infra/docs/rustc-ci.md
@@ -1,276 +1,8 @@
 # How the Rust CI works
 
-Rust CI ensures that the master branch of rust-lang/rust is always in a valid state.
+Continuous integration (CI) workflows on the `rust-lang/rust` repository ensure that the `master` branch
+is always in a valid state.
 
-A developer submitting a pull request to rust-lang/rust, experiences the following:
+The CI infrastructure is described in detail in the [rustc-dev-guide][rustc-dev-guide].
 
-- A small subset of tests and checks are run on each commit to catch common errors.
-- When the PR is ready and approved, the "bors" tool enqueues a full CI run.
-- The full run either queues the specific PR or the PR is "rolled up" with other changes.
-- Eventually a CI run containing the changes from the PR is performed and either passes or fails with an error the developer must address.
-
-## Which jobs we run
-
-The `rust-lang/rust` repository uses GitHub Actions to test [all the
-platforms][platforms] we support. We currently have two kinds of jobs running
-for each commit we want to merge to master:
-
-- Dist jobs build a full release of the compiler for that platform, including
-  all the tools we ship through rustup; Those builds are then uploaded to the
-  `rust-lang-ci2` S3 bucket and are available to be locally installed with the
-  [rustup-toolchain-install-master] tool; The same builds are also used for
-  actual releases: our release process basically consists of copying those
-  artifacts from `rust-lang-ci2` to the production endpoint and signing them.
-- Non-dist jobs run our full test suite on the platform, and the test suite of
-  all the tools we ship through rustup; The amount of stuff we test depends on
-  the platform (for example some tests are run only on Tier 1 platforms), and
-  some quicker platforms are grouped together on the same builder to avoid
-  wasting CI resources.
-
-All the builds except those on macOS and Windows are executed inside that
-platform’s custom [Docker container]. This has a lot of advantages for us:
-
-- The build environment is consistent regardless of the changes of the
-  underlying image (switching from the trusty image to xenial was painless for
-  us).
-- We can use ancient build environments to ensure maximum binary compatibility,
-  for example [using older CentOS releases][dist-x86_64-linux] on our Linux builders.
-- We can avoid reinstalling tools (like QEMU or the Android emulator) every
-  time thanks to Docker image caching.
-- Users can run the same tests in the same environment locally by just running
-  `src/ci/docker/run.sh image-name`, which is awesome to debug failures.
-
-The docker images prefixed with `dist-` are used for building artifacts while those without that prefix run tests and checks.
-
-We also run tests for less common architectures (mainly Tier 2 and Tier 3
-platforms) in CI. Since those platforms are not x86 we either run
-everything inside QEMU or just cross-compile if we don’t want to run the tests
-for that platform.
-
-These builders are running on a special pool of builders set up and maintained for us by GitHub.
-
-Almost all build steps shell out to separate scripts. This keeps the CI fairly platform independent (i.e., we are not 
-overly reliant on GitHub Actions). GitHub Actions is only relied on for bootstrapping the CI process and for orchestrating
-the scripts that drive the process.
-
-[platforms]: https://doc.rust-lang.org/nightly/rustc/platform-support.html
-[rustup-toolchain-install-master]: https://github.com/kennytm/rustup-toolchain-install-master
-[Docker container]: https://github.com/rust-lang/rust/tree/master/src/ci/docker
-[dist-x86_64-linux]: https://github.com/rust-lang/rust/blob/master/src/ci/docker/host-x86_64/dist-x86_64-linux/Dockerfile
-
-## Merging PRs serially with bors
-
-CI services usually test the last commit of a branch merged with the last
-commit in master, and while that’s great to check if the feature works in
-isolation it doesn’t provide any guarantee the code is going to work once it’s
-merged. Breakages like these usually happen when another, incompatible PR is
-merged after the build happened.
-
-To ensure a master that works all the time we forbid manual merges: instead all
-PRs have to be approved through our bot, [bors] (the software behind it is
-called [homu]). All the approved PRs are put [in a queue][homu-queue] (sorted
-by priority and creation date) and are automatically tested one at the time. If
-all the builders are green the PR is merged, otherwise the failure is recorded
-and the PR will have to be re-approved again.
-
-Bors doesn’t interact with CI services directly, but it works by pushing the
-merge commit it wants to test to a branch called `auto`, and detecting the
-outcome of the build by listening for either Commit Statuses or Check Runs.
-Since the merge commit is based on the latest master and only one can be tested
-at the same time, when the results are green master is fast-forwarded to that
-merge commit.
-
-The `auto` branch and other branches used by bors live on a fork of rust-lang/rust: 
-[rust-lang-ci/rust]. This was originally done due to some security limitations in GitHub 
-Actions. These limitations have been addressed, but we've not yet done the work of removing 
-the use of the fork.
-
-Unfortunately testing a single PR at the time, combined with our long CI (~3
-hours for a full run)[^1], means we can’t merge too many PRs in a single day, and a
-single failure greatly impacts our throughput for the day. The maximum number
-of PRs we can merge in a day is around 8.
-
-The large CI run times and requirement for a large builder pool is largely due to the
-fact that full release artifacts are built in the `dist-` builders. This is worth it 
-because these release artifacts: 
-
-- allow perf testing even at a later date 
-- allow bisection when bugs are discovered later
-- ensure release quality since if we're always releasing, we can catch problems early
-
-Bors [runs on ecs](https://github.com/rust-lang/simpleinfra/blob/master/terraform/bors/app.tf) and uses a sqlite database running in a volume as storage.
-
-[^1]: As of January 2023, the bottleneck are the `dist-x86_64-linux` and `dist-x86_64-linux-alt` runners because of their usage of [BOLT] and [PGO] optimization tooling.
-
-[bors]: https://github.com/bors
-[homu]: https://github.com/rust-lang/homu
-[homu-queue]: https://bors.rust-lang.org/queue/rust
-[rust-lang-ci/rust]: https://github.com/rust-lang-ci/rust
-[BOLT]: https://github.com/facebookincubator/BOLT
-[PGO]: https://en.wikipedia.org/wiki/Profile-guided_optimization
-
-### Rollups
-
-Some PRs don’t need the full test suite to be executed: trivial changes like
-typo fixes or README improvements *shouldn’t* break the build, and testing
-every single one of them for 2 to 3 hours is a big waste of time. To solve this
-we do a "rollup", a PR where we merge all the trivial PRs so they can be tested
-together. Rollups are created manually by a team member using the "create a rollup" button on the [bors queue]. The team member uses their
-judgment to decide if a PR is risky or not, and are the best tool we have at
-the moment to keep the queue in a manageable state.
-
-[bors queue]: https://bors.rust-lang.org/queue/rust
-
-### Try builds
-
-Sometimes we need a working compiler build before approving a PR, usually for
-[benchmarking][perf] or [checking the impact of the PR across the
-ecosystem][crater]. Bors supports creating them by pushing the merge commit on
-a separate branch (`try`), and they basically work the same as normal builds,
-without the actual merge at the end. Any number of try builds can happen at the
-same time, even if there is a normal PR in progress.
-
-You can see the CI configuration for try builds [here](https://github.com/rust-lang/rust/blob/9d46c7a3e69966782e163877151c1f0cea8b630a/src/ci/github-actions/ci.yml#L728-L741).
-
-If you want to perform a try build with a different configuration (e.g. try to
-perform a compiler build for a different architecture), you can temporarily change
-the `try` CI job in your PR:
-
-1) Open `src/ci/github-actions/ci.yml`
-2) Find the CI job that you want to run (e.g. `dist-aarch64-linux`)
-3) Copy-paste the entry of the CI job
-4) Find the `try:` job in the file
-5) Replace the `dist-x86_64-linux` job in the matrix with the copied entry from step 3)
-6) Run `python3 x.py run src/tools/expand-yaml-anchors`
-7) Push your changes and start a try build with `@bors try`
-
-[perf]: https://perf.rust-lang.org
-[crater]: https://github.com/rust-lang/crater
-
-## Which branches we test
-
-Our builders are defined in [`src/ci/github-actions/ci.yml`].
-
-[`src/ci/github-actions/ci.yml`]: https://github.com/rust-lang/rust/blob/master/src/ci/github-actions/ci.yml
-
-### PR builds
-
-All the commits pushed in a PR run a limited set of tests: a job containing a
-bunch of lints plus a cross-compile check build to Windows mingw (without
-producing any artifacts) and the `x86_64-gnu-llvm-##` non-dist builder (where
-`##` is the *system* LLVM version we are currently testing). Those two
-builders are enough to catch most of the common errors introduced in a PR, but
-they don’t cover other platforms at all. Unfortunately it would take too many
-resources to run the full test suite for each commit on every PR.
-
-Additionally, if the PR changes certain tools (or certain platform-specific
-parts of std to check for miri breakage), the `x86_64-gnu-tools` non-dist
-builder is run.
-
-### The `try` branch
-
-On the main rust repo, `try` builds produce just a Linux toolchain using the
-`dist-x86_64-linux` image.
-
-### The `auto` branch
-
-This branch is used by bors to run all the tests on a PR before merging it, so
-all the builders are enabled for it. bors will repeatedly force-push on it
-(every time a new commit is tested).
-
-### The `master` branch
-
-Since all the commits to `master` are fast-forwarded from the `auto` branch (if
-they pass all the tests there) we don’t need to build or test anything. A quick
-job is executed on each push to update toolstate (see the toolstate description
-below).
-
-### Other branches
-
-Other branches are just disabled and don’t run any kind of builds, since all
-the in-progress branches will eventually be tested in a PR.
-
-## Caching
-
-The main rust repository doesn’t use the native GitHub Actions caching tools.
-All our caching is uploaded to an S3 bucket we control
-(`rust-lang-ci-sccache2`), and it’s used mainly for two things:
-
-### Docker images caching
-
-The Docker images we use to run most of the Linux-based builders take a *long*
-time to fully build. To speed up the build, we cache the exported images on the
-S3 bucket (with `docker save`/`docker load`).
-
-Since we test multiple, diverged branches (`master`, `beta` and `stable`) we
-can’t rely on a single cache for the images, otherwise builds on a branch would
-override the cache for the others. Instead we store the images identifying them
-with a custom hash, made from the host’s Docker version and the contents of all
-the Dockerfiles and related scripts.
-
-### LLVM caching with sccache
-
-We build some C/C++ stuff during the build and we rely on [sccache] to cache
-intermediate LLVM artifacts. Sccache is a distributed ccache developed by
-Mozilla, and it can use an object storage bucket as the storage backend, like
-we do with our S3 bucket.
-
-[sccache]: https://github.com/mozilla/sccache
-
-## Custom tooling around CI
-
-During the years we developed some custom tooling to improve our CI experience.
-
-### Rust Log Analyzer to show the error message in PRs
-
-The build logs for `rust-lang/rust` are huge, and it’s not practical to find
-what caused the build to fail by looking at the logs. To improve the
-developers’ experience we developed a bot called [Rust Log Analyzer][rla] (RLA)
-that receives the build logs on failure and extracts the error message
-automatically, posting it on the PR.
-
-The bot is not hardcoded to look for error strings, but was trained with a
-bunch of build failures to recognize which lines are common between builds and
-which are not. While the generated snippets can be weird sometimes, the bot is
-pretty good at identifying the relevant lines even if it’s an error we've never
-seen before.
-
-[rla]: https://github.com/rust-lang/rust-log-analyzer
-
-### Toolstate to support allowed failures
-
-The `rust-lang/rust` repo doesn’t only test the compiler on its CI, but also a
-variety of tools and documentation. Some documentation is pulled in via git
-submodules. If we blocked merging rustc PRs on the documentation being fixed,
-we would be stuck in a chicken-and-egg problem, because the documentation's CI
-would not pass since updating it would need the not-yet-merged version of
-rustc to test against (and we usually require CI to be passing).
-
-To avoid the problem, submodules are allowed to fail, and their status is
-recorded in [rust-toolstate]. When a submodule breaks, a bot automatically
-pings the maintainers so they know about the breakage, and it records the
-failure on the toolstate repository. The release process will then ignore
-broken tools on nightly, removing them from the shipped nightlies.
-
-While tool failures are allowed most of the time, they’re automatically
-forbidden a week before a release: we don’t care if tools are broken on nightly
-but they must work on beta and stable, so they also need to work on nightly a
-few days before we promote nightly to beta.
-
-More information is available in the [toolstate documentation].
-
-### GitHub Actions Templating
-
-GitHub Actions does not natively support templating which can cause configurations to be large and difficult to change. We use YAML anchors for templating and a custom tool, [`expand-yaml-anchors`], to expand [the template] into the CI configuration that [GitHub uses][ci config].
-
-This templating language is fairly straightforward:
-
-- `&` indicates a template section
-- `*` expands the indicated template in place
-- `<<` merges yaml dictionaries
-
-[rust-toolstate]: https://rust-lang-nursery.github.io/rust-toolstate
-[toolstate documentation]: ../toolstate.md
-[`expand-yaml-anchors`]: https://github.com/rust-lang/rust/tree/master/src/tools/expand-yaml-anchors
-[the template]: https://github.com/rust-lang/rust/blob/736c675d2ab65bcde6554e1b73340c2dbc27c85a/src/ci/github-actions/ci.yml
-[ci config]: https://github.com/rust-lang/rust/blob/master/.github/workflows/ci.yml
+[rustc-dev-guide]: https://rustc-dev-guide.rust-lang.org/tests/ci.html