|
1 | 1 | # How the Rust CI works
|
2 | 2 |
|
3 |
| -Rust CI ensures that the master branch of rust-lang/rust is always in a valid state. |
| 3 | +Continuous integration (CI) workflows on the `rust-lang/rust` repository ensure that the `master` branch |
| 4 | +is always in a valid state. |
4 | 5 |
|
5 |
| -A developer submitting a pull request to rust-lang/rust, experiences the following: |
| 6 | +The CI infrastructure is described in detail in the [rustc-dev-guide][rustc-dev-guide]. |
6 | 7 |
|
7 |
| -- A small subset of tests and checks are run on each commit to catch common errors. |
8 |
| -- When the PR is ready and approved, the "bors" tool enqueues a full CI run. |
9 |
| -- The full run either queues the specific PR or the PR is "rolled up" with other changes. |
10 |
| -- Eventually a CI run containing the changes from the PR is performed and either passes or fails with an error the developer must address. |
11 |
| - |
12 |
| -## Which jobs we run |
13 |
| - |
14 |
| -The `rust-lang/rust` repository uses GitHub Actions to test [all the |
15 |
| -platforms][platforms] we support. We currently have two kinds of jobs running |
16 |
| -for each commit we want to merge to master: |
17 |
| - |
18 |
| -- Dist jobs build a full release of the compiler for that platform, including |
19 |
| - all the tools we ship through rustup; Those builds are then uploaded to the |
20 |
| - `rust-lang-ci2` S3 bucket and are available to be locally installed with the |
21 |
| - [rustup-toolchain-install-master] tool; The same builds are also used for |
22 |
| - actual releases: our release process basically consists of copying those |
23 |
| - artifacts from `rust-lang-ci2` to the production endpoint and signing them. |
24 |
| -- Non-dist jobs run our full test suite on the platform, and the test suite of |
25 |
| - all the tools we ship through rustup; The amount of stuff we test depends on |
26 |
| - the platform (for example some tests are run only on Tier 1 platforms), and |
27 |
| - some quicker platforms are grouped together on the same builder to avoid |
28 |
| - wasting CI resources. |
29 |
| - |
30 |
| -All the builds except those on macOS and Windows are executed inside that |
31 |
| -platform’s custom [Docker container]. This has a lot of advantages for us: |
32 |
| - |
33 |
| -- The build environment is consistent regardless of the changes of the |
34 |
| - underlying image (switching from the trusty image to xenial was painless for |
35 |
| - us). |
36 |
| -- We can use ancient build environments to ensure maximum binary compatibility, |
37 |
| - for example [using older CentOS releases][dist-x86_64-linux] on our Linux builders. |
38 |
| -- We can avoid reinstalling tools (like QEMU or the Android emulator) every |
39 |
| - time thanks to Docker image caching. |
40 |
| -- Users can run the same tests in the same environment locally by just running |
41 |
| - `src/ci/docker/run.sh image-name`, which is awesome to debug failures. |
42 |
| - |
43 |
| -The docker images prefixed with `dist-` are used for building artifacts while those without that prefix run tests and checks. |
44 |
| - |
45 |
| -We also run tests for less common architectures (mainly Tier 2 and Tier 3 |
46 |
| -platforms) in CI. Since those platforms are not x86 we either run |
47 |
| -everything inside QEMU or just cross-compile if we don’t want to run the tests |
48 |
| -for that platform. |
49 |
| - |
50 |
| -These builders are running on a special pool of builders set up and maintained for us by GitHub. |
51 |
| - |
52 |
| -Almost all build steps shell out to separate scripts. This keeps the CI fairly platform independent (i.e., we are not |
53 |
| -overly reliant on GitHub Actions). GitHub Actions is only relied on for bootstrapping the CI process and for orchestrating |
54 |
| -the scripts that drive the process. |
55 |
| - |
56 |
| -[platforms]: https://doc.rust-lang.org/nightly/rustc/platform-support.html |
57 |
| -[rustup-toolchain-install-master]: https://github.com/kennytm/rustup-toolchain-install-master |
58 |
| -[Docker container]: https://github.com/rust-lang/rust/tree/master/src/ci/docker |
59 |
| -[dist-x86_64-linux]: https://github.com/rust-lang/rust/blob/master/src/ci/docker/host-x86_64/dist-x86_64-linux/Dockerfile |
60 |
| - |
61 |
| -## Merging PRs serially with bors |
62 |
| - |
63 |
| -CI services usually test the last commit of a branch merged with the last |
64 |
| -commit in master, and while that’s great to check if the feature works in |
65 |
| -isolation it doesn’t provide any guarantee the code is going to work once it’s |
66 |
| -merged. Breakages like these usually happen when another, incompatible PR is |
67 |
| -merged after the build happened. |
68 |
| - |
69 |
| -To ensure a master that works all the time we forbid manual merges: instead all |
70 |
| -PRs have to be approved through our bot, [bors] (the software behind it is |
71 |
| -called [homu]). All the approved PRs are put [in a queue][homu-queue] (sorted |
72 |
| -by priority and creation date) and are automatically tested one at the time. If |
73 |
| -all the builders are green the PR is merged, otherwise the failure is recorded |
74 |
| -and the PR will have to be re-approved again. |
75 |
| - |
76 |
| -Bors doesn’t interact with CI services directly, but it works by pushing the |
77 |
| -merge commit it wants to test to a branch called `auto`, and detecting the |
78 |
| -outcome of the build by listening for either Commit Statuses or Check Runs. |
79 |
| -Since the merge commit is based on the latest master and only one can be tested |
80 |
| -at the same time, when the results are green master is fast-forwarded to that |
81 |
| -merge commit. |
82 |
| - |
83 |
| -The `auto` branch and other branches used by bors live on a fork of rust-lang/rust: |
84 |
| -[rust-lang-ci/rust]. This was originally done due to some security limitations in GitHub |
85 |
| -Actions. These limitations have been addressed, but we've not yet done the work of removing |
86 |
| -the use of the fork. |
87 |
| - |
88 |
| -Unfortunately testing a single PR at the time, combined with our long CI (~3 |
89 |
| -hours for a full run)[^1], means we can’t merge too many PRs in a single day, and a |
90 |
| -single failure greatly impacts our throughput for the day. The maximum number |
91 |
| -of PRs we can merge in a day is around 8. |
92 |
| - |
93 |
| -The large CI run times and requirement for a large builder pool is largely due to the |
94 |
| -fact that full release artifacts are built in the `dist-` builders. This is worth it |
95 |
| -because these release artifacts: |
96 |
| - |
97 |
| -- allow perf testing even at a later date |
98 |
| -- allow bisection when bugs are discovered later |
99 |
| -- ensure release quality since if we're always releasing, we can catch problems early |
100 |
| - |
101 |
| -Bors [runs on ecs](https://github.com/rust-lang/simpleinfra/blob/master/terraform/bors/app.tf) and uses a sqlite database running in a volume as storage. |
102 |
| - |
103 |
| -[^1]: As of January 2023, the bottleneck are the `dist-x86_64-linux` and `dist-x86_64-linux-alt` runners because of their usage of [BOLT] and [PGO] optimization tooling. |
104 |
| - |
105 |
| -[bors]: https://github.com/bors |
106 |
| -[homu]: https://github.com/rust-lang/homu |
107 |
| -[homu-queue]: https://bors.rust-lang.org/queue/rust |
108 |
| -[rust-lang-ci/rust]: https://github.com/rust-lang-ci/rust |
109 |
| -[BOLT]: https://github.com/facebookincubator/BOLT |
110 |
| -[PGO]: https://en.wikipedia.org/wiki/Profile-guided_optimization |
111 |
| - |
112 |
| -### Rollups |
113 |
| - |
114 |
| -Some PRs don’t need the full test suite to be executed: trivial changes like |
115 |
| -typo fixes or README improvements *shouldn’t* break the build, and testing |
116 |
| -every single one of them for 2 to 3 hours is a big waste of time. To solve this |
117 |
| -we do a "rollup", a PR where we merge all the trivial PRs so they can be tested |
118 |
| -together. Rollups are created manually by a team member using the "create a rollup" button on the [bors queue]. The team member uses their |
119 |
| -judgment to decide if a PR is risky or not, and are the best tool we have at |
120 |
| -the moment to keep the queue in a manageable state. |
121 |
| - |
122 |
| -[bors queue]: https://bors.rust-lang.org/queue/rust |
123 |
| - |
124 |
| -### Try builds |
125 |
| - |
126 |
| -Sometimes we need a working compiler build before approving a PR, usually for |
127 |
| -[benchmarking][perf] or [checking the impact of the PR across the |
128 |
| -ecosystem][crater]. Bors supports creating them by pushing the merge commit on |
129 |
| -a separate branch (`try`), and they basically work the same as normal builds, |
130 |
| -without the actual merge at the end. Any number of try builds can happen at the |
131 |
| -same time, even if there is a normal PR in progress. |
132 |
| - |
133 |
| -You can see the CI configuration for try builds [here](https://github.com/rust-lang/rust/blob/9d46c7a3e69966782e163877151c1f0cea8b630a/src/ci/github-actions/ci.yml#L728-L741). |
134 |
| - |
135 |
| -If you want to perform a try build with a different configuration (e.g. try to |
136 |
| -perform a compiler build for a different architecture), you can temporarily change |
137 |
| -the `try` CI job in your PR: |
138 |
| - |
139 |
| -1) Open `src/ci/github-actions/ci.yml` |
140 |
| -2) Find the CI job that you want to run (e.g. `dist-aarch64-linux`) |
141 |
| -3) Copy-paste the entry of the CI job |
142 |
| -4) Find the `try:` job in the file |
143 |
| -5) Replace the `dist-x86_64-linux` job in the matrix with the copied entry from step 3) |
144 |
| -6) Run `python3 x.py run src/tools/expand-yaml-anchors` |
145 |
| -7) Push your changes and start a try build with `@bors try` |
146 |
| - |
147 |
| -[perf]: https://perf.rust-lang.org |
148 |
| -[crater]: https://github.com/rust-lang/crater |
149 |
| - |
150 |
| -## Which branches we test |
151 |
| - |
152 |
| -Our builders are defined in [`src/ci/github-actions/ci.yml`]. |
153 |
| - |
154 |
| -[`src/ci/github-actions/ci.yml`]: https://github.com/rust-lang/rust/blob/master/src/ci/github-actions/ci.yml |
155 |
| - |
156 |
| -### PR builds |
157 |
| - |
158 |
| -All the commits pushed in a PR run a limited set of tests: a job containing a |
159 |
| -bunch of lints plus a cross-compile check build to Windows mingw (without |
160 |
| -producing any artifacts) and the `x86_64-gnu-llvm-##` non-dist builder (where |
161 |
| -`##` is the *system* LLVM version we are currently testing). Those two |
162 |
| -builders are enough to catch most of the common errors introduced in a PR, but |
163 |
| -they don’t cover other platforms at all. Unfortunately it would take too many |
164 |
| -resources to run the full test suite for each commit on every PR. |
165 |
| - |
166 |
| -Additionally, if the PR changes certain tools (or certain platform-specific |
167 |
| -parts of std to check for miri breakage), the `x86_64-gnu-tools` non-dist |
168 |
| -builder is run. |
169 |
| - |
170 |
| -### The `try` branch |
171 |
| - |
172 |
| -On the main rust repo, `try` builds produce just a Linux toolchain using the |
173 |
| -`dist-x86_64-linux` image. |
174 |
| - |
175 |
| -### The `auto` branch |
176 |
| - |
177 |
| -This branch is used by bors to run all the tests on a PR before merging it, so |
178 |
| -all the builders are enabled for it. bors will repeatedly force-push on it |
179 |
| -(every time a new commit is tested). |
180 |
| - |
181 |
| -### The `master` branch |
182 |
| - |
183 |
| -Since all the commits to `master` are fast-forwarded from the `auto` branch (if |
184 |
| -they pass all the tests there) we don’t need to build or test anything. A quick |
185 |
| -job is executed on each push to update toolstate (see the toolstate description |
186 |
| -below). |
187 |
| - |
188 |
| -### Other branches |
189 |
| - |
190 |
| -Other branches are just disabled and don’t run any kind of builds, since all |
191 |
| -the in-progress branches will eventually be tested in a PR. |
192 |
| - |
193 |
| -## Caching |
194 |
| - |
195 |
| -The main rust repository doesn’t use the native GitHub Actions caching tools. |
196 |
| -All our caching is uploaded to an S3 bucket we control |
197 |
| -(`rust-lang-ci-sccache2`), and it’s used mainly for two things: |
198 |
| - |
199 |
| -### Docker images caching |
200 |
| - |
201 |
| -The Docker images we use to run most of the Linux-based builders take a *long* |
202 |
| -time to fully build. To speed up the build, we cache the exported images on the |
203 |
| -S3 bucket (with `docker save`/`docker load`). |
204 |
| - |
205 |
| -Since we test multiple, diverged branches (`master`, `beta` and `stable`) we |
206 |
| -can’t rely on a single cache for the images, otherwise builds on a branch would |
207 |
| -override the cache for the others. Instead we store the images identifying them |
208 |
| -with a custom hash, made from the host’s Docker version and the contents of all |
209 |
| -the Dockerfiles and related scripts. |
210 |
| - |
211 |
| -### LLVM caching with sccache |
212 |
| - |
213 |
| -We build some C/C++ stuff during the build and we rely on [sccache] to cache |
214 |
| -intermediate LLVM artifacts. Sccache is a distributed ccache developed by |
215 |
| -Mozilla, and it can use an object storage bucket as the storage backend, like |
216 |
| -we do with our S3 bucket. |
217 |
| - |
218 |
| -[sccache]: https://github.com/mozilla/sccache |
219 |
| - |
220 |
| -## Custom tooling around CI |
221 |
| - |
222 |
| -During the years we developed some custom tooling to improve our CI experience. |
223 |
| - |
224 |
| -### Rust Log Analyzer to show the error message in PRs |
225 |
| - |
226 |
| -The build logs for `rust-lang/rust` are huge, and it’s not practical to find |
227 |
| -what caused the build to fail by looking at the logs. To improve the |
228 |
| -developers’ experience we developed a bot called [Rust Log Analyzer][rla] (RLA) |
229 |
| -that receives the build logs on failure and extracts the error message |
230 |
| -automatically, posting it on the PR. |
231 |
| - |
232 |
| -The bot is not hardcoded to look for error strings, but was trained with a |
233 |
| -bunch of build failures to recognize which lines are common between builds and |
234 |
| -which are not. While the generated snippets can be weird sometimes, the bot is |
235 |
| -pretty good at identifying the relevant lines even if it’s an error we've never |
236 |
| -seen before. |
237 |
| - |
238 |
| -[rla]: https://github.com/rust-lang/rust-log-analyzer |
239 |
| - |
240 |
| -### Toolstate to support allowed failures |
241 |
| - |
242 |
| -The `rust-lang/rust` repo doesn’t only test the compiler on its CI, but also a |
243 |
| -variety of tools and documentation. Some documentation is pulled in via git |
244 |
| -submodules. If we blocked merging rustc PRs on the documentation being fixed, |
245 |
| -we would be stuck in a chicken-and-egg problem, because the documentation's CI |
246 |
| -would not pass since updating it would need the not-yet-merged version of |
247 |
| -rustc to test against (and we usually require CI to be passing). |
248 |
| - |
249 |
| -To avoid the problem, submodules are allowed to fail, and their status is |
250 |
| -recorded in [rust-toolstate]. When a submodule breaks, a bot automatically |
251 |
| -pings the maintainers so they know about the breakage, and it records the |
252 |
| -failure on the toolstate repository. The release process will then ignore |
253 |
| -broken tools on nightly, removing them from the shipped nightlies. |
254 |
| - |
255 |
| -While tool failures are allowed most of the time, they’re automatically |
256 |
| -forbidden a week before a release: we don’t care if tools are broken on nightly |
257 |
| -but they must work on beta and stable, so they also need to work on nightly a |
258 |
| -few days before we promote nightly to beta. |
259 |
| - |
260 |
| -More information is available in the [toolstate documentation]. |
261 |
| - |
262 |
| -### GitHub Actions Templating |
263 |
| - |
264 |
| -GitHub Actions does not natively support templating which can cause configurations to be large and difficult to change. We use YAML anchors for templating and a custom tool, [`expand-yaml-anchors`], to expand [the template] into the CI configuration that [GitHub uses][ci config]. |
265 |
| - |
266 |
| -This templating language is fairly straightforward: |
267 |
| - |
268 |
| -- `&` indicates a template section |
269 |
| -- `*` expands the indicated template in place |
270 |
| -- `<<` merges yaml dictionaries |
271 |
| - |
272 |
| -[rust-toolstate]: https://rust-lang-nursery.github.io/rust-toolstate |
273 |
| -[toolstate documentation]: ../toolstate.md |
274 |
| -[`expand-yaml-anchors`]: https://github.com/rust-lang/rust/tree/master/src/tools/expand-yaml-anchors |
275 |
| -[the template]: https://github.com/rust-lang/rust/blob/736c675d2ab65bcde6554e1b73340c2dbc27c85a/src/ci/github-actions/ci.yml |
276 |
| -[ci config]: https://github.com/rust-lang/rust/blob/master/.github/workflows/ci.yml |
| 8 | +[rustc-dev-guide]: https://rustc-dev-guide.rust-lang.org/tests/ci.html |
0 commit comments