Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI speed optimization #493

Merged
merged 16 commits into from
Feb 1, 2024
Merged

Conversation

matthiasbeyer
Copy link
Member

@matthiasbeyer matthiasbeyer commented Oct 26, 2023

This PR experiments with CI speedup optimizations.

I didn't realize that CI in this project did not even use caching... 👀


Closes #482
... probably? 👀

@matthiasbeyer
Copy link
Member Author

Some more changes, lets see how fast we're now. 😆

@matthiasbeyer
Copy link
Member Author

Ok, I got 30 secs faster CI now... that's not nearly enough IMO. Could even be just random noise.

@matthiasbeyer
Copy link
Member Author

I wonder why the tests for 1.70.0 is so much slower than the ones for 1.73.0 ... nextest did not actually make anything faster. Probably because building is slow, not running. But why is the job for 1.70.0 running 2:20min, and the 1.73.0 one only 58s? I don't know.

So overall we're now slower than before. 😢

Let's see what happens if we execute tests right away and not wait for the check phase.

@matthiasbeyer
Copy link
Member Author

4 min -> 3 min now.

Not sure whether this is actually worth it... but I guess I am ready for review here. Just a quick squash of the fixup commit.

@matthiasbeyer matthiasbeyer marked this pull request as ready for review October 26, 2023 12:41
@polarathene
Copy link
Collaborator

I wonder why the tests for 1.70.0 is so much slower than the ones for 1.73.0

1.73.0 being the latest stable right? I already mentioned that Github likely caches this themselves as they're known to do with other high traffic sources for CI. Far cheaper for them to pull from the real location once every N (eg: minute), than every CI job running within N duration on their infrastructure.

Older toolchains are less likely to have that optimization? 🤷‍♂️ (just an assumption, that there is something different in the requests that differentiates this)

I believe it's a common practice, even for ISPs to cache popular content from vendors like Youtube / Netflix (pretty sure I had read of partnerships to do such).


nextest did not actually make anything faster.

I also mentioned that observation, tests did not seem to take much of the time so I didn't see it providing much benefit.

So overall we're now slower than before

Keep in mind that when using actions/cache action, it will have access to main branch cache (once CI runs on that), and then PRs can use that. However PRs maintain their own scoped cache IIRC, and that is not accessible to another PR, nor main I think 🤷‍♂️

Should be in the docs, I could be recalling that wrong. Caching makes a big difference for docker-mailserver builds.

@polarathene
Copy link
Collaborator

As mentioned:

image

Additionally keep in mind what is being cached, and the cache key generation:

image

This also confirms that they're using the official github cache action, just extending it to better suite cargo:

image


We can see that the action is correctly saving and restoring cache quickly, but pay attention to this:

image

Your current PR is using the action, then setting a toolchain, which is against the actions advice.

Copy link
Collaborator

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification, you define multiple separate jobs to run, these are all via separate runners just like matrix AFAIK?

Each separate runner then is building the project again. You could instead try minimize the jobs?


Here is an example of managing a check status, which I think can be used to display multiple from the same job if that suits you:

This makes it visible:

image

Bit more verbose workflow, but could help.


Alternatively, as suggested before, have one job start per toolchain, then subsequent jobs express a dependency to the initial job and they could bring in that cache.

If you have multiple jobs run in parallel, then initially none have that cache and all need to do the same work.

Once you have got some cache, depending on how the cache key is defined, you can find jobs falling back to earlier cache they can use.


As per prior comment with images from the rust-cache docs, you may need to wait until the Cargo.lock is committed for the most benefit in cache usage.

.github/workflows/msrv.yml Outdated Show resolved Hide resolved
.github/workflows/msrv.yml Outdated Show resolved Hide resolved
.github/workflows/msrv.yml Outdated Show resolved Hide resolved
.github/workflows/msrv.yml Outdated Show resolved Hide resolved
.github/workflows/msrv.yml Show resolved Hide resolved
.github/workflows/msrv.yml Outdated Show resolved Hide resolved
Signed-off-by: Matthias Beyer <[email protected]>
The comment says it all.

Signed-off-by: Matthias Beyer <[email protected]>
There's no point in running it with two versions, and MSRV should
suffice.

Signed-off-by: Matthias Beyer <[email protected]>
We do not get any benefit from running CI with the nightly toolchain.
We still run it with beta, so that should catch some errors if there are
any.

Signed-off-by: Matthias Beyer <[email protected]>
Signed-off-by: Matthias Beyer <[email protected]>
If stable updates and introduces a new lint for example, that could
break our CI "randomly".
Thus, we do no longer depend on a mutable version of the toolchain, but
on a fixed one (that now has to be updated regularly, of course).

Signed-off-by: Matthias Beyer <[email protected]>
Same as with the mutable "stable" being removed from our CI, we run beta
toolchain CI stuff, but we should no longer allow a changing beta
version break our CI "randomly".

Signed-off-by: Matthias Beyer <[email protected]>
This is another attempt at increasing the CI speed. It does so by
removin the dedicated example checking phase, by putting all checks in
one phase (examples, tests) and run that phase only for MSRV.

Signed-off-by: Matthias Beyer <[email protected]>
Signed-off-by: Matthias Beyer <[email protected]>
The test phase is the slowest, because it actually produces binaries
for running tests.
So do not depend on the "check" phase, so that the test phase starts as
soon as possible, to speed up overall CI time.

Signed-off-by: Matthias Beyer <[email protected]>
The nextest runner does not help enough to be viable. Installing it
takes more time than the runner saves, so go back to the normal test
runner with this patch.

This reverts commit 284de85.
Signed-off-by: Matthias Beyer <[email protected]>
This saves a bit of execution time.
There's no point in running MSRV and stable, because there shouldn't be
a difference. So we opt for stable because it tends to be faster.

Signed-off-by: Matthias Beyer <[email protected]>
@matthiasbeyer matthiasbeyer merged commit 292d331 into rust-cli:master Feb 1, 2024
6 checks passed
@matthiasbeyer matthiasbeyer deleted the ci-optimization branch February 1, 2024 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate ways to make CI faster
2 participants