-
Notifications
You must be signed in to change notification settings - Fork 480
incremental compilation without a worker #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
eb91a9b
to
a6efa0b
Compare
|
f469d1e
to
f516c27
Compare
Yeah, that worked out better - I've pushed an updated version and update the description above to cover the new flag. |
f516c27
to
c23df4d
Compare
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
c23df4d
to
5c1f16f
Compare
Sorry for being stupid, you can't glob the incremental cache and model it as input to actions? Maybe even teach rustc to create deterministic output into the cache. Bazel could handle everything without hacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, certainly as something marked experimental :)
Sorry for being stupid, you can't glob the incremental cache and model it as input to actions? Maybe even teach rustc to create deterministic output into the cache. Bazel could handle everything without hacks.
No, Bazel doesn't really have a concept of volatile files which are both inputs and outputs of the same action, as these would need to be.
@@ -6,6 +6,7 @@ package(default_visibility = ["//visibility:private"]) | |||
bzl_library( | |||
name = "docs_deps", | |||
srcs = [ | |||
"@bazel_skylib//rules:common_settings", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this the docs build was breaking, but it may have been resolved by the # buildifier: disable=module-docstring below - will check
Returns: | ||
(IncrementalPrefixInfo): a list of prefixes to include and exclude | ||
""" | ||
values = ctx.build_setting_value.split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove, or update loop to iterate over values
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, good catch. Will get this fixed up and rebased on main after #703 is merged
I've spent some time today to learn more about Bazel workers and I think I came full circle :) I'm also sorry for not knowing this already, I could probably save you all some time. If you want to follow along, I've read:
My summary: The approach in this PR is actually not using Bazel's worker support, but rather reimplementing it for the Rustc specifics. The defining point of Bazel workers is that workers are long running programs (servers actually) that receive requests in proto or json and return responses. If rules support workers, whether the worker is used or not is controlled by Bazel's execution strategy. For example by passing The main reason why it matters (to me at least) is the dynamic strategy (https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html). With that you can tell Bazel to use both the worker and the remote execution service, and whichever is faster wins. So in the case of a clean build, worker is probably going to be slower because of limited parallelism of the host machine, but for the incremental build it can be faster because maybe the remote execution workers are not as performant for single core workloads as the host machine. In our experience dynamic strategy can speed up incremental builds enough to be worth the added complexity. Dynamic strategy is not usable with the approach implemented in this PR (unfortunately). I think it's desirable to implement Rustc worker support using the Bazel conventions. Rustc is a bit unique as a worker since it doesn't need to stay running, it stores all its data into a directory and the next invocation can read it. That is why it was possible to write this PR. But I can see a future improvement to Rustc to support server mode to keep the data in the memory and not spend action execution time by serializing and deserializing. And we would be able to benefit from the dynamic strategy. What it all means is that we will need a rustc wrapper. It will need to parse and write JSON. It can be multithreaded (but workers don't have to be in general, it is only relevant for https://docs.bazel.build/versions/2.0.0/multiplex-worker.html, and we need to investigate if multiplex workers have benefits for rustc workloads). I propose the following (and I'm prepared to invest my time into that unless there is somebody driven to work on it):
What do you all think? (I saw that @Ubehebe has some experience with workers and was contributing to this repo recently, feel free to give us feedback :) Also CCing @nikhilm, @mfarrugi @djmarcin, @scentini, and @UebelAndre just to be sure all voices are heard. |
Actually, let me create a mailing list thread for this discussion, hope it works better for everybody (speak up if not). |
Here's the link: https://groups.google.com/g/rules_rust/c/E3MJtgamlD4 |
If you have the motivation to drive this forward with a different approach, no objections from me - I'll be happy to see incremental compilation arrive in any form. |
This is considerably less invasive than #667, #517 and #421 - there
is no extra worker to bootstrap, and no toolchains have to be plumbed
into the rules.
We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on #667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.
This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.
A separate flag allows you to control which crates are incrementally compiled.
The default is:
This will incrementally compile any crates in the local workspace, but exclude
other repositories like
@raze__*
, and anything in the //vendored package. Thisbehaviour is similar to cargo, which only incrementally compiles the local
workspace.