Skip to content

incremental compilation without a worker #684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dae
Copy link
Contributor

@dae dae commented Apr 8, 2021

This is considerably less invasive than #667, #517 and #421 - there
is no extra worker to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on #667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

A separate flag allows you to control which crates are incrementally compiled.
The default is:

--@rules_rust//:experimental_incremental_prefixes=//,-//vendored

This will incrementally compile any crates in the local workspace, but exclude
other repositories like @raze__*, and anything in the //vendored package. This
behaviour is similar to cargo, which only incrementally compiles the local
workspace.

@google-cla google-cla bot added the cla: yes label Apr 8, 2021
@dae dae mentioned this pull request Apr 8, 2021
@dae dae force-pushed the incremental-noworker branch 2 times, most recently from eb91a9b to a6efa0b Compare April 8, 2021 13:29
@dae
Copy link
Contributor Author

dae commented Apr 8, 2021

On second thought, tags are not the best idea, as downstream consumers of a crate may not want to build it incrementally. Maybe we could have a separate flag with a comma-separated list of crates to incrementally compile? These could be placed in a user.bazelrc to avoid having to enter them in each time. Thoughts?

@dae dae force-pushed the incremental-noworker branch 2 times, most recently from f469d1e to f516c27 Compare April 9, 2021 01:49
@dae
Copy link
Contributor Author

dae commented Apr 9, 2021

Yeah, that worked out better - I've pushed an updated version and update the description above to cover the new flag.

@dae dae force-pushed the incremental-noworker branch from f516c27 to c23df4d Compare April 9, 2021 02:12
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there
is no extra code to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

The default behaviour will incrementally compile crates in the local
workspace, like cargo does. The behaviour can be adjusted with another
flag, which is covered in docs/index.md.
@dae dae force-pushed the incremental-noworker branch from c23df4d to 5c1f16f Compare April 9, 2021 02:21
@tschuett
Copy link

tschuett commented Apr 9, 2021

Sorry for being stupid, you can't glob the incremental cache and model it as input to actions? Maybe even teach rustc to create deterministic output into the cache. Bazel could handle everything without hacks.

Copy link
Collaborator

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, certainly as something marked experimental :)

Sorry for being stupid, you can't glob the incremental cache and model it as input to actions? Maybe even teach rustc to create deterministic output into the cache. Bazel could handle everything without hacks.

No, Bazel doesn't really have a concept of volatile files which are both inputs and outputs of the same action, as these would need to be.

@@ -6,6 +6,7 @@ package(default_visibility = ["//visibility:private"])
bzl_library(
name = "docs_deps",
srcs = [
"@bazel_skylib//rules:common_settings",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the docs build was breaking, but it may have been resolved by the # buildifier: disable=module-docstring below - will check

Returns:
(IncrementalPrefixInfo): a list of prefixes to include and exclude
"""
values = ctx.build_setting_value.split(",")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, or update loop to iterate over values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, good catch. Will get this fixed up and rebased on main after #703 is merged

@hlopko
Copy link
Member

hlopko commented Apr 30, 2021

I've spent some time today to learn more about Bazel workers and I think I came full circle :) I'm also sorry for not knowing this already, I could probably save you all some time. If you want to follow along, I've read:

My summary:

The approach in this PR is actually not using Bazel's worker support, but rather reimplementing it for the Rustc specifics. The defining point of Bazel workers is that workers are long running programs (servers actually) that receive requests in proto or json and return responses. If rules support workers, whether the worker is used or not is controlled by Bazel's execution strategy. For example by passing --strategy=Rustc=worker,local we tell Bazel that (some, read on) Rustc actions can use worker. When rules register actions, they don't know if the worker will be used or not. Actions can specify their execution requirements and there they can say if they support workers or not. If not, worker is not used. If yes, worker is maybe used depending on execution strategies.

The main reason why it matters (to me at least) is the dynamic strategy (https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html). With that you can tell Bazel to use both the worker and the remote execution service, and whichever is faster wins. So in the case of a clean build, worker is probably going to be slower because of limited parallelism of the host machine, but for the incremental build it can be faster because maybe the remote execution workers are not as performant for single core workloads as the host machine. In our experience dynamic strategy can speed up incremental builds enough to be worth the added complexity. Dynamic strategy is not usable with the approach implemented in this PR (unfortunately).

I think it's desirable to implement Rustc worker support using the Bazel conventions. Rustc is a bit unique as a worker since it doesn't need to stay running, it stores all its data into a directory and the next invocation can read it. That is why it was possible to write this PR. But I can see a future improvement to Rustc to support server mode to keep the data in the memory and not spend action execution time by serializing and deserializing. And we would be able to benefit from the dynamic strategy.

What it all means is that we will need a rustc wrapper. It will need to parse and write JSON. It can be multithreaded (but workers don't have to be in general, it is only relevant for https://docs.bazel.build/versions/2.0.0/multiplex-worker.html, and we need to investigate if multiplex workers have benefits for rustc workloads).

I propose the following (and I'm prepared to invest my time into that unless there is somebody driven to work on it):

  • solve the bootstrap problem
  • rewrite process_wrapper in rust
  • extend the process wrapper to support the worker mode
  • modify the rules to always support worker mode

What do you all think?

(I saw that @Ubehebe has some experience with workers and was contributing to this repo recently, feel free to give us feedback :) Also CCing @nikhilm, @mfarrugi @djmarcin, @scentini, and @UebelAndre just to be sure all voices are heard.

@hlopko
Copy link
Member

hlopko commented Apr 30, 2021

Actually, let me create a mailing list thread for this discussion, hope it works better for everybody (speak up if not).

@hlopko
Copy link
Member

hlopko commented Apr 30, 2021

@dae
Copy link
Contributor Author

dae commented May 4, 2021

If you have the motivation to drive this forward with a different approach, no objections from me - I'll be happy to see incremental compilation arrive in any form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants