incremental compilation without a worker #684

dae · 2021-04-08T13:01:55Z

This is considerably less invasive than #667, #517 and #421 - there
is no extra worker to bootstrap, and no toolchains have to be plumbed
into the rules.

We had been using a worker to get around sandboxing restrictions, but
as @hlopko pointed out on #667, that turns out not to be necessary -
even normal rules have access to /tmp. Unfortunately it seems that rustc
does not expect the source files to change location, and it will
consistently crash when used in a sandboxed context. So the approach
this PR takes is to disable sandboxing on targets that are being compiled
incrementally. This fixes the compiler crashes, and as a bonus, means
we're not limited to saving the cache in /tmp.

This PR adds a --@rules_rust//:experimental_incremental_base flag to
specify the path where incremental build products are stored - if not
provided, the rules will function as they normally do.

A separate flag allows you to control which crates are incrementally compiled.
The default is:

--@rules_rust//:experimental_incremental_prefixes=//,-//vendored

This will incrementally compile any crates in the local workspace, but exclude
other repositories like @raze__*, and anything in the //vendored package. This
behaviour is similar to cargo, which only incrementally compiles the local
workspace.

dae · 2021-04-08T14:30:39Z

On second thought, tags are not the best idea, as downstream consumers of a crate may not want to build it incrementally. Maybe we could have a separate flag with a comma-separated list of crates to incrementally compile? These could be placed in a user.bazelrc to avoid having to enter them in each time. Thoughts?

dae · 2021-04-09T01:54:22Z

Yeah, that worked out better - I've pushed an updated version and update the description above to cover the new flag.

@hlopko

This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.

tschuett · 2021-04-09T10:26:34Z

Sorry for being stupid, you can't glob the incremental cache and model it as input to actions? Maybe even teach rustc to create deterministic output into the cache. Bazel could handle everything without hacks.

illicitonion

Looks reasonable to me, certainly as something marked experimental :)

Sorry for being stupid, you can't glob the incremental cache and model it as input to actions? Maybe even teach rustc to create deterministic output into the cache. Bazel could handle everything without hacks.

No, Bazel doesn't really have a concept of volatile files which are both inputs and outputs of the same action, as these would need to be.

illicitonion · 2021-04-21T09:56:18Z

docs/BUILD.bazel

@@ -6,6 +6,7 @@ package(default_visibility = ["//visibility:private"])
 bzl_library(
    name = "docs_deps",
    srcs = [
+        "@bazel_skylib//rules:common_settings",


Without this the docs build was breaking, but it may have been resolved by the # buildifier: disable=module-docstring below - will check

illicitonion · 2021-04-21T09:59:54Z

rust/private/rustc.bzl

+    Returns:
+        (IncrementalPrefixInfo): a list of prefixes to include and exclude
+    """
+    values = ctx.build_setting_value.split(",")


Remove, or update loop to iterate over values?

whoops, good catch. Will get this fixed up and rebased on main after #703 is merged

hlopko · 2021-04-30T09:21:29Z

I've spent some time today to learn more about Bazel workers and I think I came full circle :) I'm also sorry for not knowing this already, I could probably save you all some time. If you want to follow along, I've read:

My summary:

The approach in this PR is actually not using Bazel's worker support, but rather reimplementing it for the Rustc specifics. The defining point of Bazel workers is that workers are long running programs (servers actually) that receive requests in proto or json and return responses. If rules support workers, whether the worker is used or not is controlled by Bazel's execution strategy. For example by passing --strategy=Rustc=worker,local we tell Bazel that (some, read on) Rustc actions can use worker. When rules register actions, they don't know if the worker will be used or not. Actions can specify their execution requirements and there they can say if they support workers or not. If not, worker is not used. If yes, worker is maybe used depending on execution strategies.

The main reason why it matters (to me at least) is the dynamic strategy (https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html). With that you can tell Bazel to use both the worker and the remote execution service, and whichever is faster wins. So in the case of a clean build, worker is probably going to be slower because of limited parallelism of the host machine, but for the incremental build it can be faster because maybe the remote execution workers are not as performant for single core workloads as the host machine. In our experience dynamic strategy can speed up incremental builds enough to be worth the added complexity. Dynamic strategy is not usable with the approach implemented in this PR (unfortunately).

I think it's desirable to implement Rustc worker support using the Bazel conventions. Rustc is a bit unique as a worker since it doesn't need to stay running, it stores all its data into a directory and the next invocation can read it. That is why it was possible to write this PR. But I can see a future improvement to Rustc to support server mode to keep the data in the memory and not spend action execution time by serializing and deserializing. And we would be able to benefit from the dynamic strategy.

What it all means is that we will need a rustc wrapper. It will need to parse and write JSON. It can be multithreaded (but workers don't have to be in general, it is only relevant for https://docs.bazel.build/versions/2.0.0/multiplex-worker.html, and we need to investigate if multiplex workers have benefits for rustc workloads).

I propose the following (and I'm prepared to invest my time into that unless there is somebody driven to work on it):

solve the bootstrap problem
rewrite process_wrapper in rust
extend the process wrapper to support the worker mode
modify the rules to always support worker mode

What do you all think?

(I saw that @Ubehebe has some experience with workers and was contributing to this repo recently, feel free to give us feedback :) Also CCing @nikhilm, @mfarrugi @djmarcin, @scentini, and @UebelAndre just to be sure all voices are heard.

hlopko · 2021-04-30T09:25:03Z

Actually, let me create a mailing list thread for this discussion, hope it works better for everybody (speak up if not).

hlopko · 2021-04-30T09:29:17Z

Here's the link: https://groups.google.com/g/rules_rust/c/E3MJtgamlD4

dae · 2021-05-04T09:45:48Z

If you have the motivation to drive this forward with a different approach, no objections from me - I'll be happy to see incremental compilation arrive in any form.

google-cla bot added the cla: yes label Apr 8, 2021

dae mentioned this pull request Apr 8, 2021

Persistent worker in C++ #667

Open

dae force-pushed the incremental-noworker branch 2 times, most recently from eb91a9b to a6efa0b Compare April 8, 2021 13:29

dae force-pushed the incremental-noworker branch 2 times, most recently from f469d1e to f516c27 Compare April 9, 2021 01:49

dae force-pushed the incremental-noworker branch from f516c27 to c23df4d Compare April 9, 2021 02:12

dae force-pushed the incremental-noworker branch from c23df4d to 5c1f16f Compare April 9, 2021 02:21

illicitonion approved these changes Apr 21, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incremental compilation without a worker #684

incremental compilation without a worker #684

dae commented Apr 8, 2021 •

edited

Loading

dae commented Apr 8, 2021 •

edited

Loading

dae commented Apr 9, 2021

tschuett commented Apr 9, 2021

illicitonion left a comment

illicitonion Apr 21, 2021

dae Apr 23, 2021

illicitonion Apr 21, 2021

dae Apr 23, 2021

hlopko commented Apr 30, 2021

hlopko commented Apr 30, 2021

hlopko commented Apr 30, 2021

dae commented May 4, 2021

incremental compilation without a worker #684

Are you sure you want to change the base?

incremental compilation without a worker #684

Conversation

dae commented Apr 8, 2021 • edited Loading

dae commented Apr 8, 2021 • edited Loading

dae commented Apr 9, 2021

tschuett commented Apr 9, 2021

illicitonion left a comment

Choose a reason for hiding this comment

illicitonion Apr 21, 2021

Choose a reason for hiding this comment

dae Apr 23, 2021

Choose a reason for hiding this comment

illicitonion Apr 21, 2021

Choose a reason for hiding this comment

dae Apr 23, 2021

Choose a reason for hiding this comment

hlopko commented Apr 30, 2021

hlopko commented Apr 30, 2021

hlopko commented Apr 30, 2021

dae commented May 4, 2021

dae commented Apr 8, 2021 •

edited

Loading

dae commented Apr 8, 2021 •

edited

Loading