Skip to content

Commit ba40f07

Browse files
committed
1 parent 5942456 commit ba40f07

File tree

6 files changed

+1188
-0
lines changed

6 files changed

+1188
-0
lines changed

worker/BUILD

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# load("@rules_proto//proto:defs.bzl", "proto_library")
2+
# load("@io_bazel_rules_rust//proto:proto.bzl", "rust_proto_library")
3+
# load("@io_bazel_rules_rust//proto:toolchain.bzl", "PROTO_COMPILE_DEPS", "rust_proto_toolchain")
4+
load("@io_bazel_rules_rust//rust:rust.bzl", "rust_library", "rust_binary")
5+
6+
rust_library(
7+
name = "rustc_worker",
8+
srcs = [
9+
"src/lib.rs",
10+
"src/worker_protocol.rs",
11+
],
12+
deps = [
13+
"@io_bazel_rules_rust//proto/raze:protobuf",
14+
],
15+
)
16+
17+
rust_binary(
18+
name = "rustc-worker",
19+
srcs = ["src/main.rs"],
20+
deps = [
21+
":rustc_worker",
22+
"@io_bazel_rules_rust//proto/raze:protobuf",
23+
],
24+
)
25+
26+
# rust_proto_toolchain(name = "default-proto-toolchain-impl")
27+
#
28+
# toolchain(
29+
# name = "default-proto-toolchain",
30+
# toolchain = ":default-proto-toolchain-impl",
31+
# toolchain_type = "@io_bazel_rules_rust//proto:toolchain",
32+
# )
33+
#
34+
# proto_library(
35+
# name = "worker_protocol_proto",
36+
# srcs = ["src/worker_protocol.proto"],
37+
# )
38+
#
39+
# rust_proto_library(
40+
# name = "worker_protocol",
41+
# deps = [":worker_protocol_proto"],
42+
# )

worker/README.md

+115
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
rustc-worker
2+
============
3+
4+
rustc-worker is an implementation of [Bazel Persistent
5+
Workers](https://docs.bazel.build/versions/master/persistent-workers.html) for
6+
Rust. It is meant to be used with
7+
[rules_rust](https://github.com/bazelbuild/rules_rust). It can be used to speed
8+
up building Rust code with Bazel by using a shared, incremental cache for
9+
`rustc`.
10+
11+
In a default Bazel execution, each rustc compiler invocation is run in a
12+
sandbox, which means that Bazel builds of Rust code only benefit from Bazel
13+
caching at the crate boundaries. Each rebuild of a crate has to start from
14+
scratch.
15+
16+
In worker mode, a thin wrapper around rustc creates a directory for rustc to
17+
cache its [incremental compilation
18+
artifacts](https://blog.rust-lang.org/2018/02/15/Rust-1.24.html), such that
19+
rebuilding a crate can re-use unchanged parts of the crate.
20+
21+
This is _NOT_ a full persistent worker in the style of the
22+
Java/TypeScript/Scala workers since `rustc` does not offer a "service" mode
23+
where the same compiler process can accept multiple compilation requests and
24+
re-use state like existing parse trees. There is a possibility that some of the
25+
work from [rust-analyzer](https://rust-analyzer.github.io/) could enable such
26+
behavior in the future.
27+
28+
On my Thinkpad x230, building [ninja-rs](https://github.com/nikhilm/ninja-rs),
29+
here are the improvements I see when building the `ninja` binary, with a
30+
comment-only change to `build/src/lib.rs`. (Using the `bazel` branch.)
31+
All times are for debug builds as that is the standard developer workflow,
32+
where incremental builds matter.
33+
34+
```
35+
cargo build (incremental by default) 1.65s
36+
bazel build (without worker) 2.47s
37+
bazel build (with worker) 1.2s
38+
```
39+
40+
Bazel is possibly slightly faster than Cargo due to not paying the cost of startup.
41+
42+
## How to use
43+
44+
This currently requires a special branch of `rules_rust` until it is accepted
45+
and merged into the original rules.
46+
47+
Assuming you are already using `rules_rust`, you will need to make the
48+
following changes to your `WORKSPACE` file.
49+
50+
1. Change your `rules_rust` repository to point to the branch, like this. This
51+
should replace any existing entry for the rules.
52+
53+
```
54+
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
55+
git_repository(
56+
name = "io_bazel_rules_rust",
57+
branch = "persistentworker",
58+
remote = "https://github.com/nikhilm/rules_rust",
59+
)
60+
```
61+
62+
2. Add a repository for the rustc-worker binary for your platform.
63+
64+
```
65+
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_file")
66+
67+
http_file(
68+
name = "rustc_worker",
69+
urls = ["https://github.com/nikhilm/rustc-worker/releases/download/v0.1.0/rustc-worker-linux"],
70+
sha256 = "0e2be6198d556e4972c52ee4bcd76be2c2d8fd74c58e0bf9be195c798e2a9a4e",
71+
executable = True,
72+
)
73+
```
74+
75+
That's it! Bazel 0.27 and higher will use workers by default when available. You can simply build any Rust targets as usual with Bazel.
76+
77+
If you want to play with this, but don't have an existing Rust project handy, you can:
78+
79+
```
80+
git clone https://github.com/nikhilm/ninja-rs
81+
cd ninja-rs
82+
git checkout bazel
83+
bazel build ninja
84+
```
85+
86+
## Design
87+
88+
Incrementality is obtained like this:
89+
90+
1. On startup, the worker creates a [temporary directory](https://github.com/nikhilm/rustc-worker/blob/b840ea9f9276c47b97591d274823da54e4cbd75b/src/lib.rs#L20) uniquely identified by a hash of the path to `rustc` (actually a wrapper from rules\_rust) and the name of the Bazel workspace. This is the incremental cache. This ensures the cache is shared among all instances of rustc workers within the same workspace, but not in other workspaces.
91+
2. Bazel takes care of spawning multiple workers for parallelism. They all share the same cache. Since rustc operates at the crate level, and Bazel's design means that each crate has only one compilation artifact in the workspace, we can be reasonably sure that multiple `rustc` invocations never try to build the same crate at the same time. I'm not sure if this matters.
92+
3. The worker invokes `rustc` for each compilation request with `--codegen incremental=/path/to/cache`.
93+
94+
## Updating the worker protocol
95+
96+
The Worker protocol is described in a [protocol
97+
buffer](https://github.com/bazelbuild/bazel/blob/07e152e508d9926f1ec87cdf33c9970ee2f18a41/src/main/protobuf/worker_protocol.proto).
98+
This protocol will change very rarely, so to simplify the build process, we
99+
vendor the generated code in the tree. This avoids the need for worker
100+
consumers (via Bazel) to build `protoc` and `protobuf-codegen`. If you need to
101+
update this:
102+
103+
1. Make sure `protoc` is installed for your operating system and in the path.
104+
2. `cargo install protobuf-codegen --version 2.8.2`.
105+
3. `protoc --rust_out src/ src/worker_protocol.proto`.
106+
107+
## TODO
108+
109+
[ ] Tests
110+
[ ] How to build with Bazel to bootstrap in rules\_rust.
111+
[ ] Submit PR for rules\_rust.
112+
113+
## Contributing
114+
115+
Please file an issue discussing what you want to do if you are doing any major changes.

worker/src/lib.rs

+101
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
use protobuf::CodedInputStream;
2+
use protobuf::CodedOutputStream;
3+
use protobuf::Message;
4+
use protobuf::ProtobufResult;
5+
use std::collections::hash_map::DefaultHasher;
6+
use std::hash::Hash;
7+
use std::hash::Hasher;
8+
use std::io;
9+
use std::io::BufRead;
10+
use std::path::PathBuf;
11+
12+
mod worker_protocol;
13+
use worker_protocol::WorkRequest;
14+
use worker_protocol::WorkResponse;
15+
16+
pub struct Worker {
17+
program_path: PathBuf,
18+
incremental_dir: std::path::PathBuf,
19+
}
20+
21+
impl Worker {
22+
pub fn new<C: Into<String>>(
23+
program_path: PathBuf,
24+
rustc: PathBuf,
25+
compilation_mode: C,
26+
) -> io::Result<Self> {
27+
// The incremental cache directory includes the rustc wrapper's hash to discriminate
28+
// between multiple workspaces having the same name (usually __main__).
29+
let mut cache_path = std::env::temp_dir();
30+
let mut hasher = DefaultHasher::new();
31+
rustc.hash(&mut hasher);
32+
33+
cache_path.push(format!(
34+
"rustc-worker-{}-{}",
35+
hasher.finish(),
36+
compilation_mode.into()
37+
));
38+
std::fs::create_dir_all(&cache_path)?;
39+
Ok(Worker {
40+
program_path,
41+
incremental_dir: cache_path,
42+
})
43+
}
44+
45+
fn handle_request(&self, request: WorkRequest) -> ProtobufResult<WorkResponse> {
46+
let mut incremental_arg = std::ffi::OsString::from("incremental=");
47+
incremental_arg.push(&self.incremental_dir);
48+
let mut cmd = std::process::Command::new(&self.program_path);
49+
cmd.args(request.get_arguments());
50+
cmd.arg("--codegen");
51+
cmd.arg(incremental_arg);
52+
let output = cmd.output()?;
53+
Ok(WorkResponse {
54+
request_id: request.request_id,
55+
exit_code: output.status.code().unwrap(),
56+
output: String::from_utf8(output.stderr).expect("TODO: use the Result"),
57+
..Default::default()
58+
})
59+
}
60+
61+
pub fn main_loop<R: io::Read, W: io::Write>(
62+
&self,
63+
reader: &mut R,
64+
writer: &mut W,
65+
) -> ProtobufResult<()> {
66+
let mut stream = CodedInputStream::new(reader);
67+
loop {
68+
let msg_len = stream.read_raw_varint32()?;
69+
let limit = stream.push_limit(msg_len as u64)?;
70+
let mut message = WorkRequest::default();
71+
message.merge_from(&mut stream)?;
72+
stream.pop_limit(limit);
73+
74+
let response = self.handle_request(message)?;
75+
let mut output_stream = CodedOutputStream::new(writer);
76+
output_stream.write_raw_varint32(response.compute_size())?;
77+
response.write_to_with_cached_sizes(&mut output_stream)?;
78+
output_stream.flush()?;
79+
writer.flush()?;
80+
}
81+
}
82+
83+
pub fn once_with_response_file<P: AsRef<std::path::Path>>(
84+
&self,
85+
response_file_path: P,
86+
) -> io::Result<std::process::ExitStatus> {
87+
let file = std::io::BufReader::new(std::fs::File::open(response_file_path)?);
88+
89+
let mut cmd = std::process::Command::new(&self.program_path);
90+
for line in file.lines() {
91+
cmd.arg(line?);
92+
}
93+
cmd.status()
94+
}
95+
}
96+
97+
#[cfg(test)]
98+
mod test {
99+
#[test]
100+
fn test_eof() {}
101+
}

worker/src/main.rs

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
use protobuf::ProtobufResult;
2+
3+
fn main() -> ProtobufResult<()> {
4+
let mut args = std::env::args_os().peekable();
5+
// Always discard the executable name.
6+
args.next().unwrap();
7+
8+
let program = std::fs::canonicalize(args.next().expect("program name"))?;
9+
let rustc_path = std::fs::canonicalize(args.next().expect("rustc path"))?;
10+
let compilation_mode = args
11+
.next()
12+
.expect("compilation mode")
13+
.into_string()
14+
.expect("compilation mode must be valid utf-8");
15+
// TODO: program and rustc_path will combine when this is merged into rules_rust.
16+
let worker = rustc_worker::Worker::new(program, rustc_path, compilation_mode)?;
17+
18+
// If started as a persistent worker.
19+
if let Some(arg) = args.peek() {
20+
if arg == "--persistent_worker" {
21+
let stdin = std::io::stdin();
22+
let stdout = std::io::stdout();
23+
let mut stdin_locked = stdin.lock();
24+
let mut stdout_locked = stdout.lock();
25+
return worker.main_loop(&mut stdin_locked, &mut stdout_locked);
26+
}
27+
}
28+
29+
// Spawn process as normal.
30+
// The process wrapper does not support response files.
31+
let response_file_arg = args
32+
.next()
33+
.unwrap()
34+
.into_string()
35+
.expect("response file path is valid utf-8");
36+
// The response file has to be the last (and only) argument left.
37+
assert!(args.peek().is_none(), "iterator should be consumed!");
38+
assert!(response_file_arg.starts_with("@"));
39+
let response_file_path = &response_file_arg[1..];
40+
let status = worker.once_with_response_file(response_file_path)?;
41+
std::process::exit(status.code().unwrap());
42+
}

worker/src/worker_protocol.proto

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
// Copyright 2015 The Bazel Authors. All rights reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
syntax = "proto3";
16+
17+
package blaze.worker;
18+
19+
option java_package = "com.google.devtools.build.lib.worker";
20+
21+
// An input file.
22+
message Input {
23+
// The path in the file system where to read this input artifact from. This is
24+
// either a path relative to the execution root (the worker process is
25+
// launched with the working directory set to the execution root), or an
26+
// absolute path.
27+
string path = 1;
28+
29+
// A hash-value of the contents. The format of the contents is unspecified and
30+
// the digest should be treated as an opaque token.
31+
bytes digest = 2;
32+
}
33+
34+
// This represents a single work unit that Blaze sends to the worker.
35+
message WorkRequest {
36+
repeated string arguments = 1;
37+
38+
// The inputs that the worker is allowed to read during execution of this
39+
// request.
40+
repeated Input inputs = 2;
41+
42+
// To support multiplex worker, each WorkRequest must have an unique ID. This
43+
// ID should be attached unchanged to the WorkResponse.
44+
int32 request_id = 3;
45+
}
46+
47+
// The worker sends this message to Blaze when it finished its work on the
48+
// WorkRequest message.
49+
message WorkResponse {
50+
int32 exit_code = 1;
51+
52+
// This is printed to the user after the WorkResponse has been received and is
53+
// supposed to contain compiler warnings / errors etc. - thus we'll use a
54+
// string type here, which gives us UTF-8 encoding.
55+
string output = 2;
56+
57+
// To support multiplex worker, each WorkResponse must have an unique ID.
58+
// Since worker processes which support multiplex worker will handle multiple
59+
// WorkRequests in parallel, this ID will be used to determined which
60+
// WorkerProxy does this WorkResponse belong to.
61+
int32 request_id = 3;
62+
}

0 commit comments

Comments
 (0)