Skip to content

build: Build all cargo executables at once #20221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 1, 2023

Conversation

def-
Copy link
Contributor

@def- def- commented Jun 28, 2023

Proof from the build logs that it works: https://buildkite.com/materialize/tests/builds/58594

[2023-06-29T10:49:58Z] Common cargo-build for: ['environmentd', 'clusterd', 'environmentd', 'clusterd', 'sqllogictest', 'testdrive']
[2023-06-29T10:49:58Z] $ env 'RUSTFLAGS=-Clink-arg=-Wl,--compress-debug-sections=zlib -Csymbol-mangling-version=v0 -Clink-arg=-fuse-ld=lld -L/opt/x-tools/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/sysroot/lib' cargo build --target x86_64-unknown-linux-gnu --bin environmentd --bin clusterd --bin environmentd --bin clusterd --bin sqllogictest --bin testdrive --release

Runtimes were 10 min for x86-64 and 16 min for aarch64, which is faster than I expected. But this is not entirely from scratch.

Fixes https://github.com/MaterializeInc/database-issues/issues/6024

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@def- def- force-pushed the pr-build-speedup branch 4 times, most recently from 5edea55 to 345f875 Compare June 29, 2023 08:09
@def- def- changed the title Pr build speedup build: Build all cargo executables at once Jun 29, 2023
@def- def- force-pushed the pr-build-speedup branch 2 times, most recently from f500a91 to c8310af Compare June 29, 2023 10:48
@def- def- requested review from benesch and philip-stoev June 29, 2023 12:16
@def- def- marked this pull request as ready for review June 29, 2023 12:17
@def- def- requested a review from a team as a code owner June 29, 2023 12:17
@def- def- force-pushed the pr-build-speedup branch 2 times, most recently from a8fa067 to 311a874 Compare June 29, 2023 13:55
Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This came out much cleaner than I was expecting.

@philip-stoev
Copy link
Contributor

I defer to @benesch on this one and am removing myself as a reviewer.

I think we can remove the antithesis stuff at this time. When/if we decide to re-engage them, most likely all of that directory will need to be rewritten anyway, e.g. the materialized container does not have CRDB in it at all, so the Dockerfile is invalid for the current Mz

@philip-stoev philip-stoev removed their request for review June 30, 2023 06:30
@def- def- force-pushed the pr-build-speedup branch from 311a874 to fea9288 Compare June 30, 2023 09:11
@def-
Copy link
Contributor Author

def- commented Jun 30, 2023

Another test run to prove that everything still works after the slight refactoring: https://buildkite.com/materialize/tests/builds/58655#01890b9c-c4ef-4862-ae02-1ac9ebc9ffd0

@def- def- requested review from benesch and danhhz June 30, 2023 10:06
@def- def- force-pushed the pr-build-speedup branch from fea9288 to 8246734 Compare June 30, 2023 15:28
def generate_cargo_build_command(
bins: List[str],
examples: List[str],
rd: RepositoryDetails,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be a bit cleaner to push the rd extraction into this function.

@benesch
Copy link
Contributor

benesch commented Jun 30, 2023

Gonna noodle on a refactor, one second.

@benesch benesch force-pushed the pr-build-speedup branch from 8246734 to 1ae4655 Compare June 30, 2023 18:17
@benesch benesch requested a review from a team as a code owner June 30, 2023 18:17
@benesch
Copy link
Contributor

benesch commented Jun 30, 2023

I pushed up two changes:

  • A refactor to the common_cargo_build code to generalize it to apply to any PreImage subclass. Now any other PreImage can choose to declare a prepare_batch in order to perform any operations that are more efficient when performed in bulk.
  • Parallelization of docker push commands.

I've not tested any of this. Going to test by pushing up some dummy changes to this PR.

@benesch benesch force-pushed the pr-build-speedup branch from 5548132 to e5c3cf6 Compare June 30, 2023 19:22
@benesch
Copy link
Contributor

benesch commented Jun 30, 2023

I'm worried that the change to build all binaries and examples in parallel causes OOMs. The aarch64 build failed in my test: https://buildkite.com/materialize/tests/builds/58695#01890da6-8704-41e7-bbbd-d0983ebaf30b

The presentation of that failure wasn't obviously an OOM, but OOMing while doing parallel links is a known problem: rust-lang/cargo#9157

@def-
Copy link
Contributor Author

def- commented Jun 30, 2023

Unfortunately the "agent lost" seems to be a common way OoMs in CI manifest. On a 64 GB system, that seems crazy. Locally even with cargo build -j32 I haven't been able to go above 40 GB, maybe something is different on aarch64 though and the timing ends up with more link jobs at the same time. I see two potential solutions:

  1. If it's the actual linking step that is going OoM, if we use a very parallel linker like mold (ci: Use mold linker to speed up building #20159) then we could wrap that in a small shell script so that only one linker process is running at a time:
#!/bin/bash

LOCK_FILE="/tmp/linker.lock"

while ! (set -o noclobber; (echo $$ > "$LOCK_FILE") 2> /dev/null); done
  sleep 1
done

cleanup() {
  rm -f "$LOCK_FILE"
}

trap cleanup EXIT HUP INT QUIT TERM

exec /usr/local/bin/realmoldbinaryhere
  1. If it is the Rust processes, then we can split the cargo build into parts of max 6 executables, so instead of Common cargo build for: ['metabase-smoketest', 'clusterd', 'environmentd', 'kgen', 'clusterd', 'environmentd', 'mz', 'clusterd', 'sqllogictest', 'testdrive', 'persistcli', 'persistcli'] we'd have
Common cargo build for: ['metabase-smoketest', 'clusterd', 'environmentd', 'kgen', 'clusterd', 'environmentd']
Common cargo build for: ['mz', 'clusterd', 'sqllogictest', 'testdrive', 'persistcli', 'persistcli']

Based on my local experiments it seems to be the rustc processes using the memory actually, so have to go with (2) probably. I'll put up a change for that and see if that stays ok on CI.

@benesch
Copy link
Contributor

benesch commented Jun 30, 2023

Before you spend too long on this, Dennis, I think we might just be using wrong EC2 instance type. We're currently using compute optimized instances. That simply may not be the right ratio of CPU:memory (1:2) for compiling Rust. We should consider using general purpose instances instead (1:4).

Do you have 32 cores locally? If not, -j 32 is probably not representative of what compiles look like on the CI machines.

@def-
Copy link
Contributor Author

def- commented Jun 30, 2023

Only 16 threads locally, so not that realistic.

The implementation was not too much effort:

diff --git a/misc/python/materialize/mzbuild.py b/misc/python/materialize/mzbuild.py
index 73afd171be..9e2b7e816d 100644
--- a/misc/python/materialize/mzbuild.py
+++ b/misc/python/materialize/mzbuild.py
@@ -41,6 +41,7 @@ from typing import (
     Optional,
     Sequence,
     Set,
+    Tuple,
     cast,
 )

@@ -381,9 +382,21 @@ class CargoBuild(CargoPreImage):
             examples.update(build.examples)
         assert rd

-        ui.say(f"Common cargo build for: {', '.join(bins | examples)}")
-        cargo_build = cls.generate_cargo_build_command(rd, list(bins), list(examples))
-        spawn.runv(cargo_build, cwd=rd.root)
+        def chunks(
+            list1: List[str], list2: List[str], n: int
+        ) -> Iterator[Tuple[List[str], List[str]]]:
+            for i in range(0, len(list1) + len(list2), n):
+                chunk1 = list1[i : i + n]
+                chunk2 = list2[max(0, i - len(list1)) : max(0, i - len(list1) + n)]
+                yield (chunk1, chunk2)
+
+        # Workaround for linking OoMs, see https://github.com/rust-lang/cargo/issues/9157
+        for (bins_chunk, examples_chunk) in chunks(list(bins), list(examples), 6):
+            ui.say(f"Common cargo build for: {', '.join(bins_chunk + examples_chunk)}")
+            cargo_build = cls.generate_cargo_build_command(
+                rd, bins_chunk, examples_chunk
+            )
+            spawn.runv(cargo_build, cwd=rd.root)

     def build(self) -> None:
         cargo_profile = "release" if self.rd.release_mode else "debug"

My guess is that we are still pretty CPU-constrained during builds, so going down to half the CPU would hurt us. Doubling memory would be fine of course, but we might be utilizing less than half of it in most cases. The move would be c6g.8xlarge -> m6g.8xlarge for aarch64 which seems about 16% more expensive. If that's acceptable, I have no problem with switching to general purpose for the builders.

@def- def- force-pushed the pr-build-speedup branch 2 times, most recently from 7829100 to 20c719f Compare June 30, 2023 23:29
Copy link
Contributor Author

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# We don't use `docker push --quiet`, as that disables progress
# output entirely.
push = subprocess.Popen(
f"docker push {shlex.quote(image)} | cat",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming our docker version is new enough NO_COLOR=1 in environment should work too: moby/buildkit#2954

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect, though I did not check, that NO_COLOR=1 disables escape codes related to coloring, but not to rendering the progress bars that update in place. That's a different set of escape codes.

spawn.runv(["docker", "push", dep.spec()])
images_to_push.append(dep.spec())

# Push all Docker images in parallel to minimize build time.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was easier than expected. I was looking for a way to have a single docker push call upload multiple images, went in the wrong direction :)

The ARM build is stubbornly slower than the x86_64 build. Since we're
not actively running Materialize on ARM in production right now, stop
running the ARM build on PRs, and switch to running it only on main.
This way we'll still quickly catch bugs that cause us to stop compiling
on ARM, but we'll spend less money and waste less developer time in
doing so.
@benesch
Copy link
Contributor

benesch commented Jul 1, 2023

I pushed up one last change to get the relatively slower ARM builds out of PR builds. Merging now. We'll see how this fares over the next few days.

@benesch benesch merged commit 1d08b9b into MaterializeInc:main Jul 1, 2023
@benesch
Copy link
Contributor

benesch commented Jul 1, 2023

Thanks so much for turning this around so quickly, @def-!

@def- def- deleted the pr-build-speedup branch July 1, 2023 07:07
@def-
Copy link
Contributor Author

def- commented Jul 1, 2023

Thanks for the two additional changes @benesch !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants