Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Future of CI after Cirrus pricing change #1392

Open
13 of 17 tasks
real-or-random opened this issue Aug 4, 2023 · 37 comments
Open
13 of 17 tasks

ci: Future of CI after Cirrus pricing change #1392

real-or-random opened this issue Aug 4, 2023 · 37 comments
Labels

Comments

@real-or-random
Copy link
Contributor

real-or-random commented Aug 4, 2023

Roadmap (keeping this up to date):

I think the natural way forward for us is:

Possible follow-ups:

Other related PRs:


Corresponding Bitcoin Core issue: bitcoin/bitcoin#28098

Cirrus CI will cap the community cluster, see cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci. As with Core, the pricing model makes it totally unreasonable to pay for compute credits (multiple thousand USD / month).

The plan in Bitcoin Core is to move native Windows+macOS tasks to GitHub Actions, and move Linux tasks to persistent workers (=self-hosted). If I read the Bitcoin Core IRC meeting notes correctly, @MarcoFalke said these workers will also be available for libsecp256k1.

But the devil is in the details:

For macOS, we need to take also #1153 into account. It seems that GitHub-hosted macOS runners are on x86_64. The good news is that Valgrind should work again then, but the (very) bad is that this will reduce our number of native ARM tasks to zero. We still have some QEMU tasks, but we can't even the run the Valgrind cttimetests on them (maybe this would now work with MSan?!) @MarcoFalke Are the self-hosted runners only x86_64?

For Linux tasks, the meeting notes say that the main reason for using persistent workers is that some tasks require a very specific environment (e.g., the USDT ASan job). I don't think we have such requirements, so I tend to think that moving everything to GitHub Actions is a bit cleaner for us. With a persistent worker, Cirrus CI anyway acts only as a "coordination layer" between the worker and GitHub. Yet another way is to the self-hosted runners with GitHub Actions, see my comment bitcoin/bitcoin#28098 (comment)).

@maflcko
Copy link
Contributor

maflcko commented Aug 4, 2023

Are the self-hosted runners only x86_64?

There is one aarch64 one. (It is required because GitHub doesn't offer aarch64 Linux boxes, and Google Cloud doesn't offer an aarch64 CPU that can run armhf 32-bit binaries)

@real-or-random
Copy link
Contributor Author

Ok, then it probably makes sense to do what I suggested in #1153, namely move ARM tasks to Linux, and reduce the number of our macOS tasks.

@maflcko
Copy link
Contributor

maflcko commented Aug 4, 2023

moving everything to GitHub Actions is a bit cleaner for us

Sounds interesting. I wonder how (and if) docker images can be cached, along with ccache, etc...

@real-or-random
Copy link
Contributor Author

real-or-random commented Aug 4, 2023

moving everything to GitHub Actions is a bit cleaner for us

Sounds interesting. I wonder how (and if) docker images can be cached, along with ccache, etc...

Yeah, we'll need to see.

And I agree that "in the short run it seems easier to stick to Cirrus for now, because the diff is a lot smaller (just replace container: in the yml with persistent_worker:, etc)" (bitcoin/bitcoin#28098 (comment)). We should probably do this first, and then see if we're interested in moving to GitHub Actions fully.

edit: I updated the roadmap above.

@hebasto
Copy link
Member

hebasto commented Aug 5, 2023

For macOS, we need to take also #1153 into account. It seems that GitHub-hosted macOS runners are on x86_64. The good news is that Valgrind should work again then...

For such a case, it is good to see some progress in #1274 :)

@hebasto
Copy link
Member

hebasto commented Aug 7, 2023

moving everything to GitHub Actions is a bit cleaner for us

Sounds interesting. I wonder how (and if) docker images can be cached, along with ccache, etc...

See #1396.

@hebasto
Copy link
Member

hebasto commented Aug 7, 2023

There are open PRs for all of the mentioned items. It would be more productive, if we somehow prioritise them to spend our time until Sept. 1st more effectively.

@maflcko
Copy link
Contributor

maflcko commented Aug 8, 2023

It would be more productive, if we somehow prioritise them to spend our time until Sept. 1st more effectively.

I'd say the Windows/macOS ones are probably easier, since they don't require write permission and don't have to deal with docker image caching.

@real-or-random
Copy link
Contributor Author

Yes, we should in principle proceed in the order of the list above. But it doesn't need to be very strict. For example, if it turns out that #1396 is ready by Sep 1st, we can skip "Move Linux tasks to the Bitcoin Core persistent workers".

real-or-random added a commit that referenced this issue Aug 9, 2023
a2f7ccd ci: Run "Windows (VS 2022)" job on GitHub Actions (Hennadii Stepanov)

Pull request description:

  This PR solves one item in #1392.

  In response to upcoming [limiting free usage of Cirrus CI](https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/), suggesting to move (partially?) CI tasks/jobs from Cirrus CI to [GitHub Actions](https://docs.github.com/actions) (GHA).

  Here is example from my personal repo: https://github.com/hebasto/secp256k1/actions/runs/5806269046.

  For security concerns, see:
  - bitcoin/bitcoin#28098 (comment)
  - bitcoin/bitcoin#28098 (comment)

  I'm suggesting the repository "Actions permissions" as follows:

  ![image](https://github.com/bitcoin-core/secp256k1/assets/32963518/bd18d489-784f-48ba-b599-ed1c4dfc34fa)

  ![image](https://github.com/bitcoin-core/secp256k1/assets/32963518/632280e0-9c26-42eb-a0ed-24f9a8142faa)

  ---

  See build logs in my personal repo: https://github.com/hebasto/secp256k1/actions/runs/5692587475.

ACKs for top commit:
  real-or-random:
    utACK a2f7ccd

Tree-SHA512: b6329a29391146e3cdee9a56f6151b6672aa45837dfaacb708ba4209719801ed029a6928d638d314b71c7533d927d771b3eca4b9e740cfcf580a40ba07970ae4
@hebasto
Copy link
Member

hebasto commented Aug 14, 2023

  • Move Linux tasks to the Bitcoin Core persistent workers

It seems reasonable to split this task in two ones, depending on the underlying architecture: x86_64 and arm64, because the GitHub hosted runners lack support for arm64.

@real-or-random
Copy link
Contributor Author

real-or-random commented Aug 15, 2023

@hebasto Hm, we currently don't have native Linux arm64 jobs, so we can't "move" them over. We could add some (see #1163 and #1394 (comment)).

I tend to think that is also acceptable to wait for github/roadmap#528, it's currently planned for the end of the year. Then we could move macOS back to ARM. Until that happens, perhaps we can add a QEMU jobs that run the ctimetests on MSan (clang-only) at least. Note to self: We need apt-get install libclang-rt-dev:arm64 and this works with

HOST="aarch64-linux-gnu" CC="clang --target=aarch64-linux-gnu" WRAPPER_CMD="qemu-aarch64"

(The real tests fail with msan enabled on qemu. I think this is because the stack will explode.)

I updated the list above with optional items.

@maflcko
Copy link
Contributor

maflcko commented Aug 15, 2023

qemu-arm is a bit slower than native aarch64. You can use the already existing persistent worker, if you want:

https://github.com/bitcoin/bitcoin/blob/cd43a8444ba44f86ddbb313a03a2782482beda89/.cirrus.yml#L210-L212

(Currently not set up for this repo, but should be some time this week)

@real-or-random
Copy link
Contributor Author

Sure, that's an easy option. I just think we're currently playing around with the idea to move everything to GHA, if it's feasible for this repo.

real-or-random added a commit that referenced this issue Aug 21, 2023
…ons job

e10878f ci, gha: Drop `driver-opts.network` input for `setup-buildx-action` (Hennadii Stepanov)
4ad4914 ci, gha: Add `retry_builder` Docker image builder (Hennadii Stepanov)
6617a62 ci: Remove "x86_64: Linux (Debian stable)" task from Cirrus CI (Hennadii Stepanov)
03c9e65 ci, gha: Add "x86_64: Linux (Debian stable)" GitHub Actions job (Hennadii Stepanov)
ad3e65d ci: Remove GCC build files and sage to reduce size of Docker image (Tim Ruffing)

Pull request description:

  Solves one item in #1392 partially.

ACKs for top commit:
  real-or-random:
    ACK e10878f

Tree-SHA512: 1e685b1a6a41b4be97b9b5bb0fe546c3f1f7daac9374146ca05ab29803d5945a038294ce3ab77489bd971ffce9789ece722e0e0f268b6a7e6483a3aa782d532d
real-or-random added a commit that referenced this issue Aug 22, 2023
…tHub Actions

fc3dea2 ci: Move "ppc64le: Linux..." from Cirrus to GitHub Actions (Hennadii Stepanov)
7782dc8 ci: Move "ARM64: Linux..." from Cirrus to GitHub Actions (Hennadii Stepanov)
0a16de6 ci: Move "ARM32: Linux..." from Cirrus to GitHub Actions (Hennadii Stepanov)
ea33914 ci: Move "s390x (big-endian): Linux..." from Cirrus to GitHub Actions (Hennadii Stepanov)
880be8a ci: Move "i686: Linux (Debian stable)" from Cirrus to GiHub Actions (Hennadii Stepanov)

Pull request description:

  Move more non-x86_64 tasks from Cirrus CI to GitHub Actions.

  Solves one item in #1392 partially.

ACKs for top commit:
  real-or-random:
    ACK fc3dea2 but still waiting for Cirrus

Tree-SHA512: 9a910b3ee500aa34fc4db827f8b2a50bcfb637a9e59f4ad32545634772b397ce80b31a18723f4605dc42aa19a5632292943102099f7720f87de1da454da068b0
@hebasto
Copy link
Member

hebasto commented Aug 24, 2023

While it worked on macOS Catalina back in time, it seems a couple of suppression for /usr/lib/libSystem.B.dylib and /usr/lib/dyld are needed.

Branch (POC) -- https://github.com/hebasto/secp256k1/tree/230824-valgrind
CI -- https://github.com/hebasto/secp256k1/actions/runs/5967987235

@real-or-random
Copy link
Contributor Author

Oh thanks for checking. Have you tried the supplied suppression file (https://github.com/LouisBrunner/valgrind-macos/blob/main/darwin19.supp)? If it doesn't solve the problem, we could try to upstream the additional suppressions, see also LouisBrunner/valgrind-macos#15.

@hebasto
Copy link
Member

hebasto commented Aug 25, 2023

Have you tried the supplied suppression file (LouisBrunner/valgrind-macos@main/darwin19.supp)?

Yes, I have. It does not change the outcome.

UPD. I used https://github.com/LouisBrunner/valgrind-macos/blob/main/darwin22.supp as we run Ventura.

@real-or-random
Copy link
Contributor Author

Do you think maintaining the suppressions is a problem? I don't think it's a big deal.

UPD. I used LouisBrunner/valgrind-macos@main/darwin22.supp as we run Ventura.

Okay, sure, I got confused and looked at the wrong file.

@hebasto
Copy link
Member

hebasto commented Aug 25, 2023

Do you think maintaining the suppressions is a problem? I don't think it's a big deal.

You mean, in this repository?

@real-or-random
Copy link
Contributor Author

Do you think maintaining the suppressions is a problem? I don't think it's a big deal.

You mean, in this repository?

Yes... I don't think it will be a lot of work, but I guess we should still submit it upstream first. If they merge it quickly, then it's easiest for us. I can take care if you don't have the bandwidth.

@hebasto
Copy link
Member

hebasto commented Aug 25, 2023

While it worked on macOS Catalina back in time, it seems a couple of suppression for /usr/lib/libSystem.B.dylib and /usr/lib/dyld are needed.

FWIW, it works with no additional suppressions on macos-12.

@hebasto
Copy link
Member

hebasto commented Aug 25, 2023

I can take care if you don't have the bandwidth.

It would be nice because I have no x86_64 macOS Ventura available.

@real-or-random
Copy link
Contributor Author

FWIW, it works with no additional suppressions on macos-12.

Oh ok, should we then just use this for now?

I can take care if you don't have the bandwidth.

It would be nice because I have no x86_64 macOS Ventura available.

I don't have any macOS available. ;)

@hebasto
Copy link
Member

hebasto commented Aug 26, 2023

FWIW, it works with no additional suppressions on macos-12.

Oh ok, should we then just use this for now?

Done in #1412.

@hebasto
Copy link
Member

hebasto commented Aug 26, 2023

Do you think maintaining the suppressions is a problem? I don't think it's a big deal.

You mean, in this repository?

Yes... I don't think it will be a lot of work, but I guess we should still submit it upstream first.

See LouisBrunner/valgrind-macos#96 as a first step.

real-or-random added a commit that referenced this issue Aug 29, 2023
c223d7e ci: Switch macOS from Ventura to Monterey and add Valgrind (Hennadii Stepanov)

Pull request description:

  This PR switches the macOS native job from Ventura to Monterey, which allows to support Valgrind.

  Both runners--`macos-12` and `macos-13`--have the same clang compilers installed:
  - https://github.com/actions/runner-images/blob/main/images/macos/macos-12-Readme.md
  - https://github.com/actions/runner-images/blob/main/images/macos/macos-13-Readme.md

  But Valgrind works fine on macOS Monterey, but not on Ventura.

  See: #1392 (comment).

  The Homebrew's Valgrind package is cached once it has been built (as it was before #1152). Therefore, the `actions/cache@*` action is needed to be added to the list of the allowed actions.

  #1412 (comment):
  > By the way, this solves #1151.

ACKs for top commit:
  real-or-random:
    ACK c223d7e I tested that a cttest failure makes CI fail: https://github.com/real-or-random/secp256k1/actions/runs/6010365844

Tree-SHA512: 5e72d89fd4d82acbda8adeda7106db0dad85162cca03abe8eae9a40393997ba36a84ad7b12c4b32aec5e9230f275738ef12169994cd530952e2b0b963449b231
real-or-random added a commit that referenced this issue Sep 4, 2023
2635068 ci/gha: Let MSan continue checking after errors in all jobs (Tim Ruffing)
e78c7b6 ci/Dockerfile: Reduce size of Docker image further (Tim Ruffing)
2f0d3bb ci/Dockerfile: Warn if `ulimit -n` is too high when running Docker (Tim Ruffing)
4b8a647 ci/gha: Add ARM64 QEMU jobs for clang and clang-snapshot (Tim Ruffing)
6ebe7d2 ci/Dockerfile: Always use versioned clang packages (Tim Ruffing)

Pull request description:

  Solves one item in #1392.

  This PR also has a few tweaks to the Dockerfile, see individual commits.

  ---

  I'll follow up soon with a PR for ARM64/gcc. This will rely on Cirrus CI.

ACKs for top commit:
  hebasto:
    ACK 2635068.

Tree-SHA512: d290bdd8e8e2a2a2b6ccb1b25ecdc9662c51dab745068a98044b9abed75232d13cb9d2ddc2c63c908dcff6a12317f0c7a35db3288c57bc3b814793f7fce059fd
@real-or-random
Copy link
Contributor Author

real-or-random commented Sep 5, 2023

  • Add a task for ctimetest on ARM64/Linux/Valgrind on Cirrus CI using free minutes or the self-hosted runner

Hm, it appears that Cirrus' "Dockerfile as a CI environment" feature won't work with persistent workers (see #1418). Now that I think about it, that's somewhat expected (e.g., where should the built images be pushed?).

Alternatives:

I think we should do one of the last two?

@maflcko
Copy link
Contributor

maflcko commented Sep 5, 2023

A persistent worker will persist the docker image itself, after the first run on the hardware. I think all you need to do is call

podman image --file $docker_file --name --env $bla --name $bla_image_name && podman container kill $ci_bla_name && podman run -it --rm --name $ci_bla_name $bla_image_name ./ci.sh

Alternatively it may be possible to find a sponsor to cover the cost (if it is not too high) on cirrus directly, while native arm64 isn't on GHA.

I can look at the llvm issue next week, if time permits.

@real-or-random
Copy link
Contributor Author

real-or-random commented Sep 5, 2023

A persistent worker will persist the docker image itself, after the first run on the hardware.

Thanks for chiming in. Wouldn't we also need to make sure that images get pruned from time to time? Or does podman handle this automatically?

podman image --file $docker_file --name --env $bla --name $bla_image_name && podman container kill $ci_bla_name && podman run -it --rm --name $ci_bla_name $bla_image_name ./ci.sh

I assume the first step performs the caching automatically, rebuildung layers only as necessary? Sorry, I'm not familiar with podman, I have only used Docker so far.

Alternatively it may be possible to find a sponsor to cover the cost (if it is not too high) on cirrus directly, while native arm64 isn't on GHA.

Right, yeah, I'm just not sure if I want to spend time on this.

I can look at the llvm issue next week, if time permits.

Ok sure, but I recommend not spending too much time on it. It also won't help with GCC (I added a note above).

@maflcko
Copy link
Contributor

maflcko commented Sep 5, 2023

Thanks for chiming in. Wouldn't we also need to make sure that images get pruned from time to time? Or does podman handle this automatically?

Yeah, you can also run podman image prune, if you want. Pull requests to bitcoin-core/gui should already run it on the same machines, but that seems fragile to rely on.

See:

https://github.com/bitcoin-core/gui/blob/9d3b216e009a53ffcecd57e7f10df15cccd5fd6d/ci/test/04_install.sh#L30

I assume the first step performs the caching automatically, rebuildung layers only as necessary? Sorry, I'm not familiar with podman, I have only used Docker so far.

Yes, it is the same. You should be able to use docker as well, if you want, which is podman-docker.

Right, yeah, I'm just not sure if I want to spend time on this.

If you mean reaching out to a sponsor, I am happy to reach out, if there is a cost estimate.

@real-or-random
Copy link
Contributor Author

Okay, then I think this approach is probably simpler than I expected. I'm not sure if I have the time this week, but I'll look into that soon. (Or @hebasto, if you want to give it a try, feel free to go ahead, of course. My plan was to simply "abuse" the existing Dockerfile to avoid maintaining a second one, at the cost of a somewhat larger image. The existing file should build fine except that debian won't let you install an arm64 cross-compiler on arm64. So we'd need to add some check to skip these packages when we're on arm64, see https://github.com/bitcoin-core/secp256k1/pull/1163/files#diff-751ef1d9fd31c5787e12221f590262dcf7d96cfb166d456e06bd0ccab115b60d .)

If you mean reaching out to a sponsor, I am happy to reach out, if there is a cost estimate.

Okay, thanks, but let's first try docker/podman then.

@maflcko
Copy link
Contributor

maflcko commented Sep 21, 2023

Anything left to be done here?

@real-or-random
Copy link
Contributor Author

The migration is done, but there are still a few unticked checkboxes. (And I've just added two.) None of them are crucial, but I plan to work on them soon, so I'd like to keep this open for now. We could also close this issue here and add a new tracking issue, or open separate issues for the remaining items, if people think that makes tracking easier.

@maflcko
Copy link
Contributor

maflcko commented Oct 3, 2023

https://github.blog/2023-10-02-introducing-the-new-apple-silicon-powered-m1-macos-larger-runner-for-github-actions/

"With today’s launch, our macOS larger runners will be priced at $0.16/minute for XL and $0.12/minute for large."

@real-or-random
Copy link
Contributor Author

github.blog/2023-10-02-introducing-the-new-apple-silicon-powered-m1-macos-larger-runner-for-github-actions

"With today’s launch, our macOS larger runners will be priced at $0.16/minute for XL and $0.12/minute for large."

This is a price decrease for private repos, and GHA remains free for public repos.

@maflcko
Copy link
Contributor

maflcko commented Oct 3, 2023

Are large runners available for public repos?

@real-or-random
Copy link
Contributor Author

Are large runners available for public repos?

Ha, okay, you're right. No, "larger runners" are always billed per minute, i.e., they're not free for public repos. And it seems that they're not planning to provide M1 "standard runners". At least github/roadmap#528 (comment) has been closed now. That means we should stick to the Cirrus runners for ARM.

@real-or-random
Copy link
Contributor Author

And it seems that they're not planning to provide M1 "standard runners"

That has changed now: https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants