Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are official Corretto Linux binaries built? #129

Open
airbnb-gps opened this issue May 17, 2023 · 5 comments
Open

How are official Corretto Linux binaries built? #129

airbnb-gps opened this issue May 17, 2023 · 5 comments
Labels
question Further information is requested

Comments

@airbnb-gps
Copy link

From forensic breadcrumbs left in the ELF binaries, it looks like the canonical Linux ELF binaries of Corretto are built from a RedHat flavor Linux distro (possibly RHEL/CentOS 7 based on presence of GCC 4.8). These are then repackaged into .deb archives. (Which is fine: a .deb is a glorified tar archive and just a medium for expressing file installs.)

But if I look at GitHub Actions CI in this repo, I only see references to building on Ubuntu 20.04.

It certainly seems like the official Corretto binaries are built using a different mechanism from the transparent/reproducible GitHub Actions configs in this repo.

Are details of Corretto's official binary building public? If I wanted to reproduce them, how would I go about that?

@navyxliu
Copy link

navyxliu commented Aug 1, 2023

hi, @airbnb-gps,

The workflow that you observed is not from Corretto project. It's from upstream project. openjdk builds linux artifacts on ubuntu 20.04 so it can verify every commit and pull-request.

Internally, we use the same gradle scripts in this repo to build Correct artifacts. There are additional steps such as signing but we are not ready to disclose.

We offer accesses to both release build and nightly build. Is that good enough for your study?
https://downloads.corretto.aws/#/downloads?build=nightly&version=17

@navyxliu navyxliu added the question Further information is requested label Aug 1, 2023
@airbnb-gps
Copy link
Author

Yeah, I figured that you use the Corretto-only Gradle build system. I was just lost on what the OS build environment is like.

A pattern I've seen with distributable Linux binaries is:

  1. libc.so.6 and other ABI compatibility is hard
  2. Build binaries on an old Linux distro - maybe with an ancient GCC version - to achieve ABI compatibility.
  3. Ship ELF binaries that work on every modern Linux machine
  4. Implicitly leave a lot of performance opportunity on table because of use of old compiler and failure to leverage modern libc or kernel features.

As part of evaluating 4 by building Corretto from source in house (using a modern compiler toolchain and a build environment that matches our deploy environment) I wanted to attempt to reproduce the official Corretto builds from source so I could establish a trusted baseline and isolate factors contributing to performance changes. But without being able to reproduce the Corretto build environment, I'm out of luck. Hence this issue.

@navyxliu
Copy link

navyxliu commented Aug 2, 2023

hi,

I understand. It's possible to bump toolchains, but it's really hard to use newer glibc or the generic linux binary would break on old systems. yes, we do leave some performance on table.

I am not sure if there are a lot. Hotspot isn't an application. Most of runtime is spent in JIT-generated code. Therefore, the code quality of C++ compiler is less important than ordinary C++ programs. In addition, hotspot developers have implemented fundamental libraries such as atomic, thread rather than using the standard libraries. Improvement of glibc and GCC may have less impact than you thought.

That being said, I don't have quantitative data for my guess. If you really want to find out, I recommend evaluating openjdk tip. Evaluation and analysis shared with the openjdk mailing list will benefit the entire Java community.

@airbnb-gps
Copy link
Author

Yeah, I thought the C/C++ within OpenJDK wouldn't be as performance critical due to JIT dominating execution time. But we have just enough performance profiles with the hotspot frames and other C code in the stack to make us venture down this path of exploring optimizing the statically compiled hotspot code. At our scale, a <1% gain is meaningful.

In our case, we don't have the glibc ABI compatibility problem since we're compiling a binary on the same distro version that we're running it on.

@lutkerd
Copy link
Contributor

lutkerd commented Aug 3, 2023

What distro and GLIBC version are you targetting? What GCC version are you testing with?

The AmazonLinux 2 and Amazon Linux 2023 versions target those platforms and will use a much newer GLIBC and GCC 10.x. There are further GCC improvements for aarch64 we are looking at for a future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants