Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with dynamically linked binaries (e.g. no OpenSSL 3 compatibilty) #84

Closed
Jackenmen opened this issue Jul 5, 2022 · 18 comments · Fixed by #105
Closed

Problems with dynamically linked binaries (e.g. no OpenSSL 3 compatibilty) #84

Jackenmen opened this issue Jul 5, 2022 · 18 comments · Fixed by #105
Labels
good first issue Good for newcomers

Comments

@Jackenmen
Copy link

Jackenmen commented Jul 5, 2022

Some of the newer OSes (notably Ubuntu 22.04 LTS Jammy Jellyfish) only ship with OpenSSL 3.0 and do not have an OpenSSL 1.1 package. This results in an error when trying to run the executable built by this project as Ubuntu 20.04 which the executables here are built with ships with OpenSSL 1.1:

/home/ubuntu/.cargo/bin/cargo-install-update: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory

This is problematic because different OSes will have different versions of libraries available. I feel like linking more things statically would be helpful here. To allow to support many currently used operating systems, it probably would make sense to limit the number of libraries that you link against dynamically.

For starters, statically linking OpenSSL would help with a bunch of packages already. This isn't necessarily that easy since you would need a static build of OpenSSL but I think it might be achievable by either building a custom Docker image with a static OpenSSL build (and other static libs in the future) or using some existing work such as:

All of these are musl-based as statically linking with glibc is usually advised against because of NSS issues and licensing.

I'd like to also mention Python's manylinux images as the Python packaging community seems to have solved this problem of supporting mainstream distros with a single build: https://peps.python.org/pep-0600/#core-definition
They approached this problem a bit differently and the specification allows you to dynamically link against glibc as well as any dynamic libraries that are available in ALL mainstream distros with that (or newer) glibc version. Anything else still would need to be statically linked. Considering that you can't really give separate builds for people with different glibc versions like Python does, you would presumably have to choose a single minimum glibc version for which you want to build. Maybe it's not that great of an idea for solving this issue but I wanted to at least mention how the Python packaging community approached it. The upside of this would be that you would still be able to build against glibc and just would have to limit yourself to only dynamically linking glibc and the few libraries that are available in ALL mainstream distros with that minimum glibc version.

@alsuren
Copy link
Collaborator

alsuren commented Jul 5, 2022

Hi. Thanks for reporting this, and thanks for the write-up and suggestions.

Disclaimer: I mostly use macos for day to day things, and only use linux on Raspberry Pi/CI boxes/prod (for which ubuntu-latest seems to work fine). As GitHub Actions switches over to 22.04, this is likely to cause increasingly many problems, so it is good to start thinking about solutions now.

I'd prefer to avoid shipping staticcally linked musl binaries by default, until they improve their malloc implementation. I wonder if it would be possible to create dyncamic builds for x86_64-unknown-linux-gnu and static builds for x86_64-unknown-linux-musl, then run ldd at install time on all installed x86_64-unknown-linux-gnu binaries, to check for missing dependencies, and fetch a statically linked x86_64-unknown-linux-musl fallback only in those cases. I have no idea whether it's reasonable to assume that ldd is available on all linux systems. Do you think that this could be a workable approach?

The manylinux idea is an interesting one. I come from a python background, so I'd come across the concept, but I'd never looked into how it works in any detail. After a quick skim-read, it looks promising. That said, I am agressively optimising for maintainability in cargo-quickinstall (because I really want to keep my headspace free for working on cargo-quickbuild at the moment (cargo-quick/cargo-quick#14)). I feel like if I can avoid introducing non-first-class-rust concepts into cargo-quickinstall, that would probably help keep things simple. If rust gains this concept upstream, or someone comes along with a simple implementation, and is willing to help maintain it when people inevitably uncover bugs, I could be convinced that this is a reasonable approach.

@Jackenmen
Copy link
Author

I'd prefer to avoid shipping statically linked musl binaries by default until they improve their malloc implementation.

That's fair, taking a significant performance hit due to musl isn't ideal. I hear that musl 1.2.1 (from 2020) featured a new malloc implementation but I imagine it's still not great?

I wonder if it would be possible to create dynamic builds for x86_64-unknown-linux-gnu and static builds for x86_64-unknown-linux-musl, then run ldd at install time on all installed x86_64-unknown-linux-gnu binaries, to check for missing dependencies, and fetch a statically linked x86_64-unknown-linux-musl fallback only in those cases.

That does seem kind of complicated and makes it harder for others that support quickinstall I presume like the cargo-binstall project. Is linking glibc dynamically while statically linking other libraries (such as libssl) a no go? It would require compatible glibc version to be present on the system (so the build should be done with old enough glibc, Alma/Rocky Linux 8 is a good candidate, CentOS 7 if you want something with even older glibc) but it wouldn't require other libraries. I'm not familiar with how linking is done on Rust (I'm just a consumer of Rust-written programs, not a developer) so I don't know if this is hard or even achievable.

I have no idea whether it's reasonable to assume that ldd is available on all linux systems.

It's probably reasonable? It might be better to check the ELF headers rather than invoke a subprocess, at least assuming there's some Rust library that aids with that.

I tried running ldd on a bunch of popular distros and it seems to have succeeded on all of them:
https://cirrus-ci.com/build/6030028193398784
image

Do you think that this could be a workable approach?

I personally think that the idea of dynamically linking glibc while statically linking the rest of the libraries is a better approach but I don't know how doable it is. The approach you're proposing seems like it's likely that it will require a fallback, especially if the build would be done with a system that ships with an old enough version of glibc (cause otherwise a lot of OSes will not be able to use glibc-based version no matter what else is done about it) and therefore probably also old versions of other libraries.

@NobodyXu
Copy link
Member

NobodyXu commented Jul 6, 2022

I'd prefer to avoid shipping statically linked musl binaries by default until they improve their malloc implementation.

I think we could use mimalloc here, as it performs very well when used together with musl.

Considering that you can't really give separate builds for people with different glibc versions like Python does

With zig-cc, I think we might be able to do it.

wonder if it would be possible to create dyncamic builds for x86_64-unknown-linux-gnu and static builds for x86_64-unknown-linux-musl, then run ldd at install time on all installed x86_64-unknown-linux-gnu binaries, to check for missing dependencies, and fetch a statically linked x86_64-unknown-linux-musl fallback only in those cases.

@alsuren cargo-binstall is doing exactly this.

I added that part in src/target.rs

@NobodyXu
Copy link
Member

NobodyXu commented Jul 6, 2022

Regarding the openssl problem, many binary crates might already support having it vendored in.

openssl already supports "vendored" feature, so we just need to turn that one on.

Another alternative would be to use rustls, which many already support (crates_io_api, reqwest, hyper, etc).
That will make things a lot easier.

The con is that we have to manually enable these features, which takes some effort, but perhaps we can maintain a per-crate feature flags to be enabled?

I'm not familiar with the internal design of this repository, so I don't know how feasible my idea is.

@NobodyXu
Copy link
Member

NobodyXu commented Jul 6, 2022

wonder if it would be possible to create dyncamic builds for x86_64-unknown-linux-gnu and static builds for x86_64-unknown-linux-musl, then run ldd at install time on all installed x86_64-unknown-linux-gnu binaries, to check for missing dependencies, and fetch a statically linked x86_64-unknown-linux-musl fallback only in those cases.

@alsuren Actually, it will be awesome if we can unify our effort.

cargo-binstall already does that and it also supports writing the installed package to metadata so that we can implement updating in the future.

We also plan on implementing batch installation, enabling cargo-binstall to install multiple packages at once efficiently.

We can introduce a quick-install only mode so that cargo-binstall can act as a quickinstall client and only pull from quickinstall so that we don't have to implement the same functionalities twice.

@alsuren
Copy link
Collaborator

alsuren commented Jul 6, 2022

We can introduce a quick-install only mode so that cargo-binstall can act as a quickinstall client and only pull from quickinstall so that we don't have to implement the same functionalities twice.

In my readme, I recommend using binstall for desktop use, but I currently still use quickinstall locally, because I prefer to avoid interactive prompts, and I don't like how the versioned binaries pollute my tab completion. I'm on my phone, so I've not looked, but I'm guessing these can all be turned off with feature flags.

The thing that is most important to me is that cargo install cargo-quickinstall && cargo quickinstall $crate is lightning fast. This is why I would choose to shell out to ldd rather than import a crate for elf parsing. One way to keep this speed while getting the sophistication of binstall would be for quickinstall to bootstrap itself on first run, by downloading a precompiled binstall, and then shelling out to binstall each time after that.

I would even be fine with quickinstall preferring upstream releases in this case, as long as I could pass a flash to ensure that there is no interactivity required when falling back to quickinstall's packages (probably already possible).

If we wanted to use this approach to close this issue, what other pieces of the puzzle so we need?

  1. Make sure the precompiled binstall tarball we use is statically linked (this is allowed to be as slow as you like, because it's still orders of magnitude faster than compiling from scratch)
  2. Start compiling some statically linked binaries for binstall to fall back on when the dynamically linked ones turn out to be trash.

@NobodyXu
Copy link
Member

NobodyXu commented Jul 6, 2022

because I prefer to avoid interactive prompts,

That can be done using --no-confirm, which disables the prompts.

and I don't like how the versioned binaries pollute my tab completion.

Hmm, that is indeed a problem.

We can add a flag to disable this.

This is why I would choose to shell out to ldd rather than import a crate for elf parsing.

We also try to use rustc to detect the target by default.

On failure, we switch to using guess_host_triple, which uses the uname syscall to guess the host target.

And then if we are running on Linux, we would use ldd to decide the glibc/musl.

One way to keep this speed while getting the sophistication of binstall would be for quickinstall to bootstrap itself on first run, by downloading a precompiled binstall, and then shelling out to binstall each time after that.

cargo-binstall already provides prebuilt artifacts for x86-64, armv7 and arm64 on Linux, MacOS x86-64 and M1, x86-64 windows.

Note that x86-64 supports both musl and glibc, but due to some issues in cross, armv7 and arm64 currently only provides musl build.
But once cross-rs/cross#860 is done, we will re-enable the glibc build for armv7 and arm64.

Make sure the precompiled binstall tarball we use is statically linked (this is allowed to be as slow as you like, because it's still orders of magnitude faster than compiling from scratch)

And yes!

Our musl builds are completely statically linked.
Our glibc builds only depend on glibc and cargo-binstall always uses rustls instead of openssl, so no openssl linking hell.

@NobodyXu
Copy link
Member

NobodyXu commented Jul 6, 2022

@alsuren Checkout the installation section of cargo-binstall README.md, it provides prebuilt artifacts for almost all targets you will need and the instructions for linux is to download the musl build, which is statically linked.

@alsuren
Copy link
Collaborator

alsuren commented Jul 6, 2022

I'm just thinking that if we implemented 2. (which I guess we want anyway?) then we would be able to use our own statically linked build of binstall, without needing to add any new code to parse your releases page (or unpack zip files rather than tarballs). This would also allow me to easily say "I need binstall that is at least this new" and automatically update it as I depend on newer features later. (I would probably want to keep a mode that doesn't depend on binstall, and keep my --dry-run mode, because they are still useful for quickly bootstrapping throw-away CI boxes)

If this sounds like a reasonable approach, I think that work can start on building static musl packages straight away, and binstall can work on using them on parallel with us working on making quickinstall bootstrap itself via binstall. @jack1142 this is pure CI hackery in bash/yaml with no rust code. Do you fancy having a go? If so, I can try writing up what the broad-strokes approach would be, later this week.

@NobodyXu: crazy thought: does binstall have a mode where it can add itself to metadata (crates2.json or whatever it's called) so that it can be upgraded with cargo install later? If so, that would be pretty sweet, and I could just unpack the tarball and tell it to do its thing. This would allow me to close at least once other ticket in the process.

@NobodyXu
Copy link
Member

NobodyXu commented Jul 6, 2022

crazy thought: does binstall have a mode where it can add itself to metadata (crates2.json or whatever it's called) so that it can be upgraded with cargo install later? If so, that would be pretty sweet, and I could just unpack the tarball and tell it to do its thing. This would allow me to close at least once other ticket in the process.

It is possible for binstall to install itself, so yes, that can be done.

And we are planning on a new command to support updating these crates via cargo-binstall cargo-bins/cargo-binstall#176

@NobodyXu
Copy link
Member

NobodyXu commented Jul 6, 2022

I'm just thinking that if we implemented 2. (which I guess we want anyway?) then we would be able to use our own statically linked build of binstall, without needing to add any new code to parse your releases page (or unpack zip files rather than tarballs). This would also allow me to easily say "I need binstall that is at least this new" and automatically update it as I depend on newer features later. (I would probably want to keep a mode that doesn't depend on binstall, and keep my --dry-run mode, because they are still useful for quickly bootstrapping throw-away CI boxes)

Looks good to me.

@alsuren alsuren added the good first issue Good for newcomers label Jul 10, 2022
@NobodyXu
Copy link
Member

P.S. The code used for detecting targets at runtime in cargo-binstall will soon be available as a library cargo-bins/cargo-binstall#307

@NobodyXu
Copy link
Member

NobodyXu commented Sep 5, 2022

then run ldd at install time on all installed x86_64-unknown-linux-gnu binaries

@alsuren P.S. detect-targets 0.1.2 has released which does exactly this on Linu to detect presence of glibc and musl libc.

@SichangHe
Copy link

I believe this problem I had with cargo-update is related.

@NobodyXu
Copy link
Member

I think we can try to list all features of the binary crate, then find all features started with vender and enable them.

That might fix the problem.

@alsuren
Copy link
Collaborator

alsuren commented Dec 18, 2022

I think we can try to list all features of the binary crate, then find all features started with vender and enable them.

I resisted this idea for ages, to limit the scope of cargo-quickinstall (and to avoid having to re-build my archive of packages with whatever new scheme I pick). Maybe it is worth doing though, because it's our top issue.

Interestingly, cargo install will let us specify features of transitive dependencies, like this:

$ cargo install cargo-update --features=openssl-sys/vendored

If you name a crate that isn't in the dependency tree, you get an error back almost immediately.

$ cargo install bat --features=openssl-sys/vendored
    Updating crates.io index
  Installing bat v0.22.1
error: failed to compile `bat v0.22.1`, intermediate artifacts can be found at `/var/folders/0b/_8p88w416s15l8f02rn0m9hr0000gn/T/cargo-installeif3yN`

Caused by:
  package `bat v0.22.1` does not have a dependency named `openssl-sys`

If all of our bug reports are currently about openssl then we could just try compiling with --features=openssl-sys/vendored enabled and then fall back to installing without that flag when it fails.

In theory, if we end up with a bunch of packages that are similar to openssl then we could tee stderr to a file and look for the

```does not have a dependency named `aaaa````

error message, to remove that package from the list.

We can deal with that later though.

(Annoyingly, the openssl-sys?/vendored syntax from https://doc.rust-lang.org/cargo/reference/features.html#dependency-features still fails if openssl-sys is not a dep. Maybe we could propose fixing that upstream?)

@NobodyXu
Copy link
Member

Interestingly, cargo install will let us specify features of transitive dependencies, like this:

That's something I didn't know and that would makes fixing the problem much easier, especially when some crates might not have these vendored* features.

(Annoyingly, the openssl-sys?/vendored syntax from https://doc.rust-lang.org/cargo/reference/features.html#dependency-features still fails if openssl-sys is not a dep. Maybe we could propose fixing that upstream?)

That's a good idea!

@NobodyXu
Copy link
Member

NobodyXu commented Dec 20, 2022

I just realize that many *-sys crates supports changing linking behavior based on env.

For example, we can use OPENSSL_STATIC to force openssl-sys to link statically.

libgit2-sys uses CARGO_FEATURE_VENDORED to detect feature vendered, I don't know whether we can explicitly set that environment to force vendored. so we would have to enable that feature.

Using env would make this much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants