-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
add LTO support #7856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add LTO support #7856
Conversation
Nice, but making it opt-out at high optimization levels is not so nice. LTO is slower and more resource-intensive than regular linking, a new user would have to dive into the docs to know LTO is on my default and how to turn it off (and, last but not least, know what LTO is and it's tradeoffs). |
This may be true but LTO is only enabled by default in Release mode, where you'd expect longer compile/link times, would you not? If this is the case, I think having LTO by default in Release mode is nice to have. Having said that however, as a counterexample, Rust disables LTO by default even in Release mode (link) so clearly the defaults for LTO are debatable. |
No. |
Is Lines 238 to 240 in 90f175b
|
I too would have been under that impression. Could you elaborate where I'm not sure "universally sensible" defaults exist, or at least I'd be happy to hear opinions on this. Though I guess if it's too complex, that discussion merits its own issue somewhere. |
The line is between codegen and linking, LTO doesn't impact the former. |
I think I can make a reasonable defense of keeping it opt out for release builds. First of all it has always been the case that the trade-off for release builds is to wait longer for compilation in exchange for better runtime performance, even by a few orders of magnitude. Whether this is to blame on codegen optimization or linking is beside the point. The point about new users is important to consider, however consider further that for new users who start a small, new project, LTO in release modes will in fact be the correct choice for them. Only when their project grows quite large will it start to become worthwhile to consider disabling it, and that would happen after the user spend a long time working on the project, by that time graduating from new user to experienced user, having had plenty of time to learn about the option to disable LTO. New users who join an existing project will use the already existing build script so they won't have to think about LTO at all. In all cases, new users don't have to think about this feature. I also want to point out that we already have this equivalent problem in zig code currently because we don't split large zig projects up into multiple compilation units. So probably we will eventually need some kind of feature that is resource-aware and will split up a release build into an arbitrary number of compilation units in order to satisfy a memory resource budget or provide better caching. This same feature clearly could be used to limit LTO in order to satisfy the same constraints. Thus I argue the defaults set forth in this PR are the best way forward. |
I'm not a Zig user, but I have a fair bit of bad experiences with LLVM's LTO from my experience in Rust, so I'm here advising caution. I've seen perf regressions from turning on LTO between 10% on large codebases and 5x on microbenchmarks. The biggest failure mode seems to be around not emitting a memcpy when LTO is on. When LTO causes a regression, the inlining decisions seem the same as far as I can tell but other things fall over. I've also seen huge problems with old versions of perf on Rust LTO binaries. Specifically, the version CentOS 7 ships and when using Is there large-scale testing that can be done to see if Zig programs will suffer from these problems? |
The CLI gains -flto and -fno-lto options to override the default. However, the cool thing about this is that the defaults are great! In general when you use build-exe in release mode, Zig will enable LTO if it would work and it would help. zig cc supports detecting and honoring the -flto and -fno-lto flags as well. The linkWithLld functions are improved to all be the same with regards to copying the artifact instead of trying to pass single objects through LLD with -r. There is possibly a future improvement here as well; see the respective TODOs. stage1 is updated to support outputting LLVM bitcode instead of machine code when lto is enabled. This allows LLVM to optimize across the Zig and C/C++ code boundary. closes #2845
Looking at the Rust source code, it seems like |
See the actual usage of that type: https://github.com/rust-lang/cargo/blob/b52fc0a8270d671e33dafe8dec355624727e8534/src/cargo/core/compiler/lto.rs#L48 Rust has had a lot of debate about the default release settings, and it's entirely unclear how much of that is relevant to Zig: rust-lang/rust#57968 (comment) |
Thanks @saethlin - those are two things I'll be on the look out for. I'm going to make the call here to proceed with this at least for now and see how it goes. Of course if we run into trouble, this decision can be re-evaluated. But I think this is a worthwhile idea to try out at this point. |
The CLI gains -flto and -fno-lto options to override the default.
However, the cool thing about this is that the defaults are great! In
general when you use build-exe in release mode, Zig will enable LTO if
it would work and it would help.
zig cc supports detecting and honoring the -flto and -fno-lto flags as
well. The linkWithLld functions are improved to all be the same with
regards to copying the artifact instead of trying to pass single objects
through LLD with -r. There is possibly a future improvement here as
well; see the respective TODOs.
stage1 is updated to support outputting LLVM bitcode instead of machine
code when lto is enabled. This allows LLVM to optimize across the Zig and
C/C++ code boundary.
closes #2845
Here's an example. It's the example from the Clang LTO documentation, except I've replaced main.c with main.zig:
main.zig
a.c
The interesting thing to note here is that there was no LTO explicitly opted into. It happened automatically. And you can see here that in the main function, there is no call to
foo1
and there is no exportedfoo4
. If we didn't have LTO, the call tofoo1
could not have been inlined. For example, here's what happens if we force-disable LTO:Now you can see we are forced to call
foo
and return its result.