Skip to content

add LTO support #7856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 24, 2021
Merged

add LTO support #7856

merged 3 commits into from
Jan 24, 2021

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Jan 23, 2021

The CLI gains -flto and -fno-lto options to override the default.
However, the cool thing about this is that the defaults are great! In
general when you use build-exe in release mode, Zig will enable LTO if
it would work and it would help.

zig cc supports detecting and honoring the -flto and -fno-lto flags as
well. The linkWithLld functions are improved to all be the same with
regards to copying the artifact instead of trying to pass single objects
through LLD with -r. There is possibly a future improvement here as
well; see the respective TODOs.

stage1 is updated to support outputting LLVM bitcode instead of machine
code when lto is enabled. This allows LLVM to optimize across the Zig and
C/C++ code boundary.

closes #2845


Here's an example. It's the example from the Clang LTO documentation, except I've replaced main.c with main.zig:

main.zig

const std = @import("std");

export fn foo4() void {
    _ = std.c.printf("Hi\n");
}

extern fn foo1() c_int;

pub fn main() u8 {
    return @intCast(u8, foo1());
}

a.c

int foo1(void);
void foo2(void);
void foo4(void);

static signed int i = 0;

void foo2(void) {
  i = -1;
}

static int foo3() {
  foo4();
  return 10;
}

int foo1(void) {
  int data = 0;

  if (i < 0)
    data = foo3();

  data = data + 42;
  return data;
}
$ ./zig build-exe main.zig a.c -OReleaseFast -lc 
$ ./main
$ echo $?
42
$ objdump -d main -Mintel | grep -A7 '<main'
0000000000201530 <main>:
  201530:	48 c7 c0 ff ff ff ff 	mov    rax,0xffffffffffffffff
  201537:	66 0f 1f 84 00 00 00 	nop    WORD PTR [rax+rax*1+0x0]
  20153e:	00 00 
  201540:	48 83 7c c2 08 00    	cmp    QWORD PTR [rdx+rax*8+0x8],0x0
  201546:	48 8d 40 01          	lea    rax,[rax+0x1]
  20154a:	75 f4                	jne    201540 <main+0x10>
  20154c:	48 63 cf             	movsxd rcx,edi
  20154f:	48 89 35 7a 22 00 00 	mov    QWORD PTR [rip+0x227a],rsi        # 2037d0 <argv>
  201556:	48 89 0d 7b 22 00 00 	mov    QWORD PTR [rip+0x227b],rcx        # 2037d8 <argv+0x8>
  20155d:	48 89 15 7c 22 00 00 	mov    QWORD PTR [rip+0x227c],rdx        # 2037e0 <environ.0>
  201564:	48 89 05 7d 22 00 00 	mov    QWORD PTR [rip+0x227d],rax        # 2037e8 <environ.0+0x8>
  20156b:	b8 2a 00 00 00       	mov    eax,0x2a
  201570:	c3                   	ret    

The interesting thing to note here is that there was no LTO explicitly opted into. It happened automatically. And you can see here that in the main function, there is no call to foo1 and there is no exported foo4. If we didn't have LTO, the call to foo1 could not have been inlined. For example, here's what happens if we force-disable LTO:

$ ./zig build-exe main.zig a.c -OReleaseFast -lc -fno-lto
$ objdump -d main -Mintel | grep -A7 '<main'
00000000002015d0 <main>:
  2015d0:	50                   	push   rax
  2015d1:	48 c7 c0 ff ff ff ff 	mov    rax,0xffffffffffffffff
  2015d8:	0f 1f 84 00 00 00 00 	nop    DWORD PTR [rax+rax*1+0x0]
  2015df:	00 
  2015e0:	48 83 7c c2 08 00    	cmp    QWORD PTR [rdx+rax*8+0x8],0x0
  2015e6:	48 8d 40 01          	lea    rax,[rax+0x1]
  2015ea:	75 f4                	jne    2015e0 <main+0x10>
  2015ec:	48 63 cf             	movsxd rcx,edi
  2015ef:	48 89 35 fa 22 00 00 	mov    QWORD PTR [rip+0x22fa],rsi        # 2038f0 <argv>
  2015f6:	48 89 0d fb 22 00 00 	mov    QWORD PTR [rip+0x22fb],rcx        # 2038f8 <argv+0x8>
  2015fd:	48 89 15 fc 22 00 00 	mov    QWORD PTR [rip+0x22fc],rdx        # 203900 <environ.0>
  201604:	48 89 05 fd 22 00 00 	mov    QWORD PTR [rip+0x22fd],rax        # 203908 <environ.0+0x8>
  20160b:	e8 90 ff ff ff       	call   2015a0 <foo1>
  201610:	0f b6 c0             	movzx  eax,al

Now you can see we are forced to call foo and return its result.

@LemonBoy
Copy link
Contributor

Nice, but making it opt-out at high optimization levels is not so nice. LTO is slower and more resource-intensive than regular linking, a new user would have to dive into the docs to know LTO is on my default and how to turn it off (and, last but not least, know what LTO is and it's tradeoffs).

@kubkon
Copy link
Member

kubkon commented Jan 23, 2021

Nice, but making it opt-out at high optimization levels is not so nice. LTO is slower and more resource-intensive than regular linking, a new user would have to dive into the docs to know LTO is on my default and how to turn it off (and, last but not least, know what LTO is and it's tradeoffs).

This may be true but LTO is only enabled by default in Release mode, where you'd expect longer compile/link times, would you not? If this is the case, I think having LTO by default in Release mode is nice to have. Having said that however, as a counterexample, Rust disables LTO by default even in Release mode (link) so clearly the defaults for LTO are debatable.

@LemonBoy
Copy link
Contributor

would you not?

No.

@Mouvedia
Copy link

Mouvedia commented Jan 23, 2021

Is PrepareForLTO relevant for this PR?

zig/src/zig_llvm.cpp

Lines 238 to 240 in 90f175b

PMBuilder->PrepareForLTO = false;
PMBuilder->PrepareForThinLTO = false;
PMBuilder->PerformThinLTO = false;

@rohlem
Copy link
Contributor

rohlem commented Jan 23, 2021

[I]n Release mode [you would] expect longer compile/link times, would you not?

No.

I too would have been under that impression. Could you elaborate where ReleaseSafe and ReleaseFast should draw the line of "too expensive"? Anything that takes longer than Debug (including incremental compilation), or maybe twice that time?
Would we still want a "hard boiled" build mode that defaults to going the extra step regardless of build time?

I'm not sure "universally sensible" defaults exist, or at least I'd be happy to hear opinions on this. Though I guess if it's too complex, that discussion merits its own issue somewhere.

@LemonBoy
Copy link
Contributor

The line is between codegen and linking, LTO doesn't impact the former.

@andrewrk
Copy link
Member Author

I think I can make a reasonable defense of keeping it opt out for release builds. First of all it has always been the case that the trade-off for release builds is to wait longer for compilation in exchange for better runtime performance, even by a few orders of magnitude. Whether this is to blame on codegen optimization or linking is beside the point.

The point about new users is important to consider, however consider further that for new users who start a small, new project, LTO in release modes will in fact be the correct choice for them. Only when their project grows quite large will it start to become worthwhile to consider disabling it, and that would happen after the user spend a long time working on the project, by that time graduating from new user to experienced user, having had plenty of time to learn about the option to disable LTO.

New users who join an existing project will use the already existing build script so they won't have to think about LTO at all.

In all cases, new users don't have to think about this feature.

I also want to point out that we already have this equivalent problem in zig code currently because we don't split large zig projects up into multiple compilation units. So probably we will eventually need some kind of feature that is resource-aware and will split up a release build into an arbitrary number of compilation units in order to satisfy a memory resource budget or provide better caching. This same feature clearly could be used to limit LTO in order to satisfy the same constraints.

Thus I argue the defaults set forth in this PR are the best way forward.

@saethlin
Copy link

saethlin commented Jan 23, 2021

I'm not a Zig user, but I have a fair bit of bad experiences with LLVM's LTO from my experience in Rust, so I'm here advising caution.

I've seen perf regressions from turning on LTO between 10% on large codebases and 5x on microbenchmarks. The biggest failure mode seems to be around not emitting a memcpy when LTO is on. When LTO causes a regression, the inlining decisions seem the same as far as I can tell but other things fall over.

I've also seen huge problems with old versions of perf on Rust LTO binaries. Specifically, the version CentOS 7 ships and when using perf record --call-graph dwarf. I've rarely seen a call graph come out of that that makes sense when it does run. Often that version of perf will segfault or crash with odd error messages. The situation is significantly better in recent builds of perf, but the call stacks are still sometimes obviously wrong.

Is there large-scale testing that can be done to see if Zig programs will suffer from these problems?

The CLI gains -flto and -fno-lto options to override the default.
However, the cool thing about this is that the defaults are great! In
general when you use build-exe in release mode, Zig will enable LTO if
it would work and it would help.

zig cc supports detecting and honoring the -flto and -fno-lto flags as
well. The linkWithLld functions are improved to all be the same with
regards to copying the artifact instead of trying to pass single objects
through LLD with -r. There is possibly a future improvement here as
well; see the respective TODOs.

stage1 is updated to support outputting LLVM bitcode instead of machine
code when lto is enabled. This allows LLVM to optimize across the Zig and
C/C++ code boundary.

closes #2845
@komuw
Copy link

komuw commented Jan 24, 2021

Nice, but making it opt-out at high optimization levels is not so nice. LTO is slower and more resource-intensive than regular linking, a new user would have to dive into the docs to know LTO is on my default and how to turn it off (and, last but not least, know what LTO is and it's tradeoffs).

This may be true but LTO is only enabled by default in Release mode, where you'd expect longer compile/link times, would you not? If this is the case, I think having LTO by default in Release mode is nice to have. Having said that however, as a counterexample, Rust disables LTO by default even in Release mode (link) so clearly the defaults for LTO are debatable.

Looking at the Rust source code, it seems like lto = false does not mean disable LTO all-together; instead it means enable thin LTO. Whereas lto = true means enable fat LTO: https://github.com/rust-lang/cargo/blob/b52fc0a8270d671e33dafe8dec355624727e8534/src/cargo/core/profiles.rs#L766-L773
So Rust has thin LTO enabled by default in release mode.

@saethlin
Copy link

So Rust has thin LTO enabled by default in release mode.

See the actual usage of that type: https://github.com/rust-lang/cargo/blob/b52fc0a8270d671e33dafe8dec355624727e8534/src/cargo/core/compiler/lto.rs#L48

Rust has had a lot of debate about the default release settings, and it's entirely unclear how much of that is relevant to Zig: rust-lang/rust#57968 (comment)

@andrewrk
Copy link
Member Author

Thanks @saethlin - those are two things I'll be on the look out for. I'm going to make the call here to proceed with this at least for now and see how it goes. Of course if we run into trouble, this decision can be re-evaluated. But I think this is a worthwhile idea to try out at this point.

@andrewrk andrewrk merged commit 15278b7 into master Jan 24, 2021
@andrewrk andrewrk deleted the lto branch January 24, 2021 19:09
@mikdusan mikdusan added the release notes This PR should be mentioned in the release notes. label Jan 26, 2021
@andrewrk andrewrk added this to the 0.9.0 milestone Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LTO across zig/C boundary
8 participants