add LTO support #7856

andrewrk · 2021-01-23T07:24:33Z

The CLI gains -flto and -fno-lto options to override the default.
However, the cool thing about this is that the defaults are great! In
general when you use build-exe in release mode, Zig will enable LTO if
it would work and it would help.

zig cc supports detecting and honoring the -flto and -fno-lto flags as
well. The linkWithLld functions are improved to all be the same with
regards to copying the artifact instead of trying to pass single objects
through LLD with -r. There is possibly a future improvement here as
well; see the respective TODOs.

stage1 is updated to support outputting LLVM bitcode instead of machine
code when lto is enabled. This allows LLVM to optimize across the Zig and
C/C++ code boundary.

closes #2845

Here's an example. It's the example from the Clang LTO documentation, except I've replaced main.c with main.zig:

main.zig

const std = @import("std");

export fn foo4() void {
    _ = std.c.printf("Hi\n");
}

extern fn foo1() c_int;

pub fn main() u8 {
    return @intCast(u8, foo1());
}

a.c

int foo1(void);
void foo2(void);
void foo4(void);

static signed int i = 0;

void foo2(void) {
  i = -1;
}

static int foo3() {
  foo4();
  return 10;
}

int foo1(void) {
  int data = 0;

  if (i < 0)
    data = foo3();

  data = data + 42;
  return data;
}

$ ./zig build-exe main.zig a.c -OReleaseFast -lc 
$ ./main
$ echo $?
42
$ objdump -d main -Mintel | grep -A7 '<main'
0000000000201530 <main>:
  201530:	48 c7 c0 ff ff ff ff 	mov    rax,0xffffffffffffffff
  201537:	66 0f 1f 84 00 00 00 	nop    WORD PTR [rax+rax*1+0x0]
  20153e:	00 00 
  201540:	48 83 7c c2 08 00    	cmp    QWORD PTR [rdx+rax*8+0x8],0x0
  201546:	48 8d 40 01          	lea    rax,[rax+0x1]
  20154a:	75 f4                	jne    201540 <main+0x10>
  20154c:	48 63 cf             	movsxd rcx,edi
  20154f:	48 89 35 7a 22 00 00 	mov    QWORD PTR [rip+0x227a],rsi        # 2037d0 <argv>
  201556:	48 89 0d 7b 22 00 00 	mov    QWORD PTR [rip+0x227b],rcx        # 2037d8 <argv+0x8>
  20155d:	48 89 15 7c 22 00 00 	mov    QWORD PTR [rip+0x227c],rdx        # 2037e0 <environ.0>
  201564:	48 89 05 7d 22 00 00 	mov    QWORD PTR [rip+0x227d],rax        # 2037e8 <environ.0+0x8>
  20156b:	b8 2a 00 00 00       	mov    eax,0x2a
  201570:	c3                   	ret

The interesting thing to note here is that there was no LTO explicitly opted into. It happened automatically. And you can see here that in the main function, there is no call to foo1 and there is no exported foo4. If we didn't have LTO, the call to foo1 could not have been inlined. For example, here's what happens if we force-disable LTO:

$ ./zig build-exe main.zig a.c -OReleaseFast -lc -fno-lto
$ objdump -d main -Mintel | grep -A7 '<main'
00000000002015d0 <main>:
  2015d0:	50                   	push   rax
  2015d1:	48 c7 c0 ff ff ff ff 	mov    rax,0xffffffffffffffff
  2015d8:	0f 1f 84 00 00 00 00 	nop    DWORD PTR [rax+rax*1+0x0]
  2015df:	00 
  2015e0:	48 83 7c c2 08 00    	cmp    QWORD PTR [rdx+rax*8+0x8],0x0
  2015e6:	48 8d 40 01          	lea    rax,[rax+0x1]
  2015ea:	75 f4                	jne    2015e0 <main+0x10>
  2015ec:	48 63 cf             	movsxd rcx,edi
  2015ef:	48 89 35 fa 22 00 00 	mov    QWORD PTR [rip+0x22fa],rsi        # 2038f0 <argv>
  2015f6:	48 89 0d fb 22 00 00 	mov    QWORD PTR [rip+0x22fb],rcx        # 2038f8 <argv+0x8>
  2015fd:	48 89 15 fc 22 00 00 	mov    QWORD PTR [rip+0x22fc],rdx        # 203900 <environ.0>
  201604:	48 89 05 fd 22 00 00 	mov    QWORD PTR [rip+0x22fd],rax        # 203908 <environ.0+0x8>
  20160b:	e8 90 ff ff ff       	call   2015a0 <foo1>
  201610:	0f b6 c0             	movzx  eax,al

Now you can see we are forced to call foo and return its result.

LemonBoy · 2021-01-23T08:02:30Z

Nice, but making it opt-out at high optimization levels is not so nice. LTO is slower and more resource-intensive than regular linking, a new user would have to dive into the docs to know LTO is on my default and how to turn it off (and, last but not least, know what LTO is and it's tradeoffs).

kubkon · 2021-01-23T11:21:38Z

Nice, but making it opt-out at high optimization levels is not so nice. LTO is slower and more resource-intensive than regular linking, a new user would have to dive into the docs to know LTO is on my default and how to turn it off (and, last but not least, know what LTO is and it's tradeoffs).

This may be true but LTO is only enabled by default in Release mode, where you'd expect longer compile/link times, would you not? If this is the case, I think having LTO by default in Release mode is nice to have. Having said that however, as a counterexample, Rust disables LTO by default even in Release mode (link) so clearly the defaults for LTO are debatable.

LemonBoy · 2021-01-23T14:07:56Z

would you not?

No.

Mouvedia · 2021-01-23T14:17:42Z

Is PrepareForLTO relevant for this PR?

zig/src/zig_llvm.cpp

Lines 238 to 240 in 90f175b

    
           PMBuilder->PrepareForLTO = false; 
        
           PMBuilder->PrepareForThinLTO = false; 
        
           PMBuilder->PerformThinLTO = false;

rohlem · 2021-01-23T16:22:47Z

[I]n Release mode [you would] expect longer compile/link times, would you not?

No.

I too would have been under that impression. Could you elaborate where ReleaseSafe and ReleaseFast should draw the line of "too expensive"? Anything that takes longer than Debug (including incremental compilation), or maybe twice that time?
Would we still want a "hard boiled" build mode that defaults to going the extra step regardless of build time?

I'm not sure "universally sensible" defaults exist, or at least I'd be happy to hear opinions on this. Though I guess if it's too complex, that discussion merits its own issue somewhere.

LemonBoy · 2021-01-23T16:39:23Z

The line is between codegen and linking, LTO doesn't impact the former.

andrewrk · 2021-01-23T19:16:27Z

I think I can make a reasonable defense of keeping it opt out for release builds. First of all it has always been the case that the trade-off for release builds is to wait longer for compilation in exchange for better runtime performance, even by a few orders of magnitude. Whether this is to blame on codegen optimization or linking is beside the point.

The point about new users is important to consider, however consider further that for new users who start a small, new project, LTO in release modes will in fact be the correct choice for them. Only when their project grows quite large will it start to become worthwhile to consider disabling it, and that would happen after the user spend a long time working on the project, by that time graduating from new user to experienced user, having had plenty of time to learn about the option to disable LTO.

New users who join an existing project will use the already existing build script so they won't have to think about LTO at all.

In all cases, new users don't have to think about this feature.

I also want to point out that we already have this equivalent problem in zig code currently because we don't split large zig projects up into multiple compilation units. So probably we will eventually need some kind of feature that is resource-aware and will split up a release build into an arbitrary number of compilation units in order to satisfy a memory resource budget or provide better caching. This same feature clearly could be used to limit LTO in order to satisfy the same constraints.

Thus I argue the defaults set forth in this PR are the best way forward.

saethlin · 2021-01-23T22:24:23Z

I'm not a Zig user, but I have a fair bit of bad experiences with LLVM's LTO from my experience in Rust, so I'm here advising caution.

I've seen perf regressions from turning on LTO between 10% on large codebases and 5x on microbenchmarks. The biggest failure mode seems to be around not emitting a memcpy when LTO is on. When LTO causes a regression, the inlining decisions seem the same as far as I can tell but other things fall over.

I've also seen huge problems with old versions of perf on Rust LTO binaries. Specifically, the version CentOS 7 ships and when using perf record --call-graph dwarf. I've rarely seen a call graph come out of that that makes sense when it does run. Often that version of perf will segfault or crash with odd error messages. The situation is significantly better in recent builds of perf, but the call stacks are still sometimes obviously wrong.

Is there large-scale testing that can be done to see if Zig programs will suffer from these problems?

The CLI gains -flto and -fno-lto options to override the default. However, the cool thing about this is that the defaults are great! In general when you use build-exe in release mode, Zig will enable LTO if it would work and it would help. zig cc supports detecting and honoring the -flto and -fno-lto flags as well. The linkWithLld functions are improved to all be the same with regards to copying the artifact instead of trying to pass single objects through LLD with -r. There is possibly a future improvement here as well; see the respective TODOs. stage1 is updated to support outputting LLVM bitcode instead of machine code when lto is enabled. This allows LLVM to optimize across the Zig and C/C++ code boundary. closes #2845

komuw · 2021-01-24T10:18:16Z

Nice, but making it opt-out at high optimization levels is not so nice. LTO is slower and more resource-intensive than regular linking, a new user would have to dive into the docs to know LTO is on my default and how to turn it off (and, last but not least, know what LTO is and it's tradeoffs).

This may be true but LTO is only enabled by default in Release mode, where you'd expect longer compile/link times, would you not? If this is the case, I think having LTO by default in Release mode is nice to have. Having said that however, as a counterexample, Rust disables LTO by default even in Release mode (link) so clearly the defaults for LTO are debatable.

Looking at the Rust source code, it seems like lto = false does not mean disable LTO all-together; instead it means enable thin LTO. Whereas lto = true means enable fat LTO: https://github.com/rust-lang/cargo/blob/b52fc0a8270d671e33dafe8dec355624727e8534/src/cargo/core/profiles.rs#L766-L773
So Rust has thin LTO enabled by default in release mode.

saethlin · 2021-01-24T18:05:22Z

So Rust has thin LTO enabled by default in release mode.

See the actual usage of that type: https://github.com/rust-lang/cargo/blob/b52fc0a8270d671e33dafe8dec355624727e8534/src/cargo/core/compiler/lto.rs#L48

Rust has had a lot of debate about the default release settings, and it's entirely unclear how much of that is relevant to Zig: rust-lang/rust#57968 (comment)

andrewrk · 2021-01-24T19:09:28Z

Thanks @saethlin - those are two things I'll be on the look out for. I'm going to make the call here to proceed with this at least for now and see how it goes. Of course if we run into trouble, this decision can be re-evaluated. But I think this is a worthwhile idea to try out at this point.

andrewrk added 2 commits January 22, 2021 23:35

stage2: add missing frexpl.c to mingw c source file list

3647784

stage2: wasm arch does not support -mred-zone flags

ab4f3ae

andrewrk force-pushed the lto branch from 90f175b to 0d4b6ac Compare January 24, 2021 19:09

andrewrk merged commit 15278b7 into master Jan 24, 2021

andrewrk deleted the lto branch January 24, 2021 19:09

mikdusan added the release notes This PR should be mentioned in the release notes. label Jan 26, 2021

andrewrk added this to the 0.9.0 milestone Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add LTO support #7856

add LTO support #7856

andrewrk commented Jan 23, 2021 •

edited

Loading

LemonBoy commented Jan 23, 2021

kubkon commented Jan 23, 2021

LemonBoy commented Jan 23, 2021

Mouvedia commented Jan 23, 2021 •

edited

Loading

rohlem commented Jan 23, 2021 •

edited

Loading

LemonBoy commented Jan 23, 2021

andrewrk commented Jan 23, 2021

saethlin commented Jan 23, 2021 •

edited

Loading

komuw commented Jan 24, 2021

saethlin commented Jan 24, 2021

andrewrk commented Jan 24, 2021

add LTO support #7856

add LTO support #7856

Conversation

andrewrk commented Jan 23, 2021 • edited Loading

main.zig

a.c

LemonBoy commented Jan 23, 2021

kubkon commented Jan 23, 2021

LemonBoy commented Jan 23, 2021

Mouvedia commented Jan 23, 2021 • edited Loading

rohlem commented Jan 23, 2021 • edited Loading

LemonBoy commented Jan 23, 2021

andrewrk commented Jan 23, 2021

saethlin commented Jan 23, 2021 • edited Loading

komuw commented Jan 24, 2021

saethlin commented Jan 24, 2021

andrewrk commented Jan 24, 2021

andrewrk commented Jan 23, 2021 •

edited

Loading

Mouvedia commented Jan 23, 2021 •

edited

Loading

rohlem commented Jan 23, 2021 •

edited

Loading

saethlin commented Jan 23, 2021 •

edited

Loading