Investigate use of precompiled headers #5032

asl · 2024-11-25T07:56:00Z

In case the compile-time impact is high, how does this change compare to enabling LTO in the build? What I remember from Tofino days, we had rather significant speedup by enabling LTO, presumably exactly because it allows inlining more functions. The advantage is that LTO can be enabled only for release builds so normal development build speeds are not affected.

On a slight tangent, there was an idea of speeding up compilation by using PCH (precompiled header in GCC/clang) for ir-generated.h. As far as I know, it was never tried and it was not completely clear if it was doable, but maybe it is worth investigating for the compilation speed (I don't think C++ modules will "save" us in in any reasonably close future giving their current state and more importantly the tool support state and requirements for P4C build).

Originally posted by @vlstill in #5030 (comment)

The text was updated successfully, but these errors were encountered:

asl · 2024-11-25T07:58:51Z

@vlstill @fruffy @ChrisDodd
So, I tried to create PCH for IR headers (excluding ir-inline.h). The times do look promising.

For midend/removeComplexExpressions.cpp:

W/o PCH:

Executed in    4.12 secs    fish           external
   usr time    3.91 secs    0.16 millis    3.91 secs
   sys time    0.17 secs    8.69 millis    0.16 secs

With PCH:

Executed in    2.45 secs    fish           external
   usr time    3.06 secs    0.11 millis    3.06 secs
   sys time    0.85 secs    2.16 millis    0.85 secs

asl · 2024-11-25T08:12:57Z

And the differences for ninja frontend are (compilation is done in 10 threads):

Executed in   91.60 secs    fish           external
   usr time  550.26 secs    0.06 millis  550.26 secs
   sys time   20.98 secs    2.26 millis   20.97 secs

vs

Executed in   79.62 secs    fish           external
   usr time  371.67 secs    0.07 millis  371.67 secs
   sys time   17.33 secs    1.44 millis   17.33 secs

asl · 2024-11-25T09:39:17Z

Here is full compilation after .def change (w/o Tofino of course):

w/o PCH:

Executed in  317.24 secs    fish           external
   usr time   39.13 mins    0.10 millis   39.13 mins
   sys time    1.66 mins    3.04 millis    1.66 mins

with PCH:

Executed in  247.53 secs    fish           external
   usr time   28.21 mins    0.08 millis   28.21 mins
   sys time    1.50 mins    1.76 millis    1.50 mins

So, the wall-time difference is "just" 30 seconds, but this is due to 10 cores used to build. Note the 25% reduction in usr time, so on less beefy systems / VMs, the decrease in build time should more pronounced.

This is with clang on Mac OS, YMMV, need to check gcc on Linux :)

asl · 2024-11-25T09:42:25Z

One important thing: if one is switching between the branches frequently and rely on ccache, then the overall compile time might increase as PCH is effectively forces re-compilation in such a case.

asl added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Nov 25, 2024

asl linked a pull request Nov 25, 2024 that will close this issue

Enable PCHs for IR headers #5033

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate use of precompiled headers #5032

Investigate use of precompiled headers #5032

asl commented Nov 25, 2024 •

edited

Loading

asl commented Nov 25, 2024

asl commented Nov 25, 2024

asl commented Nov 25, 2024 •

edited

Loading

asl commented Nov 25, 2024

Investigate use of precompiled headers #5032

Investigate use of precompiled headers #5032

Comments

asl commented Nov 25, 2024 • edited Loading

asl commented Nov 25, 2024

asl commented Nov 25, 2024

asl commented Nov 25, 2024 • edited Loading

asl commented Nov 25, 2024

asl commented Nov 25, 2024 •

edited

Loading

asl commented Nov 25, 2024 •

edited

Loading