Skip to content

Commit 7c3ce02

Browse files
committed
Introduce a minimum CGU size in non-incremental builds.
Because tiny CGUs make compilation less efficient *and* result in worse generated code. We don't do this when the number of CGUs is explicitly given, because there are times when the requested number is very important, as described in some comments within the commit. So the commit also introduces a `CodegenUnits` type that distinguishes between default values and user-specified values. This change has a roughly neutral effect on walltimes across the rustc-perf benchmarks; there are some speedups and some slowdowns. But it has significant wins for most other metrics on numerous benchmarks, including instruction counts, cycles, binary size, and max-rss. It also reduces parallelism, which is good for reducing jobserver competition when multiple rustc processes are running at the same time. It's smaller benchmarks that benefit the most; larger benchmarks already have CGUs that are all larger than the minimum size. Here are some example before/after CGU sizes for opt builds. - html5ever - CGUs: 16, mean size: 1196.1, sizes: [3908, 2992, 1706, 1652, 1572, 1136, 1045, 948, 946, 938, 579, 471, 443, 327, 286, 189] - CGUs: 4, mean size: 4396.0, sizes: [6706, 3908, 3490, 3480] - libc - CGUs: 12, mean size: 35.3, sizes: [163, 93, 58, 53, 37, 8, 2 (x6)] - CGUs: 1, mean size: 424.0, sizes: [424] - tt-muncher - CGUs: 5, mean size: 1819.4, sizes: [8508, 350, 198, 34, 7] - CGUs: 1, mean size: 9075.0, sizes: [9075] Note that CGUs of size 100,000+ aren't unusual in larger programs.
1 parent 95d8589 commit 7c3ce02

File tree

5 files changed

+65
-18
lines changed

5 files changed

+65
-18
lines changed

compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -1385,7 +1385,7 @@ fn vcall_visibility_metadata<'ll, 'tcx>(
13851385
let trait_def_id = trait_ref_self.def_id();
13861386
let trait_vis = cx.tcx.visibility(trait_def_id);
13871387

1388-
let cgus = cx.sess().codegen_units();
1388+
let cgus = cx.sess().codegen_units().as_usize();
13891389
let single_cgu = cgus == 1;
13901390

13911391
let lto = cx.sess().lto();

compiler/rustc_codegen_ssa/src/back/write.rs

+3-3
Original file line numberDiff line numberDiff line change
@@ -646,10 +646,10 @@ fn produce_final_output_artifacts(
646646
// rlib.
647647
let needs_crate_object = crate_output.outputs.contains_key(&OutputType::Exe);
648648

649-
let keep_numbered_bitcode = user_wants_bitcode && sess.codegen_units() > 1;
649+
let keep_numbered_bitcode = user_wants_bitcode && sess.codegen_units().as_usize() > 1;
650650

651651
let keep_numbered_objects =
652-
needs_crate_object || (user_wants_objects && sess.codegen_units() > 1);
652+
needs_crate_object || (user_wants_objects && sess.codegen_units().as_usize() > 1);
653653

654654
for module in compiled_modules.modules.iter() {
655655
if let Some(ref path) = module.object {
@@ -1923,7 +1923,7 @@ impl<B: ExtraBackendMethods> OngoingCodegen<B> {
19231923

19241924
// FIXME: time_llvm_passes support - does this use a global context or
19251925
// something?
1926-
if sess.codegen_units() == 1 && sess.opts.unstable_opts.time_llvm_passes {
1926+
if sess.codegen_units().as_usize() == 1 && sess.opts.unstable_opts.time_llvm_passes {
19271927
self.backend.print_pass_timings()
19281928
}
19291929

compiler/rustc_monomorphize/src/partitioning.rs

+32-6
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ use rustc_middle::query::Providers;
113113
use rustc_middle::ty::print::{characteristic_def_id_of_type, with_no_trimmed_paths};
114114
use rustc_middle::ty::{self, visit::TypeVisitableExt, InstanceDef, TyCtxt};
115115
use rustc_session::config::{DumpMonoStatsFormat, SwitchWithOptPath};
116+
use rustc_session::CodegenUnits;
116117
use rustc_span::symbol::Symbol;
117118

118119
use crate::collector::UsageMap;
@@ -322,7 +323,7 @@ fn merge_codegen_units<'tcx>(
322323
cx: &PartitioningCx<'_, 'tcx>,
323324
codegen_units: &mut Vec<CodegenUnit<'tcx>>,
324325
) {
325-
assert!(cx.tcx.sess.codegen_units() >= 1);
326+
assert!(cx.tcx.sess.codegen_units().as_usize() >= 1);
326327

327328
// A sorted order here ensures merging is deterministic.
328329
assert!(codegen_units.is_sorted_by(|a, b| Some(a.name().as_str().cmp(b.name().as_str()))));
@@ -331,11 +332,32 @@ fn merge_codegen_units<'tcx>(
331332
let mut cgu_contents: FxHashMap<Symbol, Vec<Symbol>> =
332333
codegen_units.iter().map(|cgu| (cgu.name(), vec![cgu.name()])).collect();
333334

334-
// Merge the two smallest codegen units until the target size is
335-
// reached.
336-
while codegen_units.len() > cx.tcx.sess.codegen_units() {
337-
// Sort small cgus to the back
335+
// Having multiple CGUs can drastically speed up compilation. But for
336+
// non-incremental builds, tiny CGUs slow down compilation *and* result in
337+
// worse generated code. So we don't allow CGUs smaller than this (unless
338+
// there is just one CGU, of course). Note that CGU sizes of 100,000+ are
339+
// common in larger programs, so this isn't all that large.
340+
const NON_INCR_MIN_CGU_SIZE: usize = 1000;
341+
342+
// Repeatedly merge the two smallest codegen units as long as:
343+
// - we have more CGUs than the upper limit, or
344+
// - (Non-incremental builds only) the user didn't specify a CGU count, and
345+
// there are multiple CGUs, and some are below the minimum size.
346+
//
347+
// The "didn't specify a CGU count" condition is because when an explicit
348+
// count is requested we observe it as closely as possible. For example,
349+
// the `compiler_builtins` crate sets `codegen-units = 10000` and it's
350+
// critical they aren't merged. Also, some tests use explicit small values
351+
// and likewise won't work if small CGUs are merged.
352+
while codegen_units.len() > cx.tcx.sess.codegen_units().as_usize()
353+
|| (cx.tcx.sess.opts.incremental.is_none()
354+
&& matches!(cx.tcx.sess.codegen_units(), CodegenUnits::Default(_))
355+
&& codegen_units.len() > 1
356+
&& codegen_units.iter().any(|cgu| cgu.size_estimate() < NON_INCR_MIN_CGU_SIZE))
357+
{
358+
// Sort small cgus to the back.
338359
codegen_units.sort_by_cached_key(|cgu| cmp::Reverse(cgu.size_estimate()));
360+
339361
let mut smallest = codegen_units.pop().unwrap();
340362
let second_smallest = codegen_units.last_mut().unwrap();
341363

@@ -918,9 +940,13 @@ fn debug_dump<'a, 'tcx: 'a>(
918940
let symbol_hash_start = symbol_name.rfind('h');
919941
let symbol_hash = symbol_hash_start.map_or("<no hash>", |i| &symbol_name[i..]);
920942
let size = item.size_estimate(tcx);
943+
let kind = match item.instantiation_mode(tcx) {
944+
InstantiationMode::GloballyShared { .. } => "root",
945+
InstantiationMode::LocalCopy => "inlined",
946+
};
921947
let _ = with_no_trimmed_paths!(writeln!(
922948
s,
923-
" - {item} [{linkage:?}] [{symbol_hash}] (size={size})"
949+
" - {item} [{linkage:?}] [{symbol_hash}] ({kind}, size: {size})"
924950
));
925951
}
926952

compiler/rustc_session/src/session.rs

+27-6
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,27 @@ pub enum MetadataKind {
234234
Compressed,
235235
}
236236

237+
#[derive(Clone, Copy)]
238+
pub enum CodegenUnits {
239+
/// Specified by the user. In this case we try fairly hard to produce the
240+
/// number of CGUs requested.
241+
User(usize),
242+
243+
/// A default value, i.e. not specified by the user. In this case we take
244+
/// more liberties about CGU formation, e.g. avoid producing very small
245+
/// CGUs.
246+
Default(usize),
247+
}
248+
249+
impl CodegenUnits {
250+
pub fn as_usize(self) -> usize {
251+
match self {
252+
CodegenUnits::User(n) => n,
253+
CodegenUnits::Default(n) => n,
254+
}
255+
}
256+
}
257+
237258
impl Session {
238259
pub fn miri_unleashed_feature(&self, span: Span, feature_gate: Option<Symbol>) {
239260
self.miri_unleashed_features.lock().push((span, feature_gate));
@@ -1104,7 +1125,7 @@ impl Session {
11041125

11051126
// If there's only one codegen unit and LTO isn't enabled then there's
11061127
// no need for ThinLTO so just return false.
1107-
if self.codegen_units() == 1 {
1128+
if self.codegen_units().as_usize() == 1 {
11081129
return config::Lto::No;
11091130
}
11101131

@@ -1206,19 +1227,19 @@ impl Session {
12061227

12071228
/// Returns the number of codegen units that should be used for this
12081229
/// compilation
1209-
pub fn codegen_units(&self) -> usize {
1230+
pub fn codegen_units(&self) -> CodegenUnits {
12101231
if let Some(n) = self.opts.cli_forced_codegen_units {
1211-
return n;
1232+
return CodegenUnits::User(n);
12121233
}
12131234
if let Some(n) = self.target.default_codegen_units {
1214-
return n as usize;
1235+
return CodegenUnits::Default(n as usize);
12151236
}
12161237

12171238
// If incremental compilation is turned on, we default to a high number
12181239
// codegen units in order to reduce the "collateral damage" small
12191240
// changes cause.
12201241
if self.opts.incremental.is_some() {
1221-
return 256;
1242+
return CodegenUnits::Default(256);
12221243
}
12231244

12241245
// Why is 16 codegen units the default all the time?
@@ -1271,7 +1292,7 @@ impl Session {
12711292
// As a result 16 was chosen here! Mostly because it was a power of 2
12721293
// and most benchmarks agreed it was roughly a local optimum. Not very
12731294
// scientific.
1274-
16
1295+
CodegenUnits::Default(16)
12751296
}
12761297

12771298
pub fn teach(&self, code: &DiagnosticId) -> bool {

src/doc/rustc/src/codegen-options/index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ Supported values can also be discovered by running `rustc --print code-models`.
3131

3232
## codegen-units
3333

34-
This flag controls how many code generation units the crate is split into. It
35-
takes an integer greater than 0.
34+
This flag controls the maximum number of code generation units the crate is
35+
split into. It takes an integer greater than 0.
3636

3737
When a crate is split into multiple codegen units, LLVM is able to process
3838
them in parallel. Increasing parallelism may speed up compile times, but may

0 commit comments

Comments
 (0)