Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce the compiler-builtins partitioning scheme #135395

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions compiler/rustc_mir_transform/src/cross_crate_inline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,15 @@ fn cross_crate_inlinable(tcx: TyCtxt<'_>, def_id: LocalDefId) -> bool {
return true;
}

// compiler-builtins only defines intrinsics (which are handled above by checking
// contains_extern_indicator) and helper functions used by those intrinsics. The helper
// functions should always be inlined into intrinsics that use them. This check does not
// guarantee that we get the optimizations we want, but it makes them *much* easier.
// See https://github.com/rust-lang/rust/issues/73135
if tcx.is_compiler_builtins(rustc_span::def_id::LOCAL_CRATE) {
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make inlining inside the crate more likely without causing MIR for all functions in compiler-builtins to get encoded in the crate metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're pointing out here is that these functions are not reachable as MIR, so we don't need to encode MIR for them. The problem as I see it is that our notion of reachable uses this worklist/visited algorithm that tracks items in a path-independent way:

while let Some(search_item) = self.worklist.pop() {
if !scanned.insert(search_item) {
continue;
}
self.propagate_node(&self.tcx.hir_node_by_def_id(search_item), search_item);

Also we already have an issue for the inverse inefficiency, emitting object code when we only need MIR: #119214

I put a hack in this place specifically because the compiler is designed around this function either true or false for whatever reason, past the first few checks. I'm not aware of anywhere else we could make a small localized change to get the behavior we want.

The only other place I could think of putting a hack is MonoItem::instantiation_mode, but that doesn't work because then we get linker errors because instantiation mode needs to agree with exported_symbols, and those disagree because because exported_symbols is based on reachable_set. I really think the inaccuracy of the reachable_set analysis is the root problem here, and it's net better to implement this in a non-invasive way that will be fixed automatically if reachable_set gets improved.

Copy link
Member Author

@saethlin saethlin Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if I back up to my merge-base, x build library, then ar x the stage1-std libcompiler_builtins.rlib and run du -sch * I get:

808K	lib.rmeta
5.7M	total

Then with my changes:

968K	lib.rmeta
4.1M	total

So even though it's not perfect, this PR is still a net win.


if tcx.has_attr(def_id, sym::rustc_intrinsic) {
// Intrinsic fallback bodies are always cross-crate inlineable.
// To ensure that the MIR inliner doesn't cluelessly try to inline fallback
Expand Down
61 changes: 40 additions & 21 deletions compiler/rustc_monomorphize/src/partitioning.rs
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,8 @@ where
// estimates.
{
let _prof_timer = tcx.prof.generic_activity("cgu_partitioning_merge_cgus");
merge_codegen_units(cx, &mut codegen_units);
let cgu_contents = merge_codegen_units(cx, &mut codegen_units);
rename_codegen_units(cx, &mut codegen_units, cgu_contents);
debug_dump(tcx, "MERGE", &codegen_units);
}

Expand Down Expand Up @@ -200,7 +201,6 @@ where
I: Iterator<Item = MonoItem<'tcx>>,
{
let mut codegen_units = UnordMap::default();
let is_incremental_build = cx.tcx.sess.opts.incremental.is_some();
let mut internalization_candidates = UnordSet::default();

// Determine if monomorphizations instantiated in this crate will be made
Expand All @@ -227,20 +227,8 @@ where
}
}

let characteristic_def_id = characteristic_def_id_of_mono_item(cx.tcx, mono_item);
let is_volatile = is_incremental_build && mono_item.is_generic_fn();

let cgu_name = match characteristic_def_id {
Some(def_id) => compute_codegen_unit_name(
cx.tcx,
cgu_name_builder,
def_id,
is_volatile,
cgu_name_cache,
),
None => fallback_cgu_name(cgu_name_builder),
};

let cgu_name =
compute_codegen_unit_name(cx.tcx, cgu_name_builder, mono_item, cgu_name_cache);
let cgu = codegen_units.entry(cgu_name).or_insert_with(|| CodegenUnit::new(cgu_name));

let mut can_be_internalized = true;
Expand Down Expand Up @@ -321,7 +309,7 @@ where
fn merge_codegen_units<'tcx>(
cx: &PartitioningCx<'_, 'tcx>,
codegen_units: &mut Vec<CodegenUnit<'tcx>>,
) {
) -> UnordMap<Symbol, Vec<Symbol>> {
assert!(cx.tcx.sess.codegen_units().as_usize() >= 1);

// A sorted order here ensures merging is deterministic.
Expand All @@ -331,6 +319,13 @@ fn merge_codegen_units<'tcx>(
let mut cgu_contents: UnordMap<Symbol, Vec<Symbol>> =
codegen_units.iter().map(|cgu| (cgu.name(), vec![cgu.name()])).collect();

// When compiling compiler_builtins, we do not want to put multiple intrinsics in a CGU.
// There may be mergeable CGUs under this constraint, but just skipping over merging is much
// simpler.
if cx.tcx.is_compiler_builtins(LOCAL_CRATE) {
return cgu_contents;
}

// If N is the maximum number of CGUs, and the CGUs are sorted from largest
// to smallest, we repeatedly find which CGU in codegen_units[N..] has the
// greatest overlap of inlined items with codegen_units[N-1], merge that
Expand Down Expand Up @@ -421,6 +416,14 @@ fn merge_codegen_units<'tcx>(
// Don't update `cgu_contents`, that's only for incremental builds.
}

cgu_contents
}

fn rename_codegen_units<'tcx>(
cx: &PartitioningCx<'_, 'tcx>,
codegen_units: &mut Vec<CodegenUnit<'tcx>>,
cgu_contents: UnordMap<Symbol, Vec<Symbol>>,
) {
let cgu_name_builder = &mut CodegenUnitNameBuilder::new(cx.tcx);

// Rename the newly merged CGUs.
Expand Down Expand Up @@ -678,13 +681,26 @@ fn characteristic_def_id_of_mono_item<'tcx>(
}
}

fn compute_codegen_unit_name(
tcx: TyCtxt<'_>,
fn compute_codegen_unit_name<'tcx>(
tcx: TyCtxt<'tcx>,
name_builder: &mut CodegenUnitNameBuilder<'_>,
def_id: DefId,
volatile: bool,
mono_item: MonoItem<'tcx>,
cache: &mut CguNameCache,
) -> Symbol {
// When compiling compiler_builtins, we do not want to put multiple intrinsics in a CGU.
// Using the symbol name as the CGU name puts every GloballyShared item in its own CGU, but in
// an optimized build we actually want every item in the crate that isn't an intrinsic to get
// LocalCopy so that it is easy to inline away. In an unoptimized build, this CGU naming
// strategy probably generates more CGUs than we strictly need. But it is simple.
if tcx.is_compiler_builtins(LOCAL_CRATE) {
let name = mono_item.symbol_name(tcx);
return Symbol::intern(name.name);
}

let Some(def_id) = characteristic_def_id_of_mono_item(tcx, mono_item) else {
return fallback_cgu_name(name_builder);
};

// Find the innermost module that is not nested within a function.
let mut current_def_id = def_id;
let mut cgu_def_id = None;
Expand Down Expand Up @@ -712,6 +728,9 @@ fn compute_codegen_unit_name(

let cgu_def_id = cgu_def_id.unwrap();

let is_incremental_build = tcx.sess.opts.incremental.is_some();
let volatile = is_incremental_build && mono_item.is_generic_fn();

*cache.entry((cgu_def_id, volatile)).or_insert_with(|| {
let def_path = tcx.def_path(cgu_def_id);

Expand Down
13 changes: 0 additions & 13 deletions library/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,6 @@ exclude = [
"windows_targets"
]

[profile.release.package.compiler_builtins]
# For compiler-builtins we always use a high number of codegen units.
# The goal here is to place every single intrinsic into its own object
# file to avoid symbol clashes with the system libgcc if possible. Note
# that this number doesn't actually produce this many object files, we
# just don't create more than this number of object files.
#
# It's a bit of a bummer that we have to pass this here, unfortunately.
# Ideally this would be specified through an env var to Cargo so Cargo
# knows how many CGUs are for this specific crate, but for now
# per-crate configuration isn't specifiable in the environment.
codegen-units = 10000

# These dependencies of the standard library implement symbolication for
# backtraces on most platforms. Their debuginfo causes both linking to be slower
# (more data to chew through) and binaries to be larger without really all that
Expand Down