-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lld] "error: Never resolved function from blockaddress" when linking bitcode files containing cross-function blockaddress references #52787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
smaller reproducer: target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
define void @repro() {
br label %label
label:
call void @fun()
ret void
}
define void @fun() noinline {
call void @f(i8* blockaddress(@repro, %label))
ret void
}
declare void @f(i8*) $ llvm-as d.ll
$ ld.lld d.bc
ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM14.0.0git' Reader: 'LLVM 14.0.0git') |
cc @dexonsmith (author of 908d809) |
Gosh, that was a while ago. In case it helps, I can tell you I was fixing bugs I found while implementing use-list-order serialization (see e.g. the successor 5a511b5).
This could work, but I think it should be possible to fix BitcodeReader/IRMover to do the right thing without adding backedges. I think you need something like this:
I haven't looked carefully at the stack — why is IRMover being called? will the blocks be moved back? — but I think it's possible we wouldn't need to serialize anything new. If we did need something new serialized, maybe it could be limited to a bit per basic block saying "is this basic block referenced by a But I'd much prefer to avoid having the function content in bitcode depend on what references it. Seems like the IRMover and BitcodeReader should be able to track this somehow. E.g., maybe the caller of IRMover should tell BitcodeReader that the basic blocks were moved to another function. |
Funnily enough, given my reduced test case above, reordering the definition of (for my reduced case above) It seems like There's a comment in 852 if (!BasicBlockFwdRefs.count(F))
853 // Already materialized.
854 continue; so it seems like F technically was already materialized, but it's no longer referred to by |
I think what's going on is that LTO uses IRMover is moving Functions from one Module to another (to merge them). If the Function of the BlockAddress has already been moved, it appears "empty" so a So right now, in
This is tricky; you could have 2 different Functions with BlockAddresses that refer to each other. How is importing them one by one supposed to work? |
I haven't had a chance to investigate deeply after @dexonsmith's comments, but I think my biggest question is: is it guaranteed that no other modifications (e.g. LTO passes) will happen to a moved function until we have materialized all functions that reference it? If that is guaranteed, then we can probably keep track of some information about moved functions to resolve blockrefs pointing into moved functions. If that is not guaranteed though, then I think we would need to figure out a way to eagerly materialize all of the dependents of a function, which sounds a little trickier to do. |
IRMover has a ValueMapper that keeps track. The problem is that BitcodeReader doesn't have access to this mapping; it's just being told to materialize Functions one by one from bitcode.
Right, for a Function, we can generally call Function::hasAddressTaken() to find such dependencies. The issue is that that method will assert if we haven't yet fully materialized the whole Module, which we haven't done yet during LTO. |
So what's happening is that we have something like:
That splicing is making f fail to materialize for the second function in the function_list since materializing (i.e. lazy parsing LLVM bitcode into memory) is recursive for
|
I tried that; doesn't work; we can't determine whether a function has its address taken until a module has been fully materialized. To support lazy parsing, we don't want to fully parse the body of every function to try to find blockaddress constants. |
I think I have a fix for this; just running the test suite and writing more tests. |
/branch llvmbot/llvm-project/issue52787 |
IRLinker builds a work list of functions to materialize, then moves them from a source module to a destination module one at a time. This is a problem for blockaddress Constants, since they need not refer to the function they are used in; IPSCCP is quite good at sinking these constants deep into other functions when passed as arguments. This would lead to curious errors during LTO: ld.lld: error: Never resolved function from blockaddress ... based on the ordering of function definitions in IR. The problem was that IRLinker would basically do: for function f in worklist: materialize f splice f from source module to destination module in one pass, with Functions being lazily added to the running worklist. This confuses BitcodeReader, which cannot disambiguate whether a blockaddress is referring to a function which has not yet been parsed ("materialized") or is simply empty because its body was spliced out. This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs list incorrectly, as it will never re-materialize an already materialized (but spliced out) function. Because of the possibility that blockaddress Constants may appear in Functions other than the ones they reference, this patch adds a new bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of Functions that contain BlockAddress Constants that refer back to this Function, rather then the Function they are scoped in. We then materialize those functions when materializing `f` from the example loop above. This might over-materialize Functions should the user of BitcodeReader ultimately decide not to link those Functions, but we can at least now we can avoid this ordering related issue with blockaddresses. Fixes: llvm#52787 Fixes: ClangBuiltLinux/linux#1215 Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D120781 (cherry picked from commit 23ec578)
/pull-request llvmbot#167 |
@dexonsmith Is this safe to backport: 23ec578 |
@nickdesaulniers Does this change mean that LLVM 14.0.0 won't be able to read bitcode from 14.0.1+x ? |
I suspect so; 14.0.1 could produce new bitcode, I assume 14.0.0 would not recognize it. |
Sorry I missed the question — no, I don't think it's safe to backport to a dot release, since it adds new bitcode records. |
No worries; there's only 2 optional configs that expose this easily (at the moment). Those folks can use ToT if they really need full LTO to work for those modules. |
Hi, it seems that the fix only adds |
The original fix (commit 23ec578) of #52787 only adds `Function`s that have `Instruction`s that directly use `BlockAddress`es into the bitcode (`FUNC_CODE_BLOCKADDR_USERS`). However, in either @rickyz's original reproducing code: ``` void f(long); __attribute__((noinline)) static void fun(long x) { f(x + 1); } void repro(void) { fun(({ label: (long)&&label; })); } ``` ``` ... define dso_local void @repro() #0 { entry: br label %label label: ; preds = %entry tail call fastcc void @fun() ret void } define internal fastcc void @fun() unnamed_addr #1 { entry: tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3 ret void } ... ``` or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g., `i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s (e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and then used by `Instruction`s. This case is not handled by the original fix. This patch adds *indirect* users of `BlockAddress`es, i.e., the `Instruction`s using some `Constant`s which further use the `BlockAddress`es, into the bitcode as well, by doing depth-first searches. Fixes: #52787 Fixes: 23ec578 ("[Bitcode] materialize Functions early when BlockAddress taken") Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D124878
IRLinker builds a work list of functions to materialize, then moves them from a source module to a destination module one at a time. This is a problem for blockaddress Constants, since they need not refer to the function they are used in; IPSCCP is quite good at sinking these constants deep into other functions when passed as arguments. This would lead to curious errors during LTO: ld.lld: error: Never resolved function from blockaddress ... based on the ordering of function definitions in IR. The problem was that IRLinker would basically do: for function f in worklist: materialize f splice f from source module to destination module in one pass, with Functions being lazily added to the running worklist. This confuses BitcodeReader, which cannot disambiguate whether a blockaddress is referring to a function which has not yet been parsed ("materialized") or is simply empty because its body was spliced out. This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs list incorrectly, as it will never re-materialize an already materialized (but spliced out) function. Because of the possibility that blockaddress Constants may appear in Functions other than the ones they reference, this patch adds a new bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of Functions that contain BlockAddress Constants that refer back to this Function, rather then the Function they are scoped in. We then materialize those functions when materializing `f` from the example loop above. This might over-materialize Functions should the user of BitcodeReader ultimately decide not to link those Functions, but we can at least now we can avoid this ordering related issue with blockaddresses. Fixes: llvm/llvm-project#52787 Fixes: ClangBuiltLinux/linux#1215 Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D120781
The original fix (commit 23ec5782c3cc) of llvm/llvm-project#52787 only adds `Function`s that have `Instruction`s that directly use `BlockAddress`es into the bitcode (`FUNC_CODE_BLOCKADDR_USERS`). However, in either @rickyz's original reproducing code: ``` void f(long); __attribute__((noinline)) static void fun(long x) { f(x + 1); } void repro(void) { fun(({ label: (long)&&label; })); } ``` ``` ... define dso_local void @repro() #0 { entry: br label %label label: ; preds = %entry tail call fastcc void @fun() ret void } define internal fastcc void @fun() unnamed_addr #1 { entry: tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3 ret void } ... ``` or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g., `i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s (e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and then used by `Instruction`s. This case is not handled by the original fix. This patch adds *indirect* users of `BlockAddress`es, i.e., the `Instruction`s using some `Constant`s which further use the `BlockAddress`es, into the bitcode as well, by doing depth-first searches. Fixes: llvm/llvm-project#52787 Fixes: 23ec5782c3cc ("[Bitcode] materialize Functions early when BlockAddress taken") Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D124878
This bug was originally discovered at ClangBuiltLinux/linux#1215 by attempting to build Linux with Clang+LTO. The following is a copy of the analysis from ClangBuiltLinux/linux#1215 (comment):
Here's a hand-minimized repro:
$ clang -O2 -flto -c repro.c -o repro.i
$ ld.lld repro.i
ld.lld: error: Never resolved function from blockaddress (...)
Relevant part of the LLVM IR, generated with:
$ clang -O2 -flto -c repro.c -fno-discard-value-names -S
We can see clang figured out that the first argument to
fun()
is always the same, so it inlined the value of the argument (which is the address oflabel
inrepro
).Searching LLVM code for the source of the error message gives
llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
Line 853 in 4b55329
(there is another error with the same string, but building lld with a change to the error message shows that this is the one that triggers). Reading through this code, we see that BitcodeReader lazily reads an LLVM bitcode file. The reader appears to initially return placeholder objects that need to be "materialized" on demand.
The code has some special handling of blockaddress IR constants, as a blockaddress that appears within a function F may reference a basic block from a different function G (as is the case in the IR for the bug repro).
When materializing a function with a
blockaddress(Fn, Fn_BB)
, the code needs to separately handle the cases whereFn
either has or has not yet been materialized. That logic appears here:llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
Lines 2901 to 2927 in 4b55329
If
Fn
has been materialized (theFn->empty()
check), then the reader creates a BlockAddress referring to the appropriate Function/BasicBlock object. IfFn
has not been materialized, then the reader creates a placeholder BasicBlock for the BlockAddress and addsFn
to a queue of functions to be materialized. WhenFn
is materialized, it will make sure to use the precreated BasicBlock object.After adding some debug prints, it appears that lld's use of BitcodeReader breaks this lazy blockaddress handling. In the repro case, the linker does materialize
repro
before it attempts to materializefun
. However, before it attempts to materializefun
, it steals the basic blocks out ofrepro
's Function object here:llvm-project/llvm/lib/Linker/IRMover.cpp
Lines 1116 to 1117 in 4b55329
This causes the
Fn->empty()
check mentioned above to pass, which makes the code think thatFn
(repro
) has not yet been materialized. That code path enqueuesrepro
to be materialized a second time. When that happens, this code:llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
Lines 162 to 163 in 4b55329
detects (using
IsMaterializable()
instead ofempty()
) thatrepro
has already been materialized, and errors out.This looks like a fairly old bug in lld. Fixing it looks tricky - it seems that BitcodeReader requires that materialized Functions aren't modified while there are still functions that will be materialized later, and lld violates that assumption. Maintaining this invariant seems like it might conflict with the goal of lazy reading. Maybe one idea is to maintain some reverse dependency information in bitcode files. Then materializing a function F could trigger immediate materialization of all functions with a blockaddress pointing into F.
The text was updated successfully, but these errors were encountered: