Skip to content

Commit 2172b4e

Browse files
committed
[LinkerWrapper] Support relocatable linking for offloading
Summary: The standard GPU compilation process embeds each intermediate object file into the host file at the `.llvm.offloading` section so it can be linked later. We also use a sepcial section called something like `omp_offloading_entries` to store all the globals that need to be registered by the runtime. The linker-wrapper's job is to link the embedded device code stored at this section and then emit code to register the linked image and the kernels and globals in the offloading entry section. One downside to RDC linking is that it can become quite big for very large projects that wish to make use of static linking. This patch changes the support for relocatable linking via `-r` to support a kind of "partial" RDC compilation for offloading languages. This primarily requires manually editing the embedded data in the output object file for the relocatable link. We need to rename the output section to make it distinct from the input sections that will be merged. We then delete the old embedded object code so it won't be linked further. We then need to rename the old offloading section so that it is private to the module. A runtime solution could also be done to defer entires that don't belong to the given GPU executable, but this is easier. Note that this does not work with COFF linking, only the ELF method for handling offloading entries, that could be made to work similarly. Given this support, the following compilation path should produce two distinct images for OpenMP offloading. ``` $ clang foo.c -fopenmp --offload-arch=native -c $ clang foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang main.c merged.o -fopenmp --offload-arch=native $ ./a.out ``` Or similarly for HIP to effectively perform non-RDC mode compilation for a subset of files. ``` $ clang -x hip foo.c --offload-arch=native --offload-new-driver -fgpu-rdc -c $ clang -x hip foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang -x hip main.c merged.o --offload-arch=native --offload-new-driver -fgpu-rdc $ ./a.out ``` One question is whether or not this should be the default behaviour of `-r` when run through the linker-wrapper or a special option. Standard `-r` behavior is still possible if used without invoking the linker-wrapper and it guranteed to be correct.
1 parent 6fecfbc commit 2172b4e

File tree

5 files changed

+77
-14
lines changed

5 files changed

+77
-14
lines changed

clang/test/Driver/linker-wrapper.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -181,5 +181,6 @@ __attribute__((visibility("protected"), used)) int x;
181181
// RUN: --linker-path=/usr/bin/ld.lld -- -r --whole-archive %t.a --no-whole-archive \
182182
// RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE-LINK
183183

184-
// RELOCATABLE-LINK-NOT: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu
184+
// RELOCATABLE-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu
185185
// RELOCATABLE-LINK: /usr/bin/ld.lld{{.*}}-r
186+
// RELOCATABLE-LINK: llvm-objcopy{{.*}}a.out --remove-section .llvm.offloading

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

+64-7
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,63 @@ Expected<std::string> findProgram(StringRef Name, ArrayRef<StringRef> Paths) {
241241
return *Path;
242242
}
243243

244+
/// Returns the hashed value for a constant string.
245+
std::string getHash(StringRef Str) {
246+
llvm::MD5 Hasher;
247+
llvm::MD5::MD5Result Hash;
248+
Hasher.update(Str);
249+
Hasher.final(Hash);
250+
return llvm::utohexstr(Hash.low(), /*LowerCase=*/true);
251+
}
252+
253+
/// Renames offloading entry sections in a relocatable link so they do not
254+
/// conflict with a later link job.
255+
Error relocateOffloadSection(const ArgList &Args, StringRef Output) {
256+
Expected<std::string> ObjcopyPath =
257+
findProgram("llvm-objcopy", {getMainExecutable("llvm-objcopy")});
258+
if (!ObjcopyPath)
259+
return ObjcopyPath.takeError();
260+
261+
// Use the linker output file to get a unique hash. This creates a unique
262+
// identifier to rename the sections to that is deterministic to the contents.
263+
auto BufferOrErr = DryRun ? MemoryBuffer::getMemBuffer("")
264+
: MemoryBuffer::getFileOrSTDIN(Output);
265+
if (!BufferOrErr)
266+
return createStringError(inconvertibleErrorCode(), "Failed to open %s",
267+
Output.str().c_str());
268+
std::string Suffix = "_" + getHash((*BufferOrErr)->getBuffer());
269+
270+
SmallVector<StringRef> ObjcopyArgs = {
271+
*ObjcopyPath,
272+
Output,
273+
};
274+
275+
// Remove the old .llvm.offloading section to prevent further linking.
276+
ObjcopyArgs.emplace_back("--remove-section");
277+
ObjcopyArgs.emplace_back(".llvm.offloading");
278+
for (StringRef Prefix : {"omp", "cuda", "hip"}) {
279+
auto Section = (Prefix + "_offloading_entries").str();
280+
// Rename the offloading entires to make them private to this link unit.
281+
ObjcopyArgs.emplace_back("--rename-section");
282+
ObjcopyArgs.emplace_back(
283+
Args.MakeArgString(Section + "=" + Section + Suffix));
284+
285+
// Rename the __start_ / __stop_ symbols appropriately to iterate over the
286+
// newly renamed section containing the offloading entries.
287+
ObjcopyArgs.emplace_back("--redefine-sym");
288+
ObjcopyArgs.emplace_back(Args.MakeArgString("__start_" + Section + "=" +
289+
"__start_" + Section + Suffix));
290+
ObjcopyArgs.emplace_back("--redefine-sym");
291+
ObjcopyArgs.emplace_back(Args.MakeArgString("__stop_" + Section + "=" +
292+
"__stop_" + Section + Suffix));
293+
}
294+
295+
if (Error Err = executeCommands(*ObjcopyPath, ObjcopyArgs))
296+
return Err;
297+
298+
return Error::success();
299+
}
300+
244301
/// Runs the wrapped linker job with the newly created input.
245302
Error runLinker(ArrayRef<StringRef> Files, const ArgList &Args) {
246303
llvm::TimeTraceScope TimeScope("Execute host linker");
@@ -265,6 +322,11 @@ Error runLinker(ArrayRef<StringRef> Files, const ArgList &Args) {
265322
LinkerArgs.push_back(Arg);
266323
if (Error Err = executeCommands(LinkerPath, LinkerArgs))
267324
return Err;
325+
326+
if (Args.hasArg(OPT_relocatable))
327+
if (Error Err = relocateOffloadSection(Args, ExecutableName))
328+
return Err;
329+
268330
return Error::success();
269331
}
270332

@@ -910,7 +972,8 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
910972
case OFK_OpenMP:
911973
if (Error Err = offloading::wrapOpenMPBinaries(
912974
M, BuffersToWrap,
913-
offloading::getOffloadEntryArray(M, "omp_offloading_entries")))
975+
offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
976+
/*Suffix=*/"", /*Relocatable=*/Args.hasArg(OPT_relocatable)))
914977
return std::move(Err);
915978
break;
916979
case OFK_Cuda:
@@ -1356,12 +1419,6 @@ Expected<SmallVector<SmallVector<OffloadFile>>>
13561419
getDeviceInput(const ArgList &Args) {
13571420
llvm::TimeTraceScope TimeScope("ExtractDeviceCode");
13581421

1359-
// If the user is requesting a reloctable link we ignore the device code. The
1360-
// actual linker will merge the embedded device code sections so they can be
1361-
// linked when the executable is finally created.
1362-
if (Args.hasArg(OPT_relocatable))
1363-
return SmallVector<SmallVector<OffloadFile>>{};
1364-
13651422
StringRef Root = Args.getLastArgValue(OPT_sysroot_EQ);
13661423
SmallVector<StringRef> LibraryPaths;
13671424
for (const opt::Arg *Arg : Args.filtered(OPT_library_path, OPT_libpath))

llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h

+3-1
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,12 @@ using EntryArrayTy = std::pair<GlobalVariable *, GlobalVariable *>;
2020
/// \param EntryArray Optional pair pointing to the `__start` and `__stop`
2121
/// symbols holding the `__tgt_offload_entry` array.
2222
/// \param Suffix An optional suffix appended to the emitted symbols.
23+
/// \param Relocatable Indicate if we need to change the offloading section.
2324
llvm::Error wrapOpenMPBinaries(llvm::Module &M,
2425
llvm::ArrayRef<llvm::ArrayRef<char>> Images,
2526
EntryArrayTy EntryArray,
26-
llvm::StringRef Suffix = "");
27+
llvm::StringRef Suffix = "",
28+
bool Relocatable = false);
2729

2830
/// Wraps the input fatbinary image into the module \p M as global symbols and
2931
/// registers the images with the CUDA runtime.

llvm/lib/Frontend/Offloading/OffloadWrapper.cpp

+7-4
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,8 @@ PointerType *getBinDescPtrTy(Module &M) {
112112
///
113113
/// Global variable that represents BinDesc is returned.
114114
GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs,
115-
EntryArrayTy EntryArray, StringRef Suffix) {
115+
EntryArrayTy EntryArray, StringRef Suffix,
116+
bool Relocatable) {
116117
LLVMContext &C = M.getContext();
117118
auto [EntriesB, EntriesE] = EntryArray;
118119

@@ -129,7 +130,8 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs,
129130
GlobalVariable::InternalLinkage, Data,
130131
".omp_offloading.device_image" + Suffix);
131132
Image->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
132-
Image->setSection(".llvm.offloading");
133+
Image->setSection(Relocatable ? ".llvm.offloading.relocatable"
134+
: ".llvm.offloading");
133135
Image->setAlignment(Align(object::OffloadBinary::getAlignment()));
134136

135137
StringRef Binary(Buf.data(), Buf.size());
@@ -582,8 +584,9 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
582584

583585
Error offloading::wrapOpenMPBinaries(Module &M, ArrayRef<ArrayRef<char>> Images,
584586
EntryArrayTy EntryArray,
585-
llvm::StringRef Suffix) {
586-
GlobalVariable *Desc = createBinDesc(M, Images, EntryArray, Suffix);
587+
llvm::StringRef Suffix, bool Relocatable) {
588+
GlobalVariable *Desc =
589+
createBinDesc(M, Images, EntryArray, Suffix, Relocatable);
587590
if (!Desc)
588591
return createStringError(inconvertibleErrorCode(),
589592
"No binary descriptors created.");

llvm/lib/Object/OffloadBinary.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ Error extractFromObject(const ObjectFile &Obj,
8383
if (!NameOrErr)
8484
return NameOrErr.takeError();
8585

86-
if (!NameOrErr->equals(".llvm.offloading"))
86+
if (!NameOrErr->starts_with(".llvm.offloading"))
8787
continue;
8888
}
8989

0 commit comments

Comments
 (0)