Skip to content

Commit af382e0

Browse files
committed
[LinkerWrapper] Support relocatable linking for offloading
Summary: The standard GPU compilation process embeds each intermediate object file into the host file at the `.llvm.offloading` section so it can be linked later. We also use a sepcial section called something like `omp_offloading_entries` to store all the globals that need to be registered by the runtime. The linker-wrapper's job is to link the embedded device code stored at this section and then emit code to register the linked image and the kernels and globals in the offloading entry section. One downside to RDC linking is that it can become quite big for very large projects that wish to make use of static linking. This patch changes the support for relocatable linking via `-r` to support a kind of "partial" RDC compilation for offloading languages. This primarily requires manually editing the embedded data in the output object file for the relocatable link. We need to rename the output section to make it distinct from the input sections that will be merged. We then delete the old embedded object code so it won't be linked further. We then need to rename the old offloading section so that it is private to the module. A runtime solution could also be done to defer entires that don't belong to the given GPU executable, but this is easier. Note that this does not work with COFF linking, only the ELF method for handling offloading entries, that could be made to work similarly. Given this support, the following compilation path should produce two distinct images for OpenMP offloading. ``` $ clang foo.c -fopenmp --offload-arch=native -c $ clang foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang main.c merged.o -fopenmp --offload-arch=native $ ./a.out ``` Or similarly for HIP to effectively perform non-RDC mode compilation for a subset of files. ``` $ clang -x hip foo.c --offload-arch=native --offload-new-driver -fgpu-rdc -c $ clang -x hip foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang -x hip main.c merged.o --offload-arch=native --offload-new-driver -fgpu-rdc $ ./a.out ``` One question is whether or not this should be the default behaviour of `-r` when run through the linker-wrapper or a special option. Standard `-r` behavior is still possible if used without invoking the linker-wrapper and it guranteed to be correct.
1 parent 7155c1e commit af382e0

File tree

6 files changed

+119
-16
lines changed

6 files changed

+119
-16
lines changed

clang/test/Driver/linker-wrapper-image.c

+8
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
// RUN: -fembed-offload-object=%t.out
1010
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
1111
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=OPENMP,OPENMP-ELF
12+
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run -r --host-triple=x86_64-unknown-linux-gnu \
13+
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=OPENMP-ELF,OPENMP-REL
1214
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-windows-gnu \
1315
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=OPENMP,OPENMP-COFF
1416

@@ -19,6 +21,8 @@
1921
// OPENMP-COFF: @__start_omp_offloading_entries = weak_odr hidden constant [0 x %struct.__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries$OA"
2022
// OPENMP-COFF-NEXT: @__stop_omp_offloading_entries = weak_odr hidden constant [0 x %struct.__tgt_offload_entry] zeroinitializer, section "omp_offloading_entries$OZ"
2123

24+
// OPENMP-REL: @.omp_offloading.device_image = internal unnamed_addr constant [[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading.relocatable", align 8
25+
2226
// OPENMP: @.omp_offloading.device_image = internal unnamed_addr constant [[[SIZE:[0-9]+]] x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 8
2327
// OPENMP-NEXT: @.omp_offloading.device_images = internal unnamed_addr constant [1 x %__tgt_device_image] [%__tgt_device_image { ptr getelementptr inbounds ([[[BEGIN:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, i64 0), ptr getelementptr inbounds ([[[END:[0-9]+]] x i8], ptr @.omp_offloading.device_image, i64 1, i64 0), ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }]
2428
// OPENMP-NEXT: @.omp_offloading.descriptor = internal constant %__tgt_bin_desc { i32 1, ptr @.omp_offloading.device_images, ptr @__start_omp_offloading_entries, ptr @__stop_omp_offloading_entries }
@@ -42,6 +46,8 @@
4246
// RUN: -fembed-offload-object=%t.out
4347
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
4448
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=CUDA,CUDA-ELF
49+
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run -r --host-triple=x86_64-unknown-linux-gnu \
50+
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=CUDA,CUDA-ELF
4551
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-windows-gnu \
4652
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=CUDA,CUDA-COFF
4753

@@ -140,6 +146,8 @@
140146
// RUN: -fembed-offload-object=%t.out
141147
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
142148
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=HIP,HIP-ELF
149+
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu -r \
150+
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=HIP,HIP-ELF
143151
// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-windows-gnu \
144152
// RUN: --linker-path=/usr/bin/ld -- %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=HIP,HIP-COFF
145153

clang/test/Driver/linker-wrapper.c

+29-3
Original file line numberDiff line numberDiff line change
@@ -176,10 +176,36 @@ __attribute__((visibility("protected"), used)) int x;
176176
// RUN: --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
177177
// RUN: --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu
178178
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
179-
// RUN: llvm-ar rcs %t.a %t.o
180179
// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
181-
// RUN: --linker-path=/usr/bin/ld.lld -- -r --whole-archive %t.a --no-whole-archive \
180+
// RUN: --linker-path=/usr/bin/ld.lld -- -r %t.o \
182181
// RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE-LINK
183182

184-
// RELOCATABLE-LINK-NOT: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu
183+
// RELOCATABLE-LINK: clang{{.*}} -o {{.*}}.img --target=x86_64-unknown-linux-gnu
185184
// RELOCATABLE-LINK: /usr/bin/ld.lld{{.*}}-r
185+
// RELOCATABLE-LINK: llvm-objcopy{{.*}}a.out --remove-section .llvm.offloading
186+
187+
// RUN: clang-offload-packager -o %t.out \
188+
// RUN: --image=file=%t.elf.o,kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx90a \
189+
// RUN: --image=file=%t.elf.o,kind=hip,triple=amdgcn-amd-amdhsa,arch=gfx90a
190+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
191+
// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
192+
// RUN: --linker-path=/usr/bin/ld.lld -- -r %t.o \
193+
// RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE-LINK-HIP
194+
195+
// RELOCATABLE-LINK-HIP: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa
196+
// RELOCATABLE-LINK-HIP: clang-offload-bundler{{.*}} -type=o -bundle-align=4096 -targets=host-x86_64-unknown-linux,hipv4-amdgcn-amd-amdhsa--gfx90a -input=/dev/null -input={{.*}} -output={{.*}}
197+
// RELOCATABLE-LINK-HIP: /usr/bin/ld.lld{{.*}}-r
198+
// RELOCATABLE-LINK-HIP: llvm-objcopy{{.*}}a.out --remove-section .llvm.offloading
199+
200+
// RUN: clang-offload-packager -o %t.out \
201+
// RUN: --image=file=%t.elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_89 \
202+
// RUN: --image=file=%t.elf.o,kind=cuda,triple=nvptx64-nvidia-cuda,arch=sm_89
203+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
204+
// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
205+
// RUN: --linker-path=/usr/bin/ld.lld -- -r %t.o \
206+
// RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE-LINK-CUDA
207+
208+
// RELOCATABLE-LINK-CUDA: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda
209+
// RELOCATABLE-LINK-CUDA: fatbinary{{.*}} -64 --create {{.*}}.fatbin --image=profile=sm_89,file={{.*}}.img
210+
// RELOCATABLE-LINK-CUDA: /usr/bin/ld.lld{{.*}}-r
211+
// RELOCATABLE-LINK-CUDA: llvm-objcopy{{.*}}a.out --remove-section .llvm.offloading

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

+71-7
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,70 @@ Expected<std::string> findProgram(StringRef Name, ArrayRef<StringRef> Paths) {
241241
return *Path;
242242
}
243243

244+
/// Returns the hashed value for a constant string.
245+
std::string getHash(StringRef Str) {
246+
llvm::MD5 Hasher;
247+
llvm::MD5::MD5Result Hash;
248+
Hasher.update(Str);
249+
Hasher.final(Hash);
250+
return llvm::utohexstr(Hash.low(), /*LowerCase=*/true);
251+
}
252+
253+
/// Renames offloading entry sections in a relocatable link so they do not
254+
/// conflict with a later link job.
255+
Error relocateOffloadSection(const ArgList &Args, StringRef Output) {
256+
llvm::Triple Triple(
257+
Args.getLastArgValue(OPT_host_triple_EQ, sys::getDefaultTargetTriple()));
258+
if (Triple.isOSWindows())
259+
return createStringError(
260+
inconvertibleErrorCode(),
261+
"Relocatable linking is not supported on COFF targets");
262+
263+
Expected<std::string> ObjcopyPath =
264+
findProgram("llvm-objcopy", {getMainExecutable("llvm-objcopy")});
265+
if (!ObjcopyPath)
266+
return ObjcopyPath.takeError();
267+
268+
// Use the linker output file to get a unique hash. This creates a unique
269+
// identifier to rename the sections to that is deterministic to the contents.
270+
auto BufferOrErr = DryRun ? MemoryBuffer::getMemBuffer("")
271+
: MemoryBuffer::getFileOrSTDIN(Output);
272+
if (!BufferOrErr)
273+
return createStringError(inconvertibleErrorCode(), "Failed to open %s",
274+
Output.str().c_str());
275+
std::string Suffix = "_" + getHash((*BufferOrErr)->getBuffer());
276+
277+
SmallVector<StringRef> ObjcopyArgs = {
278+
*ObjcopyPath,
279+
Output,
280+
};
281+
282+
// Remove the old .llvm.offloading section to prevent further linking.
283+
ObjcopyArgs.emplace_back("--remove-section");
284+
ObjcopyArgs.emplace_back(".llvm.offloading");
285+
for (StringRef Prefix : {"omp", "cuda", "hip"}) {
286+
auto Section = (Prefix + "_offloading_entries").str();
287+
// Rename the offloading entires to make them private to this link unit.
288+
ObjcopyArgs.emplace_back("--rename-section");
289+
ObjcopyArgs.emplace_back(
290+
Args.MakeArgString(Section + "=" + Section + Suffix));
291+
292+
// Rename the __start_ / __stop_ symbols appropriately to iterate over the
293+
// newly renamed section containing the offloading entries.
294+
ObjcopyArgs.emplace_back("--redefine-sym");
295+
ObjcopyArgs.emplace_back(Args.MakeArgString("__start_" + Section + "=" +
296+
"__start_" + Section + Suffix));
297+
ObjcopyArgs.emplace_back("--redefine-sym");
298+
ObjcopyArgs.emplace_back(Args.MakeArgString("__stop_" + Section + "=" +
299+
"__stop_" + Section + Suffix));
300+
}
301+
302+
if (Error Err = executeCommands(*ObjcopyPath, ObjcopyArgs))
303+
return Err;
304+
305+
return Error::success();
306+
}
307+
244308
/// Runs the wrapped linker job with the newly created input.
245309
Error runLinker(ArrayRef<StringRef> Files, const ArgList &Args) {
246310
llvm::TimeTraceScope TimeScope("Execute host linker");
@@ -265,6 +329,11 @@ Error runLinker(ArrayRef<StringRef> Files, const ArgList &Args) {
265329
LinkerArgs.push_back(Arg);
266330
if (Error Err = executeCommands(LinkerPath, LinkerArgs))
267331
return Err;
332+
333+
if (Args.hasArg(OPT_relocatable))
334+
if (Error Err = relocateOffloadSection(Args, ExecutableName))
335+
return Err;
336+
268337
return Error::success();
269338
}
270339

@@ -910,7 +979,8 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
910979
case OFK_OpenMP:
911980
if (Error Err = offloading::wrapOpenMPBinaries(
912981
M, BuffersToWrap,
913-
offloading::getOffloadEntryArray(M, "omp_offloading_entries")))
982+
offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
983+
/*Suffix=*/"", /*Relocatable=*/Args.hasArg(OPT_relocatable)))
914984
return std::move(Err);
915985
break;
916986
case OFK_Cuda:
@@ -1356,12 +1426,6 @@ Expected<SmallVector<SmallVector<OffloadFile>>>
13561426
getDeviceInput(const ArgList &Args) {
13571427
llvm::TimeTraceScope TimeScope("ExtractDeviceCode");
13581428

1359-
// If the user is requesting a reloctable link we ignore the device code. The
1360-
// actual linker will merge the embedded device code sections so they can be
1361-
// linked when the executable is finally created.
1362-
if (Args.hasArg(OPT_relocatable))
1363-
return SmallVector<SmallVector<OffloadFile>>{};
1364-
13651429
StringRef Root = Args.getLastArgValue(OPT_sysroot_EQ);
13661430
SmallVector<StringRef> LibraryPaths;
13671431
for (const opt::Arg *Arg : Args.filtered(OPT_library_path, OPT_libpath))

llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h

+3-1
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,12 @@ using EntryArrayTy = std::pair<GlobalVariable *, GlobalVariable *>;
2020
/// \param EntryArray Optional pair pointing to the `__start` and `__stop`
2121
/// symbols holding the `__tgt_offload_entry` array.
2222
/// \param Suffix An optional suffix appended to the emitted symbols.
23+
/// \param Relocatable Indicate if we need to change the offloading section.
2324
llvm::Error wrapOpenMPBinaries(llvm::Module &M,
2425
llvm::ArrayRef<llvm::ArrayRef<char>> Images,
2526
EntryArrayTy EntryArray,
26-
llvm::StringRef Suffix = "");
27+
llvm::StringRef Suffix = "",
28+
bool Relocatable = false);
2729

2830
/// Wraps the input fatbinary image into the module \p M as global symbols and
2931
/// registers the images with the CUDA runtime.

llvm/lib/Frontend/Offloading/OffloadWrapper.cpp

+7-4
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,8 @@ PointerType *getBinDescPtrTy(Module &M) {
112112
///
113113
/// Global variable that represents BinDesc is returned.
114114
GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs,
115-
EntryArrayTy EntryArray, StringRef Suffix) {
115+
EntryArrayTy EntryArray, StringRef Suffix,
116+
bool Relocatable) {
116117
LLVMContext &C = M.getContext();
117118
auto [EntriesB, EntriesE] = EntryArray;
118119

@@ -129,7 +130,8 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs,
129130
GlobalVariable::InternalLinkage, Data,
130131
".omp_offloading.device_image" + Suffix);
131132
Image->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
132-
Image->setSection(".llvm.offloading");
133+
Image->setSection(Relocatable ? ".llvm.offloading.relocatable"
134+
: ".llvm.offloading");
133135
Image->setAlignment(Align(object::OffloadBinary::getAlignment()));
134136

135137
StringRef Binary(Buf.data(), Buf.size());
@@ -582,8 +584,9 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
582584

583585
Error offloading::wrapOpenMPBinaries(Module &M, ArrayRef<ArrayRef<char>> Images,
584586
EntryArrayTy EntryArray,
585-
llvm::StringRef Suffix) {
586-
GlobalVariable *Desc = createBinDesc(M, Images, EntryArray, Suffix);
587+
llvm::StringRef Suffix, bool Relocatable) {
588+
GlobalVariable *Desc =
589+
createBinDesc(M, Images, EntryArray, Suffix, Relocatable);
587590
if (!Desc)
588591
return createStringError(inconvertibleErrorCode(),
589592
"No binary descriptors created.");

llvm/lib/Object/OffloadBinary.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ Error extractFromObject(const ObjectFile &Obj,
8383
if (!NameOrErr)
8484
return NameOrErr.takeError();
8585

86-
if (!NameOrErr->equals(".llvm.offloading"))
86+
if (!NameOrErr->starts_with(".llvm.offloading"))
8787
continue;
8888
}
8989

0 commit comments

Comments
 (0)