-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[HIP] Allow partial linking for -fgpu-rdc
#81700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Yaxun (Sam) Liu (yxsamliu) Changes
There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with Previously, clang emits an external symbol Fixes: #77018 Full diff: https://github.com/llvm/llvm-project/pull/81700.diff 6 Files Affected:
diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 5b43272bfa62f4..7d23f944732dbf 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -760,10 +760,10 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
// to contain the fat binary but will be populated somewhere else,
// e.g. by lld through link script.
FatBinStr = new llvm::GlobalVariable(
- CGM.getModule(), CGM.Int8Ty,
- /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
- "__hip_fatbin", nullptr,
- llvm::GlobalVariable::NotThreadLocal);
+ CGM.getModule(), CGM.Int8Ty,
+ /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, nullptr,
+ "__hip_fatbin_" + CGM.getContext().getCUIDHash(), nullptr,
+ llvm::GlobalVariable::NotThreadLocal);
cast<llvm::GlobalVariable>(FatBinStr)->setSection(FatbinConstantName);
}
@@ -816,8 +816,8 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
// thread safety of the loaded program. Therefore we can assume sequential
// execution of constructor functions here.
if (IsHIP) {
- auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage :
- llvm::GlobalValue::LinkOnceAnyLinkage;
+ auto Linkage = CudaGpuBinary ? llvm::GlobalValue::InternalLinkage
+ : llvm::GlobalValue::ExternalLinkage;
llvm::BasicBlock *IfBlock =
llvm::BasicBlock::Create(Context, "if", ModuleCtorFunc);
llvm::BasicBlock *ExitBlock =
@@ -826,11 +826,11 @@ llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
// of HIP ABI.
GpuBinaryHandle = new llvm::GlobalVariable(
TheModule, PtrTy, /*isConstant=*/false, Linkage,
- /*Initializer=*/llvm::ConstantPointerNull::get(PtrTy),
- "__hip_gpubin_handle");
- if (Linkage == llvm::GlobalValue::LinkOnceAnyLinkage)
- GpuBinaryHandle->setComdat(
- CGM.getModule().getOrInsertComdat(GpuBinaryHandle->getName()));
+ /*Initializer=*/
+ CudaGpuBinary ? llvm::ConstantPointerNull::get(PtrTy) : nullptr,
+ CudaGpuBinary
+ ? "__hip_gpubin_handle"
+ : "__hip_gpubin_handle_" + CGM.getContext().getCUIDHash());
GpuBinaryHandle->setAlignment(CGM.getPointerAlign().getAsAlign());
// Prevent the weak symbol in different shared libraries being merged.
if (Linkage != llvm::GlobalValue::InternalLinkage)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index c984260b082cd1..218066bced6c19 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -919,7 +919,15 @@ void CodeGenModule::Release() {
llvm::ConstantArray::get(ATy, UsedArray), "__clang_gpu_used_external");
addCompilerUsedGlobal(GV);
}
-
+ if (LangOpts.HIP) {
+ // Emit a unique ID so that host and device binaries from the same
+ // compilation unit can be associated.
+ auto *GV = new llvm::GlobalVariable(
+ getModule(), Int8Ty, false, llvm::GlobalValue::ExternalLinkage,
+ llvm::Constant::getNullValue(Int8Ty),
+ "__hip_cuid_" + getContext().getCUIDHash());
+ addCompilerUsedGlobal(GV);
+ }
emitLLVMUsed();
if (SanStats)
SanStats->finish();
diff --git a/clang/lib/Driver/ToolChains/HIPUtility.cpp b/clang/lib/Driver/ToolChains/HIPUtility.cpp
index f692458b775de2..4bd6926ec6e463 100644
--- a/clang/lib/Driver/ToolChains/HIPUtility.cpp
+++ b/clang/lib/Driver/ToolChains/HIPUtility.cpp
@@ -9,13 +9,24 @@
#include "HIPUtility.h"
#include "CommonArgs.h"
#include "clang/Driver/Compilation.h"
+#include "clang/Driver/Options.h"
+#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/Object/Archive.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/Path.h"
+#include "llvm/Support/raw_ostream.h"
#include "llvm/TargetParser/Triple.h"
+#include <deque>
+#include <set>
+using namespace clang;
using namespace clang::driver;
using namespace clang::driver::tools;
using namespace llvm::opt;
+using llvm::dyn_cast;
#if defined(_WIN32) || defined(_WIN64)
#define NULL_FILE "nul"
@@ -36,6 +47,146 @@ static std::string normalizeForBundler(const llvm::Triple &T,
: T.normalize();
}
+// Collect undefined __hip_fatbin* and __hip_gpubin_handle* symbols from all
+// input object or archive files.
+class HIPUndefinedFatBinSymbols {
+public:
+ HIPUndefinedFatBinSymbols(const Compilation &C)
+ : C(C), DiagID(C.getDriver().getDiags().getCustomDiagID(
+ DiagnosticsEngine::Error,
+ "Error collecting HIP undefined fatbin symbols: %0")),
+ Quiet(C.getArgs().hasArg(options::OPT__HASH_HASH_HASH)),
+ Verbose(C.getArgs().hasArg(options::OPT_v)){
+ populateSymbols();
+ if (Verbose) {
+ for (auto Name : FatBinSymbols)
+ llvm::errs() << "Found undefined HIP fatbin symbol: " << Name << "\n";
+ for (auto Name : GPUBinHandleSymbols)
+ llvm::errs() << "Found undefined HIP gpubin handle symbol: " << Name
+ << "\n";
+ }
+ }
+
+ const std::set<std::string> &getFatBinSymbols() const {
+ return FatBinSymbols;
+ }
+
+ const std::set<std::string> &getGPUBinHandleSymbols() const {
+ return GPUBinHandleSymbols;
+ }
+
+private:
+ const Compilation &C;
+ unsigned DiagID;
+ bool Quiet;
+ bool Verbose;
+ std::set<std::string> FatBinSymbols;
+ std::set<std::string> GPUBinHandleSymbols;
+ const std::string FatBinPrefix = "__hip_fatbin";
+ const std::string GPUBinHandlePrefix = "__hip_gpubin_handle";
+
+ void populateSymbols() {
+ std::deque<const Action *> WorkList;
+ std::set<const Action *> Visited;
+
+ for (const auto &Action : C.getActions()) {
+ WorkList.push_back(Action);
+ }
+
+ while (!WorkList.empty()) {
+ const Action *CurrentAction = WorkList.front();
+ WorkList.pop_front();
+
+ if (!CurrentAction || !Visited.insert(CurrentAction).second)
+ continue;
+
+ if (const auto *IA = dyn_cast<InputAction>(CurrentAction)) {
+ std::string ID = IA->getId().str();
+ if (!ID.empty()) {
+ ID = llvm::utohexstr(llvm::MD5Hash(ID), /*LowerCase=*/true);
+ FatBinSymbols.insert(Twine(FatBinPrefix + "_" + ID).str());
+ GPUBinHandleSymbols.insert(
+ Twine(GPUBinHandlePrefix + "_" + ID).str());
+ continue;
+ }
+ const char *Filename = IA->getInputArg().getValue();
+ auto BufferOrErr = llvm::MemoryBuffer::getFile(Filename);
+ // Input action could be options to linker, therefore ignore it
+ // if cannot read it.
+ if (!BufferOrErr)
+ continue;
+
+ processInput(BufferOrErr.get()->getMemBufferRef());
+ } else
+ WorkList.insert(WorkList.end(), CurrentAction->getInputs().begin(),
+ CurrentAction->getInputs().end());
+ }
+ }
+
+ void processInput(const llvm::MemoryBufferRef &Buffer) {
+ // Try processing as object file first.
+ auto ObjFileOrErr = llvm::object::ObjectFile::createObjectFile(Buffer);
+ if (ObjFileOrErr) {
+ processSymbols(**ObjFileOrErr);
+ return;
+ }
+
+ // Then try processing as archive files.
+ llvm::consumeError(ObjFileOrErr.takeError());
+ auto ArchiveOrErr = llvm::object::Archive::create(Buffer);
+ if (ArchiveOrErr) {
+ llvm::Error Err = llvm::Error::success();
+ llvm::object::Archive &Archive = *ArchiveOrErr.get();
+ for (auto &Child : Archive.children(Err)) {
+ auto ChildBufOrErr = Child.getMemoryBufferRef();
+ if (ChildBufOrErr)
+ processInput(*ChildBufOrErr);
+ else
+ errorHandler(ChildBufOrErr.takeError());
+ }
+
+ if (Err)
+ errorHandler(std::move(Err));
+ return;
+ }
+
+ // Ignore other files.
+ llvm::consumeError(ArchiveOrErr.takeError());
+ }
+ void processSymbols(const llvm::object::ObjectFile &Obj) {
+ for (const auto &Symbol : Obj.symbols()) {
+ auto FlagOrErr = Symbol.getFlags();
+ if (!FlagOrErr) {
+ errorHandler(FlagOrErr.takeError());
+ continue;
+ }
+
+ // Filter only undefined symbols
+ if (!(FlagOrErr.get() & llvm::object::SymbolRef::SF_Undefined)) {
+ continue;
+ }
+
+ auto NameOrErr = Symbol.getName();
+ if (!NameOrErr) {
+ errorHandler(NameOrErr.takeError());
+ continue;
+ }
+ llvm::StringRef Name = *NameOrErr;
+
+ if (Name.starts_with(FatBinPrefix))
+ FatBinSymbols.insert(Name.str());
+ else if (Name.starts_with(GPUBinHandlePrefix))
+ GPUBinHandleSymbols.insert(Name.str());
+ }
+ }
+
+ void errorHandler(llvm::Error Err) {
+ if (Quiet)
+ return;
+ C.getDriver().Diag(DiagID) << llvm::toString(std::move(Err));
+ }
+};
+
// Construct a clang-offload-bundler command to bundle code objects for
// different devices into a HIP fat binary.
void HIP::constructHIPFatbinCommand(Compilation &C, const JobAction &JA,
@@ -130,26 +281,91 @@ void HIP::constructGenerateObjFileFromHIPFatBinary(
auto HostTriple =
C.getSingleOffloadToolChain<Action::OFK_Host>()->getTriple();
+ HIPUndefinedFatBinSymbols Symbols(C);
+
+ std::string PrimaryHipFatbinSymbol;
+ std::string
+ PrimaryGpuBinHandleSymbol;
+ bool FoundPrimaryHipFatbinSymbol =
+ false;
+ bool FoundPrimaryGpuBinHandleSymbol =
+ false;
+
+ std::vector<std::string>
+ AliasHipFatbinSymbols;
+ std::vector<std::string>
+ AliasGpuBinHandleSymbols;
+
+ // Iterate through symbols to find the primary ones and collect others for
+ // aliasing
+ for (const auto &Symbol : Symbols.getFatBinSymbols()) {
+ if (!FoundPrimaryHipFatbinSymbol) {
+ PrimaryHipFatbinSymbol = Symbol;
+ FoundPrimaryHipFatbinSymbol = true;
+ } else {
+ AliasHipFatbinSymbols.push_back(Symbol);
+ }
+ }
+
+ for (const auto &Symbol : Symbols.getGPUBinHandleSymbols()) {
+ if (!FoundPrimaryGpuBinHandleSymbol) {
+ PrimaryGpuBinHandleSymbol = Symbol;
+ FoundPrimaryGpuBinHandleSymbol = true;
+ } else {
+ AliasGpuBinHandleSymbols.push_back(Symbol);
+ }
+ }
+
// Add MC directives to embed target binaries. We ensure that each
// section and image is 16-byte aligned. This is not mandatory, but
// increases the likelihood of data to be aligned with a cache block
// in several main host machines.
ObjStream << "# HIP Object Generator\n";
ObjStream << "# *** Automatically generated by Clang ***\n";
- if (HostTriple.isWindowsMSVCEnvironment()) {
- ObjStream << " .section .hip_fatbin, \"dw\"\n";
- } else {
- ObjStream << " .protected __hip_fatbin\n";
- ObjStream << " .type __hip_fatbin,@object\n";
- ObjStream << " .section .hip_fatbin,\"a\",@progbits\n";
+ if (FoundPrimaryGpuBinHandleSymbol) {
+ // Define the first gpubin handle symbol
+ if (HostTriple.isWindowsMSVCEnvironment()) {
+ ObjStream << " .section .hip_gpubin_handle,\"dw\"\n";
+ } else {
+ ObjStream << " .protected " << PrimaryGpuBinHandleSymbol << "\n";
+ ObjStream << " .type " << PrimaryGpuBinHandleSymbol << ",@object\n";
+ ObjStream << " .section .hip_gpubin_handle,\"aw\"\n";
+ }
+ ObjStream << " .globl " << PrimaryGpuBinHandleSymbol << "\n";
+ ObjStream << " .p2align 3\n"; // Align 8
+ ObjStream << PrimaryGpuBinHandleSymbol << ":\n";
+ ObjStream << " .zero 8\n"; // Size 8
+
+ // Generate alias directives for other gpubin handle symbols
+ for (const auto &AliasSymbol : AliasGpuBinHandleSymbols) {
+ ObjStream << " .globl " << AliasSymbol << "\n";
+ ObjStream << " .set " << AliasSymbol << "," << PrimaryGpuBinHandleSymbol
+ << "\n";
+ }
+ }
+ if (FoundPrimaryHipFatbinSymbol) {
+ // Define the first fatbin symbol
+ if (HostTriple.isWindowsMSVCEnvironment()) {
+ ObjStream << " .section .hip_fatbin,\"dw\"\n";
+ } else {
+ ObjStream << " .protected " << PrimaryHipFatbinSymbol << "\n";
+ ObjStream << " .type " << PrimaryHipFatbinSymbol << ",@object\n";
+ ObjStream << " .section .hip_fatbin,\"a\",@progbits\n";
+ }
+ ObjStream << " .globl " << PrimaryHipFatbinSymbol << "\n";
+ ObjStream << " .p2align " << llvm::Log2(llvm::Align(HIPCodeObjectAlign))
+ << "\n";
+ // Generate alias directives for other fatbin symbols
+ for (const auto &AliasSymbol : AliasHipFatbinSymbols) {
+ ObjStream << " .globl " << AliasSymbol << "\n";
+ ObjStream << " .set " << AliasSymbol << "," << PrimaryHipFatbinSymbol
+ << "\n";
+ }
+ ObjStream << PrimaryHipFatbinSymbol << ":\n";
+ ObjStream << " .incbin ";
+ llvm::sys::printArg(ObjStream, BundleFile, /*Quote=*/true);
+ ObjStream << "\n";
}
- ObjStream << " .globl __hip_fatbin\n";
- ObjStream << " .p2align " << llvm::Log2(llvm::Align(HIPCodeObjectAlign))
- << "\n";
- ObjStream << "__hip_fatbin:\n";
- ObjStream << " .incbin ";
- llvm::sys::printArg(ObjStream, BundleFile, /*Quote=*/true);
- ObjStream << "\n";
if (HostTriple.isOSLinux() && HostTriple.isOSBinFormatELF())
ObjStream << " .section .note.GNU-stack, \"\", @progbits\n";
ObjStream.flush();
diff --git a/clang/test/CodeGenCUDA/device-stub.cu b/clang/test/CodeGenCUDA/device-stub.cu
index d7a7b1bb9fe956..60304647bd4c54 100644
--- a/clang/test/CodeGenCUDA/device-stub.cu
+++ b/clang/test/CodeGenCUDA/device-stub.cu
@@ -50,21 +50,19 @@
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
// RUN: -fgpu-rdc -fcuda-include-gpubinary %t -o - -x hip \
// RUN: | FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=ALL,LNX,RDC,HIP,HIPEF
-// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - -x hip\
+// RUN: %clang_cc1 -cuid=123 -triple x86_64-linux-gnu -emit-llvm %s -o - -x hip\
// RUN: | FileCheck -allow-deprecated-dag-overlap %s -check-prefixes=ALL,LNX,NORDC,HIP,HIPNEF
// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -aux-triple amdgcn -emit-llvm %s \
// RUN: -fcuda-include-gpubinary %t -o - -x hip\
// RUN: | FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=ALL,WIN
-// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -aux-triple amdgcn -emit-llvm %s \
+// RUN: %clang_cc1 -cuid=123 -triple x86_64-pc-windows-msvc -aux-triple amdgcn -emit-llvm %s \
// RUN: -o - -x hip\
// RUN: | FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=ALL,WIN,HIP,HIPNEF
#include "Inputs/cuda.h"
-// HIPNEF: $__hip_gpubin_handle = comdat any
-
#ifndef NOGLOBALS
// NORDC-DAG: @device_var = internal global i32
// RDC-DAG: @device_var = global i32
@@ -161,7 +159,7 @@ __device__ void device_use() {
// * constant unnamed string with GPU binary
// CUDA: @[[FATBIN:.*]] = private constant{{.*}} c"GPU binary would be here.",
// HIPEF: @[[FATBIN:.*]] = private constant{{.*}} c"GPU binary would be here.",{{.*}}align 4096
-// HIPNEF: @[[FATBIN:__hip_fatbin]] = external constant i8, section ".hip_fatbin"
+// HIPNEF: @[[FATBIN:__hip_fatbin_[0-9a-f]+]] = external constant i8, section ".hip_fatbin"
// CUDANORDC-SAME: section ".nv_fatbin", align 8
// CUDARDC-SAME: section "__nv_relfatbin", align 8
// * constant struct that wraps GPU binary
@@ -177,7 +175,7 @@ __device__ void device_use() {
// HIP-SAME: section ".hipFatBinSegment"
// * variable to save GPU binary handle after initialization
// CUDANORDC: @__[[PREFIX]]_gpubin_handle = internal global ptr null
-// HIPNEF: @__[[PREFIX]]_gpubin_handle = linkonce hidden global ptr null
+// HIPNEF: @__[[PREFIX]]_gpubin_handle_{{[0-9a-f]+}} = external hidden global ptr, align 8
// * constant unnamed string with NVModuleID
// CUDARDC: [[MODULE_ID_GLOBAL:@.*]] = private constant
// CUDARDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32
diff --git a/clang/test/CodeGenCUDA/host-used-device-var.cu b/clang/test/CodeGenCUDA/host-used-device-var.cu
index 7cb31aff84264e..5328660c9dc9df 100644
--- a/clang/test/CodeGenCUDA/host-used-device-var.cu
+++ b/clang/test/CodeGenCUDA/host-used-device-var.cu
@@ -1,9 +1,9 @@
// REQUIRES: amdgpu-registered-target
// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -x hip %s \
// RUN: -std=c++17 -O3 -mllvm -amdgpu-internalize-symbols -emit-llvm -o - \
-// RUN: | FileCheck -check-prefix=DEV %s
+// RUN: -cuid=123 | FileCheck -check-prefix=DEV %s
// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -x hip %s \
-// RUN: -std=c++17 -O3 -emit-llvm -o - | FileCheck -check-prefix=HOST %s
+// RUN: -std=c++17 -O3 -emit-llvm -o - -cuid=123 | FileCheck -check-prefix=HOST %s
// Negative tests.
@@ -187,6 +187,7 @@ public:
// DEV-SAME: {{^[^@]*}} @_ZL2u3
// DEV-SAME: {{^[^@]*}} @_ZZ4fun1vE11static_var1
// DEV-SAME: {{^[^@]*}} @_ZZZN21TestStaticVarInLambda3funEvENKUlPcE_clES0_E4var2
+// DEV-SAME: {{^[^@]*}} @__hip_cuid_{{[0-9a-f]+}}
// DEV-SAME: {{^[^@]*}} @constexpr_var2b
// DEV-SAME: {{^[^@]*}} @inline_var
// DEV-SAME: {{^[^@]*}} @u1
diff --git a/clang/test/Driver/hip-toolchain-rdc.hip b/clang/test/Driver/hip-toolchain-rdc.hip
index 1827531f9cab7a..d19d8ccd6cb29e 100644
--- a/clang/test/Driver/hip-toolchain-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-rdc.hip
@@ -1,7 +1,7 @@
// REQUIRES: x86-registered-target
// REQUIRES: amdgpu-registered-target
-// RUN: %clang -### --target=x86_64-linux-gnu \
+// RUN: %clang -### --target=x86_64-linux-gnu -v \
// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
// RUN: --hip-device-lib=lib1.bc --hip-device-lib=lib2.bc \
// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \
@@ -12,7 +12,7 @@
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,LNX %s
-// RUN: %clang -### --target=x86_64-pc-windows-msvc \
+// RUN: %clang -### --target=x86_64-pc-windows-msvc -v \
// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
// RUN: --hip-device-lib=lib1.bc --hip-device-lib=lib2.bc \
// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \
@@ -23,15 +23,31 @@
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,MSVC %s
-// check code object alignment in dumped llvm-mc input
-// LNX: .protected __hip_fatbin
-// LNX: .type __hip_fatbin,@object
-// LNX: .section .hip_fatbin,"a",@progbits
-// MSVC: .section .hip_fatbin, "dw"
-// CHECK: .globl __hip_fatbin
-// CHECK: .p2align 12
-// CHECK: __hip_fatbin:
-// CHECK: .incbin "[[BUNDLE:.*hipfb]]"
+// check HIP fatbin and gpubin handle symbols and code object alignment in dumped llvm-mc input
+// CHECK: Found undefined HIP fatbin symbol: __hip_fatbin_[[ID1:[0-9a-f]+]]
+// CHECK: Found undefined HIP fatbin symbol: __hip_fatbin_[[ID2:[0-9a-f]+]]
+// CHECK: Found undefined HIP gpubin handle symbol: __hip_gpubin_handle_[[ID1]]
+// CHECK: Found undefined HIP gpubin handle symbol: __hip_gpubin_handle_[[ID2]]
+// LNX: .protected __hip_gpubin_handle_[[ID1]]
+// LNX: .type __hip_gpubin_handle_[[ID1]]
+// LNX-LABEL: .section .hip_gpubin_handle,"aw"
+// MSVC-LABEL: .section .hip_gpubin_handle,"dw"
+// CHECK: .globl __hip_gpubin_handle_[[ID1]]
+// CHECK-NEXT: .p2align 3
+// CHECK-NEXT:__hip_gpubin_handle_[[ID1]]:
+// CHECK-NEXT: .zero 8
+// CHECK-NEXT: .globl __hip_gpubin_handle_[[ID2]]
+// CHECK-NEXT: .set __hip_gpubin_handle_[[ID2]],__hip_gpubin_handle_[[ID1]]
+// LNX: .protected __hip_fatbin_[[ID1]]
+// LNX: .type __hip_fatbin_[[ID1]],@object
+// LNX-LABEL: .section .hip_fatbin,"a",@progbits
+// MSVC-LABEL: .section .hip_fatbin,"dw"
+// CHECK: .globl __hip_fatbin_[[ID1]]
+// CHECK-NEXT: .p2align 12
+// CHECK-NEXT: .globl __hip_fatbin_[[ID2]]
+// CHECK-NEXT: .set __hip_fatbin_[[ID2]],__hip_fatbin_[[ID1]]
+// CHECK-NEXT: __hip_fatbin_[[ID1]]:
+// CHECK-NEXT: .incbin "[[BUNDLE:.*hipfb]]"
// LNX: .section .note.GNU-stack, "", @progbits
// MSVC-NOT: .note.GNU-stack
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
3181170
to
6006975
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Please wait for @jhuber6's to double check the partial linking mechanics details.
// Input action could be options to linker, therefore ignore it | ||
// if cannot read it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment could use some editing. therefore, ignore an error if we fail to read the file
.
This makes me ask -- what if the argument is an input file, and we do fail to read it. How do we tell apart the linker options from the input file? Relying on a failure to read it does not seem to be a good way to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the argument is an input file and it fails to be read, the error will be captured by the linker and error will be emitted there.
will fix the comment.
// Ignore other files. | ||
llvm::consumeError(ArchiveOrErr.takeError()); | ||
} | ||
void processSymbols(const llvm::object::ObjectFile &Obj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit -- add an empty line to separate functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
} | ||
|
||
// Filter only undefined symbols | ||
if (!(FlagOrErr.get() & llvm::object::SymbolRef::SF_Undefined)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: remove {}
around single-statement body.
Applies here and in a handful of other places throughout the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense overall, though it's very complicated. Generally we just need to make sure these things are private to one group of files. There's a lot more to parse here compared to the linker-wrapper
.
Do any of these tests check when called with -r
? I'm assuming that's the difference here when we decide whether or not to ghoup files by cuid.
6006975
to
d84645f
Compare
Added a test for |
`-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `-emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: llvm#77018
d84645f
to
1e9c29c
Compare
Recent compiler patch llvm/llvm-project#81700 introduced changes to certain hip external symbols clang emits. This patch makes the tests "16_assembly_to_executable" and "17_llvm_ir_to_executable" follow these changes. I expect comments on whether this is an unintended compiler behaviour that needs fix in compiler or this is the proper fix. Change-Id: I3943f5365600f683a65c98c6d17c14a5e4192351
`-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `--emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: llvm#77018 Change-Id: I0ebf263b742b554939e5b758e5ec761e00763738
-fgpu-rdc
mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO.There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with
ld -r
or generate a static library by using the-emit-static-lib
option of clang. This avoids linking all device code together, therefore decreases the linking time for-fgpu-rdc
.Previously, clang emits an external symbol
__hip_fatbin
for all objects for-fgpu-rdc
. With this patch, clang emits an unique external symbol__hip_fatbin_{cuid}
for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking.__hip_gpubin_handle
is also uniquefied and merged to avoid repeated registering. Symbol__hip_cuid_{cuid}
is introduced to facilitate debugging and tooling.Fixes: #77018