You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`-fgpu-rdc` mode allows device functions call device functions
in different TU. However, currently all device objects
have to be linked together since only one fat binary
is supported. This is time consuming for AMDGPU backend
since it only supports LTO.
There are use cases that objects can be divided into groups
in which device functions are self-contained but host functions
are not. It is desirable to link/optimize/codegen the device
code and generate a fatbin for each group, whereas partially
link the host code with `ld -r` or generate a static library
by using the `-emit-static-lib` option of clang. This avoids
linking all device code together, therefore decreases
the linking time for `-fgpu-rdc`.
Previously, clang emits an external symbol `__hip_fatbin`
for all objects for `-fgpu-rdc`. With this patch, clang
emits an unique external symbol `__hip_fatbin_{cuid}`
for the fat binary for each object. When a group of objects
are linked together to generate a fatbin, the symbols
are merged by alias and point to the same fat binary.
Each group has its own fat binary. One executable or
shared library can have multiple fat binaries. Device
linking is done for undefined fab binary symbols only
to avoid repeated linking. `__hip_gpubin_handle` is also
uniquefied and merged to avoid repeated registering.
Symbol `__hip_cuid_{cuid}` is introduced to facilitate
debugging and tooling.
Fixes: #77018
0 commit comments