Shared object support #15

n-eiling · 2023-02-17T20:28:32Z

Adds support for launching kernels from shared objects loaded during runtime using dlopen. As this is how pytorch uses CUDA, this should enable pytorch support to Cricket.
This involved adding support for decoding the fatbinary metadata before the embedded cubin ELF in binaries compiled by nvcc. Cricket becomes able to extract the cubin from a binary, send it via RPC to the server, where it will be executed using the driver APIs cuModuleLoadData.

This also makes LD_PRELOADing on the server side not necessary anymore, as we now also extract and send cubins for normal applications.

This is work in progress. Addresses #6

Signed-off-by: Niklas Eiling <[email protected]>

…ean directory not always working Signed-off-by: Niklas Eiling <[email protected]>

Signed-off-by: Niklas Eiling <[email protected]>

…them at the server. Signed-off-by: Niklas Eiling <[email protected]>

Signed-off-by: Niklas Eiling <[email protected]>

…elf retrieved via RPC. Signed-off-by: Niklas Eiling <[email protected]>

… able to identify them when launching kernels Signed-off-by: Niklas Eiling <[email protected]>

nravic · 2023-02-20T16:42:48Z

Just saw this! Thanks for taking it on haha, was about to start this weekend. I'd love to help with this effort, let me know if there's anything I can do.

Signed-off-by: Niklas Eiling <[email protected]>

…the wrong value Signed-off-by: Niklas Eiling <[email protected]>

…o pipe Signed-off-by: Niklas Eiling <[email protected]>

Signed-off-by: Niklas Eiling <[email protected]>

…a NULL filename Signed-off-by: Niklas Eiling <[email protected]>

Signed-off-by: Niklas Eiling <[email protected]>

jin-zhengnan · 2023-03-23T05:43:53Z

@n-eiling When will the entire test be completed？ I am looking forward to this！

n-eiling · 2023-03-23T08:13:00Z

I updated my todo list. There are still some open issues that need adressing. CUDA relies on the .nv.info section for information regarding kernel parameter sizes and offsets. I used to parse them using cuobjdump, but this doesn't support in-memory ELFs - only files.

Signed-off-by: Niklas Eiling <[email protected]>

n-eiling · 2023-07-19T07:34:32Z

I will merge this because the branch has diverged quite a bit and the original PR feature is working well. For pytorch, I still have some issues with the cudnnBackend API, which I will work on on a different branch.

KangYingjie0 · 2024-12-11T02:11:21Z

padding = ((8 - (size_t)(input + input_read)) % 8); maybe change it padding = (8 - (size_t)(input + input_read)% 8); for this?@n-eiling and If the size is exactly divided by 8, do you still need to add padding?

Isn't this what the original code achieves? Your suggestion doesn't work when the difference is exactly divisible by 8, while the original code does not add padding in this case.

LOGE(LOG_ERROR, "cannot find kernel %s kernel_info_t") this log miss the parameter？@n-eiling， and I want to know，cuGetExportTable is work normal in project？I find you do a lot of work for this。

Thanks for catching the error! I fixed it. cuGetExportTable is part of the interface between runtime and driver API. I experimented a lot with getting the runtime API working while only implementing the driver API in Cricket. cuGetExportTable exchanges some pretty deep data structures between the APIs and I did not manage to figure out all the memory I need to copy. So it currently does not work correctly.

I tested the functionality of cuGetExportTable, it input char [16] by exportTableId, then output matched hidden function info (in libcuda.so) by ppExportTable. If ppExportTable is only a function pointer, the hidden function will be called by it. I'm not entirely sure, ppExportTable could also contain more complex information.

n-eiling added 7 commits February 15, 2023 14:16

add perf outputs to gitignore

66642d8

Signed-off-by: Niklas Eiling <[email protected]>

fix various errors in the Makefiles that lead to building on a non-cl…

403ec5f

…ean directory not always working Signed-off-by: Niklas Eiling <[email protected]>

add test program for cuda code loaded using libdl

0e13cbf

Signed-off-by: Niklas Eiling <[email protected]>

when the client dlopens libraries containing cuda kernels, also open …

cb391b3

…them at the server. Signed-off-by: Niklas Eiling <[email protected]>

add decoding of fatbinary data

905fefe

Signed-off-by: Niklas Eiling <[email protected]>

add decoding of embedded fatbinaries

0997d44

Signed-off-by: Niklas Eiling <[email protected]>

add temporary test code that launches a kernel on the server from an …

4ff4b5c

…elf retrieved via RPC. Signed-off-by: Niklas Eiling <[email protected]>

n-eiling added enhancement New feature or request doing labels Feb 17, 2023

n-eiling self-assigned this Feb 17, 2023

n-eiling changed the title ~~WIP: Share object support~~ WIP: Shared object support Feb 17, 2023

add registry for tranferred cubins and kernel functions so Cricket is…

e72c11c

… able to identify them when launching kernels Signed-off-by: Niklas Eiling <[email protected]>

n-eiling added 12 commits February 21, 2023 11:02

fix segfault on cleanup because CUDA accesses nonexisting fatcubinHandle

15eb759

Signed-off-by: Niklas Eiling <[email protected]>

code cleanup. fix wrong passing of dimensions

ff493f6

Signed-off-by: Niklas Eiling <[email protected]>

use an infinite timeout for kernel calls

98832e2

Signed-off-by: Niklas Eiling <[email protected]>

remove timeout for cudaDeviceSynchronize

6eeef6c

Signed-off-by: Niklas Eiling <[email protected]>

make cpu_utils_contains_kernel return the right value

74179ef

Signed-off-by: Niklas Eiling <[email protected]>

add cudaRegisterVar client function

36ead03

Signed-off-by: Niklas Eiling <[email protected]>

add gdb commands file for debugging client apps

babb70c

Signed-off-by: Niklas Eiling <[email protected]>

fix cpu_utils_contains_kernel and cpu_utils_parameter_info returning …

d1f6173

…the wrong value Signed-off-by: Niklas Eiling <[email protected]>

make cpu_utils_launch_child also redirect stderr of child processes t…

9cb6aaf

…o pipe Signed-off-by: Niklas Eiling <[email protected]>

reduce debugging output verbosity and add some NULL checks

ac36e85

Signed-off-by: Niklas Eiling <[email protected]>

make dlopen return a handle to the main program if it is called with …

07ed931

…a NULL filename Signed-off-by: Niklas Eiling <[email protected]>

fix ci error by making tests/cpu/cubin/main.cpp compile

e9b2c1c

Signed-off-by: Niklas Eiling <[email protected]>

n-eiling added 3 commits March 24, 2023 12:30

parse kernel parameter infos from in-memory elf using libbfd

dec25d1

Signed-off-by: Niklas Eiling <[email protected]>

fix cpu-server not using the new name of elf_symbol_address

09b34f6

Signed-off-by: Niklas Eiling <[email protected]>

add possibility to dump elfs

701d4bd

Signed-off-by: Niklas Eiling <[email protected]>

n-eiling added 12 commits June 21, 2023 10:41

add server side cudnn lrn implementations, fix some function names

26e19bd

Signed-off-by: Niklas Eiling <[email protected]>

add basic cuBLAS support

15fc3a2

Signed-off-by: Niklas Eiling <[email protected]>

implement cudnn tensor functions

762cada

Signed-off-by: Niklas Eiling <[email protected]>

implement three more cudnn tensor APIs

5d381a7

Signed-off-by: Niklas Eiling <[email protected]>

add cublas and cudnn functions to support mnistCUDNN sample

6da2f8d

Signed-off-by: Niklas Eiling <[email protected]>

fix faulty if statement when intercepting dlopen calls

8a911ca

Signed-off-by: Niklas Eiling <[email protected]>

improve logging for unloading of modules

122b721

Signed-off-by: Niklas Eiling <[email protected]>

improve docs/pytorch.md

e8813ea

Signed-off-by: Niklas Eiling <[email protected]>

improve cublas implementation, add cudnnBackend implementation

e5dbebf

Signed-off-by: Niklas Eiling <[email protected]>

improve debug output for cuModuleLoad

ce21d8a

Signed-off-by: Niklas Eiling <[email protected]>

add support for cuModuleLoadData

481dec9

Signed-off-by: Niklas Eiling <[email protected]>

cublas: remove usage of new APIs if we compile for CUDA 10

fbf7dad

Signed-off-by: Niklas Eiling <[email protected]>

n-eiling removed a link to an issue Jul 14, 2023

Can this project support pytorch、tensorflow? #6

Open

n-eiling added 3 commits July 17, 2023 14:37

fix using logger function before initialization

bf3a15e

Signed-off-by: Niklas Eiling <[email protected]>

fix no output on weird shells, e.g. ssh

f30d9b0

Signed-off-by: Niklas Eiling <[email protected]>

remove md5

07db2ba

Signed-off-by: Niklas Eiling <[email protected]>

n-eiling force-pushed the share-object-support branch 5 times, most recently from 78444ec to 36498a8 Compare July 18, 2023 15:01

remove cuda 10 support, add cudnn CI test

088b6fc

Signed-off-by: Niklas Eiling <[email protected]>

n-eiling force-pushed the share-object-support branch from 36498a8 to 088b6fc Compare July 18, 2023 15:07

n-eiling marked this pull request as ready for review July 19, 2023 07:33

n-eiling merged commit bcc5c93 into master Jul 19, 2023

n-eiling changed the title ~~WIP: Shared object support~~ Shared object support Jul 19, 2023

nravic mentioned this pull request Aug 8, 2023

[CED-13] Explore Cricket integration w/ Cedana cedana/cedana#62

Closed

mkroening deleted the share-object-support branch November 13, 2023 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared object support #15

Shared object support #15

n-eiling commented Feb 17, 2023 •

edited

Loading

nravic commented Feb 20, 2023

jin-zhengnan commented Mar 23, 2023 •

edited

Loading

n-eiling commented Mar 23, 2023

n-eiling commented Jul 19, 2023

KangYingjie0 commented Dec 11, 2024

Shared object support #15

Shared object support #15

Conversation

n-eiling commented Feb 17, 2023 • edited Loading

nravic commented Feb 20, 2023

jin-zhengnan commented Mar 23, 2023 • edited Loading

n-eiling commented Mar 23, 2023

n-eiling commented Jul 19, 2023

KangYingjie0 commented Dec 11, 2024

n-eiling commented Feb 17, 2023 •

edited

Loading

jin-zhengnan commented Mar 23, 2023 •

edited

Loading