Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared object support #15

Merged
merged 83 commits into from
Jul 19, 2023
Merged

Shared object support #15

merged 83 commits into from
Jul 19, 2023

Conversation

n-eiling
Copy link
Member

@n-eiling n-eiling commented Feb 17, 2023

Adds support for launching kernels from shared objects loaded during runtime using dlopen. As this is how pytorch uses CUDA, this should enable pytorch support to Cricket.
This involved adding support for decoding the fatbinary metadata before the embedded cubin ELF in binaries compiled by nvcc. Cricket becomes able to extract the cubin from a binary, send it via RPC to the server, where it will be executed using the driver APIs cuModuleLoadData.

  • decode fatbinary
  • extract cubin
  • send cubin to server
  • add registry for tranferred cubins and kernel functions so Cricket is able to identify them when launching kernels
  • switch over old kernel launching functionality to always use the new registry instead of relying on kernel locations being the same on client and server
  • use libelf to read kernel infos instead of relying on cuobjdump which does not support in-memory ELFs
  • read parameter infos using libelf.
  • enable reading CUDA elfs with debugging infos and compressed elfs
  • Test with minimal pytorch (deactivated some features, no kernel compression)
  • Test with default pytorch (no kernel compression)
  • Fix compression and test with default pytorch (with kernel compression)
  • fix CI
  • large-scale test / verification
    • YOLOv5
  • cuDNN implementation

This also makes LD_PRELOADing on the server side not necessary anymore, as we now also extract and send cubins for normal applications.

This is work in progress. Addresses #6

@n-eiling n-eiling added enhancement New feature or request doing labels Feb 17, 2023
@n-eiling n-eiling self-assigned this Feb 17, 2023
@n-eiling n-eiling changed the title WIP: Share object support WIP: Shared object support Feb 17, 2023
… able to identify them when launching kernels

Signed-off-by: Niklas Eiling <[email protected]>
@nravic
Copy link

nravic commented Feb 20, 2023

Just saw this! Thanks for taking it on haha, was about to start this weekend. I'd love to help with this effort, let me know if there's anything I can do.

@jin-zhengnan
Copy link

jin-zhengnan commented Mar 23, 2023

@n-eiling When will the entire test be completed? I am looking forward to this!

@n-eiling
Copy link
Member Author

I updated my todo list. There are still some open issues that need adressing. CUDA relies on the .nv.info section for information regarding kernel parameter sizes and offsets. I used to parse them using cuobjdump, but this doesn't support in-memory ELFs - only files.

@wangtianxia-sjtu
Copy link

oh really? hm I haven't looked into nvrtc yet. Currently I am trying to get YOLOv5 working. Do you know some other useful pytorch examples I could use?

Thanks for the reply. I just learn from a paper from Microsoft where they claim that they intercept nvrtcCompileProgram call to extract kernel info.

I do not have any examples which will actually use nvrtc APIs. But I have found that in some versions of Pytorch, it will generate some kernels with nvrtc when you run import torch. And we can find in pytorch source that it will use nvrtc.

@n-eiling n-eiling force-pushed the share-object-support branch 5 times, most recently from 78444ec to 36498a8 Compare July 18, 2023 15:01
@n-eiling n-eiling marked this pull request as ready for review July 19, 2023 07:33
@n-eiling
Copy link
Member Author

I will merge this because the branch has diverged quite a bit and the original PR feature is working well. For pytorch, I still have some issues with the cudnnBackend API, which I will work on on a different branch.

@n-eiling n-eiling merged commit bcc5c93 into master Jul 19, 2023
@n-eiling n-eiling changed the title WIP: Shared object support Shared object support Jul 19, 2023
@mkroening mkroening deleted the share-object-support branch November 13, 2023 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doing enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants