Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topology: Add GPU bpf helpers #753

Open
hodgesds opened this issue Oct 7, 2024 · 2 comments
Open

topology: Add GPU bpf helpers #753

hodgesds opened this issue Oct 7, 2024 · 2 comments
Labels
bpf enhancement New feature or request help wanted Extra attention is needed rust Rust language

Comments

@hodgesds
Copy link
Contributor

hodgesds commented Oct 7, 2024

Now that there is a separate GPU module we can expand it to include BPF interfaces. Use periodic timer in userspace to call the related nvml helpers to get the processes utilizing GPUs. Use the values to update a BPF map with the key being process id and value being a struct that contains the NUMA node id of the GPU and any other relevant metadata. Finally, create BPF helper functions for schedulers to query the BPF map.

@hodgesds hodgesds added enhancement New feature or request help wanted Extra attention is needed rust Rust language bpf labels Oct 7, 2024
@hodgesds hodgesds assigned hodgesds and unassigned hodgesds Oct 9, 2024
@hodgesds
Copy link
Contributor Author

hodgesds commented Oct 9, 2024

I've started a WIP branch if anyone is interested. @arighi is this a similar thing you were thinking of? It may be useful to have some BPF helpers for process monitoring as well.

@arighi
Copy link
Contributor

arighi commented Oct 9, 2024

@hodgesds yes! that's pretty much what I was planning to do, monitor (somehow) which PIDs that are using which GPU, store them in a BPF map and then consume this information from the BPF scheduler and try to keep tasks close to the GPU they're using.

I still don't know exactly how to populate the map, in particular how to auto-detect when a PID is using a GPU? It'd be nice if there was a sync way, like using a kprobe / tracepoint, instead of actively monitoring periodically, but I don't know much about GPUs, so I need to investigate more on this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bpf enhancement New feature or request help wanted Extra attention is needed rust Rust language
Projects
None yet
Development

No branches or pull requests

2 participants