topology: Add GPU bpf helpers #753

hodgesds · 2024-10-07T23:39:21Z

Now that there is a separate GPU module we can expand it to include BPF interfaces. Use periodic timer in userspace to call the related nvml helpers to get the processes utilizing GPUs. Use the values to update a BPF map with the key being process id and value being a struct that contains the NUMA node id of the GPU and any other relevant metadata. Finally, create BPF helper functions for schedulers to query the BPF map.

hodgesds · 2024-10-09T19:39:41Z

I've started a WIP branch if anyone is interested. @arighi is this a similar thing you were thinking of? It may be useful to have some BPF helpers for process monitoring as well.

arighi · 2024-10-09T20:44:34Z

@hodgesds yes! that's pretty much what I was planning to do, monitor (somehow) which PIDs that are using which GPU, store them in a BPF map and then consume this information from the BPF scheduler and try to keep tasks close to the GPU they're using.

I still don't know exactly how to populate the map, in particular how to auto-detect when a PID is using a GPU? It'd be nice if there was a sync way, like using a kprobe / tracepoint, instead of actively monitoring periodically, but I don't know much about GPUs, so I need to investigate more on this...

hodgesds added enhancement New feature or request help wanted Extra attention is needed rust Rust language bpf labels Oct 7, 2024

hodgesds assigned hodgesds and unassigned hodgesds Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topology: Add GPU bpf helpers #753

topology: Add GPU bpf helpers #753

hodgesds commented Oct 7, 2024

hodgesds commented Oct 9, 2024

arighi commented Oct 9, 2024

topology: Add GPU bpf helpers #753

topology: Add GPU bpf helpers #753

Comments

hodgesds commented Oct 7, 2024

hodgesds commented Oct 9, 2024

arighi commented Oct 9, 2024