You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that there is a separate GPU module we can expand it to include BPF interfaces. Use periodic timer in userspace to call the related nvml helpers to get the processes utilizing GPUs. Use the values to update a BPF map with the key being process id and value being a struct that contains the NUMA node id of the GPU and any other relevant metadata. Finally, create BPF helper functions for schedulers to query the BPF map.
The text was updated successfully, but these errors were encountered:
I've started a WIP branch if anyone is interested. @arighi is this a similar thing you were thinking of? It may be useful to have some BPF helpers for process monitoring as well.
@hodgesds yes! that's pretty much what I was planning to do, monitor (somehow) which PIDs that are using which GPU, store them in a BPF map and then consume this information from the BPF scheduler and try to keep tasks close to the GPU they're using.
I still don't know exactly how to populate the map, in particular how to auto-detect when a PID is using a GPU? It'd be nice if there was a sync way, like using a kprobe / tracepoint, instead of actively monitoring periodically, but I don't know much about GPUs, so I need to investigate more on this...
Now that there is a separate GPU module we can expand it to include BPF interfaces. Use periodic timer in userspace to call the related nvml helpers to get the processes utilizing GPUs. Use the values to update a BPF map with the key being process id and value being a struct that contains the NUMA node id of the GPU and any other relevant metadata. Finally, create BPF helper functions for schedulers to query the BPF map.
The text was updated successfully, but these errors were encountered: