You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As scheduler behavior metrics can vary widely across different layers, system level metrics aren't that useful in understanding how the scheduler is behaving. Add more per-layer metrics including per-layer scheduling delay metric.
The text was updated successfully, but these errors were encountered:
hi @htejun, I'm interested in working on this. I'm trying to get familiar with the repo by helping with good "first" issues.
I'm assuming there is no existing API that provides the scheduling delay metric for a task, so it needs to be implemented at per-scheduler basis? It'll be the accumulated delay between enqueued and running, if I understand correctly.
Is there an example implementation in other schedulers that I can refer to? If not, are we interested in adding such metric for other schedulers as well (or we will just rely on kernel's overall scheduling delay metric)?
The information is useful for all schedulers but overall system metrics are easily observable with e.g. btftrace or bcc tools. How the metrics should be aggregated would depend on the specifc scheduler - e.g. scx_layered needs to collect the metrics per layer. scx_bpfland would probably want to aggregate depending on whether the task is classified interactive or not and so on. One altnerative approach could be coming up with a shared way of "tagging" tasks so that generic BPF tool can aggregate the numbers according to the tags.
We can start with baked-in impelmentation is each scheduler. I'd measure the durations whenever the task is runnable but not running - ie. ops.runnable() to ops.running() transition durations and ops.stopping() to the subsequent ops.running() transitions.
As scheduler behavior metrics can vary widely across different layers, system level metrics aren't that useful in understanding how the scheduler is behaving. Add more per-layer metrics including per-layer scheduling delay metric.
The text was updated successfully, but these errors were encountered: