Log disk I/O #135

lars-t-hansen · 2024-02-08T08:40:22Z

The use case here is jobs that are "unexpectedly slow", we want to know whether this is because they are I/O bound or are held up by slow I/O. For example, on interactive nodes (login nodes, Fox int* nodes, UiO ML nodes) memory can be oversubscribed and the system can be paging, or there can be a shared disk that is hammered and is holding up progress (the latter seems to be an issue on Saga login nodes, which are deadly slow but where very little computation actually happens).

As for #67, let's try to collect data if we can, and see if we can't surface it in some sensible way in Jobanalyzer.

Also see NAICNO/Jobanalyzer#399.

lars-t-hansen · 2024-02-13T11:44:20Z

If a job is not computing it's either descheduled or in I/O wait, but ideally we want to distinguish disk from tty from network, and really-ideally also distinguish the different interfaces or devices.

On an HPC node with 128 cores there can be many jobs running at the same time, and this is especially true of login and interactive nodes. So it's not quite enough to account for whole-system I/O wait (even if that might be better than nothing).

But all that said, there's no way to say objectively that "there's too much I/O wait" if a job has threads that can make progress while other threads are waiting. "Too much" is relative to an expectation. Even on a superfast disk there will be I/O wait.

One measure that might make sense is average wait (or better, time) per I/O operation. Then we remove sonar/Jobanalyzer from judging whether something is slow or fast, waiting or busy. Also, I/O count would be helpful. Of course, going down that path one could imagine a distribution of timings by count, but I don't expect the kernel keeps that around.

bast · 2024-02-21T17:47:30Z

But would sonar then make regular well-defined reads and writes and measure how long it takes?

lars-t-hansen · 2024-02-21T18:16:03Z

But would sonar then make regular well-defined reads and writes and measure how long it takes?

I've been looking at this but not commenting, apparently. It looks like waiting for disk writes is not a thing; they happen in the background. So (for disk) it's mostly about waiting for reads, and not just reads made explicitly but also page-ins from mapped executables, mapped files. I believe htop presents some data about this and the first order of business is to dig into that (documentation, code) to see if it leads anywhere.

lars-t-hansen · 2024-04-10T08:39:03Z

This recipe produces desired results on my Ubuntu 22 (Linux 6.5) laptop, but it does not work on a Saga login node (Linux 5.14), I get the "Avg" display but not the detailed breakdown. Given how old that post is, it's probably how the kernel is configured, not its version, that is the issue.

lars-t-hansen added the enhancement New feature or request label Feb 8, 2024

lars-t-hansen mentioned this issue Feb 8, 2024

Input/output logging and analysis NAICNO/Jobanalyzer#399

Open

7 tasks

lars-t-hansen added the important label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log disk I/O #135

Log disk I/O #135

lars-t-hansen commented Feb 8, 2024

lars-t-hansen commented Feb 13, 2024

bast commented Feb 21, 2024

lars-t-hansen commented Feb 21, 2024

lars-t-hansen commented Apr 10, 2024

Log disk I/O #135

Log disk I/O #135

Comments

lars-t-hansen commented Feb 8, 2024

lars-t-hansen commented Feb 13, 2024

bast commented Feb 21, 2024

lars-t-hansen commented Feb 21, 2024

lars-t-hansen commented Apr 10, 2024