Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log disk I/O #135

Open
lars-t-hansen opened this issue Feb 8, 2024 · 4 comments
Open

Log disk I/O #135

lars-t-hansen opened this issue Feb 8, 2024 · 4 comments
Labels
enhancement New feature or request important

Comments

@lars-t-hansen
Copy link
Collaborator

The use case here is jobs that are "unexpectedly slow", we want to know whether this is because they are I/O bound or are held up by slow I/O. For example, on interactive nodes (login nodes, Fox int* nodes, UiO ML nodes) memory can be oversubscribed and the system can be paging, or there can be a shared disk that is hammered and is holding up progress (the latter seems to be an issue on Saga login nodes, which are deadly slow but where very little computation actually happens).

As for #67, let's try to collect data if we can, and see if we can't surface it in some sensible way in Jobanalyzer.

Also see NAICNO/Jobanalyzer#399.

@lars-t-hansen
Copy link
Collaborator Author

If a job is not computing it's either descheduled or in I/O wait, but ideally we want to distinguish disk from tty from network, and really-ideally also distinguish the different interfaces or devices.

On an HPC node with 128 cores there can be many jobs running at the same time, and this is especially true of login and interactive nodes. So it's not quite enough to account for whole-system I/O wait (even if that might be better than nothing).

But all that said, there's no way to say objectively that "there's too much I/O wait" if a job has threads that can make progress while other threads are waiting. "Too much" is relative to an expectation. Even on a superfast disk there will be I/O wait.

One measure that might make sense is average wait (or better, time) per I/O operation. Then we remove sonar/Jobanalyzer from judging whether something is slow or fast, waiting or busy. Also, I/O count would be helpful. Of course, going down that path one could imagine a distribution of timings by count, but I don't expect the kernel keeps that around.

@bast
Copy link
Member

bast commented Feb 21, 2024

But would sonar then make regular well-defined reads and writes and measure how long it takes?

@lars-t-hansen
Copy link
Collaborator Author

But would sonar then make regular well-defined reads and writes and measure how long it takes?

I've been looking at this but not commenting, apparently. It looks like waiting for disk writes is not a thing; they happen in the background. So (for disk) it's mostly about waiting for reads, and not just reads made explicitly but also page-ins from mapped executables, mapped files. I believe htop presents some data about this and the first order of business is to dig into that (documentation, code) to see if it leads anywhere.

@lars-t-hansen
Copy link
Collaborator Author

This recipe produces desired results on my Ubuntu 22 (Linux 6.5) laptop, but it does not work on a Saga login node (Linux 5.14), I get the "Avg" display but not the detailed breakdown. Given how old that post is, it's probably how the kernel is configured, not its version, that is the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request important
Projects
None yet
Development

No branches or pull requests

2 participants