Sonar is slow without --batchless because of how we get slurm job IDs #97

lars-t-hansen · 2023-08-21T09:19:15Z

Running a sonar release build just now on a lightly-loaded ML node (ml7, a beefy AMD system), it runs in 0.27s real time with --batchless and in 2.5s real time without --batchless (about 10x). The difference is even more stark on my development system (a slightly older Xeon tower): 0.03s vs 1.63s (about 50x).

I run with --exclude-users=root --exclude-system-jobs --rollup to keep the amount of output to a minimum, so that we can know it's not output generation that's the main problem.

Running perf on this it is clear that the problem is in get_slurm_job_id: every profiling hit in the first several pages of profiler output is in the pipeline that that function runs to get the job ID. We can probably do much better here (and we'll need to).

The text was updated successfully, but these errors were encountered:

bast · 2023-08-21T09:23:11Z

That sounds very slow. This is built with --release?

lars-t-hansen · 2023-08-21T09:32:06Z

Yes. The problem, I think, is that we moved process filtering much later in the pipeline, so this task is run much more often than it used to be. The two obvious approaches to fixing this is to not compute the job ID until we need it (but I don't know how helpful that will be) and to avoid the shell pipeline for what is after all a very simple job of getting some text from a file with what does not actually need to be a regular expression.

bast · 2023-08-21T09:35:07Z

OK. Indeed we need to fix this. It needs to run well below 1 second per poll. Ideally only milliseconds. I think we might need to do both: avoid shell pipeline and delay it as much as we can.

lars-t-hansen · 2023-08-21T09:36:34Z

I'll take a look after lunch, since technically I introduced this bug :-)

lars-t-hansen · 2023-08-21T10:06:12Z

Avoiding the pipeline brings the time of the slow version down to the time of the fast version, and this is on a system where the matching line is never found because there is no batch job system, so the entire cgroup file has to be read and parsed. I'm not sure how well I'll be able to test this locally (yet) but I'll look into that.

lars-t-hansen · 2023-08-21T10:58:59Z

Another factor that I don't know how to think about yet is that at least on the compute nodes on Fox, once I've found a slurm ID for one process, it could look like all the processes on the node that have a slurm ID have the same slurm ID. This would be a tricky invariant to rely on, and it's probably not low-hanging fruit. Let's see what the profile looks like after we've fixed #86, #87, and #88.

Fix #97

lars-t-hansen added enhancement New feature or request performance bug Something isn't working and removed enhancement New feature or request labels Aug 21, 2023

lars-t-hansen self-assigned this Aug 21, 2023

lars-t-hansen pushed a commit to lars-t-hansen/sonar that referenced this issue Aug 21, 2023

For NordicHPC#97: Avoid running a pipeline to get the slurm ID

7f9da2c

lars-t-hansen mentioned this issue Aug 21, 2023

Read data directly from /proc #86

Closed

bast closed this as completed in aa108d7 Aug 21, 2023

bast added a commit that referenced this issue Aug 21, 2023

Merge pull request #98 from lars-t-hansen/slurm-id-faster

6375abd

Fix #97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sonar is slow without --batchless because of how we get slurm job IDs #97

Sonar is slow without --batchless because of how we get slurm job IDs #97

lars-t-hansen commented Aug 21, 2023 •

edited

Loading

bast commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

bast commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

Sonar is slow without --batchless because of how we get slurm job IDs #97

Sonar is slow without --batchless because of how we get slurm job IDs #97

Comments

lars-t-hansen commented Aug 21, 2023 • edited Loading

bast commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

bast commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023

lars-t-hansen commented Aug 21, 2023 •

edited

Loading