-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wall-time/all tasks profiler #55889
base: master
Are you sure you want to change the base?
Wall-time/all tasks profiler #55889
Conversation
An additional note: this kind of wall-time/all tasks profiler is also implemented in Go (and denoted as goroutine profiler there), so there is some precedent for this in other languages as well: https://github.com/felixge/fgprof. |
@nickrobinson251 I can't assign you as reviewer... Feel free to assign yourself or post review comments otherwise. |
6b80fe3
to
f5c8f5f
Compare
I think this is related to #55103. Could the metrics here be useful in that too? |
1e9f41f
to
6cd27d7
Compare
f6ea007
to
1029a84
Compare
0d4ca9c
to
e493403
Compare
For diagnosing excessive scheduling time? I can't immediately see how this PR would be useful for that. |
#55103 seems like a much more direct approach for doing so, at least. |
90bca24
to
14766d3
Compare
14766d3
to
b9f0f1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added the needs tests
label. Can you please add julia tests (integration tests) for this feature?
We should be testing at least these properties, maybe more:
- That it works at all
- That you do see sleeping Tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Yes, adding tests is in my TODO list for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That you do see sleeping Tasks
Now that I'm thinking about it: can we test this reliably?
The example I was thinking about was something like:
Profile.@profile_tasks sleep(1)
But the point here is: I think it may be possible for the profiler thread to get delayed arbitrarily and by the time it samples the given task, it's no longer sleeping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, i don't literally mean sleep()
ing tests. You should be able to have a bunch of Tasks waiting on a condition variable, and then do the profile, and confirm some of them show up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, given that the profiled task selection is randomized, can we test this reliably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not sure. If we had a fixed percentage, and you set it to 100%, we could. 🤔
But actually, i think you should be able to have 100% of Tasks sleeping, in which case yes you should be guaranteed to see sleeping waiting tasks?
ch = Channel()
for _ in 1:100
Threads.@spawn take!(ch)
end
Profile.@profile_wall_time sleep(10) # even the root task is waiting
^ In this example all the tasks are sleeping waiting, so we should be guaranteed to see samples with sleeping waiting tasks.
5ddd5ba
to
c9d1995
Compare
bd13e1b
to
2d7ebeb
Compare
0aca81c
to
2214459
Compare
2214459
to
debf6c5
Compare
One limitation of sampling CPU/thread profiles, as is currently done in Julia, is that they primarily capture samples from CPU-intensive tasks.
If many tasks are performing IO or contending for concurrency primitives like semaphores, these tasks won’t appear in the profile, as they aren't scheduled on OS threads sampled by the profiler.
A wall-time profiler, like the one implemented in this PR, samples tasks regardless of OS thread scheduling. This enables profiling of IO-heavy tasks and detecting areas of heavy contention in the system.
Unix-only for now...
Co-developed with @nickrobinson251.