-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all tasks profiler #186
all tasks profiler #186
Conversation
87182ca
to
7cb8d0e
Compare
Thanks for posting the MWE! Changing
but for
which is more ~45x times larger (much more than the number of threads we have, even including gcthreads) So it'd be good to know how many samples we're dropping (because we failed to select a task / selected a DONE task) ...and maybe consider recording a "placeholder" for those? |
src/signals-unix.c
Outdated
bt_data_prof[bt_size_cur++].jlvalue = (jl_value_t*)t; | ||
|
||
// store cpu cycle clock. XXX(Diogo, Nick): why are we recording the cycleclock here? | ||
bt_data_prof[bt_size_cur++].uintptr = cycleclock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nathan guessed it's so some profile viewing tools can display the profile over time (rather than as a single aggregated flamegraph), although idk if any tools are using this. Still, seems trivial to keep the same as with the existing profiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(Diogo): investigate why we need to measure the cycle-clock and whether we can just keep this line as it is for the task profiler.
} | ||
bt_context_t *context = NULL; | ||
bt_context_t c; | ||
int16_t old = -1; | ||
while (!jl_atomic_cmpswap(&t->tid, &old, ptls->tid) && old != ptls->tid) { | ||
while (!jl_atomic_cmpswap(&t->tid, &old, tid) && old != tid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to remember to look into what's up with sticky tasks https://relationalai.slack.com/archives/C07KVKH87DH/p1727113728609429
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some println debugging to this
diff --git a/src/stackwalk.c b/src/stackwalk.c
index 5c8893d286..3512cdff8a 100644
--- a/src/stackwalk.c
+++ b/src/stackwalk.c
@@ -891,7 +891,15 @@ int jl_rec_backtrace(jl_task_t *t) JL_NOTSAFEPOINT
bt_context_t *context = NULL;
bt_context_t c;
int16_t old = -1;
+ if (t->tid == -1) {
+ jl_safe_printf("\nwwwwwwwwwwww");
+ }
+ if (t->sticky) {
+ jl_safe_printf("\n***sticky***");
+ }
+ jl_safe_printf("\n[Before] t->tid %d old %d tid %d", t->tid, old, tid);
while (!jl_atomic_cmpswap(&t->tid, &old, tid) && old != tid) {
+ jl_safe_printf("\n[Loop] t->tid %d old %d tid %d", t->tid, old, tid);
int lockret = jl_lock_stackwalk();
// if this task is already running somewhere, we need to stop the thread it is running on and query its state
if (!jl_thread_suspend_and_get_state(old, 0, &c)) {
@@ -911,6 +919,7 @@ int jl_rec_backtrace(jl_task_t *t) JL_NOTSAFEPOINT
// got the wrong thread stopped, try again
jl_thread_resume(old);
}
+ jl_safe_printf("\n[After] t->tid %d old %d tid %d\n", t->tid, old, tid);
if (context == NULL && (!t->copy_stack && t->started && t->stkbuf != NULL)) {
// need to read the context from the task stored state
#if defined(_OS_WINDOWS_)
e.g.
[Before] t->tid 1 old -1 tid 32767
[Loop] t->tid 1 old 1 tid 32767
[After] t->tid 1 old 1 tid 32767
[Before] t->tid 1 old -1 tid 32767
[Loop] t->tid 1 old 1 tid 32767
[After] t->tid 1 old 1 tid 32767
[Before] t->tid 1 old -1 tid 32767
[Loop] t->tid 1 old 1 tid 32767
[After] t->tid 1 old 1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
[Before] t->tid 1 old -1 tid 32767
[Loop] t->tid 1 old 1 tid 32767
[After] t->tid 1 old 1 tid 32767
***sticky***
[Before] t->tid 16 old -1 tid 32767
[Loop] t->tid 16 old 16 tid 32767
[After] t->tid 16 old 16 tid 32767
***sticky***
[Before] t->tid 0 old -1 tid 32767
[Loop] t->tid 0 old 0 tid 32767
[After] t->tid 0 old 0 tid 32767
***sticky***
[Before] t->tid 0 old -1 tid 32767
[Loop] t->tid 0 old 0 tid 32767
[After] t->tid 0 old 0 tid 32767
***sticky***
[Before] t->tid 0 old -1 tid 32767
[Loop] t->tid 0 old 0 tid 32767
[After] t->tid 0 old 0 tid 32767
***sticky***
[Before] t->tid 16 old -1 tid 32767
[Loop] t->tid 16 old 16 tid 32767
[After] t->tid 16 old 16 tid 32767
***sticky***
[Before] t->tid 0 old -1 tid 32767
[Loop] t->tid 0 old 0 tid 32767
[After] t->tid 0 old 0 tid 32767
***sticky***
[Before] t->tid 16 old -1 tid 32767
[Loop] t->tid 16 old 16 tid 32767
[After] t->tid 16 old 16 tid 32767
***sticky***
[Before] t->tid 16 old -1 tid 32767
[Loop] t->tid 16 old 16 tid 32767
[After] t->tid 16 old 16 tid 32767
***sticky***
[Before] t->tid 16 old -1 tid 32767
[Loop] t->tid 16 old 16 tid 32767
[After] t->tid 16 old 16 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
***sticky***
[Before] t->tid 0 old -1 tid 32767
[Loop] t->tid 0 old 0 tid 32767
[After] t->tid 0 old 0 tid 32767
***sticky***
[Before] t->tid 0 old -1 tid 32767
[Loop] t->tid 0 old 0 tid 32767
[After] t->tid 0 old 0 tid 32767
***sticky***
[Before] t->tid 0 old -1 tid 32767
[Loop] t->tid 0 old 0 tid 32767
[After] t->tid 0 old 0 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
wwwwwwwwwwww
[Before] t->tid -1 old -1 tid 32767
[After] t->tid 32767 old -1 tid 32767
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From discussion: there might be an issue with this code when we are attempting to record a backtrace from a sticky task.
More specifically, sticky tasks (even if unscheduled) always have their thread-id set to a thread-id of the Julia thread that may run it, so there is a chance that this code will look at the thread-id of a sticky task, think that it's actually a task that's running, and then record the context of the Julia thread at the moment it was stopped -- which may be different from the tasks's context if that Julia thread happens to be running another task.
1bf5ba7
to
5e46012
Compare
66004cc
to
d48fd77
Compare
d48fd77
to
2defb28
Compare
Will open another PR to backport JuliaLang#55889. |
PR Description
Extends @nickrobinson251's #177.
Needs further testing to check whether the output produced on Linux/MacOS makes sense.
There are a few code changes which are vestigial from the time I was thinking about adding a
ptls
for the signal listener thread -- but figured this was not needed after discussing with @kpamnany.MWE
Example Output on Linux (--threads=8)
Example Output on MacOS (--threads=8)
Checklist
Requirements for merging:
port-to-*
labels that don't apply.