You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The program initially spawns 5 download threads that quit midway of the execution. After download is done, there is some processing, which is all I care for.
Strange factoid 1: according to speeedscope's "Time Order", the flame chart and even the raw output, all threads were somewhat active until the very last moment - I see stack traces from the download threads even close to the end. It's as if they were all interleaved - when in fact they were not (I did check with ps -T).
I patched a particular function called processCapabilities, which only works on the single main thread after all downloading is done. Before the latest patch in the above PR, it looked like one of the biggest offenders:
After the patch, it takes only 3% down from 18%! Good job!
...except that's not actually the case, the run time is actually longer. According to py-spy outputs, it's as if all other function calls magically became longer to more than compensate the saving in processCapabilities, which made little sense.
Finally I measured it with an extra block like the following:
Strange factoid 2: the total duration actually got much worse with the patch instead of much better: total used to be 25% of the program run time (wall clock time) and after my patch it's 41%.
Strange factoid 3: py-spy does indeed sample ~100 times per second according to the number reported in standard output when the program terminates. But the number of samples in the bottom-most element of the flame graph/flame chart/speedscope is much lower, for example, in a run of 657s I got 65733 samples, but the bottom most element in the flame graph has 37229. That does not change if I use --threads and look at the all element. Where have all the other samples gone? Could this be the culprit?
Million-dollar question: how can I even get started understanding what is going on? Are those all real bugs?
The text was updated successfully, but these errors were encountered:
py-spy samples 100x a second - but by default doesn't write out frames where the CPU isn't active. This is causing issue #3 that you report, and I think whats happening here is that the speedscope visualization is getting messed up by idle frames.
This might be a bit involved to fix unfortunately =(
Strange factoid 1 was really a PBCAK. Looking at time scales it is clear that threads in SpeedScope flames are always re-scaled to fit the whole window width, so you can't really superimpose different threads as all have their own scale. Sorry for the noise
Strage factoid 2 could be related to the fact I had an issue with NTP synchronization of the machine I use for tests. Specifically, the synchronization could happen while profiling was running - could this be the cause I saw unreasonable numbers? After making sure NTP is forcibly synchronized immediately before profiling, this issue seems to have disappeared
Strange factoid 3 is indeed explained by --idle
If you confirm NTP sync as a possible culprit for number 2 above, this issue can be closed.
👋
In the context of the Uyuni open source project I am looking to optimize a Python commandline tool (draft PR).
My commandline is:
The program initially spawns 5 download threads that quit midway of the execution. After download is done, there is some processing, which is all I care for.
Strange factoid 1: according to speeedscope's "Time Order", the flame chart and even the raw output, all threads were somewhat active until the very last moment - I see stack traces from the download threads even close to the end. It's as if they were all interleaved - when in fact they were not (I did check with
ps -T
).I patched a particular function called
processCapabilities
, which only works on the single main thread after all downloading is done. Before the latest patch in the above PR, it looked like one of the biggest offenders:After the patch, it takes only 3% down from 18%! Good job!
...except that's not actually the case, the run time is actually longer. According to
py-spy
outputs, it's as if all other function calls magically became longer to more than compensate the saving inprocessCapabilities
, which made little sense.Finally I measured it with an extra block like the following:
Strange factoid 2: the total duration actually got much worse with the patch instead of much better:
total
used to be 25% of the program run time (wall clock time) and after my patch it's 41%.Strange factoid 3:
py-spy
does indeed sample ~100 times per second according to the number reported in standard output when the program terminates. But the number of samples in the bottom-most element of the flame graph/flame chart/speedscope is much lower, for example, in a run of 657s I got 65733 samples, but the bottom most element in the flame graph has 37229. That does not change if I use--threads
and look at theall
element. Where have all the other samples gone? Could this be the culprit?Million-dollar question: how can I even get started understanding what is going on? Are those all real bugs?
The text was updated successfully, but these errors were encountered: