-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving loading and first execution times #1504
Comments
I agree, this should definitely be improved. However, it's a bit difficult to comment without any more detailed benchmarks and without knowing what parts cause these excessive compilation times. I assume a large part is caused by DistributionsAD and its hacks and type piracy, it would be interesting to see if it causes invalidations (maybe also some other parts cause them?). Can you check the compilation and loading times of DistributionsAD? IIRC in the Slack discussion it was mentioned that dropping DynamicHMC and a more lightweight MLE/MAP implementation would allow to drop Requires. That is actually not true since it is used as well to optionally load AD support for Zygote and ReverseDiff. The optimization algorithm is supposed to be changed from Optim to GalacticOptim to make it more generally applicable but I am not sure how much this will help - we will have to depend at least on DiffEqBase which also depends on quite many packages (there is an issue in DiffEqBase to fix this and/or extract some parts to a more lightweight package: SciML/DiffEqBase.jl#618). And even if Requires could be removed from Turing, other packages such as DistributionsAD still make heavy use of it... Regardless of the loading times, maybe it would be good to move some additional parts (such as DynamicHMC integration) to separate packages to make the setup more modular. |
Hi devmotion- Here are the requested load times for DistributionsAD: Julia 1.5.3
Nightly:
The improvement for nightly is small. I am not sure if that is expected. Let me know if there is any more info I can provide. Thanks for looking into this. |
Thanks! I think it would be even more realistic to check how much time it takes to load DistributionsAD + Tracker (+ ForwardDiff) since Turing loads both AD backends but DistributionsAD only depends on ForwardDiff and puts Tracker support in a |
Is this what you are looking for? Julia 1.5.3
Nightly
|
Yes, so it seems most time is spent with loading and compiling DistributionsAD and the AD backends which can't be fixed in Turing itself. I had a quick look at how to analyze method invalidations but I am not sure how to interpret and fix the results yet. On Julia nightly I ran julia> ] add SnoopCompileCore SnoopCompile Tracker ForwardDiff DistributionsAD#master
julia> using SnoopCompileCore
julia> invalidations = @snoopr begin
using DistributionsAD, ForwardDiff, Tracker
end
1701-element Vector{Any}:
...
julia> using SnoopCompile
julia> length(uinvalidated(invalidations))
510
julia> trees = invalidation_trees(invalidations)
17-element Vector{SnoopCompile.MethodInvalidations}:
... I.e., as far as I can see loading DistributionsAD, ForwardDiff, and Tracker causes 510 method invalidations. I'll try to debug this more closely. BTW unfortunately it was not possible to perform the same analysis with Turing since Libtask_jll does not support Julia >= 1.6. |
Tim Holy's tutorial on precompilation might be helpful. In case it is, you can find it here. |
I am opening this issue in response to a discussion we had on Slack regarding loading and execution times in Turing. Although there have been improvements in absolute loading time in newer versions of Julia, Turing remains about 2-3 times slower than Plots.
Loading Times
Julia 1.5.3
The absolute load times are better on nightly, but Turing is still relatively slow.
Nightly
First Execution Time
Here are typical first execution times for the coinflip model described in the documentation.
Julia 1.5.3
Nightly
Nearly all of the first execution time is attributable to compilation. It should be noted that the improvement in loading time on nightly is offset by the increased first execution time. There is also a lag of about 6-7 seconds for printing the first chain, but this seems to have improved quite a bit in recent versions.
Reducing these times would be greatly appreciated, even if it is only by 2-3 seconds.
Versions
The text was updated successfully, but these errors were encountered: