Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving loading and first execution times #1504

Closed
itsdfish opened this issue Dec 26, 2020 · 6 comments
Closed

Improving loading and first execution times #1504

itsdfish opened this issue Dec 26, 2020 · 6 comments

Comments

@itsdfish
Copy link
Contributor

I am opening this issue in response to a discussion we had on Slack regarding loading and execution times in Turing. Although there have been improvements in absolute loading time in newer versions of Julia, Turing remains about 2-3 times slower than Plots.

Loading Times

Julia 1.5.3

@time using Plots
  6.639685 seconds (12.61 M allocations: 797.037 MiB, 4.42% gc time)
@time using Turing
 16.071072 seconds (40.04 M allocations: 2.060 GiB, 4.11% gc time)

The absolute load times are better on nightly, but Turing is still relatively slow.

Nightly

@time using Plots
  2.841848 seconds (5.80 M allocations: 429.191 MiB, 5.07% gc time, 25.36% compilation time)
  @time using Turing
  9.306305 seconds (16.28 M allocations: 988.606 MiB, 3.80% gc time, 54.65% compilation time)

First Execution Time

Here are typical first execution times for the coinflip model described in the documentation.

Julia 1.5.3

@time chain = sample(coinflip(data), HMC(ϵ, τ), iterations; progress=false)
 10.453679 seconds (42.26 M allocations: 1.876 GiB, 5.53% gc time)
Chains MCMC chain (1000×10×1 Array{Float64,3}):

Nightly

@time chain = sample(coinflip(data), HMC(ϵ, τ), iterations; progress=false)
 18.583285 seconds (74.80 M allocations: 4.053 GiB, 6.13% gc time, 93.57% compilation time)

Nearly all of the first execution time is attributable to compilation. It should be noted that the improvement in loading time on nightly is offset by the increased first execution time. There is also a lag of about 6-7 seconds for printing the first chain, but this seems to have improved quite a bit in recent versions.

Reducing these times would be greatly appreciated, even if it is only by 2-3 seconds.

Versions

(@v1.5) pkg> st Turing Plots
Status `~/.julia/environments/v1.5/Project.toml`
  [91a5bcdd] Plots v1.9.1
  [fce5fe82] Turing v0.15.7

@devmotion
Copy link
Member

I agree, this should definitely be improved. However, it's a bit difficult to comment without any more detailed benchmarks and without knowing what parts cause these excessive compilation times. I assume a large part is caused by DistributionsAD and its hacks and type piracy, it would be interesting to see if it causes invalidations (maybe also some other parts cause them?). Can you check the compilation and loading times of DistributionsAD?

IIRC in the Slack discussion it was mentioned that dropping DynamicHMC and a more lightweight MLE/MAP implementation would allow to drop Requires. That is actually not true since it is used as well to optionally load AD support for Zygote and ReverseDiff. The optimization algorithm is supposed to be changed from Optim to GalacticOptim to make it more generally applicable but I am not sure how much this will help - we will have to depend at least on DiffEqBase which also depends on quite many packages (there is an issue in DiffEqBase to fix this and/or extract some parts to a more lightweight package: SciML/DiffEqBase.jl#618). And even if Requires could be removed from Turing, other packages such as DistributionsAD still make heavy use of it...

Regardless of the loading times, maybe it would be good to move some additional parts (such as DynamicHMC integration) to separate packages to make the setup more modular.

@itsdfish
Copy link
Contributor Author

itsdfish commented Dec 26, 2020

Hi devmotion-

Here are the requested load times for DistributionsAD:

Julia 1.5.3

@time using DistributionsAD
 2.845870 seconds (5.87 M allocations: 326.353 MiB, 1.62% gc time)
 

Nightly:

  @time using DistributionsAD
  2.632951 seconds (5.31 M allocations: 325.930 MiB, 2.58% gc time, 66.49% compilation time)

The improvement for nightly is small. I am not sure if that is expected. Let me know if there is any more info I can provide. Thanks for looking into this.

@devmotion
Copy link
Member

Thanks!

I think it would be even more realistic to check how much time it takes to load DistributionsAD + Tracker (+ ForwardDiff) since Turing loads both AD backends but DistributionsAD only depends on ForwardDiff and puts Tracker support in a @requires block. IIRC Requires mostly affects loading times if the optional dependency is loaded.

@itsdfish
Copy link
Contributor Author

Is this what you are looking for?

Julia 1.5.3

@time using DistributionsAD, Tracker, ForwardDiff
  9.621758 seconds (22.21 M allocations: 1.217 GiB, 3.77% gc time)

Nightly

@time using DistributionsAD, ForwardDiff, Tracker
  6.568642 seconds (14.62 M allocations: 916.992 MiB, 5.09% gc time, 59.66% compilation time)

@devmotion
Copy link
Member

Yes, so it seems most time is spent with loading and compiling DistributionsAD and the AD backends which can't be fixed in Turing itself.

I had a quick look at how to analyze method invalidations but I am not sure how to interpret and fix the results yet. On Julia nightly I ran

julia> ] add SnoopCompileCore SnoopCompile Tracker ForwardDiff DistributionsAD#master

julia> using SnoopCompileCore

julia> invalidations = @snoopr begin
           using DistributionsAD, ForwardDiff, Tracker
       end
1701-element Vector{Any}:
...

julia> using SnoopCompile

julia> length(uinvalidated(invalidations))
510

julia> trees = invalidation_trees(invalidations)
17-element Vector{SnoopCompile.MethodInvalidations}:
...

I.e., as far as I can see loading DistributionsAD, ForwardDiff, and Tracker causes 510 method invalidations. I'll try to debug this more closely. BTW unfortunately it was not possible to perform the same analysis with Turing since Libtask_jll does not support Julia >= 1.6.

@itsdfish
Copy link
Contributor Author

Tim Holy's tutorial on precompilation might be helpful. In case it is, you can find it here.

@yebai yebai mentioned this issue Jan 28, 2021
@yebai yebai closed this as completed Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants