Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enzyme segfaults on Turing model #650

Closed
sethaxen opened this issue Mar 5, 2023 · 81 comments
Closed

Enzyme segfaults on Turing model #650

sethaxen opened this issue Mar 5, 2023 · 81 comments

Comments

@sethaxen
Copy link
Collaborator

sethaxen commented Mar 5, 2023

I just checked again the model in TuringLang/Turing.jl#1887 (comment) on that branch, and it once again (after being fixed in #457) segfaults after emitting warnings. Below is the complete code sample:

using Turing
using Enzyme

@model function model()
    m ~ Normal(0, 1)
    s ~ InverseGamma()
    x ~ Normal(m, s)
end

sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10)

The full stacktrace can be found at https://gist.github.com/sethaxen/5666e1c6c9d8194e0370c60eb70de49e#file-log-txt

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, tigerlake)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = auto

julia> using Pkg; Pkg.status()
Status `~/Downloads/enzyme_turing_test/Project.toml`
  [7da242da] Enzyme v0.11.0-dev `https://github.com/EnzymeAD/Enzyme.jl.git#main`
  [f151be2c] EnzymeCore v0.2.1 `https://github.com/EnzymeAD/Enzyme.jl.git:lib/EnzymeCore#main`
  [fce5fe82] Turing v0.24.1 `https://github.com/TuringLang/Turing.jl.git#dw/enzyme`

It also fails on the latest release of Enzyme.

@wsmoses
Copy link
Member

wsmoses commented Mar 5, 2023

@sethaxen can you extract this out so we can see the function being passed to autodiff?

As is it's hard to see what function is being differentiated, in order to debug.

@wsmoses
Copy link
Member

wsmoses commented Mar 5, 2023

@sethaxen this runs correctly for me on Enzyme#main and Julia 1.9.

First run (for compile), and second below

┌ Info: Found initial step size
└   ϵ = 0.4
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:01:05
Chains MCMC chain (10×14×1 Array{Float64, 3}):

Iterations        = 6:1:15
Number of chains  = 1
Samples per chain = 10
Wall duration     = 66.48 seconds
Compute duration  = 66.48 seconds
parameters        = m, s
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Float64 

           m    0.3281    0.9148    0.2893    10.0000        NaN    1.0216        0.1504
           s    3.3405    4.6931    1.4841     9.3304     6.6667    1.3839        0.1403

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           m   -1.1945   -0.1198    0.5916    1.1116    1.1116
           s    0.3560    0.3560    2.4453    3.5200   13.3314


julia> sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10)
┌ Info: Found initial step size
└   ϵ = 1.6
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:00
Chains MCMC chain (10×14×1 Array{Float64, 3}):

Iterations        = 6:1:15
Number of chains  = 1
Samples per chain = 10
Wall duration     = 0.01 seconds
Compute duration  = 0.01 seconds
parameters        = m, s
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Float64 

           m    0.1278    0.4991    0.1578    10.0000    10.0000    1.0588     1666.6667
           s    1.8968    1.7990    0.5689    10.0000    10.0000    1.1069     1666.6667

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           m   -0.7156   -0.1037    0.0939    0.3510    0.8085
           s    0.8492    0.9252    0.9904    1.7534    5.6334
           ```

@sethaxen
Copy link
Collaborator Author

sethaxen commented Mar 5, 2023

@sethaxen can you extract this out so we can see the function being passed to autodiff?

As is it's hard to see what function is being differentiated, in order to debug.

Sure, here's the version that contains the call to autodiff:

using Turing, Enzyme
using Turing.LogDensityProblems

@model function model()
    m ~ Normal(0, 1)
    s ~ InverseGamma()
    x ~ Normal(m, s)
end

mod = model()
sampler = DynamicPPL.Sampler(NUTS())
vi = DynamicPPL.VarInfo(mod)
vi = DynamicPPL.link!!(vi, sampler, mod)
ℓ = Turing.LogDensityFunction(vi, mod, sampler, DynamicPPL.DefaultContext())
x = vi[sampler]  # Vector{Float64}
∂ℓ_∂x = zero(x)
Enzyme.autodiff(
    Reverse,
    LogDensityProblems.logdensity,
    Enzyme.Active,
    Enzyme.Const(ℓ),
    Enzyme.Duplicated(x, ∂ℓ_∂x),
)

@sethaxen this runs correctly for me on Enzyme#main and Julia 1.9.

Strange, because this also segfaults for me on Julia 1.9.

@wsmoses
Copy link
Member

wsmoses commented Mar 5, 2023

Odd, okay, would you be able to simplify the above? E.g. simplify and/or inline as much as possible?

@sethaxen
Copy link
Collaborator Author

sethaxen commented Mar 7, 2023

No, sorry, I am not familiar with the inner workings of this code and have no time right now.

@ViralBShah
Copy link
Contributor

@yebai Is this the same as the issue you mentioned in #658?

@yebai
Copy link

yebai commented Mar 9, 2023

No -- this issue is already fixed by using an immutable internal data structure (SimpleVarInfo if you are interested). This issue can be closed now.

PS. Here is a working example of Turing using Enzyme:

julia> using Distributions, DynamicPPL, LogDensityProblems, LogDensityProblemsAD, Enzyme

julia> @model demo() = x ~ Normal()
demo (generic function with 2 methods)

julia> model = demo()
Model{typeof(demo), (), (), (), Tuple{}, Tuple{}, DefaultContext}(demo, NamedTuple(), NamedTuple(), DefaultContext())

julia> f = DynamicPPL.LogDensityFunction(model, SimpleVarInfo((x = 1.0, )));

julia> fwithgrad = ADgradient(:Enzyme, f);

julia> LogDensityProblems.logdensity_and_gradient(fwithgrad, [1.0])
(-1.4189385332046727, [-1.0])

@sethaxen
Copy link
Collaborator Author

sethaxen commented Mar 9, 2023

No -- this issue is already fixed by using an immutable internal data structure (SimpleVarInfo if you are interested). This issue can be closed now.

No, this is not fixed using SimpleVarInfo. We found if we used s ~ InverseGamma(), s ~ Gamma(), or s ~ truncated(Normal(); lower=0), we got the same segfaults. If we remove s and x entirely, then everything is fine, even with the usual VarInfo.

@devmotion
Copy link
Contributor

As another data point, the example in #650 (comment) segfaults for me with Enzyme#main, EnzymeCore#main and Enzyme_jll#main on Julia 1.9 rc1.

Using DynamicPPL.SimpleVarInfo, i.e., replacing the line vi = DynamicPPL.VarInfo(mod) with vi = DynamicPPL.SimpleVarInfo(mod), fixes the segfault but yields an error:

julia> Enzyme.autodiff(
           Reverse,
           LogDensityProblems.logdensity,
           Enzyme.Active,
           Enzyme.Const(ℓ),
           Enzyme.Duplicated(x, ∂ℓ_∂x),
       )
ERROR: Return type inferred to be Union{}. Giving up.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] #s479#163
   @ ~/.julia/packages/Enzyme/SUstD/src/compiler.jl:8160 [inlined]
 [3] var"#s479#163"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ShadowInit::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
   @ Enzyme.Compiler ./none:0
 [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
   @ Core ./boot.jl:602
 [5] thunk(f::typeof(LogDensityProblems.logdensity), df::Nothing, ::Type{Duplicated{Union{}}}, tt::Type{Tuple{Const{LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}}, Duplicated{Vector{Float64}}}}, ::Val{Enzyme.API.DEM_ReverseModeGradient}, ::Val{1}, ::Val{(false, false, false)}, ::Val{false}, ::Val{true})
   @ Enzyme.Compiler ~/.julia/packages/Enzyme/SUstD/src/compiler.jl:8218
 [6] autodiff(::EnzymeCore.ReverseMode{false, false}, ::typeof(LogDensityProblems.logdensity), ::Type{Active}, ::Const{LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}}, ::Vararg{Any})
   @ Enzyme ~/.julia/packages/Enzyme/SUstD/src/Enzyme.jl:185
 [7] top-level scope
   @ REPL[10]:1

Surprisingly it seems the return type of the logdensity function can't be inferred even though we work with a simple NamedTuple here:

julia> @code_warntype LogDensityProblems.logdensity(ℓ, x)
MethodInstance for LogDensityProblems.logdensity(::LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}, ::Vector{Float64})
  from logdensity(f::LogDensityFunction, θ::AbstractVector) @ DynamicPPL ~/.julia/packages/DynamicPPL/UFajj/src/logdensityfunction.jl:92
Arguments
  #self#::Core.Const(LogDensityProblems.logdensity)
  f::LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}
  θ::Vector{Float64}
Locals
  vi_new::DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}
Body::Union{}
1%1 = Base.getproperty(f, :varinfo)::DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}%2 = Base.getproperty(f, :context)::DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}
│        (vi_new = DynamicPPL.unflatten(%1, %2, θ))
│   %4 = Base.getproperty(f, :model)::Core.Const(DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}(model, NamedTuple(), NamedTuple(), DynamicPPL.DefaultContext()))
│   %5 = vi_new::DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}%6 = Base.getproperty(f, :context)::DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}
│        DynamicPPL.evaluate!!(%4, %5, %6)
│        Core.Const(:(DynamicPPL.last(%7)))
│        Core.Const(:(DynamicPPL.getlogp(%8)))
└──      Core.Const(:(return %9))

@vchuravy
Copy link
Member

Union{} means "gurantueed to error."

What does execution the function normally yield?

@devmotion
Copy link
Contributor

It should return a Float64 but after getting some sleep it's clear to me why the SimpleVarInfo version fails. The reduced example in #650 (comment) is not correct - it introduces a mix of ForwardDiff.Dual and sampling in the log density evaluation. The log density function should be constructed by

= Turing.LogDensityFunction(mod, vi, DynamicPPL.DefaultContext())

instead. Then the logdensity can be evaluated correctly and Enzyme can compute the gradient (when using SimpleVarInfo; tested with Enzyme#main, EnzymeCore#main, Enzyme_jll#main on Julia 1.9 rc1):

julia> autodiff(
           ReverseWithPrimal,
           LogDensityProblems.logdensity,
           ℓ,
           Duplicated(x, ∂ℓ_∂x),
       )
((nothing, nothing), -4.2993310577423145)

julia> ∂ℓ_∂x
3-element Vector{Float64}:
  0.4264673357364165
 -0.6593350799670791
  0.5109041418560486

julia> LogDensityProblems.logdensity(ℓ, x)
-4.2993310577423145

julia> ForwardDiff.gradient(x -> logjoint(mod, DynamicPPL.SimpleVarInfo((m = x[1], s = x[2], x = x[3]), zero(eltype(x)), DynamicPPL.DynamicTransformation())), x)
3-element Vector{Float64}:
  0.4264673357364165
 -0.6593350799670789
  0.5109041418560486

@wsmoses
Copy link
Member

wsmoses commented Mar 12, 2023

@devmotion well unfortunately as I cannot reproduce the segfault on main, you're going to have to minimize it (and hopefully therefore allow me to reproduce it), in order to start any investigation and/or fix.

@devmotion
Copy link
Contributor

I don't understand how it's possible that you can successfully run the example in the OP. For me, the same happens as @sethaxen described above: When I run

using Turing, Enzyme

@model function model()
    m ~ Normal(0, 1)
    s ~ InverseGamma()
    x ~ Normal(m, s)
end

sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10)

I get a lot of warnings and then Julia segfaults. I used the latest version of Enzyme, Julia, and the Turing branch with Enzyme support:

julia> versioninfo()
Julia Version 1.9.0-rc1
Commit 3b2e0d8fbc1 (2023-03-07 07:51 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
  Threads: 3 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 3
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_EDITOR = code
  JULIA_PKG_SERVER = https://pumasai.juliahub.com

(enzyme) pkg> st
Status `~/sources/enzyme/Project.toml`
  [7da242da] Enzyme v0.11.0-dev `https://github.com/EnzymeAD/Enzyme.jl.git#main`
  [f151be2c] EnzymeCore v0.2.1 `https://github.com/EnzymeAD/Enzyme.jl.git:lib/EnzymeCore#main`
  [fce5fe82] Turing v0.24.1 `https://github.com/TuringLang/Turing.jl.git#dw/enzyme`
  [7cc45869] Enzyme_jll v0.0.51+0 `https://github.com/JuliaBinaryWrappers/Enzyme_jll.jl.git#main`

Does your setup in some way differ from ours?

@devmotion
Copy link
Contributor

I found it 🎉 I had a final idea, based on the differences between sampling/computing derivatives with a single thread and with multiple threads (or the respective methods for these cases) that I had observed in #659. And indeed, when I erase the JULIA_NUM_THREADS environment variable and start Julia single-threaded, then sampling succeeds. However, it still emits all these warnings which possibly require some fixes or at least should not show up in non-debug mode, I think. Interestingly, the warnings/stacktrace points to GPUCompiler even though I don't run anything on the GPU?! For instance,

┌ Warning: TypeAnalysisDepthLimit
│ {[]:Pointer, [0]:Pointer, [0,0]:Pointer, [0,0,0]:Integer, [0,8]:Integer, [0,9]:Integer, [0,10]:Integer, [0,11]:Integer, [0,12]:Integer, [0,13]:Integer, [0,14]:Integer, [0,15]:Integer, [0,16]:Integer, [0,17]:Integer, [0,18]:Integer, [0,19]:Integer, [0,20]:Integer, [0,21]:Integer, [0,22]:Integer, [0,23]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [0,32]:Integer, [0,33]:Integer, [0,34]:Integer, [0,35]:Integer, [0,36]:Integer, [0,37]:Integer, [0,38]:Integer, [0,39]:Integer, [0,40]:Integer, [8]:Pointer, [8,0]:Pointer, [8,0,0]:Pointer, [8,8]:Integer, [8,9]:Integer, [8,10]:Integer, [8,11]:Integer, [8,12]:Integer, [8,13]:Integer, [8,14]:Integer, [8,15]:Integer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [8,24]:Integer, [8,25]:Integer, [8,26]:Integer, [8,27]:Integer, [8,28]:Integer, [8,29]:Integer, [8,30]:Integer, [8,31]:Integer, [8,32]:Integer, [8,33]:Integer, [8,34]:Integer, [8,35]:Integer, [8,36]:Integer, [8,37]:Integer, [8,38]:Integer, [8,39]:Integer, [8,40]:Integer, [16]:Pointer, [16,0]:Pointer, [16,0,0]:Pointer, [16,0,0,0]:Pointer, [16,0,0,0,0]:Pointer, [16,0,0,0,0,0]:Integer, [16,0,0,0,0,1]:Integer, [16,0,0,0,0,2]:Integer, [16,0,0,0,0,3]:Integer, [16,0,0,0,0,4]:Integer, [16,0,0,0,0,5]:Integer, [16,0,0,0,0,6]:Integer, [16,0,0,0,0,7]:Integer, [16,0,0,0,8]:Integer, [16,0,0,0,9]:Integer, [16,0,0,0,10]:Integer, [16,0,0,0,11]:Integer, [16,0,0,0,12]:Integer, [16,0,0,0,13]:Integer, [16,0,0,0,14]:Integer, [16,0,0,0,15]:Integer, [16,0,0,0,16]:Integer, [16,0,0,0,17]:Integer, [16,0,0,0,18]:Integer, [16,0,0,0,19]:Integer, [16,0,0,0,20]:Integer, [16,0,0,0,21]:Integer, [16,0,0,0,22]:Integer, [16,0,0,0,23]:Integer, [16,0,0,0,24]:Integer, [16,0,0,0,25]:Integer, [16,0,0,0,26]:Integer, [16,0,0,0,27]:Integer, [16,0,0,0,28]:Integer, [16,0,0,0,29]:Integer, [16,0,0,0,30]:Integer, [16,0,0,0,31]:Integer, [16,0,0,0,32]:Integer, [16,0,0,0,33]:Integer, [16,0,0,0,34]:Integer, [16,0,0,0,35]:Integer, [16,0,0,0,36]:Integer, [16,0,0,0,37]:Integer, [16,0,0,0,38]:Integer, [16,0,0,0,39]:Integer, [16,0,0,0,40]:Integer, [16,0,0,8]:Integer, [16,0,0,9]:Integer, [16,0,0,10]:Integer, [16,0,0,11]:Integer, [16,0,0,12]:Integer, [16,0,0,13]:Integer, [16,0,0,14]:Integer, [16,0,0,15]:Integer, [16,0,0,16]:Integer, [16,0,0,17]:Integer, [16,0,0,18]:Integer, [16,0,0,19]:Integer, [16,0,0,20]:Integer, [16,0,0,21]:Integer, [16,0,0,22]:Integer, [16,0,0,23]:Integer, [16,8]:Integer, [16,9]:Integer, [16,10]:Integer, [16,11]:Integer, [16,12]:Integer, [16,13]:Integer, [16,14]:Integer, [16,15]:Integer, [16,16]:Integer, [16,17]:Integer, [16,18]:Integer, [16,19]:Integer, [16,20]:Integer, [16,21]:Integer, [16,22]:Integer, [16,23]:Integer, [16,24]:Integer, [16,25]:Integer, [16,26]:Integer, [16,27]:Integer, [16,28]:Integer, [16,29]:Integer, [16,30]:Integer, [16,31]:Integer, [16,32]:Integer, [16,33]:Integer, [16,34]:Integer, [16,35]:Integer, [16,36]:Integer, [16,37]:Integer, [16,38]:Integer, [16,39]:Integer, [16,40]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer, [40]:Integer, [41]:Integer, [42]:Integer, [43]:Integer, [44]:Integer, [45]:Integer, [46]:Integer, [47]:Integer, [48]:Integer, [49]:Integer, [50]:Integer, [51]:Integer, [52]:Integer, [53]:Integer, [54]:Integer, [55]:Integer, [56]:Integer, [57]:Integer, [58]:Integer, [59]:Integer, [60]:Integer, [61]:Integer, [62]:Integer, [63]:Integer}
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:50
not handling more than 6 pointer lookups deep dt:{[]:Pointer, [0]:Pointer, [0,0]:Pointer, [0,0,0]:Integer, [0,8]:Integer, [0,9]:Integer, [0,10]:Integer, [0,11]:Integer, [0,12]:Integer, [0,13]:Integer, [0,14]:Integer, [0,15]:Integer, [0,16]:Integer, [0,17]:Integer, [0,18]:Integer, [0,19]:Integer, [0,20]:Integer, [0,21]:Integer, [0,22]:Integer, [0,23]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [0,32]:Integer, [0,33]:Integer, [0,34]:Integer, [0,35]:Integer, [0,36]:Integer, [0,37]:Integer, [0,38]:Integer, [0,39]:Integer, [0,40]:Integer, [8]:Pointer, [8,0]:Pointer, [8,0,0]:Pointer, [8,8]:Integer, [8,9]:Integer, [8,10]:Integer, [8,11]:Integer, [8,12]:Integer, [8,13]:Integer, [8,14]:Integer, [8,15]:Integer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [8,24]:Integer, [8,25]:Integer, [8,26]:Integer, [8,27]:Integer, [8,28]:Integer, [8,29]:Integer, [8,30]:Integer, [8,31]:Integer, [8,32]:Integer, [8,33]:Integer, [8,34]:Integer, [8,35]:Integer, [8,36]:Integer, [8,37]:Integer, [8,38]:Integer, [8,39]:Integer, [8,40]:Integer, [16]:Pointer, [16,0]:Pointer, [16,0,0]:Pointer, [16,0,0,0]:Pointer, [16,0,0,0,0]:Pointer, [16,0,0,0,0,0]:Integer, [16,0,0,0,0,1]:Integer, [16,0,0,0,0,2]:Integer, [16,0,0,0,0,3]:Integer, [16,0,0,0,0,4]:Integer, [16,0,0,0,0,5]:Integer, [16,0,0,0,0,6]:Integer, [16,0,0,0,0,7]:Integer, [16,0,0,0,8]:Integer, [16,0,0,0,9]:Integer, [16,0,0,0,10]:Integer, [16,0,0,0,11]:Integer, [16,0,0,0,12]:Integer, [16,0,0,0,13]:Integer, [16,0,0,0,14]:Integer, [16,0,0,0,15]:Integer, [16,0,0,0,16]:Integer, [16,0,0,0,17]:Integer, [16,0,0,0,18]:Integer, [16,0,0,0,19]:Integer, [16,0,0,0,20]:Integer, [16,0,0,0,21]:Integer, [16,0,0,0,22]:Integer, [16,0,0,0,23]:Integer, [16,0,0,0,24]:Integer, [16,0,0,0,25]:Integer, [16,0,0,0,26]:Integer, [16,0,0,0,27]:Integer, [16,0,0,0,28]:Integer, [16,0,0,0,29]:Integer, [16,0,0,0,30]:Integer, [16,0,0,0,31]:Integer, [16,0,0,0,32]:Integer, [16,0,0,0,33]:Integer, [16,0,0,0,34]:Integer, [16,0,0,0,35]:Integer, [16,0,0,0,36]:Integer, [16,0,0,0,37]:Integer, [16,0,0,0,38]:Integer, [16,0,0,0,39]:Integer, [16,0,0,0,40]:Integer, [16,0,0,8]:Integer, [16,0,0,9]:Integer, [16,0,0,10]:Integer, [16,0,0,11]:Integer, [16,0,0,12]:Integer, [16,0,0,13]:Integer, [16,0,0,14]:Integer, [16,0,0,15]:Integer, [16,0,0,16]:Integer, [16,0,0,17]:Integer, [16,0,0,18]:Integer, [16,0,0,19]:Integer, [16,0,0,20]:Integer, [16,0,0,21]:Integer, [16,0,0,22]:Integer, [16,0,0,23]:Integer, [16,8]:Integer, [16,9]:Integer, [16,10]:Integer, [16,11]:Integer, [16,12]:Integer, [16,13]:Integer, [16,14]:Integer, [16,15]:Integer, [16,16]:Integer, [16,17]:Integer, [16,18]:Integer, [16,19]:Integer, [16,20]:Integer, [16,21]:Integer, [16,22]:Integer, [16,23]:Integer, [16,24]:Integer, [16,25]:Integer, [16,26]:Integer, [16,27]:Integer, [16,28]:Integer, [16,29]:Integer, [16,30]:Integer, [16,31]:Integer, [16,32]:Integer, [16,33]:Integer, [16,34]:Integer, [16,35]:Integer, [16,36]:Integer, [16,37]:Integer, [16,38]:Integer, [16,39]:Integer, [16,40]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer, [40]:Integer, [41]:Integer, [42]:Integer, [43]:Integer, [44]:Integer, [45]:Integer, [46]:Integer, [47]:Integer, [48]:Integer, [49]:Integer, [50]:Integer, [51]:Integer, [52]:Integer, [53]:Integer, [54]:Integer, [55]:Integer, [56]:Integer, [57]:Integer, [58]:Integer, [59]:Integer, [60]:Integer, [61]:Integer, [62]:Integer, [63]:Integer} only(56): 
┌ Warning: TypeAnalysisDepthLimit
│   store {} addrspace(10)* %.fca.0.0.1.7.extract, {} addrspace(10)* addrspace(10)* %.fca.0.0.1.7.gep, align 8, !dbg !19
│ {[]:Pointer, [0]:Pointer, [0,0]:Pointer, [0,0,0]:Pointer, [0,0,0,0]:Integer, [0,0,8]:Integer, [0,0,9]:Integer, [0,0,10]:Integer, [0,0,11]:Integer, [0,0,12]:Integer, [0,0,13]:Integer, [0,0,14]:Integer, [0,0,15]:Integer, [0,0,16]:Integer, [0,0,17]:Integer, [0,0,18]:Integer, [0,0,19]:Integer, [0,0,20]:Integer, [0,0,21]:Integer, [0,0,22]:Integer, [0,0,23]:Integer, [0,0,24]:Integer, [0,0,25]:Integer, [0,0,26]:Integer, [0,0,27]:Integer, [0,0,28]:Integer, [0,0,29]:Integer, [0,0,30]:Integer, [0,0,31]:Integer, [0,0,32]:Integer, [0,0,33]:Integer, [0,0,34]:Integer, [0,0,35]:Integer, [0,0,36]:Integer, [0,0,37]:Integer, [0,0,38]:Integer, [0,0,39]:Integer, [0,0,40]:Integer, [0,8]:Pointer, [0,8,0]:Pointer, [0,8,0,0]:Pointer, [0,8,8]:Integer, [0,8,9]:Integer, [0,8,10]:Integer, [0,8,11]:Integer, [0,8,12]:Integer, [0,8,13]:Integer, [0,8,14]:Integer, [0,8,15]:Integer, [0,8,16]:Integer, [0,8,17]:Integer, [0,8,18]:Integer, [0,8,19]:Integer, [0,8,20]:Integer, [0,8,21]:Integer, [0,8,22]:Integer, [0,8,23]:Integer, [0,8,24]:Integer, [0,8,25]:Integer, [0,8,26]:Integer, [0,8,27]:Integer, [0,8,28]:Integer, [0,8,29]:Integer, [0,8,30]:Integer, [0,8,31]:Integer, [0,8,32]:Integer, [0,8,33]:Integer, [0,8,34]:Integer, [0,8,35]:Integer, [0,8,36]:Integer, [0,8,37]:Integer, [0,8,38]:Integer, [0,8,39]:Integer, [0,8,40]:Integer, [0,16]:Pointer, [0,16,0]:Pointer, [0,16,0,0]:Pointer, [0,16,0,0,0]:Pointer, [0,16,0,0,0,0]:Pointer, [0,16,0,0,0,8]:Integer, [0,16,0,0,0,9]:Integer, [0,16,0,0,0,10]:Integer, [0,16,0,0,0,11]:Integer, [0,16,0,0,0,12]:Integer, [0,16,0,0,0,13]:Integer, [0,16,0,0,0,14]:Integer, [0,16,0,0,0,15]:Integer, [0,16,0,0,0,16]:Integer, [0,16,0,0,0,17]:Integer, [0,16,0,0,0,18]:Integer, [0,16,0,0,0,19]:Integer, [0,16,0,0,0,20]:Integer, [0,16,0,0,0,21]:Integer, [0,16,0,0,0,22]:Integer, [0,16,0,0,0,23]:Integer, [0,16,0,0,0,24]:Integer, [0,16,0,0,0,25]:Integer, [0,16,0,0,0,26]:Integer, [0,16,0,0,0,27]:Integer, [0,16,0,0,0,28]:Integer, [0,16,0,0,0,29]:Integer, [0,16,0,0,0,30]:Integer, [0,16,0,0,0,31]:Integer, [0,16,0,0,0,32]:Integer, [0,16,0,0,0,33]:Integer, [0,16,0,0,0,34]:Integer, [0,16,0,0,0,35]:Integer, [0,16,0,0,0,36]:Integer, [0,16,0,0,0,37]:Integer, [0,16,0,0,0,38]:Integer, [0,16,0,0,0,39]:Integer, [0,16,0,0,0,40]:Integer, [0,16,0,0,8]:Integer, [0,16,0,0,9]:Integer, [0,16,0,0,10]:Integer, [0,16,0,0,11]:Integer, [0,16,0,0,12]:Integer, [0,16,0,0,13]:Integer, [0,16,0,0,14]:Integer, [0,16,0,0,15]:Integer, [0,16,0,0,16]:Integer, [0,16,0,0,17]:Integer, [0,16,0,0,18]:Integer, [0,16,0,0,19]:Integer, [0,16,0,0,20]:Integer, [0,16,0,0,21]:Integer, [0,16,0,0,22]:Integer, [0,16,0,0,23]:Integer, [0,16,8]:Integer, [0,16,9]:Integer, [0,16,10]:Integer, [0,16,11]:Integer, [0,16,12]:Integer, [0,16,13]:Integer, [0,16,14]:Integer, [0,16,15]:Integer, [0,16,16]:Integer, [0,16,17]:Integer, [0,16,18]:Integer, [0,16,19]:Integer, [0,16,20]:Integer, [0,16,21]:Integer, [0,16,22]:Integer, [0,16,23]:Integer, [0,16,24]:Integer, [0,16,25]:Integer, [0,16,26]:Integer, [0,16,27]:Integer, [0,16,28]:Integer, [0,16,29]:Integer, [0,16,30]:Integer, [0,16,31]:Integer, [0,16,32]:Integer, [0,16,33]:Integer, [0,16,34]:Integer, [0,16,35]:Integer, [0,16,36]:Integer, [0,16,37]:Integer, [0,16,38]:Integer, [0,16,39]:Integer, [0,16,40]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [0,32]:Integer, [0,33]:Integer, [0,34]:Integer, [0,35]:Integer, [0,36]:Integer, [0,37]:Integer, [0,38]:Integer, [0,39]:Integer, [0,40]:Integer, [0,41]:Integer, [0,42]:Integer, [0,43]:Integer, [0,44]:Integer, [0,45]:Integer, [0,46]:Integer, [0,47]:Integer, [0,48]:Integer, [0,49]:Integer, [0,50]:Integer, [0,51]:Integer, [0,52]:Integer, [0,53]:Integer, [0,54]:Integer, [0,55]:Integer, [0,56]:Integer, [0,57]:Integer, [0,58]:Integer, [0,59]:Integer, [0,60]:Integer, [0,61]:Integer, [0,62]:Integer, [0,63]:Integer}
│ 
│ Stacktrace:
│  [1] Fix1
│    @ ./operators.jl:0
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:50

So then apart from these warnings the remaining question, as in #659 is, why we see the segfaults/different behaviour in the multithreaded case. The only difference is that in the multi-threaded case the variable structure is put into a wrapper to ensure that @threads for ... loops in the models accumulate the density correctly. Otherwise, the underlying logic and calls are the same in both cases. #659 (comment) contains some additional explanations and links.

@wsmoses
Copy link
Member

wsmoses commented Mar 12, 2023

Great find!

Unfortunately, will still need a minimal example to be able to resolve, if you can similarly try to simplify!

@wsmoses
Copy link
Member

wsmoses commented Mar 13, 2023

I reduced it down to the following, which still needs to be reduced a lot more to be able to debug.

@devmotion if you can assist, you'd be a lot faster than me since I have no idea waht any of these libraries/etc are xD

using Distributions, DynamicPPL, LogDensityProblems, LogDensityProblemsAD, Enzyme, LinearAlgebra
using Turing
using Enzyme
using Turing.AbstractMCMC

using AdvancedHMC

@model function model()
    m ~ Normal(0, 1)
    s ~ InverseGamma()
    x ~ Normal(m, s)
end

using Random

mod = model() | (; x=0.5)
alg = Turing.NUTS{Turing.EnzymeAD}()
spl = Sampler(alg, mod)

vi = DynamicPPL.default_varinfo(Random.GLOBAL_RNG, mod, spl)
    
vi = link!!(vi, spl, mod)

    # Extract parameters.
    theta = vi[spl]

    # Create a Hamiltonian.
    metricT = Turing.Inference.getmetricT(spl.alg)
    metric = metricT(length(theta))
    ℓ = LogDensityProblemsAD.ADgradient(
        Turing.LogDensityFunction(vi, mod, spl, DynamicPPL.DefaultContext())
    )
    logπ = Base.Fix1(LogDensityProblems.logdensity, ℓ)
    ∂logπ∂θ(x) = LogDensityProblems.logdensity_and_gradient(ℓ, x)
    hamiltonian = AdvancedHMC.Hamiltonian(metric, logπ, ∂logπ∂θ)

    # Compute phase point z.
    # r = rand(Random.GLOBAL_RNG, metricT, size(metric)...)
    # r ./= 
    # r ./= metric.sqrtM⁻¹
                               # AdvancedHMC.rand(Random.GLOBAL_RNG, metric, hamiltonian.kinetic)
                               # AdvancedHMC.
                               # rand(Random.GLOBAL_RNG, metric, hamiltonian.kinetic)

                               AdvancedHMC.phasepoint(hamiltonian, theta, rand(Random.GLOBAL_RNG, metric, hamiltonian.kinetic))

# AbstractMCMC.step(Random.GLOBAL_RNG, mod, alg)
# mymcmcsample(Random.GLOBAL_RNG, mod, alg, 10)
# sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10)

@devmotion
Copy link
Contributor

I'm happy to help but unfortunately it might take a few days before I find time for some more debugging.

@wsmoses
Copy link
Member

wsmoses commented Apr 7, 2023

@devmotion any luck?

@sethaxen
Copy link
Collaborator Author

@wsmoses here's a smaller example:

using Enzyme
using Turing.LogDensityProblems
using Turing.Distributions
using Turing: DynamicPPL, NUTS

DynamicPPL.@model function model()
    m ~ Normal(0, 1)
    s ~ InverseGamma()
    x ~ Normal(m, s)
end

mod = model()
sampler = DynamicPPL.Sampler(NUTS())
vi = DynamicPPL.VarInfo(mod)
vi = DynamicPPL.link!!(vi, sampler, mod)
ℓ = DynamicPPL.LogDensityFunction(mod, vi, DynamicPPL.DefaultContext())

x = vi[sampler]  # Vector{Float64}
∂ℓ_∂x = zero(x)
LogDensityProblems.logdensity(ℓ, x)  # works
Enzyme.autodiff(
    Reverse,
    LogDensityProblems.logdensity,
    Const(ℓ),
    Duplicated(x, ∂ℓ_∂x),
)

On Enzyme v0.10, this segfaults for me regardless of whether JULIA_NUM_THREADS is set or not. On Enzyme v0.11, it prints out a bunch of warnings (same as before), and I get the following error:

ERROR: MethodError: no method matching callconv!(::Ptr{LLVM.API.LLVMOpaqueValue}, ::UInt32)

Closest candidates are:
  callconv!(::Union{LLVM.CallBrInst, LLVM.CallInst, LLVM.InvokeInst}, ::Any)
   @ LLVM ~/.julia/packages/LLVM/TLGyi/src/core/instructions.jl:155
  callconv!(::LLVM.Function, ::Any)
   @ LLVM ~/.julia/packages/LLVM/TLGyi/src/core/function.jl:27

Stacktrace:
  [1] jl_array_ptr_copy_fwd(B::Ptr{LLVM.API.LLVMOpaqueBuilder}, OrigCI::Ptr{LLVM.API.LLVMOpaqueValue}, gutils::Ptr{Nothing}, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:4873
  [2] jl_array_ptr_copy_augfwd(B::Ptr{LLVM.API.LLVMOpaqueBuilder}, OrigCI::Ptr{LLVM.API.LLVMOpaqueValue}, gutils::Ptr{Nothing}, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, tapeR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:4892
  [3] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{Nothing}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
    @ Enzyme.API ~/.julia/packages/Enzyme/EncRR/src/api.jl:124
  [4] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{Bool, Bool, Bool}, returnPrimal::Bool, jlrules::Vector{String}, expectedTapeType::Type)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:6680
  [5] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.ThreadSafeContext, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:7921
  [6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, ctx::Nothing, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8434
  [7] _thunk
    @ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8431 [inlined]
  [8] cached_compilation
    @ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8469 [inlined]
  [9] #s286#175
    @ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8527 [inlined]
 [10] var"#s286#175"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ::Any, ::Any, ::Any, ::Any, tt::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
    @ Enzyme.Compiler ./none:0
 [11] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [12] thunk
    @ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8486 [inlined]
 [13] autodiff
    @ ~/.julia/packages/Enzyme/EncRR/src/Enzyme.jl:199 [inlined]
 [14] autodiff
    @ ~/.julia/packages/Enzyme/EncRR/src/Enzyme.jl:228 [inlined]
 [15] autodiff(::EnzymeCore.ReverseMode{false}, ::typeof(LogDensityProblems.logdensity), ::Const{DynamicPPL.LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s, :x), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.DefaultContext}}, ::Duplicated{Vector{Float64}})
    @ Enzyme ~/.julia/packages/Enzyme/EncRR/src/Enzyme.jl:214
 [16] top-level scope
    @ REPL[33]:1

Note this is now using the release version of Turing and not the branch that glues it and Enzyme (which is not up-to-date with Enzyme v0.11 compat)

julia> using Pkg; Pkg.status()
Status `/tmp/jl_eLfrOK/Project.toml`
  [7da242da] Enzyme v0.11.0
  [fce5fe82] Turing v0.24.3

julia> versioninfo()
Julia Version 1.9.0-rc2
Commit 72aec423c2a (2023-04-01 10:41 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_CMDSTAN_HOME = /home/sethaxen/software/cmdstan/2.30.1/
  JULIA_EDITOR = code

@wsmoses
Copy link
Member

wsmoses commented Apr 14, 2023

I've fixed the 0.11 error you saw on main just now. Your test now has the previous behavior of working fine single threaded, but segfaulting multi threaded.

Unfortunately, this means additional minimization is required.

@devmotion
Copy link
Contributor

@devmotion any luck?

No, I had to postpone it. Will return to it probably in ~ 2 weeks.

@wsmoses
Copy link
Member

wsmoses commented Apr 24, 2023

Should be solved by #772 please reopen if it persists.

@wsmoses wsmoses closed this as completed Apr 24, 2023
@sethaxen
Copy link
Collaborator Author

sethaxen commented Apr 24, 2023

For me the example in #650 (comment) still segfaults (even without JULIA_NUM_THREADS set)

Edit: I don't have permissions to reopen.

@wsmoses wsmoses reopened this Apr 24, 2023
@wsmoses
Copy link
Member

wsmoses commented Apr 24, 2023

@sethaxen Can you make a minimal reproducer out of that comment?

@wsmoses
Copy link
Member

wsmoses commented Apr 24, 2023

It appeared to work on my system post fix, unfortunately.

@sethaxen
Copy link
Collaborator Author

@sethaxen Can you make a minimal reproducer out of that comment?

I'll try but I'm also not very familiar with DynamicPPL's internals.

@wsmoses
Copy link
Member

wsmoses commented Jun 26, 2023

In any case if you/anyone find any segfault/GC issues and can minimize them (see my minimization inline above, for example) it will allow us to attempt to fix them.

@wsmoses
Copy link
Member

wsmoses commented Jun 26, 2023

And FWIW, if you're able to minimize to help us try to fix it, it does appear that the performance improvement is significant -- at least for this code:

julia> sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10000)
┌ Info: Found initial step size
└   ϵ = 3.2
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:00
Chains MCMC chain (10000×14×1 Array{Float64, 3}):

Iterations        = 1001:1:11000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 0.69 seconds
Compute duration  = 0.69 seconds
parameters        = m, s
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64 

           m    0.2315    0.6896    0.0113   3973.8777   4139.8132    0.9999     5784.3926
           s    1.5227    2.9910    0.0510   2829.1958   2915.2429    1.0009     4118.1890

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           m   -1.2766   -0.1601    0.2890    0.6651    1.5145
           s    0.2375    0.5263    0.8791    1.5640    6.2744

julia> sample(model() | (; x=0.5), NUTS{Turing.ZygoteAD}(), 10000)
┌ Info: Found initial step size
└   ϵ = 0.8
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:29
Chains MCMC chain (10000×14×1 Array{Float64, 3}):

Iterations        = 1001:1:11000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 29.35 seconds
Compute duration  = 29.35 seconds
parameters        = m, s
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64 

           m    0.2465    0.6962    0.0111   4146.4375   4181.3396    1.0001      141.2804
           s    1.4766    2.9093    0.0483   2859.4063   2677.9875    1.0004       97.4277

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           m   -1.2932   -0.1415    0.3066    0.6702    1.5565
           s    0.2348    0.5326    0.8819    1.5318    6.1036

@sethaxen
Copy link
Collaborator Author

I don't know if it's related, but I kept repeating the following line of @yebai's example in #650 (comment):

sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10000)

and it eventually failed for me with an error:

GC error (probable corruption) :
Allocations: 134280068 (Pool: 134191737; Big: 88331); GC: 134
<?#0x7ff99834bc60::(nil)>
0x7ff9d0fff010: Queued root: 0x7ffa0525ff10 :: 0x7ff9f7a58fa0 (bits: 3)
        of type REPL.LineEdit.PromptState
0x7ff9d0fff028: Queued root: 0x7ffa0292b8b0 :: 0x7ff9f7a537f0 (bits: 3)
        of type REPL.REPLHistoryProvider
0x7ff9d0fff040: Queued root: 0x7ffa01cfded0 :: 0x7ff9f7a55840 (bits: 3)
        of type REPL.LineEdit.MIState
0x7ff9d0fff058: Queued root: 0x7ff9fc6d19b0 :: 0x7ff9f7993470 (bits: 7)
        of type Base.IdDict{Any, Any}
0x7ff9d0fff070:  r-- Stack frame 0x7ffcccf7ac30 -- 196 of 462 (direct)
0x7ff9d0fff098:   `- Object (16bit) 0x7ff9750eb660 :: 0x7ff984cce5d1 -- [18, 19)
        of type Tuple{Float64, Float64, NamedTuple{(Symbol("1"), Symbol("2")), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3")), Tuple{Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Sym
bol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7"), Symbol("8")), NTuple{8, Any}}, NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol(
"7"), Symbol("8")), NTuple{8, Any}}}, Any, Any}}, Any}}}

[499273] signal (6.-6): Aborted
in expression starting at REPL[7]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gc_assert_datatype_fail at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1912
gc_mark_loop at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3020
_jl_gc_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3400
ijl_gc_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3707
maybe_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1078 [inlined]
jl_gc_pool_alloc_inner at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1443 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1504 [inlined]
jl_gc_alloc_ at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia_internal.h:460 [inlined]
jl_gc_alloc at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3754
unknown function (ip: 0x7ff9a78233c8)
unknown function (ip: 0x7ff9a7827ff2)
unknown function (ip: 0x7ff9a781820d)
Allocations: 134280068 (Pool: 134191737; Big: 88331); GC: 134
Aborted (core dumped)

I'm on Linux:

(jl_ztgnWD) pkg> st
Status `/tmp/jl_ztgnWD/Project.toml`
  [7da242da] Enzyme v0.11.2 `https://github.com/EnzymeAD/Enzyme.jl.git#main`
  [fce5fe82] Turing v0.26.2 `https://github.com/TuringLang/Turing.jl.git#dw/enzyme`

julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_CMDSTAN_HOME = /home/sethaxen/software/cmdstan/2.30.1/
  JULIA_NUM_THREADS = auto
  JULIA_EDITOR = code

@vchuravy
Copy link
Member

How much ram does your system have? I suspect that in your case GC runs more frequently thus exposing the issue.

@sethaxen
Copy link
Collaborator Author

How much ram does your system have? I suspect that in your case GC runs more frequently thus exposing the issue.

31G

shell> free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi        17Gi       3.8Gi       6.4Gi       9.6Gi       5.0Gi
Swap:           19Gi        18Gi       1.1Gi

@wsmoses
Copy link
Member

wsmoses commented Jul 3, 2023

I think it unlikely to have fixed this, but I just landed some minor GC fixes, in case it changes this.

@wsmoses
Copy link
Member

wsmoses commented Jul 3, 2023

turerr.txt

@sethaxen
Copy link
Collaborator Author

sethaxen commented Jul 3, 2023

I think it unlikely to have fixed this, but I just landed some minor GC fixes, in case it changes this.

I ran the command in #650 (comment) a few dozen times without error, so I cautiously think it may be fixed for me now!

@wsmoses
Copy link
Member

wsmoses commented Jul 3, 2023

So I unfortunately reproduced it (log above) m, but that means I can debug it hopefully!

@wsmoses
Copy link
Member

wsmoses commented Jul 4, 2023

 Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007f2404b60b6e in gc_try_setmark (obj=0x89804, nptr=0x7f23e5d800b0, ptag=0x7ffc8bcf3780, pbits=0x7ffc8bcf373d "\001") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1965
1965	    uintptr_t tag = o->header;
(rr) bt
#0  0x00007f2404b60b6e in gc_try_setmark (obj=0x89804, nptr=0x7f23e5d800b0, ptag=0x7ffc8bcf3780, pbits=0x7ffc8bcf373d "\001")
    at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1965
#1  0x00007f2404b616d7 in gc_mark_scan_obj16 (ptls=0x55c69bb52b90, sp=0x7ffc8bcf3af0, obj16=0x7f23e5d80098, parent=0x7f23d2839770 "\230\227\203\322#\177", 
    begin=0x7f23d38e0050, end=0x7f23d38e0074, pnew_obj=0x7ffc8bcf3778, ptag=0x7ffc8bcf3780, pbits=0x7ffc8bcf373d "\001")
    at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:2250
#2  0x00007f2404b61e0b in gc_mark_loop (ptls=0x55c69bb52b90, sp=...) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:2526
#3  0x00007f2404b6570b in _jl_gc_collect (ptls=0x55c69bb52b90, collection=JL_GC_AUTO) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:3400
#4  0x00007f2404b663c3 in ijl_gc_collect (collection=JL_GC_AUTO) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:3706
#5  0x00007f2404b5e2e6 in maybe_collect (ptls=0x55c69bb52b90) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1078
#6  0x00007f2404b5f446 in jl_gc_pool_alloc_inner (ptls=0x55c69bb52b90, pool_offset=1824, osize=176) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1443
#7  0x00007f2404b5f7ac in jl_gc_pool_alloc_noinline (ptls=0x55c69bb52b90, pool_offset=1824, osize=176) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1504
#8  0x00007f2404b5b13e in jl_gc_alloc_ (ptls=0x55c69bb52b90, sz=160, ty=0x7f23fb1959d0) at /home/wmoses/git/Enzyme.jl/julia9/src/julia_internal.h:460
#9  0x00007f2404b665a4 in jl_gc_alloc (ptls=0x55c69bb52b90, sz=160, ty=0x7f23fb1959d0) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:3753
#10 0x00007f2404b67f51 in ijl_gc_alloc_typed (ptls=0x55c69bb52b90, sz=160, ty=0x7f23fb1959d0) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:4357
#11 0x00007f23d3a6360b in julia_model_1783 (__model__=..., __varinfo__=..., __context__=...) at /home/wmoses/git/Enzyme.jl/turerr.jl:5
#12 0x00007f23d3a4d385 in _evaluate!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:582
#13 evaluate_threadunsafe!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:555
#14 evaluate!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:508
#15 julia_logdensity_1776 (f=..., θ=<optimized out>) at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/logdensityfunction.jl:94
#16 0x00007f23d3a4d385 in julia_logdensity_1776 (f=..., θ=<optimized out>)
#17 0x00007f23d3a4d385 in diffejulia_logdensity_1776_inner_1wrap ()
#18 0x00007f23d3a97e96 in macro expansion () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9542
#19 enzyme_call () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9219
#20 CombinedAdjointThunk () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9182
#21 autodiff () at /home/wmoses/git/Enzyme.jl/src/Enzyme.jl:212
#22 autodiff () at /home/wmoses/git/Enzyme.jl/src/Enzyme.jl:221
#23 logdensity_and_gradient () at /home/wmoses/.julia/packages/LogDensityProblemsAD/JoNjv/ext/LogDensityProblemsADEnzymeExt.jl:73
#24 ∂logπ∂θ () at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:172
#25 julia_∂H∂θ_5711 (h=..., θ=<error reading variable: Cannot access memory at address 0x0>)
    at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:38
#26 0x00007f23d3aa4fd0 in julia_#step#9_5764 (fwd=<optimized out>, full_trajectory=..., lf=..., h=..., z=..., n_steps=18446744073709551615)
    at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/integrator.jl:228
#27 0x00007f23d3adb497 in step () at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/integrator.jl:198
#28 julia_build_tree_6281 (rng=..., nt=<error reading variable: Cannot access memory at address 0x3fe26d6ce21dc8be>, 
    h=<error reading variable: Cannot access memory at address 0x5000400030002>, z=<error reading variable: Cannot access memory at address 0xc00000568>, 
    sampler=..., v=18446744073709551615, j=0, H0=4617660585609877804) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:623
#29 0x00007f23d3adbebd in jfptr_build_tree_6282 ()
#30 0x00007f2404af1eab in _jl_invoke (F=0x7f23d86f0830 <jl_system_image_data+237552>, args=0x7ffc8bcfa0e0, nargs=8, mfunc=0x7f23d6e31f00, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#31 0x00007f2404af294a in ijl_apply_generic (F=0x7f23d86f0830 <jl_system_image_data+237552>, args=0x7ffc8bcfa0e0, nargs=8)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#32 0x00007f23d3adbb66 in julia_build_tree_6281 (rng=..., nt=<error reading variable: Cannot access memory at address 0x3fe26d6ce21dc8be>, h=..., z=..., 
    sampler=..., v=<optimized out>, j=<optimized out>, H0=139791832422256) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:638
#33 0x00007f23d3adb3e7 in julia_build_tree_6281 (rng=..., nt=<error reading variable: Cannot access memory at address 0x3fe26d6ce21dc8be>, h=..., z=..., 
    sampler=..., v=18446744073709551615, j=<optimized out>, H0=4617660585609877804) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:633
#34 0x00007f23d3adbebd in jfptr_build_tree_6282 ()
#35 0x00007f2404af1eab in _jl_invoke (F=0x7f23d86f0830 <jl_system_image_data+237552>, args=0x7ffc8bcfa738, nargs=8, mfunc=0x7f23d6e31f00, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#36 0x00007f2404af294a in ijl_apply_generic (F=0x7f23d86f0830 <jl_system_image_data+237552>, args=0x7ffc8bcfa738, nargs=8)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#37 0x00007f23d3a958b5 in julia_transition_5688 (rng=..., τ=<error reading variable: Cannot access memory at address 0x3fe26d6ce21dc8be>, h=..., z0=...)
    at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:687
#38 0x00007f23d3a995f2 in julia_transition_5680 (rng=..., h=..., κ=..., z=...) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/sampler.jl:59
#39 0x00007f23d3aed2b2 in julia_#step#45_6481 (nadapts=1000, kwargs=..., rng=..., model=..., spl=..., state=...)
    at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:253
#40 0x00007f23d3aee2d7 in julia_step_6478 (rng=..., model=<error reading variable: Cannot access memory at address 0x48>, spl=..., 
    state=<error reading variable: Cannot access memory at address 0x9bb54580000001>) at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:239
#41 0x00007f23d3aee307 in jfptr_step_6479 ()
#42 0x00007f2404af1eab in _jl_invoke (F=0x7f23f3b3a070 <jl_system_image_data+72355184>, args=0x7ffc8bcfed88, nargs=6, mfunc=0x7f23d5a85e10, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#43 0x00007f2404af294a in ijl_apply_generic (F=0x7f23f3b3a070 <jl_system_image_data+72355184>, args=0x7ffc8bcfed88, nargs=6)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#44 0x00007f23ed545b78 in macro expansion () at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:168
#45 macro expansion () at /home/wmoses/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328
#46 julia_#21_710 () at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:12
#47 0x00007f23ed549a34 in jfptr_#21_711 ()
#48 0x00007f2404af1eab in _jl_invoke (F=0x7f23d3463790, args=0x0, nargs=0, mfunc=0x7f23

@wsmoses
Copy link
Member

wsmoses commented Jul 4, 2023

(rr) p *pnew_obj
$4 = (jl_value_t *) 0x89804
(rr) layout src
(rr) p slot
$7 = (jl_value_t **) 0x7f23d2839780
(rr) p *slot
$8 = (jl_value_t *) 0x89804
(rr) p *(jl_value_t **) 0x7f23d2839780
$9 = (jl_value_t *) 0x89804
(rr) watch *(jl_value_t **) 0x7f23d2839780
Hardware watchpoint 1: *(jl_value_t **) 0x7f23d2839780




(rr) p *pnew_obj
$4 = (jl_value_t *) 0x89804
(rr) layout src
(rr) p slot
$7 = (jl_value_t **) 0x7f23d2839780
(rr) p *slot
$8 = (jl_value_t *) 0x89804
(rr) p *(jl_value_t **) 0x7f23d2839780
$9 = (jl_value_t *) 0x89804
(rr) watch *(jl_value_t **) 0x7f23d2839780
Hardware watchpoint 1: *(jl_value_t **) 0x7f23d2839780
(rr) reverse-continue
Continuing.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007f2404b60b6e in gc_try_setmark (obj=0x89804, nptr=0x7f23e5d800b0, ptag=0x7ffc8bcf3780, pbits=0x7ffc8bcf373d "\001") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1965
1965	    uintptr_t tag = o->header;
(rr) reverse-continue
Continuing.

Thread 1 hit Hardware watchpoint 1: *(jl_value_t **) 0x7f23d2839780

Old value = (jl_value_t *) 0x89804
New value = (jl_value_t *) 0x7f2300089804
0x00007f2404b1e7f9 in _new_array_ (atype=0x7f23f479af60 <jl_system_image_data+85335136>, ndims=1, dims=0x7ffc8bcfe100, isunboxed=0 '\000', hasptr=0 '\000', isunion=0 '\000', zeroinit=1 '\001', elsz=8) at /home/wmoses/git/Enzyme.jl/julia9/src/array.c:163
163	    a->offset = 0;
(rr) p a
$10 = (jl_array_t *) 0x7f23d2839770
(rr) p &a->offset
$11 = (uint32_t *) 0x7f23d2839784



Thread 1 hit Hardware watchpoint 1: *(jl_value_t **) 0x7f23d2839780

Old value = (jl_value_t *) 0x89804
New value = (jl_value_t *) 0x7f2300089804
0x00007f2404b1e7f9 in _new_array_ (atype=0x7f23f479af60 <jl_system_image_data+85335136>, ndims=1, dims=0x7ffc8bcfe100, isunboxed=0 '\000', hasptr=0 '\000', isunion=0 '\000', zeroinit=1 '\001', elsz=8) at /home/wmoses/git/Enzyme.jl/julia9/src/array.c:163
163	    a->offset = 0;
(rr) p a
$10 = (jl_array_t *) 0x7f23d2839770
(rr) p &a->offset
$11 = (uint32_t *) 0x7f23d2839784
(rr) layout src
(rr) p a
$12 = (jl_array_t *) 0x7f23d2839770
(rr) bt
#0  0x00007f2404b1e7f9 in _new_array_ (atype=0x7f23f479af60 <jl_system_image_data+85335136>, ndims=1, dims=0x7ffc8bcfe100, isunboxed=0 '\000', hasptr=0 '\000', 
    isunion=0 '\000', zeroinit=1 '\001', elsz=8) at /home/wmoses/git/Enzyme.jl/julia9/src/array.c:163
#1  0x00007f2404b1ea7e in _new_array (atype=0x7f23f479af60 <jl_system_image_data+85335136>, ndims=1, dims=0x7ffc8bcfe100)
    at /home/wmoses/git/Enzyme.jl/julia9/src/array.c:198
#2  0x00007f2404b1f8c6 in ijl_alloc_array_1d (atype=0x7f23f479af60 <jl_system_image_data+85335136>, nr=16) at /home/wmoses/git/Enzyme.jl/julia9/src/array.c:436
#3  0x00007f23d3af7430 in Array () at boot.jl:477
#4  julia_Dict_6651 () at dict.jl:70
#5  0x00007f23d3af7eff in julia_Dict_6637 (kv=...) at dict.jl:81
#6  0x00007f23d3af82a3 in dict_with_eltype () at abstractdict.jl:581
#7  dict_with_eltype () at abstractdict.jl:588
#8  julia_Dict_6634 (kv=...) at dict.jl:109
#9  0x00007f23d3af85a7 in jfptr_Dict_6635 ()
#10 0x00007f2404af1eab in _jl_invoke (F=0x7f23ef659430 <jl_system_image_data+131376>, args=0x7ffc8bcfe588, nargs=1, mfunc=0x7f23d7581aa0, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#11 0x00007f2404af294a in ijl_apply_generic (F=0x7f23ef659430 <jl_system_image_data+131376>, args=0x7ffc8bcfe588, nargs=1)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#12 0x00007f23d3af2e38 in julia_#9_6580 (t=...) at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:261
#13 0x00007f23d3af879b in iterate () at generator.jl:47
#14 julia_collect_to!_6658 (dest=<error reading variable: Cannot access memory at address 0x0>, itr=..., offs=<optimized out>, st=<optimized out>) at array.jl:840
#15 0x00007f23d3af8a48 in julia_collect_to_with_first!_6655 (dest=<error reading variable: Cannot access memory at address 0x0>, v1=..., itr=..., st=2)
    at array.jl:818
#16 0x00007f23d3af8aea in jfptr_collect_to_with_first!_6656 ()
#17 0x00007f2404af1eab in _jl_invoke (F=0x7f23ef7cf630 <jl_system_image_data+1663792>, args=0x7ffc8bcfe9e8, nargs=4, mfunc=0x7f23dc784bf0, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#18 0x00007f2404af294a in ijl_apply_generic (F=0x7f23ef7cf630 <jl_system_image_data+1663792>, args=0x7ffc8bcfe9e8, nargs=4)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#19 0x00007f23d3af34d0 in julia__collect_6576 (c=<error reading variable: Cannot access memory at address 0x0>, itr=..., isz=...) at array.jl:812
#20 0x00007f23d3af4552 in collect_similar () at array.jl:711
#21 map () at abstractarray.jl:3261
#22 julia__params_to_array_6566 (ts=<error reading variable: Cannot access memory at address 0x0>)
    at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:255
#23 0x00007f23d3af4d07 in julia_#bundle_samples#29_6554 (save_state=0 '\000', stats=<error reading variable: Cannot access memory at address 0x41d928cf6789db66>, 
    sort_chain=0 '\000', discard_initial=1000, thinning=1, kwargs=..., ts=<error reading variable: Cannot access memory at address 0x0>, model=..., spl=..., 
    state=..., chain_type=0x7f23d97730b0 <jl_system_image_data+9456>) at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:338
#24 0x00007f23d3af5c63 in julia_bundle_samples_6551 (ts=<error reading variable: Cannot access memory at address 0x0>, 
    model=<error reading variable: Cannot access memory at address 0x0>, spl=<error reading variable: Cannot access memory at address 0xf>, state=..., 
    chain_type=0x0) at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:323
#25 0x00007f23d3af5ccf in jfptr_bundle_samples_6552 ()
#26 0x00007f2404af1eab in _jl_invoke (F=0x7f23f3b3a070 <jl_system_image_data+72355184>, args=0x7ffc8bd021c8, nargs=7, mfunc=0x7f23d6e08600, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#27 0x00007f2404af294a in ijl_apply_generic (F=0x7f23f3b3a070 <jl_system_image_data+72355184>, args=0x7ffc8bd021c8, nargs=7)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#28 0x00007f23ed539175 in julia_#mcmcsample#20_461 (progress=1 '\001', progressname=<error reading variable: Cannot access memory at address 0x0>, callback=..., 
    discard_initial=1000, thinning=1, chain_type=0x0, kwargs=<error reading variable: Cannot access memory at address 0x3e8>, rng=..., model=..., sampler=..., 
    N=10000) at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:190
#29 0x00007f23ed53bf72 in jfptr_#mcmcsample#20_462 ()
#30 0x00007f2404af1eab in _jl_invoke (F=0x7f23dfc584f0 <jl_system_image_data+211248>, args=0x7ffc8bd02420, nargs=12, mfunc=0x7f23d4436db0, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#31 0x00007f2404af1fe6 in ijl_invoke (F=0x7f23dfc584f0 <jl_system_image_data+211248>, args=0x7ffc8bd02420, nargs=12, mfunc=0x7f23d4436db0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2765
#32 0x00007f23ed5208ee in julia_mcmcsample_459 (rng=..., model=<error reading variable: Cannot access memory at address 0x3fe0000000000000>, 
    sampler=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, N=10000)
    at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:95
#33 0x00007f23ed52095a in jfptr_mcmcsample_460 ()
#34 0x00007f2404af1eab in _jl_invoke (F=0x7f23f3b3a070 <jl_system_image_data+72355184>, args=0x7ffc8bd02630, nargs=6, mfunc=0x7f23dd679410, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#35 0x00007f2404af294a in ijl_apply_generic (F=0x7f23f3b3a070 <jl_system_image_data+72355184>, args=0x7ffc8bd02630, nargs=6)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#36 0x00007f23ed52052e in julia_#sample#42_455 (chain_type=0x0, resume_from=..., progress=<optimized out>, nadapts=<optimized out>, 
    discard_adapt=<optimized out>, discard_initial=<optimized out>, kwargs=..., rng=..., 
    model=<error reading variable: Cannot access memory at address 0x3fe0000000000000>, sampler=..., N=10000)
    at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:133
#37 0x00007f23ed5205b5 in jfptr_#sample#42_456 ()
#38 0x00007f2404af1eab in _jl_invoke (F=0x7f23d7cd0be0 <jl_system_image_data+285216>, args=0x7ffc8bd02860, nargs=12, mfunc=0x7f23dd53e9f0, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#39 0x00007f2404af1fe6 in ijl_invoke (F=0x7f23d7cd0be0 <jl_system_image_data+285216>, args=0x7ffc8bd02860, nargs=12, mfunc=0x7f23dd53e9f0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2765
--Type <RET> for more, q to quit, c to continue without paging--
#40 0x00007f23ed52028e in sample () at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:103
#41 #sample#2 () at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:146
#42 sample () at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:139
#43 #sample#1 () at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:136
#44 julia_sample_453 (model=<error reading variable: Cannot access memory at address 0x3fe0000000000000>, alg=..., N=10000)
    at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:130
#45 0x00007f23ed5202f5 in jfptr_sample_454 ()
#46 0x00007f2404af1eab in _jl_invoke (F=0x7f23e1756c20 <jl_system_image_data+159776>, args=0x7ffc8bd02a48, nargs=3, mfunc=0x7f23dd5a41f0, world=33656)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#47 0x00007f2404af294a in ijl_apply_generic (F=0x7f23e1756c20 <jl_system_image_data+159776>, args=0x7ffc8bd02a48, nargs=3)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#48 0x00007f2404b11c18 in jl_apply (args=0x7ffc8bd02a40, nargs=4) at /home/wmoses/git/Enzyme.jl/julia9/src/julia.h:1879
#49 0x00007f2404b1211b in do_call (args=0x7f23d4f6d7a8, nargs=4, s=0x7ffc8bd02e80) at /home/wmoses/git/Enzyme.jl/julia9/src/interpreter.c:126
#50 0x00007f2404b12a3a in eval_value (e=0x7f23deb75890, s=0x7ffc8bd02e80) at /home/wmoses/git/Enzyme.jl/julia9/src/interpreter.c:226
#51 0x00007f2404b12518 in eval_stmt_value (stmt=0x7f23deb75890, s=0x7ffc8bd02e80) at /home/wmoses/git/Enzyme.jl/julia9/src/interpreter.c:177
#52 0x00007f2404b14d06 in eval_body (stmts=0x7f23d7b4f2b0, s=0x7ffc8bd02e80, ip=10, toplevel=1) at /home/wmoses/git/Enzyme.jl/julia9/src/interpreter.c:606
#53 0x00007f2404b15b82 in jl_interpret_toplevel_thunk (m=0x7f23f416eef0 <jl_system_image_data+78863344>, src=0x7f23d7b4f3d0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/interpreter.c:762
#54 0x00007f2404b3cf68 in jl_toplevel_eval_flex (m=0x7f23f416eef0 <jl_system_image_data+78863344>, e=0x7f23f91d6530, fast=1, expanded=0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/toplevel.c:912
#55 0x00007f2404b3ca17 in jl_toplevel_eval_flex (m=0x7f23f416eef0 <jl_system_image_data+78863344>, e=0x7f23f91d6ab0, fast=1, expanded=0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/toplevel.c:856
#56 0x00007f2404b3cfc8 in ijl_toplevel_eval (m=0x7f23f416eef0 <jl_system_image_data+78863344>, v=0x7f23f91d6ab0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/toplevel.c:921
#57 0x00007f2404b3d26a in ijl_toplevel_eval_in (m=0x7f23f416eef0 <jl_system_image_data+78863344>, ex=0x7f23f91d6ab0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/toplevel.c:971
#58 0x00007f23ef3732d4 in eval () at boot.jl:370
#59 japi1_include_string_55222 (mapexpr=..., mod=0x7f23fd0be730, code=0x226, filename=0x24) at loading.jl:1903
#60 0x00007f2404af0b6a in jl_fptr_args (f=0x7f23f280d6d0 <jl_system_image_data+52249552>, args=0x7ffc8bd037f0, nargs=4, 
    m=0x7f23f280e780 <jl_system_image_data+52253824>) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2405
#61 0x00007f2404af1eab in _jl_invoke (F=0x7f23f280d6d0 <jl_system_image_data+52249552>, args=0x7ffc8bd037f0, nargs=4, 
    mfunc=0x7f23f280e730 <jl_system_image_data+52253744>, world=33460) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#62 0x00007f2404af294a in ijl_apply_generic (F=0x7f23f280d6d0 <jl_system_image_data+52249552>, args=0x7ffc8bd037f0, nargs=4)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#63 0x00007f23ef37bf8b in japi1__include_49160 (mapexpr=0x7f23efff6520 <jl_system_image_data+10211872>, mod=0x7f23fd0be730, _path=0x9) at loading.jl:1963
#64 0x00007f23eee7b1c0 in julia_include_32064 (mod=0x7f23fd0be730, _path=0x9) at Base.jl:457
#65 0x00007f23eee7b210 in jfptr_include_32065 () from /home/wmoses/git/Enzyme.jl/julia9/usr/lib/julia/sys-debug.so
#66 0x00007f2404af1eab in _jl_invoke (F=0x7f23f0035e20 <jl_system_image_data+10472224>, args=0x7ffc8bd04f00, nargs=2, 
    mfunc=0x7f23f0036090 <jl_system_image_data+10472848>, world=33460) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#67 0x00007f2404af294a in ijl_apply_generic (F=0x7f23f0035e20 <jl_system_image_data+10472224>, args=0x7ffc8bd04f00, nargs=2)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#68 0x00007f23ef452647 in julia_exec_options_40033 (opts=<error reading variable: Cannot access memory at address 0xff00>) at client.jl:307
#69 0x00007f23ef452ef5 in julia__start_48050 () at client.jl:522
#70 0x00007f23ef453069 in jfptr.start_48051 () from /home/wmoses/git/Enzyme.jl/julia9/usr/lib/julia/sys-debug.so
#71 0x00007f2404af1eab in _jl_invoke (F=0x7f23f2534590 <jl_system_image_data+49263248>, args=0x7ffc8bd05370, nargs=0, 
    mfunc=0x7f23f25343f0 <jl_system_image_data+49262832>, world=33460) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#72 0x00007f2404af294a in ijl_apply_generic (F=0x7f23f2534590 <jl_system_image_data+49263248>, args=0x7ffc8bd05370, nargs=0)
    at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#73 0x00007f2404b74419 in jl_apply (args=0x7ffc8bd05368, nargs=1) at /home/wmoses/git/Enzyme.jl/julia9/src/julia.h:1879
#74 0x00007f2404b7629a in true_main (argc=1, argv=0x7ffc8bd05798) at /home/wmoses/git/Enzyme.jl/julia9/src/jlapi.c:573
#75 0x00007f2404b7684f in jl_repl_entrypoint (argc=1, argv=0x7ffc8bd05788) at /home/wmoses/git/Enzyme.jl/julia9/src/jlapi.c:717
#76 0x00007f240572c184 in jl_load_repl (argc=3, argv=0x7ffc8bd05788) at /home/wmoses/git/Enzyme.jl/julia9/cli/loader_lib.c:529
#77 0x000055c69a7fe1b9 in main (argc=3, argv=0x7ffc8bd05788) at /home/wmoses/git/Enzyme.jl/julia9/cli/loader_exe.c:59
(rr) 

@wsmoses
Copy link
Member

wsmoses commented Jul 5, 2023

Okay after much depth of debugging, @gbaraldi and I found the actual source of the GC error (and subsequently fixed it here: EnzymeAD/Enzyme#1314 (review)). Once that lands on Enzyme proper, we'll cut a jll, then land that here.

At that point try it again (will bump people), and let's see if it is all happy!

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

Landed on main, try it @sethaxen @devmotion @yebai ?

@yebai
Copy link

yebai commented Jul 6, 2023

It seems more stable. I can repeat the sample line several times now. However, I still ran into a segfault simply by repeating

julia> sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10000)
┌ Info: Found initial step size
└   ϵ = 0.80%|                                                                                    |  ETA: N/A

[2058] signal (10.2): Bus error: 10█▎                                                             |  ETA: 0:00:01
in expression starting at REPL[4]:1
gc_mark_loop at /Users/hg344/.julia/juliaup/julia-1.9.1+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
_jl_gc_collect at /Users/hg344/.julia/juliaup/julia-1.9.1+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
ijl_gc_collect at /Users/hg344/.julia/juliaup/julia-1.9.1+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_gc_pool_alloc_inner at /Users/hg344/.julia/juliaup/julia-1.9.1+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
ijl_gc_alloc_typed at /Users/hg344/.julia/juliaup/julia-1.9.1+0.x64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
Allocations: 239260121 (Pool: 239180571; Big: 79550); GC: 192

Enviroment:

julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Apple M2
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, westmere)
  Threads: 1 on 8 virtual cores

pkg> st
Status `~/projects/enzyme-turing/Project.toml`
  [7da242da] Enzyme v0.11.3 `https://github.com/EnzymeAD/Enzyme.jl.git#main`
  [f6369f11] ForwardDiff v0.10.35
  [37e2e3b7] ReverseDiff v1.14.6
  [fce5fe82] Turing v0.26.2 `https://github.com/TuringLang/Turing.jl.git#dw/enzyme`
  [e88e6eb3] Zygote v0.6.62

@yebai
Copy link

yebai commented Jul 6, 2023

Ps. The segfault is easier to reproduce on a machine with smaller RAM.

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

0x00007fed78247b6e in gc_try_setmark (obj=0x12, nptr=0x7fed59383f20, ptag=0x7ffcaeebf0a0, pbits=0x7ffcaeebf05d "\001") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1965
1965	    uintptr_t tag = o->header;
(rr) up 1
#1  0x00007fed782486d7 in gc_mark_scan_obj16 (ptls=0x561d0638cb90, sp=0x7ffcaeebf410, obj16=0x7fed59383f08, parent=0x7fed51706770 "\022", begin=0x7fed56578f9c, end=0x7fed56578fc0, pnew_obj=0x7ffcaeebf098, ptag=0x7ffcaeebf0a0, 
    pbits=0x7ffcaeebf05d "\001") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:2250
2250	        if (!gc_try_setmark(*pnew_obj, &obj16->nptr, ptag, pbits))
(rr) l
2245	        if (*pnew_obj) {
2246	            verify_parent2("object", parent, slot, "field(%d)",
2247	                           gc_slot_to_fieldidx(parent, slot, (jl_datatype_t*)jl_typeof(parent)));
2248	            gc_heap_snapshot_record_object_edge((jl_value_t*)parent, slot);
2249	        }
2250	        if (!gc_try_setmark(*pnew_obj, &obj16->nptr, ptag, pbits))
2251	            continue;
2252	        begin++;
2253	        // Found an object to mark
2254	        if (begin < end) {
(rr) layout src
(rr) p parent
$8 = 0x7fed51706770 "\022"
(rr) up 1
#2  0x00007fed78248e0b in gc_mark_loop (ptls=0x561d0638cb90, sp=...) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:2526
2526	    if (gc_mark_scan_obj16(ptls, &sp, obj16, obj16_parent, obj16_begin, obj16_end,
(rr) p obj16->parent
p$9 = (jl_value_t *) 0x7fed51706770
(rr) p jl_(obj16->parent)
Base.Fix1{typeof(DynamicPPL.istrans), DynamicPPL.VarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Array{Distributions.Normal{Float64}, 1}, Array{AbstractPPL.VarName{:m, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}, DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Array{Distributions.InverseGamma{Float64}, 1}, Array{AbstractPPL.VarName{:s, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}}}, Float64}}(f=typeof(DynamicPPL.istrans)(), x=DynamicPPL.VarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Array{Distributions.Normal{Float64}, 1}, Array{AbstractPPL.VarName{:m, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}, DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Array{Distributions.InverseGamma{Float64}, 1}, Array{AbstractPPL.VarName{:s, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}}}, Float64}(metadata=(m=DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Array{Distributions.Normal{Float64}, 1}, Array{AbstractPPL.VarName{:m, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}(idcs=#<18>, vns=Union{}, ranges=StaticArraysCore.StaticArray{Tuple{s1, s2}, T, 2}, vals=#<null>, dists=Union{}, gids=<?#0x7fed6e632ad0::<?#0x7fed6e632b00::(nil)>>, orders=#<null>, flags=T), s=DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Array{Distributions.InverseGamma{Float64}, 1}, Array{AbstractPPL.VarName{:s, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}(idcs=T, vns=#<null>, ranges=Union{}, vals=AbstractFloat, dists=#<null>, gids=Union{}, orders=Any, flags=#<null>)), logp=Union{}, num_produce=Any))
$10 = void
(rr) bt
#0  0x00007fed78247b6e in gc_try_setmark (obj=0x12, nptr=0x7fed59383f20, ptag=0x7ffcaeebf0a0, pbits=0x7ffcaeebf05d "\001") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1965
#1  0x00007fed782486d7 in gc_mark_scan_obj16 (ptls=0x561d0638cb90, sp=0x7ffcaeebf410, obj16=0x7fed59383f08, parent=0x7fed51706770 "\022", begin=0x7fed56578f9c, end=0x7fed56578fc0, pnew_obj=0x7ffcaeebf098, ptag=0x7ffcaeebf0a0, 
    pbits=0x7ffcaeebf05d "\001") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:2250
#2  0x00007fed78248e0b in gc_mark_loop (ptls=0x561d0638cb90, sp=...) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:2526
#3  0x00007fed7824c70b in _jl_gc_collect (ptls=0x561d0638cb90, collection=JL_GC_AUTO) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:3400
#4  0x00007fed7824d3c3 in ijl_gc_collect (collection=JL_GC_AUTO) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:3706
#5  0x00007fed782452e6 in maybe_collect (ptls=0x561d0638cb90) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1078
#6  0x00007fed78246446 in jl_gc_pool_alloc_inner (ptls=0x561d0638cb90, pool_offset=1800, osize=160) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1443
#7  0x00007fed7824675e in ijl_gc_pool_alloc (ptls=0x561d0638cb90, pool_offset=1800, osize=160) at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1494
#8  0x00007fed470478e4 in julia__all_1912 (f=..., itr=<optimized out>) at reduce.jl:1283
#9  0x00007fed47048024 in #all#830 () at reducedim.jl:1007
#10 all () at reducedim.jl:1007
#11 istrans () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/abstract_varinfo.jl:355
#12 julia_istrans_1906 (vi=...) at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/abstract_varinfo.jl:353
#13 0x00007fed47048dc2 in maybe_invlink_before_eval!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/varinfo.jl:811
#14 macro expansion () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:612
#15 make_evaluate_args_and_kwargs () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:590
#16 _evaluate!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:581
#17 evaluate_threadunsafe!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:555
#18 evaluate!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:508
#19 julia_logdensity_1776 (f=..., θ=<optimized out>) at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/logdensityfunction.jl:94
#20 0x00007fed47048dc2 in julia_logdensity_1776 (f=..., θ=<optimized out>)
#21 0x00007fed47048dc2 in diffejulia_logdensity_1776_inner_1wrap ()
#22 0x00007fed4708fed6 in macro expansion () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9553
#23 enzyme_call () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9230
#24 CombinedAdjointThunk () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9193
#25 autodiff () at /home/wmoses/git/Enzyme.jl/src/Enzyme.jl:212
#26 autodiff () at /home/wmoses/git/Enzyme.jl/src/Enzyme.jl:221
#27 logdensity_and_gradient () at /home/wmoses/.julia/packages/LogDensityProblemsAD/JoNjv/ext/LogDensityProblemsADEnzymeExt.jl:73
#28 ∂logπ∂θ () at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:172
#29 julia_∂H∂θ_5710 (h=..., θ=<error reading variable: Cannot access memory at address 0x0>) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:38
#30 0x00007fed470a1230 in julia_#step#9_5763 (fwd=<optimized out>, full_trajectory=..., lf=..., h=..., z=..., n_steps=18446744073709551615) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/integrator.jl:228
#31 0x00007fed470d7307 in step () at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/integrator.jl:198
#32 julia_build_tree_6280 (rng=..., nt=<error reading variable: Cannot access memory at address 0x3fdb32ae222900b0>, h=<error reading variable: Cannot access memory at address 0x3000200010000>, 
    z=<error reading variable: Cannot access memory at address 0x500000098>, sampler=..., v=18446744073709551615, j=0, H0=4615223628109399973) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:623
#33 0x00007fed470d7257 in julia_build_tree_6280 (rng=..., nt=<error reading variable: Cannot access memory at address 0x3fdb32ae222900b0>, h=..., z=..., sampler=..., v=18446744073709551615, j=<optimized out>, H0=4615223628109399973)
    at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:633
#34 0x00007fed470d7d2d in jfptr_build_tree_6281 ()
#35 0x00007fed781d8eab in _jl_invoke (F=0x7fed4bd18590 <jl_system_image_data+417104>, args=0x7ffcaeec4bf8, nargs=8, mfunc=0x7fed42c9cbf0, world=33658) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#36 0x00007fed781d994a in ijl_apply_generic (F=0x7fed4bd18590 <jl_system_image_data+417104>, args=0x7ffcaeec4bf8, nargs=8) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#37 0x00007fed4708d8f5 in julia_transition_5687 (rng=..., τ=<error reading variable: Cannot access memory at address 0x3fdb32ae222900b0>, h=..., z0=...) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:687
#38 0x00007fed47091632 in julia_transition_5679 (rng=..., h=..., κ=..., z=...) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/sampler.jl:59
#39 0x00007fed470e90f1 in julia_#step#45_6480 (nadapts=1000, kwargs=..., rng=..., model=..., spl=..., state=...) at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:253
#40 0x00007fed470ea117 in julia_step_6477 (rng=..., model=<error reading variable: Cannot access memory at address 0x48>, spl=..., state=<error reading variable: Cannot access memory at address 0x6f3cf598000001>)
    at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:239
#41 0x00007fed470ea147 in jfptr_step_6478 ()
#42 0x00007fed781d8eab in _jl_invoke (F=0x7fed671bf070 <jl_system_image_data+72355184>, args=0x7ffcaeec9228, nargs=6, mfunc=0x7fed473b4600, world=33658) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#43 0x00007fed781d994a in ijl_apply_generic (F=0x7fed671bf070 <jl_system_image_data+72355184>, args=0x7ffcaeec9228, nargs=6) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#44 0x00007fed60b401b6 in macro expansion () at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:136
#45 macro expansion () at /home/wmoses/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

(rr) p *begin
$15 = 0

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

(rr) p obj16->parent
$2 = (jl_value_t *) 0x7fed51706770
(rr) p jl_(obj16->parent)
Base.Fix1{typeof(DynamicPPL.istrans), DynamicPPL.VarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Array{Distributions.Normal{Float64}, 1}, Array{AbstractPPL.VarName{:m, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}, DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Array{Distributions.InverseGamma{Float64}, 1}, Array{AbstractPPL.VarName{:s, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}}}, Float64}}(f=typeof(DynamicPPL.istrans)(), x=DynamicPPL.VarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Array{Distributions.Normal{Float64}, 1}, Array{AbstractPPL.VarName{:m, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}, DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Array{Distributions.InverseGamma{Float64}, 1}, Array{AbstractPPL.VarName{:s, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}}}, Float64}(metadata=(m=DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Array{Distributions.Normal{Float64}, 1}, Array{AbstractPPL.VarName{:m, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}(idcs=#<18>, vns=Union{}, ranges=StaticArraysCore.StaticArray{Tuple{s1, s2}, T, 2}, vals=#<null>, dists=Union{}, gids=<?#0x7fed6e632ad0::<?#0x7fed6e632b00::(nil)>>, orders=#<null>, flags=T), s=DynamicPPL.Metadata{Base.Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Array{Distributions.InverseGamma{Float64}, 1}, Array{AbstractPPL.VarName{:s, Setfield.IdentityLens}, 1}, Array{Float64, 1}, Array{Base.Set{DynamicPPL.Selector}, 1}}(idcs=T, vns=#<null>, ranges=Union{}, vals=AbstractFloat, dists=#<null>, gids=Union{}, orders=Any, flags=#<null>)), logp=Union{}, num_produce=Any))
$3 = void
(rr) p jl_astaggedvalue(0x7fed51706770)
$4 = (jl_taggedvalue_t *) 0x7fed51706768
(rr) watch *(jl_taggedvalue_t *) 0x7fed51706768
Hardware watchpoint 1: *(jl_taggedvalue_t *) 0x7fed51706768

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

(rr) watch *(jl_taggedvalue_t *) 0x7fed51706768
Hardware watchpoint 1: *(jl_taggedvalue_t *) 0x7fed51706768
(rr) reverse-continue
Continuing.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007fed78247b6e in gc_try_setmark (obj=0x12, nptr=0x7fed59383f20, ptag=0x7ffcaeebf0a0, pbits=0x7ffcaeebf05d "\001") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:1965
1965	    uintptr_t tag = o->header;
(rr) reverse-continue
Continuing.

Thread 1 hit Hardware watchpoint 1: *(jl_taggedvalue_t *) 0x7fed51706768

Old value = {{header = 140657259329041, next = 0x7fed51fa3611, type = 0x7fed51fa3611, bits = {gc = 1, in_image = 0}}}
New value = {{header = 140657259329040, next = 0x7fed51fa3610, type = 0x7fed51fa3610, bits = {gc = 0, in_image = 0}}}
0x00007fed78244d32 in gc_setmark_tag (o=0x7fed51706768, mark_mode=1 '\001', tag=140657259329041, bits=0x7ffcaeebf01b "\001\374\177") at /home/wmoses/git/Enzyme.jl/julia9/src/gc.c:937
937	    tag = jl_atomic_exchange_relaxed((_Atomic(uintptr_t)*)&o->header, tag);
(rr) 
Continuing.

Thread 1 hit Hardware watchpoint 1: *(jl_taggedvalue_t *) 0x7fed51706768

Old value = {{header = 140657259329040, next = 0x7fed51fa3610, type = 0x7fed51fa3610, bits = {gc = 0, in_image = 0}}}
New value = {{header = 140657626854080, next = 0x7fed67e232c0 <jl_system_image_data+85348288>, type = 0x7fed67e232c0 <jl_system_image_data+85348288>, bits = {gc = 0, in_image = 0}}}
0x00007fed470478d1 in julia__all_1912 (f=..., itr=<optimized out>) at reduce.jl:1283
1283	reduce.jl: No such file or directory.
(rr) bt
#0  0x00007fed470478d1 in julia__all_1912 (f=..., itr=<optimized out>) at reduce.jl:1283
#1  0x00007fed47048024 in #all#830 () at reducedim.jl:1007
#2  all () at reducedim.jl:1007
#3  istrans () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/abstract_varinfo.jl:355
#4  julia_istrans_1906 (vi=...) at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/abstract_varinfo.jl:353
#5  0x00007fed47048dc2 in maybe_invlink_before_eval!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/varinfo.jl:811
#6  macro expansion () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:612
#7  make_evaluate_args_and_kwargs () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:590
#8  _evaluate!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:581
#9  evaluate_threadunsafe!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:555
#10 evaluate!! () at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/model.jl:508
#11 julia_logdensity_1776 (f=..., θ=<optimized out>) at /home/wmoses/.julia/packages/DynamicPPL/oJMmE/src/logdensityfunction.jl:94
#12 0x00007fed47048dc2 in julia_logdensity_1776 (f=..., θ=<optimized out>)
#13 0x00007fed47048dc2 in diffejulia_logdensity_1776_inner_1wrap ()
#14 0x00007fed4708fed6 in macro expansion () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9553
#15 enzyme_call () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9230
#16 CombinedAdjointThunk () at /home/wmoses/git/Enzyme.jl/src/compiler.jl:9193
#17 autodiff () at /home/wmoses/git/Enzyme.jl/src/Enzyme.jl:212
#18 autodiff () at /home/wmoses/git/Enzyme.jl/src/Enzyme.jl:221
#19 logdensity_and_gradient () at /home/wmoses/.julia/packages/LogDensityProblemsAD/JoNjv/ext/LogDensityProblemsADEnzymeExt.jl:73
#20 ∂logπ∂θ () at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:172
#21 julia_∂H∂θ_5710 (h=..., θ=<error reading variable: Cannot access memory at address 0x0>) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:38
#22 0x00007fed470a1230 in julia_#step#9_5763 (fwd=<optimized out>, full_trajectory=..., lf=..., h=<error reading variable: Cannot access memory at address 0xa0>, z=..., n_steps=18446744073709551615)
    at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/integrator.jl:228
#23 0x00007fed470d7307 in step () at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/integrator.jl:198
#24 julia_build_tree_6280 (rng=..., nt=<error reading variable: Cannot access memory at address 0x3fdb32ae222900b0>, h=..., z=..., sampler=..., v=18446744073709551615, j=0, H0=4615223628109399973)
    at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:623
#25 0x00007fed470d7257 in julia_build_tree_6280 (rng=..., nt=<error reading variable: Cannot access memory at address 0x3fdb32ae222900b0>, h=..., z=<error reading variable: Cannot access memory at address 0xa0>, 
    sampler=..., v=18446744073709551615, j=<optimized out>, H0=4615223628109399973) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:633
#26 0x00007fed470d7d2d in jfptr_build_tree_6281 ()
#27 0x00007fed781d8eab in _jl_invoke (F=0x7fed4bd18590 <jl_system_image_data+417104>, args=0x7ffcaeec4bf8, nargs=8, mfunc=0x7fed42c9cbf0, world=33658) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#28 0x00007fed781d994a in ijl_apply_generic (F=0x7fed4bd18590 <jl_system_image_data+417104>, args=0x7ffcaeec4bf8, nargs=8) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#29 0x00007fed4708d8f5 in julia_transition_5687 (rng=..., τ=<error reading variable: Cannot access memory at address 0x3fdb32ae222900b0>, h=..., z0=...)
    at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/trajectory.jl:687
#30 0x00007fed47091632 in julia_transition_5679 (rng=..., h=..., κ=..., z=...) at /home/wmoses/.julia/packages/AdvancedHMC/2MdYL/src/sampler.jl:59
#31 0x00007fed470e90f1 in julia_#step#45_6480 (nadapts=1000, kwargs=..., rng=..., model=..., spl=..., state=...) at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:253
#32 0x00007fed470ea117 in julia_step_6477 (rng=..., model=<error reading variable: Cannot access memory at address 0x708>, spl=<error reading variable: Cannot access memory at address 0xa0>, 
    state=<error reading variable: Cannot access memory at address 0xa0>) at /home/wmoses/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:239
#33 0x00007fed470ea147 in jfptr_step_6478 ()
#34 0x00007fed781d8eab in _jl_invoke (F=0x7fed671bf070 <jl_system_image_data+72355184>, args=0x7ffcaeec9228, nargs=6, mfunc=0x7fed473b4600, world=33658) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#35 0x00007fed781d994a in ijl_apply_generic (F=0x7fed671bf070 <jl_system_image_data+72355184>, args=0x7ffcaeec9228, nargs=6) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#36 0x00007fed60b401b6 in macro expansion () at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:136
#37 macro expansion () at /home/wmoses/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328
#38 julia_#21_710 () at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:12
#39 0x00007fed60b49a44 in jfptr_#21_711 ()
#40 0x00007fed781d8eab in _jl_invoke (F=0x7fed4f0ca890, args=0x0, nargs=0, mfunc=0x7fed4fb37440, world=33658) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#41 0x00007fed781d994a in ijl_apply_generic (F=0x7fed4f0ca890, args=0x0, nargs=0) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#42 0x00007fed6234df77 in julia_with_logstate_35393 (f=0x7fed4f0ca890, logstate=0x7fed528d7610) at logging.jl:514
#43 0x00007fed60b3dc14 in julia_with_logger_708 (f=0x0, logger=...) at logging.jl:626
#44 0x00007fed60b3dc6b in jfptr_with_logger_709 ()
#45 0x00007fed781d8eab in _jl_invoke (F=0x7fed6350ef00 <jl_system_image_data+8719360>, args=0x7ffcaeec9790, nargs=2, mfunc=0x7fed50b91d20, world=33658) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#46 0x00007fed781d8fe6 in ijl_invoke (F=0x7fed6350ef00 <jl_system_image_data+8719360>, args=0x7ffcaeec9790, nargs=2, mfunc=0x7fed50b91d20) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2765
#47 0x00007fed60b3d998 in julia_with_progresslogger_679 (f=0x0, _module=<error reading variable: Cannot access memory at address 0x0>, logger=...) at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:36
#48 0x00007fed60b3daf2 in jfptr_with_progresslogger_680 ()
#49 0x00007fed781d8eab in _jl_invoke (F=0x7fed53247260 <jl_system_image_data+116384>, args=0x7ffcaeecc668, nargs=3, mfunc=0x7fed4fafba30, world=33658) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2739
#50 0x00007fed781d994a in ijl_apply_generic (F=0x7fed53247260 <jl_system_image_data+116384>, args=0x7ffcaeecc668, nargs=3) at /home/wmoses/git/Enzyme.jl/julia9/src/gf.c:2940
#51 0x00007fed60b2f2da in macro expansion () at /home/wmoses/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:11
#52 julia_#mcmcsample#20_461 (progress=1 '\001', progressname=<error reading variable: Cannot access memory at address 0x0>, callback=..., discard_initial=<optimized out>, thinning=1, chain_type=0x0, 
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(rr) disassemble
Dump of assembler code for function julia__all_1912:
   0x00007fed47046d70 <+0>:	endbr64 
   0x00007fed47046d74 <+4>:	push   %rbp
   0x00007fed47046d75 <+5>:	mov    %rsp,%rbp
   0x00007fed47046d78 <+8>:	push   %r15
   0x00007fed47046d7a <+10>:	push   %r14
   0x00007fed47046d7c <+12>:	push   %r13
   0x00007fed47046d7e <+14>:	push   %r12
   0x00007fed47046d80 <+16>:	push   %rbx
   0x00007fed47046d81 <+17>:	and    $0xffffffffffffffe0,%rsp
   0x00007fed47046d85 <+21>:	sub    $0x220,%rsp
   0x00007fed47046d8c <+28>:	vxorps %xmm0,%xmm0,%xmm0
   0x00007fed47046d90 <+32>:	mov    %rdx,0xd8(%rsp)
   0x00007fed47046d98 <+40>:	mov    %fs:0x0,%rax
   0x00007fed47046da1 <+49>:	mov    %rsi,0xd0(%rsp)
   0x00007fed47046da9 <+57>:	lea    0x100(%rsp),%rsi
   0x00007fed47046db1 <+65>:	mov    %rdi,%r13
   0x00007fed47046db4 <+68>:	mov    %r8,0xe0(%rsp)
   0x00007fed47046dbc <+76>:	vmovups %ymm0,0x178(%rsp)
   0x00007fed47046dc5 <+85>:	vmovaps %ymm0,0x160(%rsp)
   0x00007fed47046dce <+94>:	vmovaps %ymm0,0x140(%rsp)
   0x00007fed47046dd7 <+103>:	vmovaps %ymm0,0x120(%rsp)
   0x00007fed47046de0 <+112>:	vmovaps %ymm0,0x100(%rsp)
   0x00007fed47046de9 <+121>:	mov    -0x8(%rax),%rdx
   0x00007fed47046ded <+125>:	movq   $0x44,0x100(%rsp)
   0x00007fed47046df9 <+137>:	mov    (%rdx),%rax
   0x00007fed47046dfc <+140>:	mov    %rdx,0x20(%rsp)
   0x00007fed47046e01 <+145>:	mov    %rax,0x108(%rsp)
   0x00007fed47046e09 <+153>:	mov    %rsi,(%rdx)
   0x00007fed47046e0c <+156>:	movabs $0x7fed70687008,%rsi
   0x00007fed47046e16 <+166>:	mov    0x8(%rcx),%rbx
   0x00007fed47046e1a <+170>:	test   %rbx,%rbx
   0x00007fed47046e1d <+173>:	je     0x7fed47047b63 <julia__all_1912+3571>
   0x00007fed47046e23 <+179>:	mov    (%rcx),%rax
   0x00007fed47046e26 <+182>:	mov    %rbx,0xc8(%rsp)
   0x00007fed47046e2e <+190>:	mov    %rcx,0x80(%rsp)
   0x00007fed47046e36 <+198>:	mov    (%rax),%r12
   0x00007fed47046e39 <+201>:	test   %r12,%r12
   0x00007fed47046e3c <+204>:	je     0x7fed47047c3b <julia__all_1912+3787>
   0x00007fed47046e42 <+210>:	mov    0xe0(%rsp),%rax
   0x00007fed47046e4a <+218>:	mov    0x20(%rsp),%rbx
   0x00007fed47046e4f <+223>:	movabs $0x7fed78246734,%r15
   0x00007fed47046e59 <+233>:	mov    $0x708,%esi
   0x00007fed47046e5e <+238>:	mov    $0xa0,%edx
   0x00007fed47046e63 <+243>:	mov    %r12,0x78(%rsp)
   0x00007fed47046e68 <+248>:	mov    (%rax),%rax
   0x00007fed47046e6b <+251>:	mov    (%rax),%r14
   0x00007fed47046e6e <+254>:	mov    %r12,0x190(%rsp)
   0x00007fed47046e76 <+262>:	mov    %r14,0x188(%rsp)
   0x00007fed47046e7e <+270>:	mov    0x10(%rbx),%rdi
   0x00007fed47046e82 <+274>:	vzeroupper 
   0x00007fed47046e85 <+277>:	call   *%r15
   0x00007fed47046e88 <+280>:	movabs $0x7fed51fa3610,%rcx
   0x00007fed47046e92 <+290>:	mov    $0x708,%esi
   0x00007fed47046e97 <+295>:	mov    $0xa0,%edx
   0x00007fed47046e9c <+300>:	mov    %rax,%r12
   0x00007fed47046e9f <+303>:	mov    %rcx,-0x8(%rax)
   0x00007fed47046ea3 <+307>:	mov    %rax,0x178(%rsp)
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed47046eab <+315>:	mov    0x10(%rbx),%rdi
   0x00007fed47046eaf <+319>:	call   *%r15
   0x00007fed47046eb2 <+322>:	mov    0xd8(%rsp),%rsi
   0x00007fed47046eba <+330>:	movabs $0x7fed51fa3610,%rcx
   0x00007fed47046ec4 <+340>:	mov    %rax,0x58(%rsp)
   0x00007fed47046ec9 <+345>:	movabs $0x7fed572abc10,%rdi
   0x00007fed47046ed3 <+355>:	mov    $0x8,%edx
   0x00007fed47046ed8 <+360>:	mov    %rcx,-0x8(%rax)
   0x00007fed47046edc <+364>:	mov    0xd0(%rsp),%rcx
   0x00007fed47046ee4 <+372>:	vmovups 0x70(%rsi),%ymm0
   0x00007fed47046ee9 <+377>:	vmovups %ymm0,0x70(%rax)
   0x00007fed47046eee <+382>:	vmovups (%rsi),%ymm0
   0x00007fed47046ef2 <+386>:	vmovups 0x20(%rsi),%ymm1
   0x00007fed47046ef7 <+391>:	vmovups 0x40(%rsi),%ymm2
   0x00007fed47046efc <+396>:	vmovups 0x60(%rsi),%ymm3
   0x00007fed47046f01 <+401>:	lea    0x88(%rsp),%rsi
   0x00007fed47046f09 <+409>:	vmovups %ymm3,0x60(%rax)
   0x00007fed47046f0e <+414>:	vmovups %ymm2,0x40(%rax)
   0x00007fed47046f13 <+419>:	vmovups %ymm1,0x20(%rax)
   0x00007fed47046f18 <+424>:	vmovups %ymm0,(%rax)
   0x00007fed47046f1c <+428>:	vmovups 0x70(%rcx),%ymm0
   0x00007fed47046f21 <+433>:	vmovups %ymm0,0x70(%r12)
   0x00007fed47046f28 <+440>:	vmovups 0x20(%rcx),%ymm1
   0x00007fed47046f2d <+445>:	vmovups (%rcx),%ymm0
   0x00007fed47046f31 <+449>:	vmovups 0x40(%rcx),%ymm2
   0x00007fed47046f36 <+454>:	vmovups 0x60(%rcx),%ymm3
   0x00007fed47046f3b <+459>:	movabs $0x7fed5d00f2e0,%rcx
   0x00007fed47046f45 <+469>:	mov    %rax,0x170(%rsp)
   0x00007fed47046f4d <+477>:	vmovups %ymm1,0x20(%r12)
   0x00007fed47046f54 <+484>:	vmovaps (%rcx),%ymm1
   0x00007fed47046f58 <+488>:	mov    0x78(%rsp),%rcx
   0x00007fed47046f5d <+493>:	vmovups %ymm3,0x60(%r12)
   0x00007fed47046f64 <+500>:	vmovups %ymm2,0x40(%r12)
   0x00007fed47046f6b <+507>:	vmovups %ymm0,(%r12)
   0x00007fed47046f71 <+513>:	vmovups %ymm1,0x88(%rsp)
   0x00007fed47046f7a <+522>:	mov    %r12,0xa8(%rsp)
   0x00007fed47046f82 <+530>:	mov    %rax,0xb0(%rsp)
   0x00007fed47046f8a <+538>:	movabs $0x7fed781d98d9,%rax
   0x00007fed47046f94 <+548>:	mov    %rcx,0xb8(%rsp)
   0x00007fed47046f9c <+556>:	vmovaps %ymm1,0x1e0(%rsp)
   0x00007fed47046fa5 <+565>:	mov    %r14,0xc0(%rsp)
   0x00007fed47046fad <+573>:	vzeroupper 
   0x00007fed47046fb0 <+576>:	call   *%rax
   0x00007fed47046fb2 <+578>:	mov    (%rax),%rcx
   0x00007fed47046fb5 <+581>:	mov    0x10(%rax),%rax
   0x00007fed47046fb9 <+585>:	cmpb   $0x0,(%rcx)
   0x00007fed47046fbc <+588>:	mov    %rax,0x50(%rsp)
   0x00007fed47046fc1 <+593>:	je     0x7fed47047b84 <julia__all_1912+3604>
   0x00007fed47046fc7 <+599>:	xor    %eax,%eax
   0x00007fed47046fc9 <+601>:	mov    0x58(%rsp),%rdi
   0x00007fed47046fce <+606>:	mov    0x80(%rsp),%rcx
   0x00007fed47046fd6 <+614>:	movabs $0x7fed70687008,%rsi
   0x00007fed47046fe0 <+624>:	mov    %r13,0x1b0(%rsp)
   0x00007fed47046fe8 <+632>:	xor    %r13d,%r13d
   0x00007fed47046feb <+635>:	mov    %r14,0xf0(%rsp)
   0x00007fed47046ff3 <+643>:	mov    %r12,0xe8(%rsp)
   0x00007fed47046ffb <+651>:	mov    %rax,0x48(%rsp)
   0x00007fed47047000 <+656>:	movabs $0x7fed5d00f168,%rax
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed4704700a <+666>:	mov    %rsi,%r10
   0x00007fed4704700d <+669>:	mov    %rsi,%r9
   0x00007fed47047010 <+672>:	mov    %rsi,%r11
   0x00007fed47047013 <+675>:	mov    %rsi,%r8
   0x00007fed47047016 <+678>:	vbroadcastsd (%rax),%ymm0
   0x00007fed4704701b <+683>:	xor    %eax,%eax
   0x00007fed4704701d <+685>:	vmovaps %ymm0,0x1c0(%rsp)
   0x00007fed47047026 <+694>:	jmp    0x7fed47047071 <julia__all_1912+769>
   0x00007fed47047028 <+696>:	nopl   0x0(%rax,%rax,1)
   0x00007fed47047030 <+704>:	cmpb   $0x0,(%r12)
   0x00007fed47047035 <+709>:	mov    %rbx,%rax
   0x00007fed47047038 <+712>:	mov    0xf0(%rsp),%r14
   0x00007fed47047040 <+720>:	mov    0xe8(%rsp),%r12
   0x00007fed47047048 <+728>:	mov    0x58(%rsp),%rdi
   0x00007fed4704704d <+733>:	mov    0x70(%rsp),%r13
   0x00007fed47047052 <+738>:	mov    0xf8(%rsp),%rax
   0x00007fed4704705a <+746>:	mov    %rbx,0x48(%rsp)
   0x00007fed4704705f <+751>:	sete   (%rbx,%rcx,1)
   0x00007fed47047063 <+755>:	mov    0x80(%rsp),%rcx
   0x00007fed4704706b <+763>:	je     0x7fed47047b45 <julia__all_1912+3541>
   0x00007fed47047071 <+769>:	lea    0x1(%rax),%r15
   0x00007fed47047075 <+773>:	mov    %rax,%rdx
   0x00007fed47047078 <+776>:	popcnt %r15,%rax
   0x00007fed4704707d <+781>:	mov    %rdx,0x28(%rsp)
   0x00007fed47047082 <+786>:	mov    %r15,0xf8(%rsp)
   0x00007fed4704708a <+794>:	cmp    $0x2,%eax
   0x00007fed4704708d <+797>:	ja     0x7fed470477b3 <julia__all_1912+2627>
   0x00007fed47047093 <+803>:	mov    %r15d,%eax
   0x00007fed47047096 <+806>:	and    $0x1,%eax
   0x00007fed47047099 <+809>:	test   %rax,%rax
   0x00007fed4704709c <+812>:	je     0x7fed470477b3 <julia__all_1912+2627>
   0x00007fed470470a2 <+818>:	lzcnt  %r15,%rax
   0x00007fed470470a7 <+823>:	mov    $0x40,%r12d
   0x00007fed470470ad <+829>:	mov    $0x8,%ecx
   0x00007fed470470b2 <+834>:	mov    %rsi,0x8(%rsp)
   0x00007fed470470b7 <+839>:	mov    %r8,0x10(%rsp)
   0x00007fed470470bc <+844>:	mov    %r11,0x18(%rsp)
   0x00007fed470470c1 <+849>:	mov    %r9,0x40(%rsp)
   0x00007fed470470c6 <+854>:	mov    %r10,0x38(%rsp)
   0x00007fed470470cb <+859>:	sub    %rax,%r12
   0x00007fed470470ce <+862>:	neg    %rax
   0x00007fed470470d1 <+865>:	shlx   %rax,%rcx,%r14
   0x00007fed470470d6 <+870>:	mov    0x50(%rsp),%rcx
   0x00007fed470470db <+875>:	movabs $0x7fed7822e28c,%rax
   0x00007fed470470e5 <+885>:	mov    %r14,%rbx
   0x00007fed470470e8 <+888>:	mov    %r14,%rdi
   0x00007fed470470eb <+891>:	shr    %rbx
   0x00007fed470470ee <+894>:	test   %rdx,%rdx
   0x00007fed470470f1 <+897>:	cmove  %rdx,%rbx
   0x00007fed470470f5 <+901>:	shr    $0x3,%rdi
   0x00007fed470470f9 <+905>:	mov    %rdi,0x68(%rsp)
   0x00007fed470470fe <+910>:	mov    %rcx,0x180(%rsp)
   0x00007fed47047106 <+918>:	mov    %rsi,0x130(%rsp)
   0x00007fed4704710e <+926>:	mov    %r8,0x128(%rsp)
   0x00007fed47047116 <+934>:	mov    %r11,0x120(%rsp)
   0x00007fed4704711e <+942>:	mov    %r9,0x118(%rsp)
   0x00007fed47047126 <+950>:	mov    %r10,0x138(%rsp)
   0x00007fed4704712e <+958>:	vzeroupper 
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed47047131 <+961>:	call   *%rax
   0x00007fed47047133 <+963>:	movabs $0x7fed635ee200,%rcx
   0x00007fed4704713d <+973>:	mov    %rax,0x110(%rsp)
   0x00007fed47047145 <+981>:	xor    %edi,%edi
   0x00007fed47047147 <+983>:	lea    0x88(%rsp),%rsi
   0x00007fed4704714f <+991>:	mov    $0x3,%edx
   0x00007fed47047154 <+996>:	mov    %rcx,0x88(%rsp)
   0x00007fed4704715c <+1004>:	mov    %rax,0x90(%rsp)
   0x00007fed47047164 <+1012>:	movabs $0x7fed48e25150,%rcx
   0x00007fed4704716e <+1022>:	movabs $0x7fed781ef996,%rax
   0x00007fed47047178 <+1032>:	mov    %rcx,0x98(%rsp)
   0x00007fed47047180 <+1040>:	call   *%rax
   0x00007fed47047182 <+1042>:	mov    0x20(%rsp),%rcx
   0x00007fed47047187 <+1047>:	mov    %r14,%rsi
   0x00007fed4704718a <+1050>:	mov    %rax,%rdx
   0x00007fed4704718d <+1053>:	mov    0x10(%rcx),%rdi
   0x00007fed47047191 <+1057>:	mov    %rax,0x110(%rsp)
   0x00007fed47047199 <+1065>:	movabs $0x7fed78d6cb56,%rax
   0x00007fed470471a3 <+1075>:	call   *%rax
   0x00007fed470471a5 <+1077>:	mov    %rbx,%rdx
   0x00007fed470471a8 <+1080>:	mov    %r14,%rcx
   0x00007fed470471ab <+1083>:	mov    %r14,0x30(%rsp)
   0x00007fed470471b0 <+1088>:	mov    %rax,%rbx
   0x00007fed470471b3 <+1091>:	mov    $0x0,%eax
   0x00007fed470471b8 <+1096>:	movabs $0x7fed70687008,%rsi
   0x00007fed470471c2 <+1106>:	sub    %rdx,%rcx
   0x00007fed470471c5 <+1109>:	mov    %rcx,%r14
   0x00007fed470471c8 <+1112>:	mov    %rcx,0x60(%rsp)
   0x00007fed470471cd <+1117>:	shr    $0x3,%r14
   0x00007fed470471d1 <+1121>:	cmp    $0x80,%rcx
   0x00007fed470471d8 <+1128>:	jb     0x7fed47047225 <julia__all_1912+1205>
   0x00007fed470471da <+1130>:	vmovaps 0x1c0(%rsp),%ymm0
   0x00007fed470471e3 <+1139>:	mov    %r14,%rax
   0x00007fed470471e6 <+1142>:	movabs $0x1ffffffffffffff0,%rcx
   0x00007fed470471f0 <+1152>:	xor    %edi,%edi
   0x00007fed470471f2 <+1154>:	and    %rcx,%rax
   0x00007fed470471f5 <+1157>:	lea    0x60(%rbx,%rdx,1),%rcx
   0x00007fed470471fa <+1162>:	nopw   0x0(%rax,%rax,1)
   0x00007fed47047200 <+1168>:	vmovups %ymm0,-0x60(%rcx,%rdi,8)
   0x00007fed47047206 <+1174>:	vmovups %ymm0,-0x40(%rcx,%rdi,8)
   0x00007fed4704720c <+1180>:	vmovups %ymm0,-0x20(%rcx,%rdi,8)
   0x00007fed47047212 <+1186>:	vmovups %ymm0,(%rcx,%rdi,8)
   0x00007fed47047217 <+1191>:	add    $0x10,%rdi
   0x00007fed4704721b <+1195>:	cmp    %rdi,%rax
   0x00007fed4704721e <+1198>:	jne    0x7fed47047200 <julia__all_1912+1168>
   0x00007fed47047220 <+1200>:	cmp    %rax,%r14
   0x00007fed47047223 <+1203>:	je     0x7fed4704723c <julia__all_1912+1228>
   0x00007fed47047225 <+1205>:	mov    %rbx,%rcx
   0x00007fed47047228 <+1208>:	add    %rdx,%rcx
   0x00007fed4704722b <+1211>:	nopl   0x0(%rax,%rax,1)
   0x00007fed47047230 <+1216>:	mov    %rsi,(%rcx,%rax,8)
   0x00007fed47047234 <+1220>:	inc    %rax
   0x00007fed47047237 <+1223>:	cmp    %rax,%r14
   0x00007fed4704723a <+1226>:	jne    0x7fed47047230 <julia__all_1912+1216>
   0x00007fed4704723c <+1228>:	mov    0x8(%rsp),%rsi
   0x00007fed47047241 <+1233>:	mov    %rbx,%rdi
   0x00007fed47047244 <+1236>:	movabs $0x7fed78f1d8b0,%rax
   0x00007fed4704724e <+1246>:	mov    %rdx,0x8(%rsp)
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed47047253 <+1251>:	mov    %rbx,%r15
   0x00007fed47047256 <+1254>:	vzeroupper 
   0x00007fed47047259 <+1257>:	call   *%rax
   0x00007fed4704725b <+1259>:	mov    0x28(%rsp),%rcx
   0x00007fed47047260 <+1264>:	mov    $0x1,%eax
   0x00007fed47047265 <+1269>:	mov    %r13,%rdi
   0x00007fed47047268 <+1272>:	mov    %r15,0x1b8(%rsp)
   0x00007fed47047270 <+1280>:	mov    %r15,0x130(%rsp)
   0x00007fed47047278 <+1288>:	shlx   %r12,%rax,%r12
   0x00007fed4704727d <+1293>:	movabs $0x7fed78e227c0,%rax
   0x00007fed47047287 <+1303>:	mov    %r12,%rbx
   0x00007fed4704728a <+1306>:	mov    %r12,%rsi
   0x00007fed4704728d <+1309>:	shr    %rbx
   0x00007fed47047290 <+1312>:	test   %rcx,%rcx
   0x00007fed47047293 <+1315>:	cmove  %rcx,%rbx
   0x00007fed47047297 <+1319>:	call   *%rax
   0x00007fed47047299 <+1321>:	mov    %r12,%r13
   0x00007fed4704729c <+1324>:	lea    (%rax,%rbx,1),%rdi
   0x00007fed470472a0 <+1328>:	xor    %esi,%esi
   0x00007fed470472a2 <+1330>:	movabs $0x7fed78f1e080,%r15
   0x00007fed470472ac <+1340>:	mov    %rax,0x70(%rsp)
   0x00007fed470472b1 <+1345>:	sub    %rbx,%r13
   0x00007fed470472b4 <+1348>:	mov    %r13,%rdx
   0x00007fed470472b7 <+1351>:	call   *%r15
   0x00007fed470472ba <+1354>:	mov    0x48(%rsp),%rdi
   0x00007fed470472bf <+1359>:	mov    %r12,%rsi
   0x00007fed470472c2 <+1362>:	movabs $0x7fed78e227c0,%rax
   0x00007fed470472cc <+1372>:	call   *%rax
   0x00007fed470472ce <+1374>:	add    %rax,%rbx
   0x00007fed470472d1 <+1377>:	xor    %esi,%esi
   0x00007fed470472d3 <+1379>:	mov    %r13,%rdx
   0x00007fed470472d6 <+1382>:	mov    %rax,%r12
   0x00007fed470472d9 <+1385>:	mov    %rbx,%rdi
   0x00007fed470472dc <+1388>:	call   *%r15
   0x00007fed470472df <+1391>:	mov    0x68(%rsp),%rdi
   0x00007fed470472e4 <+1396>:	movabs $0x7fed7822e28c,%r15
   0x00007fed470472ee <+1406>:	call   *%r15
   0x00007fed470472f1 <+1409>:	movabs $0x7fed635ee200,%rcx
   0x00007fed470472fb <+1419>:	mov    %rax,0x110(%rsp)
   0x00007fed47047303 <+1427>:	xor    %edi,%edi
   0x00007fed47047305 <+1429>:	lea    0x88(%rsp),%rsi
   0x00007fed4704730d <+1437>:	mov    $0x3,%edx
   0x00007fed47047312 <+1442>:	mov    %rcx,0x88(%rsp)
   0x00007fed4704731a <+1450>:	mov    %rax,0x90(%rsp)
   0x00007fed47047322 <+1458>:	movabs $0x7fed48e25150,%rcx
   0x00007fed4704732c <+1468>:	movabs $0x7fed781ef996,%rax
   0x00007fed47047336 <+1478>:	mov    %rcx,0x98(%rsp)
   0x00007fed4704733e <+1486>:	call   *%rax
   0x00007fed47047340 <+1488>:	mov    0x20(%rsp),%rcx
   0x00007fed47047345 <+1493>:	mov    0x30(%rsp),%rsi
   0x00007fed4704734a <+1498>:	mov    %rax,%rdx
   0x00007fed4704734d <+1501>:	mov    0x10(%rcx),%rdi
   0x00007fed47047351 <+1505>:	mov    %rax,0x110(%rsp)
   0x00007fed47047359 <+1513>:	movabs $0x7fed78d6cb56,%rax
   0x00007fed47047363 <+1523>:	call   *%rax
   0x00007fed47047365 <+1525>:	mov    0x8(%rsp),%rdx
   0x00007fed4704736a <+1530>:	mov    0x10(%rsp),%rsi
   0x00007fed4704736f <+1535>:	cmpq   $0x80,0x60(%rsp)
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed47047378 <+1544>:	mov    %rax,%r13
   0x00007fed4704737b <+1547>:	movabs $0x7fed70687008,%rdi
   0x00007fed47047385 <+1557>:	mov    $0x0,%eax
   0x00007fed4704738a <+1562>:	jb     0x7fed470473d5 <julia__all_1912+1637>
   0x00007fed4704738c <+1564>:	vmovaps 0x1c0(%rsp),%ymm0
   0x00007fed47047395 <+1573>:	mov    %r14,%rax
   0x00007fed47047398 <+1576>:	movabs $0x1ffffffffffffff0,%rcx
   0x00007fed470473a2 <+1586>:	xor    %ebx,%ebx
   0x00007fed470473a4 <+1588>:	and    %rcx,%rax
   0x00007fed470473a7 <+1591>:	lea    0x60(%r13,%rdx,1),%rcx
   0x00007fed470473ac <+1596>:	nopl   0x0(%rax)
   0x00007fed470473b0 <+1600>:	vmovups %ymm0,-0x60(%rcx,%rbx,8)
   0x00007fed470473b6 <+1606>:	vmovups %ymm0,-0x40(%rcx,%rbx,8)
   0x00007fed470473bc <+1612>:	vmovups %ymm0,-0x20(%rcx,%rbx,8)
   0x00007fed470473c2 <+1618>:	vmovups %ymm0,(%rcx,%rbx,8)
   0x00007fed470473c7 <+1623>:	add    $0x10,%rbx
   0x00007fed470473cb <+1627>:	cmp    %rbx,%rax
   0x00007fed470473ce <+1630>:	jne    0x7fed470473b0 <julia__all_1912+1600>
   0x00007fed470473d0 <+1632>:	cmp    %rax,%r14
   0x00007fed470473d3 <+1635>:	je     0x7fed470473ec <julia__all_1912+1660>
   0x00007fed470473d5 <+1637>:	mov    %r13,%rcx
   0x00007fed470473d8 <+1640>:	add    %rdx,%rcx
   0x00007fed470473db <+1643>:	nopl   0x0(%rax,%rax,1)
   0x00007fed470473e0 <+1648>:	mov    %rdi,(%rcx,%rax,8)
   0x00007fed470473e4 <+1652>:	inc    %rax
   0x00007fed470473e7 <+1655>:	cmp    %rax,%r14
   0x00007fed470473ea <+1658>:	jne    0x7fed470473e0 <julia__all_1912+1648>
   0x00007fed470473ec <+1660>:	mov    %r13,%rdi
   0x00007fed470473ef <+1663>:	movabs $0x7fed78f1d8b0,%rax
   0x00007fed470473f9 <+1673>:	vzeroupper 
   0x00007fed470473fc <+1676>:	call   *%rax
   0x00007fed470473fe <+1678>:	mov    0x68(%rsp),%rdi
   0x00007fed47047403 <+1683>:	mov    %r13,0x10(%rsp)
   0x00007fed47047408 <+1688>:	mov    %r13,0x128(%rsp)
   0x00007fed47047410 <+1696>:	call   *%r15
   0x00007fed47047413 <+1699>:	movabs $0x7fed635ee200,%rcx
   0x00007fed4704741d <+1709>:	mov    %rax,0x110(%rsp)
   0x00007fed47047425 <+1717>:	xor    %edi,%edi
   0x00007fed47047427 <+1719>:	lea    0x88(%rsp),%rsi
   0x00007fed4704742f <+1727>:	mov    $0x3,%edx
   0x00007fed47047434 <+1732>:	mov    %rcx,0x88(%rsp)
   0x00007fed4704743c <+1740>:	mov    %rax,0x90(%rsp)
   0x00007fed47047444 <+1748>:	movabs $0x7fed48e25150,%rcx
   0x00007fed4704744e <+1758>:	movabs $0x7fed781ef996,%rax
   0x00007fed47047458 <+1768>:	mov    %rcx,0x98(%rsp)
   0x00007fed47047460 <+1776>:	call   *%rax
   0x00007fed47047462 <+1778>:	mov    0x20(%rsp),%rcx
   0x00007fed47047467 <+1783>:	mov    0x30(%rsp),%rsi
   0x00007fed4704746c <+1788>:	mov    %rax,%rdx
   0x00007fed4704746f <+1791>:	mov    0x10(%rcx),%rdi
   0x00007fed47047473 <+1795>:	mov    %rax,0x110(%rsp)
   0x00007fed4704747b <+1803>:	movabs $0x7fed78d6cb56,%rax
   0x00007fed47047485 <+1813>:	call   *%rax
   0x00007fed47047487 <+1815>:	mov    0x8(%rsp),%rdx
   0x00007fed4704748c <+1820>:	mov    0x18(%rsp),%rsi
   0x00007fed47047491 <+1825>:	cmpq   $0x80,0x60(%rsp)
   0x00007fed4704749a <+1834>:	mov    %rax,%r13
   0x00007fed4704749d <+1837>:	movabs $0x7fed70687008,%rdi
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed470474a7 <+1847>:	mov    $0x0,%eax
   0x00007fed470474ac <+1852>:	mov    %r12,0x48(%rsp)
   0x00007fed470474b1 <+1857>:	jb     0x7fed47047505 <julia__all_1912+1941>
   0x00007fed470474b3 <+1859>:	vmovaps 0x1c0(%rsp),%ymm0
   0x00007fed470474bc <+1868>:	mov    %r14,%rax
   0x00007fed470474bf <+1871>:	movabs $0x1ffffffffffffff0,%rcx
   0x00007fed470474c9 <+1881>:	xor    %ebx,%ebx
   0x00007fed470474cb <+1883>:	and    %rcx,%rax
   0x00007fed470474ce <+1886>:	lea    0x60(%r13,%rdx,1),%rcx
   0x00007fed470474d3 <+1891>:	data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
   0x00007fed470474e0 <+1904>:	vmovups %ymm0,-0x60(%rcx,%rbx,8)
   0x00007fed470474e6 <+1910>:	vmovups %ymm0,-0x40(%rcx,%rbx,8)
   0x00007fed470474ec <+1916>:	vmovups %ymm0,-0x20(%rcx,%rbx,8)
   0x00007fed470474f2 <+1922>:	vmovups %ymm0,(%rcx,%rbx,8)
   0x00007fed470474f7 <+1927>:	add    $0x10,%rbx
   0x00007fed470474fb <+1931>:	cmp    %rbx,%rax
   0x00007fed470474fe <+1934>:	jne    0x7fed470474e0 <julia__all_1912+1904>
   0x00007fed47047500 <+1936>:	cmp    %rax,%r14
   0x00007fed47047503 <+1939>:	je     0x7fed4704751c <julia__all_1912+1964>
   0x00007fed47047505 <+1941>:	mov    %r13,%rcx
   0x00007fed47047508 <+1944>:	add    %rdx,%rcx
   0x00007fed4704750b <+1947>:	nopl   0x0(%rax,%rax,1)
   0x00007fed47047510 <+1952>:	mov    %rdi,(%rcx,%rax,8)
   0x00007fed47047514 <+1956>:	inc    %rax
   0x00007fed47047517 <+1959>:	cmp    %rax,%r14
   0x00007fed4704751a <+1962>:	jne    0x7fed47047510 <julia__all_1912+1952>
   0x00007fed4704751c <+1964>:	mov    %r13,%rdi
   0x00007fed4704751f <+1967>:	movabs $0x7fed78f1d8b0,%rax
   0x00007fed47047529 <+1977>:	vzeroupper 
   0x00007fed4704752c <+1980>:	call   *%rax
   0x00007fed4704752e <+1982>:	mov    0x68(%rsp),%rdi
   0x00007fed47047533 <+1987>:	mov    %r13,0x120(%rsp)
   0x00007fed4704753b <+1995>:	call   *%r15
   0x00007fed4704753e <+1998>:	movabs $0x7fed635ee200,%rcx
   0x00007fed47047548 <+2008>:	mov    %rax,0x110(%rsp)
   0x00007fed47047550 <+2016>:	xor    %edi,%edi
   0x00007fed47047552 <+2018>:	lea    0x88(%rsp),%rsi
   0x00007fed4704755a <+2026>:	mov    $0x3,%edx
   0x00007fed4704755f <+2031>:	mov    %rcx,0x88(%rsp)
   0x00007fed47047567 <+2039>:	mov    %rax,0x90(%rsp)
   0x00007fed4704756f <+2047>:	movabs $0x7fed48e25150,%rcx
   0x00007fed47047579 <+2057>:	movabs $0x7fed781ef996,%rax
   0x00007fed47047583 <+2067>:	mov    %rcx,0x98(%rsp)
   0x00007fed4704758b <+2075>:	call   *%rax
   0x00007fed4704758d <+2077>:	mov    0x20(%rsp),%rcx
   0x00007fed47047592 <+2082>:	mov    0x30(%rsp),%rsi
   0x00007fed47047597 <+2087>:	mov    %rax,%rdx
   0x00007fed4704759a <+2090>:	mov    0x10(%rcx),%rdi
   0x00007fed4704759e <+2094>:	mov    %rax,0x110(%rsp)
   0x00007fed470475a6 <+2102>:	movabs $0x7fed78d6cb56,%rax
   0x00007fed470475b0 <+2112>:	call   *%rax
   0x00007fed470475b2 <+2114>:	mov    0x8(%rsp),%rdx
   0x00007fed470475b7 <+2119>:	mov    0x40(%rsp),%rsi
   0x00007fed470475bc <+2124>:	cmpq   $0x80,0x60(%rsp)
   0x00007fed470475c5 <+2133>:	mov    %rax,%r12
   0x00007fed470475c8 <+2136>:	movabs $0x7fed70687008,%rdi
   0x00007fed470475d2 <+2146>:	mov    $0x0,%eax
   0x00007fed470475d7 <+2151>:	jb     0x7fed47047625 <julia__all_1912+2229>
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed470475d9 <+2153>:	vmovaps 0x1c0(%rsp),%ymm0
   0x00007fed470475e2 <+2162>:	mov    %r14,%rax
   0x00007fed470475e5 <+2165>:	movabs $0x1ffffffffffffff0,%rcx
   0x00007fed470475ef <+2175>:	xor    %ebx,%ebx
   0x00007fed470475f1 <+2177>:	and    %rcx,%rax
   0x00007fed470475f4 <+2180>:	lea    0x60(%r12,%rdx,1),%rcx
   0x00007fed470475f9 <+2185>:	nopl   0x0(%rax)
   0x00007fed47047600 <+2192>:	vmovups %ymm0,-0x60(%rcx,%rbx,8)
   0x00007fed47047606 <+2198>:	vmovups %ymm0,-0x40(%rcx,%rbx,8)
   0x00007fed4704760c <+2204>:	vmovups %ymm0,-0x20(%rcx,%rbx,8)
   0x00007fed47047612 <+2210>:	vmovups %ymm0,(%rcx,%rbx,8)
   0x00007fed47047617 <+2215>:	add    $0x10,%rbx
   0x00007fed4704761b <+2219>:	cmp    %rbx,%rax
   0x00007fed4704761e <+2222>:	jne    0x7fed47047600 <julia__all_1912+2192>
   0x00007fed47047620 <+2224>:	cmp    %rax,%r14
   0x00007fed47047623 <+2227>:	je     0x7fed4704763c <julia__all_1912+2252>
   0x00007fed47047625 <+2229>:	mov    %r12,%rcx
   0x00007fed47047628 <+2232>:	add    %rdx,%rcx
   0x00007fed4704762b <+2235>:	nopl   0x0(%rax,%rax,1)
   0x00007fed47047630 <+2240>:	mov    %rdi,(%rcx,%rax,8)
   0x00007fed47047634 <+2244>:	inc    %rax
   0x00007fed47047637 <+2247>:	cmp    %rax,%r14
   0x00007fed4704763a <+2250>:	jne    0x7fed47047630 <julia__all_1912+2240>
   0x00007fed4704763c <+2252>:	mov    %r12,%rdi
   0x00007fed4704763f <+2255>:	movabs $0x7fed78f1d8b0,%rax
   0x00007fed47047649 <+2265>:	vzeroupper 
   0x00007fed4704764c <+2268>:	call   *%rax
   0x00007fed4704764e <+2270>:	mov    0x68(%rsp),%rdi
   0x00007fed47047653 <+2275>:	mov    %r12,0x118(%rsp)
   0x00007fed4704765b <+2283>:	call   *%r15
   0x00007fed4704765e <+2286>:	movabs $0x7fed635ee200,%rcx
   0x00007fed47047668 <+2296>:	mov    %rax,0x110(%rsp)
   0x00007fed47047670 <+2304>:	xor    %edi,%edi
   0x00007fed47047672 <+2306>:	lea    0x88(%rsp),%rsi
   0x00007fed4704767a <+2314>:	mov    $0x3,%edx
   0x00007fed4704767f <+2319>:	mov    %rcx,0x88(%rsp)
   0x00007fed47047687 <+2327>:	mov    %rax,0x90(%rsp)
   0x00007fed4704768f <+2335>:	movabs $0x7fed48e25150,%rcx
   0x00007fed47047699 <+2345>:	movabs $0x7fed781ef996,%rax
   0x00007fed470476a3 <+2355>:	mov    %rcx,0x98(%rsp)
   0x00007fed470476ab <+2363>:	call   *%rax
   0x00007fed470476ad <+2365>:	mov    0x20(%rsp),%rcx
   0x00007fed470476b2 <+2370>:	mov    0x30(%rsp),%rsi
   0x00007fed470476b7 <+2375>:	mov    %rax,%rdx
   0x00007fed470476ba <+2378>:	mov    0x10(%rcx),%rdi
   0x00007fed470476be <+2382>:	mov    %rax,0x110(%rsp)
   0x00007fed470476c6 <+2390>:	movabs $0x7fed78d6cb56,%rax
   0x00007fed470476d0 <+2400>:	call   *%rax
   0x00007fed470476d2 <+2402>:	mov    0x8(%rsp),%rdx
   0x00007fed470476d7 <+2407>:	mov    0x38(%rsp),%rsi
   0x00007fed470476dc <+2412>:	cmpq   $0x80,0x60(%rsp)
   0x00007fed470476e5 <+2421>:	mov    %rax,%r15
   0x00007fed470476e8 <+2424>:	movabs $0x7fed70687008,%rdi
   0x00007fed470476f2 <+2434>:	mov    $0x0,%eax
   0x00007fed470476f7 <+2439>:	jb     0x7fed47047745 <julia__all_1912+2517>
   0x00007fed470476f9 <+2441>:	vmovaps 0x1c0(%rsp),%ymm0
   0x00007fed47047702 <+2450>:	mov    %r14,%rax
   0x00007fed47047705 <+2453>:	movabs $0x1ffffffffffffff0,%rcx
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed4704770f <+2463>:	xor    %ebx,%ebx
   0x00007fed47047711 <+2465>:	and    %rcx,%rax
   0x00007fed47047714 <+2468>:	lea    0x60(%r15,%rdx,1),%rcx
   0x00007fed47047719 <+2473>:	nopl   0x0(%rax)
   0x00007fed47047720 <+2480>:	vmovups %ymm0,-0x60(%rcx,%rbx,8)
   0x00007fed47047726 <+2486>:	vmovups %ymm0,-0x40(%rcx,%rbx,8)
   0x00007fed4704772c <+2492>:	vmovups %ymm0,-0x20(%rcx,%rbx,8)
   0x00007fed47047732 <+2498>:	vmovups %ymm0,(%rcx,%rbx,8)
   0x00007fed47047737 <+2503>:	add    $0x10,%rbx
   0x00007fed4704773b <+2507>:	cmp    %rbx,%rax
   0x00007fed4704773e <+2510>:	jne    0x7fed47047720 <julia__all_1912+2480>
   0x00007fed47047740 <+2512>:	cmp    %rax,%r14
   0x00007fed47047743 <+2515>:	je     0x7fed4704775c <julia__all_1912+2540>
   0x00007fed47047745 <+2517>:	lea    (%r15,%rdx,1),%rcx
   0x00007fed47047749 <+2521>:	nopl   0x0(%rax)
   0x00007fed47047750 <+2528>:	mov    %rdi,(%rcx,%rax,8)
   0x00007fed47047754 <+2532>:	inc    %rax
   0x00007fed47047757 <+2535>:	cmp    %rax,%r14
   0x00007fed4704775a <+2538>:	jne    0x7fed47047750 <julia__all_1912+2528>
   0x00007fed4704775c <+2540>:	mov    %r15,%rdi
   0x00007fed4704775f <+2543>:	movabs $0x7fed78f1d8b0,%rax
   0x00007fed47047769 <+2553>:	vzeroupper 
   0x00007fed4704776c <+2556>:	call   *%rax
   0x00007fed4704776e <+2558>:	mov    %r12,%r9
   0x00007fed47047771 <+2561>:	mov    %r13,%r11
   0x00007fed47047774 <+2564>:	mov    %r15,%r10
   0x00007fed47047777 <+2567>:	mov    0x10(%rsp),%r8
   0x00007fed4704777c <+2572>:	mov    0x1b8(%rsp),%rsi
   0x00007fed47047784 <+2580>:	mov    0xf0(%rsp),%r14
   0x00007fed4704778c <+2588>:	mov    0xe8(%rsp),%r12
   0x00007fed47047794 <+2596>:	mov    0x58(%rsp),%rdi
   0x00007fed47047799 <+2601>:	mov    0x80(%rsp),%rcx
   0x00007fed470477a1 <+2609>:	mov    0x70(%rsp),%r13
   0x00007fed470477a6 <+2614>:	mov    0xf8(%rsp),%r15
   0x00007fed470477ae <+2622>:	mov    0x28(%rsp),%rdx
   0x00007fed470477b3 <+2627>:	mov    %r13,0x70(%rsp)
   0x00007fed470477b8 <+2632>:	mov    %r8,0x10(%rsp)
   0x00007fed470477bd <+2637>:	cmp    0x8(%rcx),%r15
   0x00007fed470477c1 <+2641>:	setb   0x0(%r13,%rdx,1)
   0x00007fed470477c7 <+2647>:	jae    0x7fed47047b45 <julia__all_1912+3541>
   0x00007fed470477cd <+2653>:	mov    0xe0(%rsp),%rax
   0x00007fed470477d5 <+2661>:	mov    (%rcx),%r12
   0x00007fed470477d8 <+2664>:	mov    %r9,0x40(%rsp)
   0x00007fed470477dd <+2669>:	mov    %r11,0x18(%rsp)
   0x00007fed470477e2 <+2674>:	mov    %rsi,0x8(%rsp)
   0x00007fed470477e7 <+2679>:	mov    (%rax),%rax
   0x00007fed470477ea <+2682>:	mov    (%rax,%r15,8),%rcx
   0x00007fed470477ee <+2686>:	mov    %rcx,(%r8,%rdx,8)
   0x00007fed470477f2 <+2690>:	mov    -0x8(%r8),%rax
   0x00007fed470477f6 <+2694>:	not    %eax
   0x00007fed470477f8 <+2696>:	test   $0x3,%al
   0x00007fed470477fa <+2698>:	jne    0x7fed47047808 <julia__all_1912+2712>
   0x00007fed470477fc <+2700>:	mov    -0x8(%rcx),%rax
   0x00007fed47047800 <+2704>:	test   $0x1,%al
   0x00007fed47047802 <+2706>:	je     0x7fed47047a73 <julia__all_1912+3331>
   0x00007fed47047808 <+2712>:	mov    (%r12,%r15,8),%r12
   0x00007fed4704780c <+2716>:	mov    %r12,(%rsi,%rdx,8)
   0x00007fed47047810 <+2720>:	mov    -0x8(%rsi),%rax
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed47047814 <+2724>:	not    %eax
   0x00007fed47047816 <+2726>:	test   $0x3,%al
   0x00007fed47047818 <+2728>:	jne    0x7fed47047827 <julia__all_1912+2743>
   0x00007fed4704781a <+2730>:	mov    -0x8(%r12),%rax
   0x00007fed4704781f <+2735>:	test   $0x1,%al
   0x00007fed47047821 <+2737>:	je     0x7fed47047aaf <julia__all_1912+3391>
   0x00007fed47047827 <+2743>:	test   %r12,%r12
   0x00007fed4704782a <+2746>:	je     0x7fed47047c22 <julia__all_1912+3762>
   0x00007fed47047830 <+2752>:	mov    0x50(%rsp),%rax
   0x00007fed47047835 <+2757>:	mov    0x20(%rsp),%r14
   0x00007fed4704783a <+2762>:	mov    $0xa0,%edx
   0x00007fed4704783f <+2767>:	movabs $0x7fed78246734,%r15
   0x00007fed47047849 <+2777>:	mov    %r10,0x38(%rsp)
   0x00007fed4704784e <+2782>:	mov    %rcx,0x30(%rsp)
   0x00007fed47047853 <+2787>:	mov    %rax,0x180(%rsp)
   0x00007fed4704785b <+2795>:	mov    %r10,0x168(%rsp)
   0x00007fed47047863 <+2803>:	mov    %r9,0x160(%rsp)
   0x00007fed4704786b <+2811>:	mov    %r11,0x150(%rsp)
   0x00007fed47047873 <+2819>:	mov    %r11,0x140(%rsp)
   0x00007fed4704787b <+2827>:	mov    %r12,0x138(%rsp)
   0x00007fed47047883 <+2835>:	mov    %rcx,0x130(%rsp)
   0x00007fed4704788b <+2843>:	mov    %rsi,0x128(%rsp)
   0x00007fed47047893 <+2851>:	mov    %r8,0x120(%rsp)
   0x00007fed4704789b <+2859>:	mov    %rsi,0x118(%rsp)
   0x00007fed470478a3 <+2867>:	mov    %r8,0x110(%rsp)
   0x00007fed470478ab <+2875>:	mov    $0x708,%esi
   0x00007fed470478b0 <+2880>:	mov    0x10(%r14),%rdi
   0x00007fed470478b4 <+2884>:	vzeroupper 
   0x00007fed470478b7 <+2887>:	call   *%r15
   0x00007fed470478ba <+2890>:	movabs $0x7fed51fa3610,%r13
   0x00007fed470478c4 <+2900>:	mov    $0x708,%esi
   0x00007fed470478c9 <+2905>:	mov    $0xa0,%edx
   0x00007fed470478ce <+2910>:	mov    %rax,%rbx
=> 0x00007fed470478d1 <+2913>:	mov    %r13,-0x8(%rax)
   0x00007fed470478d5 <+2917>:	mov    %rax,0x148(%rsp)
   0x00007fed470478dd <+2925>:	mov    0x10(%r14),%rdi
   0x00007fed470478e1 <+2929>:	call   *%r15
   0x00007fed470478e4 <+2932>:	mov    0xd8(%rsp),%rcx
   0x00007fed470478ec <+2940>:	mov    %r13,-0x8(%rax)
   0x00007fed470478f0 <+2944>:	mov    %rax,%r14
   0x00007fed470478f3 <+2947>:	mov    0xd0(%rsp),%rax
   0x00007fed470478fb <+2955>:	movabs $0x7fed572abc10,%rdi
   0x00007fed47047905 <+2965>:	lea    0x88(%rsp),%rsi
   0x00007fed4704790d <+2973>:	mov    $0x8,%edx
   0x00007fed47047912 <+2978>:	vmovups 0x70(%rcx),%ymm0
   0x00007fed47047917 <+2983>:	vmovups %ymm0,0x70(%r14)
   0x00007fed4704791d <+2989>:	vmovups (%rcx),%ymm0
   0x00007fed47047921 <+2993>:	vmovups 0x20(%rcx),%ymm1
   0x00007fed47047926 <+2998>:	vmovups 0x40(%rcx),%ymm2
   0x00007fed4704792b <+3003>:	vmovups 0x60(%rcx),%ymm3
   0x00007fed47047930 <+3008>:	vmovups %ymm3,0x60(%r14)
   0x00007fed47047936 <+3014>:	vmovups %ymm2,0x40(%r14)
   0x00007fed4704793c <+3020>:	vmovups %ymm1,0x20(%r14)
   0x00007fed47047942 <+3026>:	vmovups %ymm0,(%r14)
   0x00007fed47047947 <+3031>:	vmovups 0x70(%rax),%ymm0
   0x00007fed4704794c <+3036>:	vmovups %ymm0,0x70(%rbx)
   0x00007fed47047951 <+3041>:	vmovups 0x20(%rax),%ymm1
   0x00007fed47047956 <+3046>:	vmovups (%rax),%ymm0
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed4704795a <+3050>:	vmovups 0x40(%rax),%ymm2
   0x00007fed4704795f <+3055>:	vmovups 0x60(%rax),%ymm3
   0x00007fed47047964 <+3060>:	mov    0x30(%rsp),%rax
   0x00007fed47047969 <+3065>:	mov    %r14,0x158(%rsp)
   0x00007fed47047971 <+3073>:	vmovups %ymm1,0x20(%rbx)
   0x00007fed47047976 <+3078>:	vmovaps 0x1e0(%rsp),%ymm1
   0x00007fed4704797f <+3087>:	vmovups %ymm3,0x60(%rbx)
   0x00007fed47047984 <+3092>:	vmovups %ymm2,0x40(%rbx)
   0x00007fed47047989 <+3097>:	vmovups %ymm0,(%rbx)
   0x00007fed4704798d <+3101>:	vmovups %ymm1,0x88(%rsp)
   0x00007fed47047996 <+3110>:	mov    %rbx,0xa8(%rsp)
   0x00007fed4704799e <+3118>:	mov    %r14,0xb0(%rsp)
   0x00007fed470479a6 <+3126>:	mov    %r12,0xb8(%rsp)
   0x00007fed470479ae <+3134>:	mov    %rax,0xc0(%rsp)
   0x00007fed470479b6 <+3142>:	movabs $0x7fed781d98d9,%rax
   0x00007fed470479c0 <+3152>:	vzeroupper 
   0x00007fed470479c3 <+3155>:	call   *%rax
   0x00007fed470479c5 <+3157>:	mov    0x18(%rsp),%r11
   0x00007fed470479ca <+3162>:	mov    0x38(%rsp),%r10
   0x00007fed470479cf <+3167>:	mov    0x10(%rax),%r13
   0x00007fed470479d3 <+3171>:	mov    0x28(%rsp),%rcx
   0x00007fed470479d8 <+3176>:	mov    (%rax),%r12
   0x00007fed470479db <+3179>:	mov    %r13,(%r10,%rcx,8)
   0x00007fed470479df <+3183>:	mov    %rbx,(%r11,%rcx,8)
   0x00007fed470479e3 <+3187>:	mov    -0x8(%r11),%rax
   0x00007fed470479e7 <+3191>:	not    %eax
   0x00007fed470479e9 <+3193>:	test   $0x3,%al
   0x00007fed470479eb <+3195>:	jne    0x7fed470479f9 <julia__all_1912+3209>
   0x00007fed470479ed <+3197>:	mov    -0x8(%rbx),%rax
   0x00007fed470479f1 <+3201>:	test   $0x1,%al
   0x00007fed470479f3 <+3203>:	je     0x7fed47047af0 <julia__all_1912+3456>
   0x00007fed470479f9 <+3209>:	mov    0x40(%rsp),%r9
   0x00007fed470479fe <+3214>:	mov    0x48(%rsp),%rbx
   0x00007fed47047a03 <+3219>:	mov    0x10(%rsp),%r8
   0x00007fed47047a08 <+3224>:	mov    0x8(%rsp),%rsi
   0x00007fed47047a0d <+3229>:	mov    %r14,(%r9,%rcx,8)
   0x00007fed47047a11 <+3233>:	mov    -0x8(%r9),%rax
   0x00007fed47047a15 <+3237>:	not    %eax
   0x00007fed47047a17 <+3239>:	test   $0x3,%al
   0x00007fed47047a19 <+3241>:	jne    0x7fed47047a27 <julia__all_1912+3255>
   0x00007fed47047a1b <+3243>:	mov    -0x8(%r14),%rax
   0x00007fed47047a1f <+3247>:	test   $0x1,%al
   0x00007fed47047a21 <+3249>:	je     0x7fed47047b13 <julia__all_1912+3491>
   0x00007fed47047a27 <+3255>:	mov    -0x8(%r10),%rax
   0x00007fed47047a2b <+3259>:	not    %eax
   0x00007fed47047a2d <+3261>:	test   $0x3,%al
   0x00007fed47047a2f <+3263>:	jne    0x7fed47047030 <julia__all_1912+704>
   0x00007fed47047a35 <+3269>:	mov    -0x8(%r13),%rax
   0x00007fed47047a39 <+3273>:	test   $0x1,%al
   0x00007fed47047a3b <+3275>:	jne    0x7fed47047030 <julia__all_1912+704>
   0x00007fed47047a41 <+3281>:	mov    %r10,%rdi
   0x00007fed47047a44 <+3284>:	movabs $0x7fed782475d0,%rax
   0x00007fed47047a4e <+3294>:	call   *%rax
   0x00007fed47047a50 <+3296>:	mov    0x8(%rsp),%rsi
   0x00007fed47047a55 <+3301>:	mov    0x10(%rsp),%r8
   0x00007fed47047a5a <+3306>:	mov    0x18(%rsp),%r11
   0x00007fed47047a5f <+3311>:	mov    0x40(%rsp),%r9
   0x00007fed47047a64 <+3316>:	mov    0x28(%rsp),%rcx
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed47047a69 <+3321>:	mov    0x38(%rsp),%r10
   0x00007fed47047a6e <+3326>:	jmp    0x7fed47047030 <julia__all_1912+704>
   0x00007fed47047a73 <+3331>:	mov    %r8,%rdi
   0x00007fed47047a76 <+3334>:	movabs $0x7fed782475d0,%rax
   0x00007fed47047a80 <+3344>:	mov    %r10,%r14
   0x00007fed47047a83 <+3347>:	mov    %rcx,%rbx
   0x00007fed47047a86 <+3350>:	vzeroupper 
   0x00007fed47047a89 <+3353>:	call   *%rax
   0x00007fed47047a8b <+3355>:	mov    0x8(%rsp),%rsi
   0x00007fed47047a90 <+3360>:	mov    0x10(%rsp),%r8
   0x00007fed47047a95 <+3365>:	mov    0x18(%rsp),%r11
   0x00007fed47047a9a <+3370>:	mov    0x40(%rsp),%r9
   0x00007fed47047a9f <+3375>:	mov    0x28(%rsp),%rdx
   0x00007fed47047aa4 <+3380>:	mov    %rbx,%rcx
   0x00007fed47047aa7 <+3383>:	mov    %r14,%r10
   0x00007fed47047aaa <+3386>:	jmp    0x7fed47047808 <julia__all_1912+2712>
   0x00007fed47047aaf <+3391>:	mov    %rsi,%rdi
   0x00007fed47047ab2 <+3394>:	movabs $0x7fed782475d0,%rax
   0x00007fed47047abc <+3404>:	mov    %r10,%rbx
   0x00007fed47047abf <+3407>:	mov    %r9,%r14
   0x00007fed47047ac2 <+3410>:	mov    %rcx,%r15
   0x00007fed47047ac5 <+3413>:	vzeroupper 
   0x00007fed47047ac8 <+3416>:	call   *%rax
   0x00007fed47047aca <+3418>:	mov    0x8(%rsp),%rsi
   0x00007fed47047acf <+3423>:	mov    0x10(%rsp),%r8
   0x00007fed47047ad4 <+3428>:	mov    0x18(%rsp),%r11
   0x00007fed47047ad9 <+3433>:	mov    %r15,%rcx
   0x00007fed47047adc <+3436>:	mov    %r14,%r9
   0x00007fed47047adf <+3439>:	mov    %rbx,%r10
   0x00007fed47047ae2 <+3442>:	test   %r12,%r12
   0x00007fed47047ae5 <+3445>:	jne    0x7fed47047830 <julia__all_1912+2752>
   0x00007fed47047aeb <+3451>:	jmp    0x7fed47047c22 <julia__all_1912+3762>
   0x00007fed47047af0 <+3456>:	mov    %r11,%rdi
   0x00007fed47047af3 <+3459>:	movabs $0x7fed782475d0,%rax
   0x00007fed47047afd <+3469>:	call   *%rax
   0x00007fed47047aff <+3471>:	mov    0x18(%rsp),%r11
   0x00007fed47047b04 <+3476>:	mov    0x28(%rsp),%rcx
   0x00007fed47047b09 <+3481>:	mov    0x38(%rsp),%r10
   0x00007fed47047b0e <+3486>:	jmp    0x7fed470479f9 <julia__all_1912+3209>
   0x00007fed47047b13 <+3491>:	mov    %r9,%rdi
   0x00007fed47047b16 <+3494>:	movabs $0x7fed782475d0,%rax
   0x00007fed47047b20 <+3504>:	call   *%rax
   0x00007fed47047b22 <+3506>:	mov    0x8(%rsp),%rsi
   0x00007fed47047b27 <+3511>:	mov    0x10(%rsp),%r8
   0x00007fed47047b2c <+3516>:	mov    0x18(%rsp),%r11
   0x00007fed47047b31 <+3521>:	mov    0x40(%rsp),%r9
   0x00007fed47047b36 <+3526>:	mov    0x28(%rsp),%rcx
   0x00007fed47047b3b <+3531>:	mov    0x38(%rsp),%r10
   0x00007fed47047b40 <+3536>:	jmp    0x7fed47047a27 <julia__all_1912+3255>
   0x00007fed47047b45 <+3541>:	mov    0x1b0(%rsp),%r13
   0x00007fed47047b4d <+3549>:	mov    0xc8(%rsp),%rbx
   0x00007fed47047b55 <+3557>:	mov    0x50(%rsp),%r8
   0x00007fed47047b5a <+3562>:	mov    0x48(%rsp),%r15
   0x00007fed47047b5f <+3567>:	xor    %eax,%eax
   0x00007fed47047b61 <+3569>:	jmp    0x7fed47047bb0 <julia__all_1912+3648>
   0x00007fed47047b63 <+3571>:	mov    %rsi,0x78(%rsp)
   0x00007fed47047b68 <+3576>:	mov    %rsi,0x10(%rsp)
   0x00007fed47047b6d <+3581>:	mov    %rsi,%r11
--Type <RET> for more, q to quit, c to continue without paging--
   0x00007fed47047b70 <+3584>:	mov    %rsi,%r9
   0x00007fed47047b73 <+3587>:	mov    %rsi,%r10
   0x00007fed47047b76 <+3590>:	mov    %rsi,%r12
   0x00007fed47047b79 <+3593>:	mov    %rsi,%rdi
   0x00007fed47047b7c <+3596>:	mov    %rsi,%r8
   0x00007fed47047b7f <+3599>:	mov    %rsi,%r14
   0x00007fed47047b82 <+3602>:	jmp    0x7fed47047bb0 <julia__all_1912+3648>
   0x00007fed47047b84 <+3604>:	mov    0x58(%rsp),%rdi
   0x00007fed47047b89 <+3609>:	mov    0x50(%rsp),%r8
   0x00007fed47047b8e <+3614>:	mov    0xc8(%rsp),%rbx
   0x00007fed47047b96 <+3622>:	movabs $0x7fed70687008,%rsi
   0x00007fed47047ba0 <+3632>:	mov    $0x1,%al
   0x00007fed47047ba2 <+3634>:	mov    %rsi,0x10(%rsp)
   0x00007fed47047ba7 <+3639>:	mov    %rsi,%r11
   0x00007fed47047baa <+3642>:	mov    %rsi,%r9
   0x00007fed47047bad <+3645>:	mov    %rsi,%r10
   0x00007fed47047bb0 <+3648>:	mov    0x108(%rsp),%rcx
   0x00007fed47047bb8 <+3656>:	mov    0x20(%rsp),%rdx
   0x00007fed47047bbd <+3661>:	and    $0x1,%al
   0x00007fed47047bbf <+3663>:	test   %rbx,%rbx
   0x00007fed47047bc2 <+3666>:	mov    %rcx,(%rdx)
   0x00007fed47047bc5 <+3669>:	mov    %rsi,0x68(%r13)
   0x00007fed47047bc9 <+3673>:	mov    0x70(%rsp),%rcx
   0x00007fed47047bce <+3678>:	mov    %rcx,0x60(%r13)
   0x00007fed47047bd2 <+3682>:	mov    %r15,0x58(%r13)
   0x00007fed47047bd6 <+3686>:	mov    %al,0x50(%r13)
   0x00007fed47047bda <+3690>:	mov    0x78(%rsp),%rax
   0x00007fed47047bdf <+3695>:	mov    %rax,0x48(%r13)
   0x00007fed47047be3 <+3699>:	sete   0x40(%r13)
   0x00007fed47047be8 <+3704>:	mov    0x10(%rsp),%rax
   0x00007fed47047bed <+3709>:	mov    %rax,0x38(%r13)
   0x00007fed47047bf1 <+3713>:	mov    %r11,0x30(%r13)
   0x00007fed47047bf5 <+3717>:	mov    %r9,0x28(%r13)
   0x00007fed47047bf9 <+3721>:	mov    %r10,0x20(%r13)
   0x00007fed47047bfd <+3725>:	mov    %r12,0x18(%r13)
   0x00007fed47047c01 <+3729>:	mov    %rdi,0x10(%r13)
   0x00007fed47047c05 <+3733>:	mov    %r8,0x8(%r13)
   0x00007fed47047c09 <+3737>:	mov    %r14,0x0(%r13)
   0x00007fed47047c0d <+3741>:	mov    %r13,%rax
   0x00007fed47047c10 <+3744>:	lea    -0x28(%rbp),%rsp
   0x00007fed47047c14 <+3748>:	pop    %rbx
   0x00007fed47047c15 <+3749>:	pop    %r12
   0x00007fed47047c17 <+3751>:	pop    %r13
   0x00007fed47047c19 <+3753>:	pop    %r14
   0x00007fed47047c1b <+3755>:	pop    %r15
   0x00007fed47047c1d <+3757>:	pop    %rbp
   0x00007fed47047c1e <+3758>:	vzeroupper 
   0x00007fed47047c21 <+3761>:	ret    
   0x00007fed47047c22 <+3762>:	movabs $0x7fed78203575,%rax
   0x00007fed47047c2c <+3772>:	movabs $0x7fed67d06880,%rdi
   0x00007fed47047c36 <+3782>:	vzeroupper 
   0x00007fed47047c39 <+3785>:	call   *%rax
   0x00007fed47047c3b <+3787>:	movabs $0x7fed78203575,%rax
   0x00007fed47047c45 <+3797>:	movabs $0x7fed67d06880,%rdi
   0x00007fed47047c4f <+3807>:	vzeroupper 
   0x00007fed47047c52 <+3810>:	call   *%rax
End of assembler dump.

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

; Function Attrs: mustprogress willreturn
define internal fastcc { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 } @augmented_julia__all_1912({ { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(144) %0, { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* nocapture align 8 %"'", {} addrspace(10)* noundef nonnull readonly align 16 dereferenceable(40) %1, {} addrspace(10)* align 16 %"'1") unnamed_addr #84 !dbg !5821 {
top:
  %2 = alloca { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, align 8
  %3 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }* %2, i32 0, i32 0
  %4 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 0
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %4, align 8
  %5 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 1
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %5, align 8
  %6 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 2
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %6, align 8
  %7 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 3
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %7, align 8
  %8 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 4
  %9 = bitcast {} addrspace(10)* addrspace(10)** %8 to {} addrspace(10)**
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %9, align 8
  %10 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 5
  %11 = bitcast {} addrspace(10)* addrspace(10)** %10 to {} addrspace(10)**
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %11, align 8
  %12 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 6
  %13 = bitcast {} addrspace(10)* addrspace(10)** %12 to {} addrspace(10)**
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %13, align 8
  %14 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 7
  %15 = bitcast {} addrspace(10)* addrspace(10)** %14 to {} addrspace(10)**
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %15, align 8
  %16 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 9
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %16, align 8
  %17 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 13
  %18 = bitcast {} addrspace(10)* addrspace(10)** %17 to {} addrspace(10)**
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %18, align 8
  %"iv'ac" = alloca i64, align 8
  %loopLimit_cache = alloca i64, align 8
  %_cache = alloca {} addrspace(10)* addrspace(10)*, align 8
  %"'mi13_cache" = alloca {} addrspace(10)* addrspace(10)*, align 8
  %_cache14 = alloca {} addrspace(10)* addrspace(10)*, align 8
  %"'ipl16_cache" = alloca {} addrspace(10)* addrspace(10)*, align 8
  %.not16_cache = alloca i1*, align 8
  %.not17_cache = alloca i1*, align 8
  %_cache18 = alloca {} addrspace(10)* addrspace(10)*, align 8
  %19 = call {}*** @julia.get_pgcstack()
  %20 = call {}*** @julia.get_pgcstack()
  %21 = call {}*** @julia.get_pgcstack()
  %22 = call {}*** @julia.get_pgcstack()
  %23 = call {}*** @julia.get_pgcstack()
  %24 = call {}*** @julia.get_pgcstack()
  %25 = call {}*** @julia.get_pgcstack()
  %26 = call {}*** @julia.get_pgcstack()
  %27 = call {}*** @julia.get_pgcstack()
  %28 = call {}*** @julia.get_pgcstack()
  %29 = call {}*** @julia.get_pgcstack()
  %30 = call {}*** @julia.get_pgcstack()
  %31 = call {}*** @julia.get_pgcstack()
  %32 = call {}*** @julia.get_pgcstack()
  %33 = call {}*** @julia.get_pgcstack()
  %34 = call {}*** @julia.get_pgcstack() #86
  %35 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !5826
  %36 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %35 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !5826
  %37 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %36, i64 0, i32 1, !dbg !5826
  %38 = load i64, i64 addrspace(11)* %37, align 8, !dbg !5826, !tbaa !162, !range !165, !alias.scope !5830, !noalias !5833
  %.not = icmp eq i64 %38, 0, !dbg !5835
  %39 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 8, !dbg !5827
  store i1 %.not, i1* %39, align 1, !dbg !5827
  br i1 %.not, label %common.ret, label %L9, !dbg !5827

L9:                                               ; preds = %top
  %"'ipc" = bitcast {} addrspace(10)* %"'1" to {} addrspace(10)* addrspace(13)* addrspace(10)*, !dbg !5837
  %40 = bitcast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(13)* addrspace(10)*, !dbg !5837
  %"'ipc8" = addrspacecast {} addrspace(10)* addrspace(13)* addrspace(10)* %"'ipc" to {} addrspace(10)* addrspace(13)* addrspace(11)*, !dbg !5837
  %41 = addrspacecast {} addrspace(10)* addrspace(13)* addrspace(10)* %40 to {} addrspace(10)* addrspace(13)* addrspace(11)*, !dbg !5837
  %"'ipl9" = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %"'ipc8", align 8, !dbg !5837, !tbaa !198, !alias.scope !5838, !noalias !5841, !nonnull !93
  %42 = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %41, align 8, !dbg !5837, !tbaa !198, !alias.scope !5842, !noalias !5833, !nonnull !93
  %"'ipl" = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %"'ipl9", align 8, !dbg !5837, !tbaa !648, !alias.scope !5843, !noalias !5846
  %43 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 0, !dbg !5837
  store {} addrspace(10)* %"'ipl", {} addrspace(10)** %43, align 8, !dbg !5837
  %44 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %42, align 8, !dbg !5837, !tbaa !648, !alias.scope !5848, !noalias !5849
  %45 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 9, !dbg !5837
  store {} addrspace(10)* %44, {} addrspace(10)** %45, align 8, !dbg !5837
  %.not14 = icmp eq {} addrspace(10)* %44, null, !dbg !5837
  br i1 %.not14, label %fail, label %L17, !dbg !5837

L17:                                              ; preds = %L9
  %current_task515 = getelementptr inbounds {}**, {}*** %34, i64 -13, !dbg !5850
  %current_task5 = bitcast {}*** %current_task515 to {}**, !dbg !5850
  %46 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
  %47 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 3, !dbg !5850
  store {} addrspace(10)* %46, {} addrspace(10)** %47, align 8, !dbg !5850
  %"'mi" = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
  %48 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 2, !dbg !5850
  store {} addrspace(10)* %"'mi", {} addrspace(10)** %48, align 8, !dbg !5850
  %49 = bitcast {} addrspace(10)* %"'mi" to i8 addrspace(10)*, !dbg !5850
  call void @llvm.memset.p10i8.i64(i8 addrspace(10)* nonnull dereferenceable(144) dereferenceable_or_null(144) %49, i8 0, i64 144, i1 false), !dbg !5850
  %"'ipc10" = bitcast {} addrspace(10)* %"'mi" to i8 addrspace(10)*, !dbg !5850
  %50 = bitcast {} addrspace(10)* %46 to i8 addrspace(10)*, !dbg !5850
  %"'ipc11" = bitcast { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* %"'" to i8 addrspace(11)*, !dbg !5850
  %51 = bitcast { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* %0 to i8 addrspace(11)*, !dbg !5850
  call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %"'ipc10", i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %"'ipc11", i64 144, i1 false) #86, !dbg !5850
  call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %50, i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %51, i64 144, i1 false) #86, !dbg !5850, !tbaa !327, !alias.scope !651, !noalias !5851
  %52 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @ijl_apply_generic, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657346395152 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657545686608 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954512 to {}*) to {} addrspace(10)*), {} addrspace(10)* %46, {} addrspace(10)* %"'mi", {} addrspace(10)* %44, {} addrspace(10)* %"'ipl"), !dbg !5850
  %53 = addrspacecast {} addrspace(10)* %52 to {} addrspace(11)*, !dbg !5850
  %54 = bitcast {} addrspace(11)* %53 to [3 x {} addrspace(10)*] addrspace(11)*, !dbg !5850
  %55 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %54, i64 0, i64 1, !dbg !5850
  %56 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %55, align 8, !dbg !5850
  %57 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %54, i64 0, i64 0, !dbg !5850
  %58 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %57, align 8, !dbg !5850
  %59 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %54, i64 0, i64 2, !dbg !5850
  %60 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %59, align 8, !dbg !5850
  %61 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 1, !dbg !5854
  store {} addrspace(10)* %60, {} addrspace(10)** %61, align 8, !dbg !5854
  %62 = bitcast {} addrspace(10)* %58 to i8 addrspace(10)*, !dbg !5854
  %63 = load i8, i8 addrspace(10)* %62, align 1, !dbg !5854, !tbaa !399, !range !654, !alias.scope !5855, !noalias !5858
  %.not1622 = icmp eq i8 %63, 0, !dbg !5854
  %64 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 10, !dbg !5854
  store i1 %.not1622, i1* %64, align 1, !dbg !5854
  br i1 %.not1622, label %common.ret, label %L23.preheader, !dbg !5854

L23.preheader:                                    ; preds = %L17
  store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5860
  store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5860
  store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5860
  store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5860
  store i1* null, i1** %.not16_cache, align 8, !dbg !5860
  store i1* null, i1** %.not17_cache, align 8, !dbg !5860
  store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5860
  br label %L23, !dbg !5860

L19:                                              ; preds = %L32
  %65 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
  %"'mi13" = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
  %66 = bitcast {} addrspace(10)* %"'mi13" to i8 addrspace(10)*, !dbg !5850
  call void @llvm.memset.p10i8.i64(i8 addrspace(10)* nonnull dereferenceable(144) dereferenceable_or_null(144) %66, i8 0, i64 144, i1 false), !dbg !5850
  %"'ipc12" = bitcast {} addrspace(10)* %"'mi13" to i8 addrspace(10)*, !dbg !5850
  %67 = bitcast {} addrspace(10)* %65 to i8 addrspace(10)*, !dbg !5850
  call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %"'ipc12", i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %"'ipc11", i64 144, i1 false) #86, !dbg !5850
  call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %67, i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %51, i64 144, i1 false) #86, !dbg !5850, !tbaa !327, !alias.scope !651, !noalias !5851
  %68 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @ijl_apply_generic, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657346395152 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657545686608 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954512 to {}*) to {} addrspace(10)*), {} addrspace(10)* %65, {} addrspace(10)* %"'mi13", {} addrspace(10)* %308, {} addrspace(10)* %"'ipl16"), !dbg !5850
  %69 = addrspacecast {} addrspace(10)* %68 to {} addrspace(11)*, !dbg !5850
  %70 = bitcast {} addrspace(11)* %69 to [3 x {} addrspace(10)*] addrspace(11)*, !dbg !5850
  %71 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %70, i64 0, i64 1, !dbg !5850
  %72 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %71, align 8, !dbg !5850
  %73 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %70, i64 0, i64 0, !dbg !5850
  %74 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %73, align 8, !dbg !5850
  %75 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %70, i64 0, i64 2, !dbg !5850
  %76 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %75, align 8, !dbg !5850
  %77 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5850, !dereferenceable !140, !invariant.group !5862
  %78 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %77, i64 %iv, !dbg !5850
  store {} addrspace(10)* %76, {} addrspace(10)* addrspace(10)* %78, align 8, !dbg !5850, !invariant.group !5863
  %79 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5850, !dereferenceable !140, !invariant.group !5864
  %80 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %79, i64 %iv, !dbg !5850
  store {} addrspace(10)* %65, {} addrspace(10)* addrspace(10)* %80, align 8, !dbg !5850, !invariant.group !5865
  %81 = bitcast {} addrspace(10)* addrspace(10)* %79 to {} addrspace(10)*, !dbg !5850
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %81, {} addrspace(10)* %65), !dbg !5850
  %82 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5850, !dereferenceable !140, !invariant.group !5866
  %83 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %82, i64 %iv, !dbg !5850
  store {} addrspace(10)* %"'mi13", {} addrspace(10)* addrspace(10)* %83, align 8, !dbg !5850, !invariant.group !5867
  %84 = bitcast {} addrspace(10)* addrspace(10)* %82 to {} addrspace(10)*, !dbg !5850
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %84, {} addrspace(10)* %"'mi13"), !dbg !5850
  %85 = bitcast {} addrspace(10)* addrspace(10)* %77 to {} addrspace(10)*, !dbg !5850
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %85, {} addrspace(10)* %76), !dbg !5850
  %86 = bitcast {} addrspace(10)* %74 to i8 addrspace(10)*, !dbg !5854
  %87 = load i8, i8 addrspace(10)* %86, align 1, !dbg !5854, !tbaa !399, !range !654, !alias.scope !5868, !noalias !5871
  %.not16 = icmp eq i8 %87, 0, !dbg !5854
  %88 = load i1*, i1** %.not16_cache, align 8, !dbg !5854, !dereferenceable !140, !invariant.group !5873
  %89 = getelementptr inbounds i1, i1* %88, i64 %iv, !dbg !5854
  store i1 %.not16, i1* %89, align 1, !dbg !5854, !invariant.group !5874
  br i1 %.not16, label %common.ret.loopexit, label %L23, !dbg !5854

L23:                                              ; preds = %L19, %L23.preheader
  %iv = phi i64 [ 0, %L23.preheader ], [ %iv.next, %L19 ]
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !5875
  %90 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5875
  %91 = bitcast {} addrspace(10)* addrspace(10)* %90 to {} addrspace(10)*, !dbg !5875
  %92 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
  %93 = and i64 %iv.next, 1, !dbg !5875
  %94 = icmp ne i64 %93, 0, !dbg !5875
  %95 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
  %96 = icmp ult i64 %95, 3, !dbg !5875
  %97 = and i1 %96, %94, !dbg !5875
  br i1 %97, label %grow.i, label %"[email protected]", !dbg !5875

grow.i:                                           ; preds = %L23
  %98 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
  %99 = sub nuw nsw i64 64, %98, !dbg !5875
  %100 = shl i64 8, %99, !dbg !5875
  %101 = lshr i64 %100, 1, !dbg !5875
  %102 = icmp eq i64 %iv.next, 1, !dbg !5875
  %103 = select i1 %102, i64 0, i64 %101, !dbg !5875
  %104 = udiv exact i64 %100, 8, !dbg !5875
  %105 = call {} addrspace(10)* @ijl_box_int64(i64 %104) #88, !dbg !5875
  %106 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %105, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
  %107 = bitcast {}*** %92 to {}**, !dbg !5875
  %108 = getelementptr inbounds {}*, {}** %107, i64 -13, !dbg !5875
  %109 = getelementptr inbounds {}*, {}** %108, i64 15, !dbg !5875
  %110 = bitcast {}** %109 to i8**, !dbg !5875
  %111 = load i8*, i8** %110, align 8, !dbg !5875
  %112 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %111, i64 %100, {} addrspace(10)* %106) #88, !dbg !5875
  %113 = sub i64 %100, %103, !dbg !5875
  %114 = bitcast {} addrspace(10)* %112 to i8 addrspace(10)*, !dbg !5875
  %115 = getelementptr inbounds i8, i8 addrspace(10)* %114, i64 %103, !dbg !5875
  %116 = bitcast i8 addrspace(10)* %115 to {} addrspace(10)*, !dbg !5875
  call void @zeroType.38({} addrspace(10)* %116, i8 0, i64 %113) #88, !dbg !5875
  %117 = bitcast {} addrspace(10)* %112 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %118 = bitcast {} addrspace(10)* addrspace(10)* %117 to i8 addrspace(10)*, !dbg !5875
  %119 = bitcast {} addrspace(10)* %91 to i8 addrspace(10)*, !dbg !5875
  call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %118, i8 addrspace(10)* %119, i64 %103, i1 false) #88, !dbg !5875
  %120 = bitcast i8 addrspace(10)* %118 to {} addrspace(10)*, !dbg !5875
  br label %"[email protected]", !dbg !5875

"[email protected]": ; preds = %L23, %grow.i
  %121 = phi {} addrspace(10)* [ %120, %grow.i ], [ %91, %L23 ], !dbg !5875
  %122 = bitcast {} addrspace(10)* %121 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %123 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 13, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %122, {} addrspace(10)* addrspace(10)** %123, align 8, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %122, {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5875
  %124 = load i1*, i1** %.not17_cache, align 8, !dbg !5875
  %125 = bitcast i1* %124 to i8*, !dbg !5875
  %126 = and i64 %iv.next, 1, !dbg !5875
  %127 = icmp ne i64 %126, 0, !dbg !5875
  %128 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
  %129 = icmp ult i64 %128, 3, !dbg !5875
  %130 = and i1 %129, %127, !dbg !5875
  br i1 %130, label %grow.i1, label %__enzyme_exponentialallocationzero.exit, !dbg !5875

grow.i1:                                          ; preds = %"[email protected]"
  %131 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
  %132 = sub nuw nsw i64 64, %131, !dbg !5875
  %133 = shl i64 1, %132, !dbg !5875
  %134 = lshr i64 %133, 1, !dbg !5875
  %135 = icmp eq i64 %iv.next, 1, !dbg !5875
  %136 = select i1 %135, i64 0, i64 %134, !dbg !5875
  %137 = call i8* @realloc(i8* %125, i64 %133) #88, !dbg !5875
  %138 = sub i64 %133, %136, !dbg !5875
  %139 = getelementptr inbounds i8, i8* %137, i64 %136, !dbg !5875
  call void @llvm.memset.p0i8.i64(i8* %139, i8 0, i64 %138, i1 false) #88, !dbg !5875
  br label %__enzyme_exponentialallocationzero.exit, !dbg !5875

__enzyme_exponentialallocationzero.exit:          ; preds = %"[email protected]", %grow.i1
  %140 = phi i8* [ %137, %grow.i1 ], [ %125, %"[email protected]" ], !dbg !5875
  %141 = bitcast i8* %140 to i1*, !dbg !5875
  %142 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 12, !dbg !5875
  store i1* %141, i1** %142, align 8, !dbg !5875
  store i1* %141, i1** %.not17_cache, align 1, !dbg !5875
  %143 = load i1*, i1** %.not16_cache, align 8, !dbg !5875
  %144 = bitcast i1* %143 to i8*, !dbg !5875
  %145 = and i64 %iv.next, 1, !dbg !5875
  %146 = icmp ne i64 %145, 0, !dbg !5875
  %147 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
  %148 = icmp ult i64 %147, 3, !dbg !5875
  %149 = and i1 %148, %146, !dbg !5875
  br i1 %149, label %grow.i2, label %__enzyme_exponentialallocationzero.exit3, !dbg !5875

grow.i2:                                          ; preds = %__enzyme_exponentialallocationzero.exit
  %150 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
  %151 = sub nuw nsw i64 64, %150, !dbg !5875
  %152 = shl i64 1, %151, !dbg !5875
  %153 = lshr i64 %152, 1, !dbg !5875
  %154 = icmp eq i64 %iv.next, 1, !dbg !5875
  %155 = select i1 %154, i64 0, i64 %153, !dbg !5875
  %156 = call i8* @realloc(i8* %144, i64 %152) #88, !dbg !5875
  %157 = sub i64 %152, %155, !dbg !5875
  %158 = getelementptr inbounds i8, i8* %156, i64 %155, !dbg !5875
  call void @llvm.memset.p0i8.i64(i8* %158, i8 0, i64 %157, i1 false) #88, !dbg !5875
  br label %__enzyme_exponentialallocationzero.exit3, !dbg !5875

__enzyme_exponentialallocationzero.exit3:         ; preds = %__enzyme_exponentialallocationzero.exit, %grow.i2
  %159 = phi i8* [ %156, %grow.i2 ], [ %144, %__enzyme_exponentialallocationzero.exit ], !dbg !5875
  %160 = bitcast i8* %159 to i1*, !dbg !5875
  %161 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 11, !dbg !5875
  store i1* %160, i1** %161, align 8, !dbg !5875
  store i1* %160, i1** %.not16_cache, align 1, !dbg !5875
  %162 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5875
  %163 = bitcast {} addrspace(10)* addrspace(10)* %162 to {} addrspace(10)*, !dbg !5875
  %164 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
  %165 = and i64 %iv.next, 1, !dbg !5875
  %166 = icmp ne i64 %165, 0, !dbg !5875
  %167 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
  %168 = icmp ult i64 %167, 3, !dbg !5875
  %169 = and i1 %168, %166, !dbg !5875
  br i1 %169, label %grow.i4, label %"[email protected]", !dbg !5875

grow.i4:                                          ; preds = %__enzyme_exponentialallocationzero.exit3
  %170 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
  %171 = sub nuw nsw i64 64, %170, !dbg !5875
  %172 = shl i64 8, %171, !dbg !5875
  %173 = lshr i64 %172, 1, !dbg !5875
  %174 = icmp eq i64 %iv.next, 1, !dbg !5875
  %175 = select i1 %174, i64 0, i64 %173, !dbg !5875
  %176 = udiv exact i64 %172, 8, !dbg !5875
  %177 = call {} addrspace(10)* @ijl_box_int64(i64 %176) #88, !dbg !5875
  %178 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %177, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
  %179 = bitcast {}*** %164 to {}**, !dbg !5875
  %180 = getelementptr inbounds {}*, {}** %179, i64 -13, !dbg !5875
  %181 = getelementptr inbounds {}*, {}** %180, i64 15, !dbg !5875
  %182 = bitcast {}** %181 to i8**, !dbg !5875
  %183 = load i8*, i8** %182, align 8, !dbg !5875
  %184 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %183, i64 %172, {} addrspace(10)* %178) #88, !dbg !5875
  %185 = sub i64 %172, %175, !dbg !5875
  %186 = bitcast {} addrspace(10)* %184 to i8 addrspace(10)*, !dbg !5875
  %187 = getelementptr inbounds i8, i8 addrspace(10)* %186, i64 %175, !dbg !5875
  %188 = bitcast i8 addrspace(10)* %187 to {} addrspace(10)*, !dbg !5875
  call void @zeroType.38({} addrspace(10)* %188, i8 0, i64 %185) #88, !dbg !5875
  %189 = bitcast {} addrspace(10)* %184 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %190 = bitcast {} addrspace(10)* addrspace(10)* %189 to i8 addrspace(10)*, !dbg !5875
  %191 = bitcast {} addrspace(10)* %163 to i8 addrspace(10)*, !dbg !5875
  call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %190, i8 addrspace(10)* %191, i64 %175, i1 false) #88, !dbg !5875
  %192 = bitcast i8 addrspace(10)* %190 to {} addrspace(10)*, !dbg !5875
  br label %"[email protected]", !dbg !5875

"[email protected]": ; preds = %__enzyme_exponentialallocationzero.exit3, %grow.i4
  %193 = phi {} addrspace(10)* [ %192, %grow.i4 ], [ %163, %__enzyme_exponentialallocationzero.exit3 ], !dbg !5875
  %194 = bitcast {} addrspace(10)* %193 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %195 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 7, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %194, {} addrspace(10)* addrspace(10)** %195, align 8, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %194, {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5875
  %196 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5875
  %197 = bitcast {} addrspace(10)* addrspace(10)* %196 to {} addrspace(10)*, !dbg !5875
  %198 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
  %199 = and i64 %iv.next, 1, !dbg !5875
  %200 = icmp ne i64 %199, 0, !dbg !5875
  %201 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
  %202 = icmp ult i64 %201, 3, !dbg !5875
  %203 = and i1 %202, %200, !dbg !5875
  br i1 %203, label %grow.i6, label %"[email protected]", !dbg !5875

grow.i6:                                          ; preds = %"[email protected]"
  %204 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
  %205 = sub nuw nsw i64 64, %204, !dbg !5875
  %206 = shl i64 8, %205, !dbg !5875
  %207 = lshr i64 %206, 1, !dbg !5875
  %208 = icmp eq i64 %iv.next, 1, !dbg !5875
  %209 = select i1 %208, i64 0, i64 %207, !dbg !5875
  %210 = udiv exact i64 %206, 8, !dbg !5875
  %211 = call {} addrspace(10)* @ijl_box_int64(i64 %210) #88, !dbg !5875
  %212 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %211, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
  %213 = bitcast {}*** %198 to {}**, !dbg !5875
  %214 = getelementptr inbounds {}*, {}** %213, i64 -13, !dbg !5875
  %215 = getelementptr inbounds {}*, {}** %214, i64 15, !dbg !5875
  %216 = bitcast {}** %215 to i8**, !dbg !5875
  %217 = load i8*, i8** %216, align 8, !dbg !5875
  %218 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %217, i64 %206, {} addrspace(10)* %212) #88, !dbg !5875
  %219 = sub i64 %206, %209, !dbg !5875
  %220 = bitcast {} addrspace(10)* %218 to i8 addrspace(10)*, !dbg !5875
  %221 = getelementptr inbounds i8, i8 addrspace(10)* %220, i64 %209, !dbg !5875
  %222 = bitcast i8 addrspace(10)* %221 to {} addrspace(10)*, !dbg !5875
  call void @zeroType.38({} addrspace(10)* %222, i8 0, i64 %219) #88, !dbg !5875
  %223 = bitcast {} addrspace(10)* %218 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %224 = bitcast {} addrspace(10)* addrspace(10)* %223 to i8 addrspace(10)*, !dbg !5875
  %225 = bitcast {} addrspace(10)* %197 to i8 addrspace(10)*, !dbg !5875
  call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %224, i8 addrspace(10)* %225, i64 %209, i1 false) #88, !dbg !5875
  %226 = bitcast i8 addrspace(10)* %224 to {} addrspace(10)*, !dbg !5875
  br label %"[email protected]", !dbg !5875

"[email protected]": ; preds = %"[email protected]", %grow.i6
  %227 = phi {} addrspace(10)* [ %226, %grow.i6 ], [ %197, %"[email protected]" ], !dbg !5875
  %228 = bitcast {} addrspace(10)* %227 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %229 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 6, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %228, {} addrspace(10)* addrspace(10)** %229, align 8, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %228, {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5875
  %230 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5875
  %231 = bitcast {} addrspace(10)* addrspace(10)* %230 to {} addrspace(10)*, !dbg !5875
  %232 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
  %233 = and i64 %iv.next, 1, !dbg !5875
  %234 = icmp ne i64 %233, 0, !dbg !5875
  %235 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
  %236 = icmp ult i64 %235, 3, !dbg !5875
  %237 = and i1 %236, %234, !dbg !5875
  br i1 %237, label %grow.i8, label %"[email protected]", !dbg !5875

grow.i8:                                          ; preds = %"[email protected]"
  %238 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
  %239 = sub nuw nsw i64 64, %238, !dbg !5875
  %240 = shl i64 8, %239, !dbg !5875
  %241 = lshr i64 %240, 1, !dbg !5875
  %242 = icmp eq i64 %iv.next, 1, !dbg !5875
  %243 = select i1 %242, i64 0, i64 %241, !dbg !5875
  %244 = udiv exact i64 %240, 8, !dbg !5875
  %245 = call {} addrspace(10)* @ijl_box_int64(i64 %244) #88, !dbg !5875
  %246 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %245, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
  %247 = bitcast {}*** %232 to {}**, !dbg !5875
  %248 = getelementptr inbounds {}*, {}** %247, i64 -13, !dbg !5875
  %249 = getelementptr inbounds {}*, {}** %248, i64 15, !dbg !5875
  %250 = bitcast {}** %249 to i8**, !dbg !5875
  %251 = load i8*, i8** %250, align 8, !dbg !5875
  %252 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %251, i64 %240, {} addrspace(10)* %246) #88, !dbg !5875
  %253 = sub i64 %240, %243, !dbg !5875
  %254 = bitcast {} addrspace(10)* %252 to i8 addrspace(10)*, !dbg !5875
  %255 = getelementptr inbounds i8, i8 addrspace(10)* %254, i64 %243, !dbg !5875
  %256 = bitcast i8 addrspace(10)* %255 to {} addrspace(10)*, !dbg !5875
  call void @zeroType.38({} addrspace(10)* %256, i8 0, i64 %253) #88, !dbg !5875
  %257 = bitcast {} addrspace(10)* %252 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %258 = bitcast {} addrspace(10)* addrspace(10)* %257 to i8 addrspace(10)*, !dbg !5875
  %259 = bitcast {} addrspace(10)* %231 to i8 addrspace(10)*, !dbg !5875
  call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %258, i8 addrspace(10)* %259, i64 %243, i1 false) #88, !dbg !5875
  %260 = bitcast i8 addrspace(10)* %258 to {} addrspace(10)*, !dbg !5875
  br label %"[email protected]", !dbg !5875

"[email protected]": ; preds = %"[email protected]", %grow.i8
  %261 = phi {} addrspace(10)* [ %260, %grow.i8 ], [ %231, %"[email protected]" ], !dbg !5875
  %262 = bitcast {} addrspace(10)* %261 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %263 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 5, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %262, {} addrspace(10)* addrspace(10)** %263, align 8, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %262, {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5875
  %264 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5875
  %265 = bitcast {} addrspace(10)* addrspace(10)* %264 to {} addrspace(10)*, !dbg !5875
  %266 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
  %267 = and i64 %iv.next, 1, !dbg !5875
  %268 = icmp ne i64 %267, 0, !dbg !5875
  %269 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
  %270 = icmp ult i64 %269, 3, !dbg !5875
  %271 = and i1 %270, %268, !dbg !5875
  br i1 %271, label %grow.i10, label %"[email protected]", !dbg !5875

grow.i10:                                         ; preds = %"[email protected]"
  %272 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
  %273 = sub nuw nsw i64 64, %272, !dbg !5875
  %274 = shl i64 8, %273, !dbg !5875
  %275 = lshr i64 %274, 1, !dbg !5875
  %276 = icmp eq i64 %iv.next, 1, !dbg !5875
  %277 = select i1 %276, i64 0, i64 %275, !dbg !5875
  %278 = udiv exact i64 %274, 8, !dbg !5875
  %279 = call {} addrspace(10)* @ijl_box_int64(i64 %278) #88, !dbg !5875
  %280 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %279, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
  %281 = bitcast {}*** %266 to {}**, !dbg !5875
  %282 = getelementptr inbounds {}*, {}** %281, i64 -13, !dbg !5875
  %283 = getelementptr inbounds {}*, {}** %282, i64 15, !dbg !5875
  %284 = bitcast {}** %283 to i8**, !dbg !5875
  %285 = load i8*, i8** %284, align 8, !dbg !5875
  %286 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %285, i64 %274, {} addrspace(10)* %280) #88, !dbg !5875
  %287 = sub i64 %274, %277, !dbg !5875
  %288 = bitcast {} addrspace(10)* %286 to i8 addrspace(10)*, !dbg !5875
  %289 = getelementptr inbounds i8, i8 addrspace(10)* %288, i64 %277, !dbg !5875
  %290 = bitcast i8 addrspace(10)* %289 to {} addrspace(10)*, !dbg !5875
  call void @zeroType.38({} addrspace(10)* %290, i8 0, i64 %287) #88, !dbg !5875
  %291 = bitcast {} addrspace(10)* %286 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %292 = bitcast {} addrspace(10)* addrspace(10)* %291 to i8 addrspace(10)*, !dbg !5875
  %293 = bitcast {} addrspace(10)* %265 to i8 addrspace(10)*, !dbg !5875
  call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %292, i8 addrspace(10)* %293, i64 %277, i1 false) #88, !dbg !5875
  %294 = bitcast i8 addrspace(10)* %292 to {} addrspace(10)*, !dbg !5875
  br label %"[email protected]", !dbg !5875

"[email protected]": ; preds = %"[email protected]", %grow.i10
  %295 = phi {} addrspace(10)* [ %294, %grow.i10 ], [ %265, %"[email protected]" ], !dbg !5875
  %296 = bitcast {} addrspace(10)* %295 to {} addrspace(10)* addrspace(10)*, !dbg !5875
  %297 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 4, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %296, {} addrspace(10)* addrspace(10)** %297, align 8, !dbg !5875
  store {} addrspace(10)* addrspace(10)* %296, {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5875
  %298 = add i64 %iv, 2, !dbg !5875
  %299 = add nsw i64 %298, -1, !dbg !5875
  %300 = load i64, i64 addrspace(11)* %37, align 8, !dbg !5877, !tbaa !162, !range !165, !alias.scope !5830, !noalias !5833
  %.not17 = icmp ult i64 %299, %300, !dbg !5878
  %301 = load i1*, i1** %.not17_cache, align 8, !dbg !5860, !dereferenceable !140, !invariant.group !5880
  %302 = getelementptr inbounds i1, i1* %301, i64 %iv, !dbg !5860
  store i1 %.not17, i1* %302, align 1, !dbg !5860, !invariant.group !5881
  br i1 %.not17, label %L32, label %common.ret.loopexit, !dbg !5860

L32:                                              ; preds = %"[email protected]"
  %"'ipl17" = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %"'ipc8", align 8, !dbg !5882, !tbaa !198, !alias.scope !5838, !noalias !5841, !nonnull !93
  %303 = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %41, align 8, !dbg !5882, !tbaa !198, !alias.scope !5842, !noalias !5833, !nonnull !93
  %"'ipg" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %"'ipl17", i64 %299, !dbg !5882
  %304 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %303, i64 %299, !dbg !5882
  %"'ipl16" = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %"'ipg", align 8, !dbg !5882, !tbaa !648, !alias.scope !5883, !noalias !5886
  %305 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5882, !dereferenceable !140, !invariant.group !5888
  %306 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %305, i64 %iv, !dbg !5882
  store {} addrspace(10)* %"'ipl16", {} addrspace(10)* addrspace(10)* %306, align 8, !dbg !5882, !tbaa !648, !invariant.group !5889
  %307 = bitcast {} addrspace(10)* addrspace(10)* %305 to {} addrspace(10)*, !dbg !5882
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %307, {} addrspace(10)* %"'ipl16"), !dbg !5882
  %308 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %304, align 8, !dbg !5882, !tbaa !648, !alias.scope !5890, !noalias !5891
  %309 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5882, !dereferenceable !140, !invariant.group !5892
  %310 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %309, i64 %iv, !dbg !5882
  store {} addrspace(10)* %308, {} addrspace(10)* addrspace(10)* %310, align 8, !dbg !5882, !tbaa !648, !invariant.group !5893
  %311 = bitcast {} addrspace(10)* addrspace(10)* %309 to {} addrspace(10)*, !dbg !5882
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %311, {} addrspace(10)* %308), !dbg !5882
  %.not18 = icmp eq {} addrspace(10)* %308, null, !dbg !5882
  br i1 %.not18, label %fail6, label %L19, !dbg !5882

common.ret.loopexit:                              ; preds = %"[email protected]", %L19
  %312 = phi i64 [ %iv, %"[email protected]" ], [ %iv, %L19 ]
  %common.ret.op.ph = phi i8 [ 0, %L19 ], [ 1, %"[email protected]" ]
  store i64 %312, i64* %loopLimit_cache, align 8, !dbg !5894, !invariant.group !5895
  br label %common.ret, !dbg !5894

common.ret:                                       ; preds = %common.ret.loopexit, %L17, %top
  %common.ret.op = phi i8 [ 1, %top ], [ 0, %L17 ], [ %common.ret.op.ph, %common.ret.loopexit ]
  %313 = insertvalue { {} addrspace(10)*, i8 } undef, i8 %common.ret.op, 1, !dbg !5894
  %314 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }* %2, i32 0, i32 1, !dbg !5894
  store i8 %common.ret.op, i8* %314, align 1, !dbg !5894
  %315 = load { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }* %2, align 8, !dbg !5894
  ret { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 } %315, !dbg !5894

fail:                                             ; preds = %L9
  call void @ijl_throw({} addrspace(12)* addrspacecast ({}* inttoptr (i64 140657625688192 to {}*) to {} addrspace(12)*)) #86, !dbg !5837
  unreachable, !dbg !5837

fail6:                                            ; preds = %L32
  call void @ijl_throw({} addrspace(12)* addrspacecast ({}* inttoptr (i64 140657625688192 to {}*) to {} addrspace(12)*)) #86, !dbg !5882
  unreachable, !dbg !5882
}

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

after simplification :
; Function Attrs: mustprogress willreturn
define internal fastcc i8 @preprocess_julia__all_1912({ { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(144) %0, {} addrspace(10)* noundef nonnull readonly align 16 dereferenceable(40) %1) unnamed_addr #84 !dbg !5792 {
top:
  %2 = call {}*** @julia.get_pgcstack() #85
  call void @llvm.dbg.value(metadata {} addrspace(10)* null, metadata !5796, metadata !DIExpression(DW_OP_deref)) #85, !dbg !5797
  call void @llvm.dbg.declare(metadata { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* %0, metadata !5795, metadata !DIExpression(DW_OP_deref)) #85, !dbg !5798
  call void @llvm.dbg.value(metadata {} addrspace(10)* %1, metadata !5796, metadata !DIExpression(DW_OP_deref)) #85, !dbg !5797
  %3 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !5799
  %4 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %3 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !5799
  %5 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %4, i64 0, i32 1, !dbg !5799
  %6 = load i64, i64 addrspace(11)* %5, align 8, !dbg !5799, !tbaa !162, !range !165, !alias.scope !166, !noalias !167
  %.not = icmp eq i64 %6, 0, !dbg !5803
  br i1 %.not, label %common.ret, label %L9, !dbg !5800

L9:                                               ; preds = %top
  %7 = bitcast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(13)* addrspace(10)*, !dbg !5805
  %8 = addrspacecast {} addrspace(10)* addrspace(13)* addrspace(10)* %7 to {} addrspace(10)* addrspace(13)* addrspace(11)*, !dbg !5805
  %9 = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %8, align 8, !dbg !5805, !tbaa !198, !alias.scope !5806, !noalias !167, !nonnull !93
  %10 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %9, align 8, !dbg !5805, !tbaa !648, !alias.scope !152, !noalias !153
  %.not14 = icmp eq {} addrspace(10)* %10, null, !dbg !5805
  br i1 %.not14, label %fail, label %L17, !dbg !5805

L17:                                              ; preds = %L9
  %current_task515 = getelementptr inbounds {}**, {}*** %2, i64 -13, !dbg !5809
  %current_task5 = bitcast {}*** %current_task515 to {}**, !dbg !5809
  %11 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #86, !dbg !5809
  %12 = bitcast {} addrspace(10)* %11 to i8 addrspace(10)*, !dbg !5809
  %13 = bitcast { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* %0 to i8 addrspace(11)*, !dbg !5809
  call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %12, i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %13, i64 144, i1 false) #85, !dbg !5809, !tbaa !327, !alias.scope !651, !noalias !5810
  %14 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* nonnull @ijl_apply_generic, {} addrspace(10)* nonnull %11, {} addrspace(10)* nonnull %10) #87, !dbg !5809
  %15 = bitcast {} addrspace(10)* %14 to i8 addrspace(10)*, !dbg !5811
  %16 = load i8, i8 addrspace(10)* %15, align 1, !dbg !5811, !tbaa !399, !range !654, !alias.scope !152, !noalias !153
  %.not1622 = icmp eq i8 %16, 0, !dbg !5811
  br i1 %.not1622, label %common.ret, label %L23.preheader, !dbg !5811

L23.preheader:                                    ; preds = %L17
  br label %L23, !dbg !5812

L19:                                              ; preds = %L32
  %17 = add nuw nsw i64 %23, 1, !dbg !5814
  %18 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #86, !dbg !5809
  %19 = bitcast {} addrspace(10)* %18 to i8 addrspace(10)*, !dbg !5809
  call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %19, i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %13, i64 144, i1 false) #85, !dbg !5809, !tbaa !327, !alias.scope !651, !noalias !5810
  %20 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* nonnull @ijl_apply_generic, {} addrspace(10)* nonnull %18, {} addrspace(10)* nonnull %28) #87, !dbg !5809
  %21 = bitcast {} addrspace(10)* %20 to i8 addrspace(10)*, !dbg !5811
  %22 = load i8, i8 addrspace(10)* %21, align 1, !dbg !5811, !tbaa !399, !range !654, !alias.scope !152, !noalias !153
  %.not16 = icmp eq i8 %22, 0, !dbg !5811
  br i1 %.not16, label %common.ret.loopexit, label %L23, !dbg !5811

L23:                                              ; preds = %L23.preheader, %L19
  %iv = phi i64 [ 0, %L23.preheader ], [ %iv.next, %L19 ]
  %23 = add i64 %iv, 2, !dbg !5815
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !5815
  %24 = add nsw i64 %23, -1, !dbg !5815
  %25 = load i64, i64 addrspace(11)* %5, align 8, !dbg !5817, !tbaa !162, !range !165, !alias.scope !166, !noalias !167
  %.not17 = icmp ult i64 %24, %25, !dbg !5818
  br i1 %.not17, label %L32, label %common.ret.loopexit, !dbg !5812

L32:                                              ; preds = %L23
  %26 = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %8, align 8, !dbg !5820, !tbaa !198, !alias.scope !5806, !noalias !167, !nonnull !93
  %27 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %26, i64 %24, !dbg !5820
  %28 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %27, align 8, !dbg !5820, !tbaa !648, !alias.scope !152, !noalias !153
  %.not18 = icmp eq {} addrspace(10)* %28, null, !dbg !5820
  br i1 %.not18, label %fail6, label %L19, !dbg !5820

common.ret.loopexit:                              ; preds = %L19, %L23
  %common.ret.op.ph = phi i8 [ 0, %L19 ], [ 1, %L23 ]
  br label %common.ret, !dbg !5797

common.ret:                                       ; preds = %common.ret.loopexit, %L17, %top
  %common.ret.op = phi i8 [ 1, %top ], [ 0, %L17 ], [ %common.ret.op.ph, %common.ret.loopexit ]
  ret i8 %common.ret.op, !dbg !5797

fail:                                             ; preds = %L9
  call void @ijl_throw({} addrspace(12)* addrspacecast ({}* inttoptr (i64 140657625688192 to {}*) to {} addrspace(12)*)) #85, !dbg !5805
  unreachable, !dbg !5805

fail6:                                            ; preds = %L32
  call void @ijl_throw({} addrspace(12)* addrspacecast ({}* inttoptr (i64 140657625688192 to {}*) to {} addrspace(12)*)) #85, !dbg !5820
  unreachable, !dbg !5820
}

@wsmoses
Copy link
Member

wsmoses commented Jul 6, 2023

Problem appears to be that this is unset:

L19:                                              ; preds = %L32
  %65 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850

@wsmoses
Copy link
Member

wsmoses commented Jul 7, 2023

@sethaxen @devmotion @yebai found and pushed a fix for the subsequent segfault you found, retry?

@devmotion
Copy link
Contributor

I just ran @yebai's example above with warnings disabled but unexpectedly ended up with a UndefVarError. The stacktrace points to Enzyme.Compiler (compiler.jl, line 5210):

julia> Enzyme.API.runtimeActivity!(true)

julia> Enzyme.API.typeWarning!(false)

julia> sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10) # this works!
warning: didn't implement memmove, using memcpy as fallback which can result in errors
warning: didn't implement memmove, using memcpy as fallback which can result in errors
Sampling 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:05
ERROR: UndefVarError: `b` not defined
Stacktrace:
  [1] jl_array_ptr_copy_fwd(B::LLVM.IRBuilder, orig::LLVM.CallInst, gutils::Enzyme.Compiler.GradientUtils, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:5210
  [2] jl_array_ptr_copy_augfwd
    @ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:5230 [inlined]
  [3] (::Enzyme.Compiler.var"#304#305")(B::Ptr{LLVM.API.LLVMOpaqueBuilder}, OrigCI::Ptr{LLVM.API.LLVMOpaqueValue}, gutils::Ptr{Nothing}, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, tapeR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:6426
  [4] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{Nothing}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
    @ Enzyme.API ~/.julia/packages/Enzyme/ph9NM/src/api.jl:128
  [5] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{Bool, Bool, Bool}, returnPrimal::Bool, jlrules::Vector{String}, expectedTapeType::Type)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:7418
  [6] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, ctx::LLVM.ThreadSafeContext, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:8922
  [7] codegen
    @ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:8530 [inlined]
  [8] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, ctx::Nothing, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9456
  [9] _thunk
    @ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9453 [inlined]
 [10] cached_compilation
    @ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9491 [inlined]
 [11] #s291#430
    @ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9553 [inlined]
 [12] var"#s291#430"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ABI::Any, ::Any, #unused#::Type, #unused#::Type, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
    @ Enzyme.Compiler ./none:0
 [13] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [14] autodiff
    @ ~/.julia/packages/Enzyme/ph9NM/src/Enzyme.jl:207 [inlined]
 [15] autodiff
    @ ~/.julia/packages/Enzyme/ph9NM/src/Enzyme.jl:222 [inlined]
 [16] logdensity_and_gradient(∇ℓ::LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}, x::Vector{Float64})
    @ LogDensityProblemsADEnzymeExt ~/.julia/packages/LogDensityProblemsAD/JoNjv/ext/LogDensityProblemsADEnzymeExt.jl:73
 [17] ∂logπ∂θ
    @ ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:172 [inlined]
 [18] ∂H∂θ(h::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}, Turing.Inference.var"#∂logπ∂θ#44"{LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}}, θ::Vector{Float64})
    @ AdvancedHMC ~/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:38
 [19] phasepoint(h::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}, Turing.Inference.var"#∂logπ∂θ#44"{LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}}, θ::Vector{Float64}, r::Vector{Float64})
    @ AdvancedHMC ~/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:80
 [20] phasepoint
    @ ~/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:159 [inlined]
 [21] initialstep(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, spl::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, vi::DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}; init_params::Nothing, nadapts::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Turing.Inference ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:176
 [22] step(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, spl::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}; resume_from::Nothing, init_params::Nothing, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:nadapts,), Tuple{Int64}}})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/oJMmE/src/sampler.jl:111
 [23] step
    @ ~/.julia/packages/DynamicPPL/oJMmE/src/sampler.jl:84 [inlined]
 [24] macro expansion
    @ ~/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:125 [inlined]
 [25] macro expansion
    @ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined]
 [26] macro expansion
    @ ~/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:9 [inlined]
 [27] mcmcsample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, sampler::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, N::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:nadapts,), Tuple{Int64}}})
    @ AbstractMCMC ~/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:116
 [28] sample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, sampler::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, N::Int64; chain_type::Type, resume_from::Nothing, progress::Bool, nadapts::Int64, discard_adapt::Bool, discard_initial::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Turing.Inference ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:133
 [29] sample
    @ ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:103 [inlined]
 [30] #sample#2
    @ ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:146 [inlined]
 [31] sample
    @ ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:139 [inlined]
 [32] #sample#1
    @ ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:136 [inlined]
 [33] sample(model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, alg::NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}, N::Int64)
    @ Turing.Inference ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:130
 [34] top-level scope
    @ REPL[12]:1

@devmotion
Copy link
Contributor

I just checked, and now that #934 is merged I can run the example successfully again 🙂 I was able to repeat the sampling with 10000 samples ~ 20 times (then I became uninterested) and also sampling 1_000_000 samples in one call worked without issues. Should say though that I use Linux and don't have access to a Mac, so I don't know if the issues observed above are fixed.

@yebai
Copy link

yebai commented Jul 7, 2023

Great work -- I can confirm it runs successfully on my Mac too!

@wsmoses
Copy link
Member

wsmoses commented Jul 7, 2023

I've now bumped the Enzyme patch version. Retry on full turing once that hits the general registry and let me know how it goes?

Regardless, closing this issue as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants