-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
~7x performance regression of BioSequence on Julia 0.5 (RC2) #18135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here is more precise environment information. Julia 0.4.6:
Julia 0.5 RC2:
|
Seems to be caused by the inlining of indexing function. Note that the assembly code looks incomplete and it's usually more useful to post llvm ir instead |
Thank you. This is LLVM code: Julia 0.4.6:
Julia 0.5 RC2:
|
I believe this is due to the introduction of branches into the bit shift operators in 5a9717b. Well, now you know why languages like to have undefined behaviors :) I'll try putting inline declarations on them, and/or switching the branch to |
Using
|
If you can use |
replace branch in bit shift operators, helps #18135
RC3 branch (tk/backports-0.5.0-rc3) including e02692f improved the performance significantly, but still I see the performance regression about 4x slow down:
|
@JeffBezanson Are we going to be able to do something here for the 0.5 release? |
I don't think it's (any longer) the bitshift operators in base: Julia 0.4: julia> using BenchmarkTools
julia> function foo(r, n)
s = zero(eltype(r))
for i in r
s += i<<n
end
s
end
foo (generic function with 1 method)
julia> @benchmark foo(1:10^6, UInt(3))
BenchmarkTools.Trial:
samples: 4606
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 1.00 ms (0.00% GC)
median time: 1.04 ms (0.00% GC)
mean time: 1.08 ms (0.00% GC)
maximum time: 1.95 ms (0.00% GC)
julia> @benchmark foo(1:10^6, 3)
BenchmarkTools.Trial:
samples: 4789
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 1.00 ms (0.00% GC)
median time: 1.04 ms (0.00% GC)
mean time: 1.04 ms (0.00% GC)
maximum time: 1.63 ms (0.00% GC)
julia> @benchmark foo(UInt(1):UInt(10^6), UInt(3))
BenchmarkTools.Trial:
samples: 4885
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 1.00 ms (0.00% GC)
median time: 1.02 ms (0.00% GC)
mean time: 1.02 ms (0.00% GC)
maximum time: 2.09 ms (0.00% GC) Master: julia> @benchmark foo(1:10^6, UInt(3))
BenchmarkTools.Trial:
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 163.74 μs (0.00% GC)
median time: 169.65 μs (0.00% GC)
mean time: 172.10 μs (0.00% GC)
maximum time: 245.19 μs (0.00% GC)
julia> @benchmark foo(1:10^6, 3)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 125.74 μs (0.00% GC)
median time: 130.16 μs (0.00% GC)
mean time: 132.26 μs (0.00% GC)
maximum time: 164.10 μs (0.00% GC)
julia> @benchmark foo(UInt(1):UInt(10^6), UInt(3))
BenchmarkTools.Trial:
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 163.74 μs (0.00% GC)
median time: 175.87 μs (0.00% GC)
mean time: 174.41 μs (0.00% GC)
maximum time: 460.21 μs (0.00% GC) I suspect this is a package-level detail; perhaps one needs to dispatch to a different version of the operator? |
I'll check it again on RC3 or 4. |
Just tried this quickly on 0.6 and the numbers seem pretty close. Is this still an issue? |
No. Some benchmarks support there is no performance degradation. |
Uh oh!
There was an error while loading. Please reload this page.
I'm now working on supporting Julia 0.5 in the Bio.jl package. However, I found a significant performance regression in a very simple benchmark shown below:
benchmark.jl:
Julia 0.4.6 (BioJulia/Bio.jl@f5481fe):
Julia 0.5 RC2 (BioJulia/Bio.jl@b50b425):
This benchmark calls
Bio.Seq.inbound_getindex
for each element in the loop. So, I guess the problem is in the optimization of this function. The generated machine code on Julia 0.5 looks much more complicated than that of Julia 0.4.Julia 0.4.6:
Julia 0.5 RC2:
I may still need to narrow down the problem and create a smaller reproducible case. If you need it, I will try it later.
The text was updated successfully, but these errors were encountered: