Closed
Description
I'm now working on supporting Julia 0.5 in the Bio.jl package. However, I found a significant performance regression in a very simple benchmark shown below:
benchmark.jl:
using Bio.Seq
using BenchmarkTools
function count_a(seq)
n = 0
for nt in seq
n += nt == DNA_A
end
return n
end
srand(1234)
seq = randdnaseq(1_000_000)
println(@benchmark count_a(seq))
println("--- baseline ---")
seq = collect(seq)
println(@benchmark count_a(seq))
Julia 0.4.6 (BioJulia/Bio.jl@f5481fe):
~/.j/v/Bio (julia-v0.5|…) $ julia benchmark.jl
BenchmarkTools.Trial:
samples: 2526
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 16.00 bytes
allocs estimate: 1
minimum time: 1.79 ms (0.00% GC)
median time: 1.81 ms (0.00% GC)
mean time: 1.98 ms (0.00% GC)
maximum time: 4.12 ms (0.00% GC)
--- baseline ---
BenchmarkTools.Trial:
samples: 5666
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 16.00 bytes
allocs estimate: 1
minimum time: 806.94 μs (0.00% GC)
median time: 809.80 μs (0.00% GC)
mean time: 880.51 μs (0.00% GC)
maximum time: 3.19 ms (0.00% GC)
Julia 0.5 RC2 (BioJulia/Bio.jl@b50b425):
~/.j/v/Bio (julia-v0.5|…) $ julia5 benchmark.jl
WARNING: Method definition require(Symbol) in module Base at loading.jl:317 overwritten in
module Main at /Users/kenta/.julia/v0.5/Requires/src/require.jl:12.
BenchmarkTools.Trial:
samples: 367
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 16.00 bytes
allocs estimate: 1
minimum time: 12.13 ms (0.00% GC)
median time: 13.39 ms (0.00% GC)
mean time: 13.62 ms (0.00% GC)
maximum time: 19.61 ms (0.00% GC)
--- baseline ---
BenchmarkTools.Trial:
samples: 5004
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 16.00 bytes
allocs estimate: 1
minimum time: 860.65 μs (0.00% GC)
median time: 953.10 μs (0.00% GC)
mean time: 997.09 μs (0.00% GC)
maximum time: 2.83 ms (0.00% GC)
This benchmark calls Bio.Seq.inbound_getindex
for each element in the loop. So, I guess the problem is in the optimization of this function. The generated machine code on Julia 0.5 looks much more complicated than that of Julia 0.4.
Julia 0.4.6:
julia> using Bio.Seq
julia> @code_native Bio.Seq.inbounds_getindex(randdnaseq(4), 1)
.section __TEXT,__text,regular,pure_instructions
Filename: /Users/kenta/.julia/v0.4/Bio/src/seq/bioseq.jl
Source line: 307
pushq %rbp
movq %rsp, %rbp
Source line: 307
addq 8(%rdi), %rsi
leaq -8(,%rsi,4), %rcx
Source line: 308
movq %rcx, %rax
sarq $6, %rax
movq (%rdi), %rdx
movq (%rdx), %rdx
movq (%rdx,%rax,8), %rax
andb $60, %cl
shrq %cl, %rax
andb $15, %al
popq %rbp
ret
Julia 0.5 RC2:
julia> using Bio.Seq
WARNING: Method definition require(Symbol) in module Base at loading.jl:317 overwritten in module Main at /Users/kenta/.julia/v0.5/Requires/src/require.jl:12.
julia> @code_native Bio.Seq.inbounds_getindex(randdnaseq(4), 1)
.section __TEXT,__text,regular,pure_instructions
Filename: bioseq.jl
pushq %rbp
movq %rsp, %rbp
pushq %r15
pushq %r14
pushq %r12
pushq %rbx
subq $32, %rsp
movq %rsi, %rbx
movq %rdi, %r12
movabsq $4368469648, %r14 ## imm = 0x104618E90
movabsq $jl_get_ptls_states_fast, %rax
callq *%rax
movq %rax, %r15
movq $0, -40(%rbp)
movq $2, -56(%rbp)
movq (%r15), %rax
movq %rax, -48(%rbp)
leaq -56(%rbp), %rax
movq %rax, (%r15)
Source line: 301
addq 8(%r12), %rbx
Source line: 186
leaq -8(,%rbx,4), %rax
Source line: 305
movq %rax, -64(%rbp)
Source line: 306
movabsq $index, %rax
leaq -64(%rbp), %rdi
callq *%rax
movq (%r12), %rdx
I may still need to narrow down the problem and create a smaller reproducible case. If you need it, I will try it later.