Skip to content

~7x performance regression of BioSequence on Julia 0.5 (RC2) #18135

Closed
@bicycle1885

Description

@bicycle1885

I'm now working on supporting Julia 0.5 in the Bio.jl package. However, I found a significant performance regression in a very simple benchmark shown below:

benchmark.jl:

using Bio.Seq
using BenchmarkTools

function count_a(seq)
    n = 0
    for nt in seq
        n += nt == DNA_A
    end
    return n
end

srand(1234)
seq = randdnaseq(1_000_000)
println(@benchmark count_a(seq))

println("--- baseline ---")
seq = collect(seq)
println(@benchmark count_a(seq))

Julia 0.4.6 (BioJulia/Bio.jl@f5481fe):

~/.j/v/Bio (julia-v0.5|…) $ julia benchmark.jl
BenchmarkTools.Trial:
  samples:          2526
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     1.79 ms (0.00% GC)
  median time:      1.81 ms (0.00% GC)
  mean time:        1.98 ms (0.00% GC)
  maximum time:     4.12 ms (0.00% GC)
--- baseline ---
BenchmarkTools.Trial:
  samples:          5666
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     806.94 μs (0.00% GC)
  median time:      809.80 μs (0.00% GC)
  mean time:        880.51 μs (0.00% GC)
  maximum time:     3.19 ms (0.00% GC)

Julia 0.5 RC2 (BioJulia/Bio.jl@b50b425):

~/.j/v/Bio (julia-v0.5|…) $ julia5 benchmark.jl
WARNING: Method definition require(Symbol) in module Base at loading.jl:317 overwritten in
 module Main at /Users/kenta/.julia/v0.5/Requires/src/require.jl:12.
BenchmarkTools.Trial:
  samples:          367
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     12.13 ms (0.00% GC)
  median time:      13.39 ms (0.00% GC)
  mean time:        13.62 ms (0.00% GC)
  maximum time:     19.61 ms (0.00% GC)
--- baseline ---
BenchmarkTools.Trial:
  samples:          5004
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     860.65 μs (0.00% GC)
  median time:      953.10 μs (0.00% GC)
  mean time:        997.09 μs (0.00% GC)
  maximum time:     2.83 ms (0.00% GC)

This benchmark calls Bio.Seq.inbound_getindex for each element in the loop. So, I guess the problem is in the optimization of this function. The generated machine code on Julia 0.5 looks much more complicated than that of Julia 0.4.

Julia 0.4.6:

julia> using Bio.Seq

julia> @code_native Bio.Seq.inbounds_getindex(randdnaseq(4), 1)
        .section        __TEXT,__text,regular,pure_instructions
Filename: /Users/kenta/.julia/v0.4/Bio/src/seq/bioseq.jl
Source line: 307
        pushq   %rbp
        movq    %rsp, %rbp
Source line: 307
        addq    8(%rdi), %rsi
        leaq    -8(,%rsi,4), %rcx
Source line: 308
        movq    %rcx, %rax
        sarq    $6, %rax
        movq    (%rdi), %rdx
        movq    (%rdx), %rdx
        movq    (%rdx,%rax,8), %rax
        andb    $60, %cl
        shrq    %cl, %rax
        andb    $15, %al
        popq    %rbp
        ret

Julia 0.5 RC2:

julia> using Bio.Seq
WARNING: Method definition require(Symbol) in module Base at loading.jl:317 overwritten in module Main at /Users/kenta/.julia/v0.5/Requires/src/require.jl:12.

julia> @code_native Bio.Seq.inbounds_getindex(randdnaseq(4), 1)
        .section        __TEXT,__text,regular,pure_instructions
Filename: bioseq.jl
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r12
        pushq   %rbx
        subq    $32, %rsp
        movq    %rsi, %rbx
        movq    %rdi, %r12
        movabsq $4368469648, %r14       ## imm = 0x104618E90
        movabsq $jl_get_ptls_states_fast, %rax
        callq   *%rax
        movq    %rax, %r15
        movq    $0, -40(%rbp)
        movq    $2, -56(%rbp)
        movq    (%r15), %rax
        movq    %rax, -48(%rbp)
        leaq    -56(%rbp), %rax
        movq    %rax, (%r15)
Source line: 301
        addq    8(%r12), %rbx
Source line: 186
        leaq    -8(,%rbx,4), %rax
Source line: 305
        movq    %rax, -64(%rbp)
Source line: 306
        movabsq $index, %rax
        leaq    -64(%rbp), %rdi
        callq   *%rax
        movq    (%r12), %rdx

I may still need to narrow down the problem and create a smaller reproducible case. If you need it, I will try it later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedIndicates that a maintainer wants help on an issue or pull requestperformanceMust go fasterregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions