~7x performance regression of BioSequence on Julia 0.5 (RC2)

I'm now working on supporting Julia 0.5 in the [Bio.jl](https://github.com/BioJulia/Bio.jl) package. However, I found a significant [performance regression](https://github.com/BioJulia/Bio.jl/pull/220#issuecomment-240925229) in a very simple benchmark shown below:

benchmark.jl:

``` julia
using Bio.Seq
using BenchmarkTools

function count_a(seq)
    n = 0
    for nt in seq
        n += nt == DNA_A
    end
    return n
end

srand(1234)
seq = randdnaseq(1_000_000)
println(@benchmark count_a(seq))

println("--- baseline ---")
seq = collect(seq)
println(@benchmark count_a(seq))
```

Julia 0.4.6 (https://github.com/BioJulia/Bio.jl/commit/f5481feecb9dcffdfa79db2168e0dfe748230e83):

```
~/.j/v/Bio (julia-v0.5|…) $ julia benchmark.jl
BenchmarkTools.Trial:
  samples:          2526
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     1.79 ms (0.00% GC)
  median time:      1.81 ms (0.00% GC)
  mean time:        1.98 ms (0.00% GC)
  maximum time:     4.12 ms (0.00% GC)
--- baseline ---
BenchmarkTools.Trial:
  samples:          5666
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     806.94 μs (0.00% GC)
  median time:      809.80 μs (0.00% GC)
  mean time:        880.51 μs (0.00% GC)
  maximum time:     3.19 ms (0.00% GC)
```

Julia 0.5 RC2 (https://github.com/BioJulia/Bio.jl/pull/220/commits/b50b425a9ba7986a0d4739cf8b2e2789d333505b):

```
~/.j/v/Bio (julia-v0.5|…) $ julia5 benchmark.jl
WARNING: Method definition require(Symbol) in module Base at loading.jl:317 overwritten in
 module Main at /Users/kenta/.julia/v0.5/Requires/src/require.jl:12.
BenchmarkTools.Trial:
  samples:          367
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     12.13 ms (0.00% GC)
  median time:      13.39 ms (0.00% GC)
  mean time:        13.62 ms (0.00% GC)
  maximum time:     19.61 ms (0.00% GC)
--- baseline ---
BenchmarkTools.Trial:
  samples:          5004
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     860.65 μs (0.00% GC)
  median time:      953.10 μs (0.00% GC)
  mean time:        997.09 μs (0.00% GC)
  maximum time:     2.83 ms (0.00% GC)
```

This benchmark calls [`Bio.Seq.inbound_getindex`](https://github.com/BioJulia/Bio.jl/blob/f5481feecb9dcffdfa79db2168e0dfe748230e83/src/seq/bioseq.jl#L306-L309) for each element in the loop. So, I guess the problem is in the optimization of this function. The generated machine code on Julia 0.5 looks much more complicated than that of Julia 0.4.

Julia 0.4.6:

```
julia> using Bio.Seq

julia> @code_native Bio.Seq.inbounds_getindex(randdnaseq(4), 1)
        .section        __TEXT,__text,regular,pure_instructions
Filename: /Users/kenta/.julia/v0.4/Bio/src/seq/bioseq.jl
Source line: 307
        pushq   %rbp
        movq    %rsp, %rbp
Source line: 307
        addq    8(%rdi), %rsi
        leaq    -8(,%rsi,4), %rcx
Source line: 308
        movq    %rcx, %rax
        sarq    $6, %rax
        movq    (%rdi), %rdx
        movq    (%rdx), %rdx
        movq    (%rdx,%rax,8), %rax
        andb    $60, %cl
        shrq    %cl, %rax
        andb    $15, %al
        popq    %rbp
        ret
```

Julia 0.5 RC2:

```
julia> using Bio.Seq
WARNING: Method definition require(Symbol) in module Base at loading.jl:317 overwritten in module Main at /Users/kenta/.julia/v0.5/Requires/src/require.jl:12.

julia> @code_native Bio.Seq.inbounds_getindex(randdnaseq(4), 1)
        .section        __TEXT,__text,regular,pure_instructions
Filename: bioseq.jl
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r12
        pushq   %rbx
        subq    $32, %rsp
        movq    %rsi, %rbx
        movq    %rdi, %r12
        movabsq $4368469648, %r14       ## imm = 0x104618E90
        movabsq $jl_get_ptls_states_fast, %rax
        callq   *%rax
        movq    %rax, %r15
        movq    $0, -40(%rbp)
        movq    $2, -56(%rbp)
        movq    (%r15), %rax
        movq    %rax, -48(%rbp)
        leaq    -56(%rbp), %rax
        movq    %rax, (%r15)
Source line: 301
        addq    8(%r12), %rbx
Source line: 186
        leaq    -8(,%rbx,4), %rax
Source line: 305
        movq    %rax, -64(%rbp)
Source line: 306
        movabsq $index, %rax
        leaq    -64(%rbp), %rdi
        callq   *%rax
        movq    (%r12), %rdx
```

I may still need to narrow down the problem and create a smaller reproducible case. If you need it, I will try it later.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

~7x performance regression of BioSequence on Julia 0.5 (RC2) #18135

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

~7x performance regression of BioSequence on Julia 0.5 (RC2) #18135

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions