-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
performance regression on high-dimensional array iteration using CartesianIndices (no simd) #38073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is the benchmark result on array of 4096 elements at various dimensions. OffsetArrays 1.3.1 is used as a comparison, which uses dataframe
test scriptusing Test
using BenchmarkTools
using Plots
using Random
using OffsetArrays # 1.3.1
function arr_sum(X)
val = zero(eltype(X))
R = CartesianIndices(X)
for i in R
@inbounds val += X[i]
end
val
end
sz_list = [
(4096, ),
(2048, 2),
(1024, 2, 2),
( 512, 2, 2, 2),
( 256, 2, 2, 2, 2),
( 128, 2, 2, 2, 2, 2),
( 64, 2, 2, 2, 2, 2, 2),
( 32, 2, 2, 2, 2, 2, 2, 2),
( 16, 2, 2, 2, 2, 2, 2, 2, 2),
( 8, 2, 2, 2, 2, 2, 2, 2, 2, 2),
( 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
( 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
]
X = axes(sz_list, 1)
### run in Julia 1.5.2
sz = sz_list[2]
A = rand(sz...);
AO = OffsetArray(A, OffsetArrays.Origin(1));
@assert arr_sum(A) == arr_sum(AO)
T15O = []
for sz in sz_list
t = @belapsed arr_sum(A) setup=(A=OffsetArray(rand($sz...), OffsetArrays.Origin(1)))
@show sz t
push!(T15O, t)
end
T15 = []
for sz in sz_list
t = @belapsed arr_sum(A) setup=(A=rand($sz...))
@show sz t
push!(T15, t)
end
### run in Julia 1.6.0-DEV
sz = sz_list[2]
Random.seed!(0)
A = rand(sz...);
AO = OffsetArray(A, OffsetArrays.Origin(1));
@assert arr_sum(A) == arr_sum(AO)
T16O = []
for sz in sz_list
t = @belapsed arr_sum(A) setup=(A=OffsetArray(rand($sz...), OffsetArrays.Origin(1)))
@show sz t
push!(T16O, t)
end
T16 = []
for sz in sz_list
t = @belapsed arr_sum(A) setup=(A=rand($sz...))
@show sz t
push!(T16, t)
end
scatter(X, [T15, T16], labels="Array", markershape=:circle)
scatter!(X, [T15O, T16O], labels="OffsetArray", markershape=:rect)
plot!(X, [T15, T15O], labels="1.5.2", linestyle=:dash)
plot!(X, [T16, T16O], labels="1.6.0-DEV.1262", legend=:topleft) |
Interesting. So basically everything in 1.6 is no worse than 1.5, with the minor exception of Array in 6 and 7 dimensions? |
I read #38086 (comment) and wondered if there was something going on with the bounds check (or inbounds propagation). function arr_sum_both(X)
val = zero(eltype(X))
R = CartesianIndices(X)
@inbounds for i in R
@inbounds val += X[i]
end
val
end
function arr_sum_outeronly(X)
val = zero(eltype(X))
R = CartesianIndices(X)
@inbounds for i in R
val += X[i]
end
val
end julia> VERSION
v"1.6.0-DEV.1322"
julia> @btime arr_sum($X);
5.033 μs (0 allocations: 0 bytes)
julia> @btime arr_sum_both($X);
5.033 μs (0 allocations: 0 bytes)
julia> @btime arr_sum_outeronly($X);
5.267 μs (0 allocations: 0 bytes) This may be a separate issue, but it is weird that |
This is what I get now with two repeated benchmarks: julia> VERSION
v"1.7.0-DEV.36"
julia> X = rand(4, 4, 4, 4, 4, 4);
julia> @btime arr_sum($X);
5.540 μs (0 allocations: 0 bytes)
5.538 μs (0 allocations: 0 bytes)
julia> @btime arr_sum_both($X);
5.221 μs (0 allocations: 0 bytes)
5.582 μs (0 allocations: 0 bytes)
julia> @btime arr_sum_outeronly($X);
5.110 μs (0 allocations: 0 bytes)
5.223 μs (0 allocations: 0 bytes) This difference might just be noises. |
I also checked again and found no difference in the native code depending on the position of On the other hand, in the case of #38086 (comment), there is a clear difference. |
Seems like this caused a similar problem in https://discourse.julialang.org/t/drop-of-performances-with-julia-1-6-0-for-interpolationkernels/58085 as was fixed in #39333. https://discourse.julialang.org/t/drop-of-performances-with-julia-1-6-0-for-interpolationkernels/58085/12 has an MWE. Adding |
IIRC, this might have something to do with #39700 (comment) However, I have no knowledge about the countermeasures on the compiler side. |
It turns out that #37829 has increased iteration performance for 2d array, while slowed down the iteration for higher-dimensional(>=4) array...
SIMD and LinearIndices are not affected.
simd
LinearIndices
The text was updated successfully, but these errors were encountered: