performance #17

AshtonSBradley · 2020-05-06T23:24:53Z

I wanted to implement a simple circular vector, and ended up finding major slowdown for accessing. Is this expected behavior:

using FFTViews
x = LinRange(-5,5,1000)

@btime x[200:800]
29.597 ns (1 allocation: 48 bytes)

xc = FFTView(x)
@btime xc[-800:-200]
6.299 μs (1 allocation: 4.88 KiB)

is it just due to allocations? It doesn't seem to be the case for a regular @view which is non-allocating

@btime @view x[200:800]
28.780 ns (1 allocation: 80 bytes)

So should @FFTView be regarded more as convenience than performant, or is there a trick to using it safely?

The text was updated successfully, but these errors were encountered:

timholy · 2020-05-07T09:59:13Z

The difference in allocations is mostly a red herring; if you first call collect on x you'll see the same size allocation for both. The issue is that x[200:800], when x::AbstractRange, can return a range. x[200:800] does almost no work when x is a range (type @edit x[200:800] with x = LinRange(-5,5,1000) to see what I mean).

Currently there is no specialized implementation for FFTView(::AbstractRange) that returns a range object; moreover, because of the periodic boundary conditions you might need to return a concatenation of two ranges, so that would require a novel type.

Try calling x = collect(x) and repeat your test and you'll see (1) that the allocations are identical, and (2) the performance gap has narrowed tremendously.

The remaining difference in performance is because getindex calls reindex, and modrange is insansely expensive because integer division is one of the slowest operations on a CPU. But you need it for the periodic boundary conditions.

If you want to fix this, here are three possible strategies (and they can be applied in combination):

contribute a specialized getindex method for FFTViews of ranges. You'll need to create a DoubleRange type or something. This will get hairy beyond one dimension, though.
contribute a specialized getindex method for range indices. Currently, every index in 200:800 gets modded. However, for a range you only need to apply the boundary conditions once, breaking it up into at most two ranges. This goes beyond the typical AbstractArray interface but there is nothing wrong about supplying higher-order methods as long as you are prepared to deal with the consequences. This will be far easier than the first option to generalize to multiple dimensions.
modify the FFTView struct so that it stores a fast multiplicative inverse so as to avoid integer division. We already do this for our ReshapedArrays, no reason not to do it here.

I'm hesitant to even guess which of these would be easier to implement or more effective for your purposes. The first will be more effective (by far) for the problem you pose, but probably limited to one-dimensional objects. The second will also be really effective, readily extensible to higher dimensions, but applicable to range-indexing only. Applying mod to ranges is easy; the main challenge there will be not breaking Julia's dispatch rules; you'd want to add ambiguity tests and a few more indexing tests to make sure you haven't broken anything for indexing with many different supported types. The third will help your problem a lot, but it still won't get it to the performance of typical arrays; however, it will generically speed up indexing operations with FFTViews for any supplied index types; this approach will run into fewer dispatch challenges because it's purely an internal change.

The best option, of course, is to do all of them, but that would depend on your level of commitment to this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance #17

performance #17

AshtonSBradley commented May 6, 2020

timholy commented May 7, 2020

performance #17

performance #17

Comments

AshtonSBradley commented May 6, 2020

timholy commented May 7, 2020