Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance #17

Open
AshtonSBradley opened this issue May 6, 2020 · 1 comment
Open

performance #17

AshtonSBradley opened this issue May 6, 2020 · 1 comment

Comments

@AshtonSBradley
Copy link

I wanted to implement a simple circular vector, and ended up finding major slowdown for accessing. Is this expected behavior:

using FFTViews
x = LinRange(-5,5,1000)

@btime x[200:800]
29.597 ns (1 allocation: 48 bytes)

xc = FFTView(x)
@btime xc[-800:-200]
6.299 μs (1 allocation: 4.88 KiB)

is it just due to allocations? It doesn't seem to be the case for a regular @view which is non-allocating

@btime @view x[200:800]
28.780 ns (1 allocation: 80 bytes)

So should @FFTView be regarded more as convenience than performant, or is there a trick to using it safely?

@timholy
Copy link
Member

timholy commented May 7, 2020

The difference in allocations is mostly a red herring; if you first call collect on x you'll see the same size allocation for both. The issue is that x[200:800], when x::AbstractRange, can return a range. x[200:800] does almost no work when x is a range (type @edit x[200:800] with x = LinRange(-5,5,1000) to see what I mean).

Currently there is no specialized implementation for FFTView(::AbstractRange) that returns a range object; moreover, because of the periodic boundary conditions you might need to return a concatenation of two ranges, so that would require a novel type.

Try calling x = collect(x) and repeat your test and you'll see (1) that the allocations are identical, and (2) the performance gap has narrowed tremendously.

The remaining difference in performance is because getindex calls reindex, and modrange is insansely expensive because integer division is one of the slowest operations on a CPU. But you need it for the periodic boundary conditions.

If you want to fix this, here are three possible strategies (and they can be applied in combination):

  • contribute a specialized getindex method for FFTViews of ranges. You'll need to create a DoubleRange type or something. This will get hairy beyond one dimension, though.
  • contribute a specialized getindex method for range indices. Currently, every index in 200:800 gets modded. However, for a range you only need to apply the boundary conditions once, breaking it up into at most two ranges. This goes beyond the typical AbstractArray interface but there is nothing wrong about supplying higher-order methods as long as you are prepared to deal with the consequences. This will be far easier than the first option to generalize to multiple dimensions.
  • modify the FFTView struct so that it stores a fast multiplicative inverse so as to avoid integer division. We already do this for our ReshapedArrays, no reason not to do it here.

I'm hesitant to even guess which of these would be easier to implement or more effective for your purposes. The first will be more effective (by far) for the problem you pose, but probably limited to one-dimensional objects. The second will also be really effective, readily extensible to higher dimensions, but applicable to range-indexing only. Applying mod to ranges is easy; the main challenge there will be not breaking Julia's dispatch rules; you'd want to add ambiguity tests and a few more indexing tests to make sure you haven't broken anything for indexing with many different supported types. The third will help your problem a lot, but it still won't get it to the performance of typical arrays; however, it will generically speed up indexing operations with FFTViews for any supplied index types; this approach will run into fewer dispatch challenges because it's purely an internal change.

The best option, of course, is to do all of them, but that would depend on your level of commitment to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants