Track possible multiple correlation support in nifty/ducc gridders #169

sjperkins · 2020-01-10T11:17:33Z

Codex Africanus version: 0.2.0
Python version: 3.6
Operating System: Ubuntu

Description

Nifty Gridder only supports gridding a single correlation
MS data is (row, chan, corr)
We index on corr to pass a single correlation to nifty which produces a non-contiguous array.

Solution

Make an array copy
Investigate whether nifty might be able to now support gridding multiple correlations.

mreineck · 2020-07-23T16:13:39Z

While the ducc gridder still does not support multiple correlations, it's on my todo list.

I assume that the most typical situation will be four correlations, i.e. four complex numbers for every entry in the MS instead of just one at the moment, correct? Will the weight array stay the same, or will it also increase by four?

sjperkins · 2020-07-23T16:41:46Z

Hi @mreineck. Thanks for keeping this on the horizon.

I would say one, two and four correlations are the typical cases. The WEIGHT and WEIGHT_SPECTRUM arrays always have a correlation dimension which is the same as that in the DATA, FLAG, CORRECTED_DATA etc. arrays.

See the CASA MSv2.0 spec for confirmation: https://casa.nrao.edu/Memos/229.html#SECTION00061000000000000000

From admittedly hazy memory, I think I just decided to pass non-contiguous arrays into the nifty gridder. I remember it sent many warning messages about non-contiguous arrays to the console, which I think you removed?

mreineck · 2020-07-23T16:59:46Z

OK, I'll see how that fits into the design. My hope is that I can use vector instructions when gridding/degridding multiple values that are at identical (u,v,w) locations. If that works, then a single pass doing four correlations might be noticeably faster than four separate passes doing the one correlation at a time.

Passing non-contiguous arrays is probably not such a big deal. There will be slightly more cache misses, but the effect on overall performance is probably small.

sjperkins · 2020-07-24T08:40:15Z

I would say one, two and four correlations are the typical cases.

@landmanbester, I can't imagine another RA use case, but pinging you in case there's something exotic that I may have missed

sjperkins · 2020-07-24T08:48:09Z

Passing non-contiguous arrays is probably not such a big deal. There will be slightly more cache misses, but the effect on overall performance is probably small.

This is useful to know. I'm currently re-implementing a Discrete Wavelet Transform in numba: ratt-ru/pfb-imaging#10 and the thought of applying the 1D DWT on non-contiguous data made me pause: The original implementation copies rows which are non-contiguous before applying the filter. Do you have any thoughts on this that you'd be willing to share on the matter in ratt-ru/pfb-imaging#10? I'm not asking for a code review -- I find this level of discussion useful.

mreineck · 2020-07-24T09:30:46Z

Whether unaligned accesses are a problem depends primarily on how many operations you do on an array element before going to the next. In the gridder, I read a value from the ms array and then grid it onto support**2 different locations (which are almost always in Level1 cache). So the penalty for the potential cache miss when accessing ms is rather small compared to the time required to actually grid it.

If this was a 1D gridder, things would already look worse, as cost now only scales linearly with support. And if you use the value only for a single arithmetic operation (as is the case in FFTs, for example), cache misses become really painful. I don't know the details of Discrete Wavelet Transforms, but I'd naively expect that they are more on the FFT side of things...

landmanbester · 2020-07-24T10:24:21Z

I would say one, two and four correlations are the typical cases.

@landmanbester, I can't imagine another RA use case, but pinging you in case there's something exotic that I may have missed

I think that about covers it. The weights are per correlation so, strictly speaking, you have to grid the correlations before converting to Stokes components. I don't know if it makes much of a difference but I think the diagonal components of the brightness matrix are always real-valued

bennahugo · 2020-07-24T13:15:08Z

Unaligned accesses can create a bit of heavoc when vectorizing code using special instruction sets. I had some speedup (not the full 8 fold when I did this vectorization by hand once but that was a long time ago). Compilers have likely become smarter in the mean time. I don't know if numba exploits special instruction sets or not though

…

On Fri, 24 Jul 2020, 12:24 Landman Bester, ***@***.***> wrote: I would say one, two and four correlations are the typical cases. @landmanbester <https://github.com/landmanbester>, I can't imagine another RA use case, but pinging you in case there's something exotic that I may have missed I think that about covers it. The weights are per correlation so, strictly speaking, you have to grid the correlations before converting to Stokes components. I don't know if it makes much of a difference but I think the diagonal components of the brightness matrix are always real-valued — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#169 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4RE6TBLLKRRWYERHCSZHLR5FOOPANCNFSM4KFGJBBQ> .

mreineck · 2020-07-24T13:19:38Z

Unaligned accesses can create a bit of heavoc when vectorizing code using
special instruction sets.

Yes. In my particular situation it wouldn't be a problem though, since during gridding I'd basically have a (potentially) unaligned load per visibility, and the remaining support**2 memory accesses would be aligned. I still don't expect anything near an 8-fold speedup, however...

bennahugo · 2020-07-24T13:26:39Z

Yea I don't expect it to make any difference in your case. Sorry I forgot you wrote a stacking gridder. With wprojection it makes more of a difference.

…

On Fri, 24 Jul 2020, 15:19 mreineck, ***@***.***> wrote: Unaligned accesses can create a bit of heavoc when vectorizing code using special instruction sets. Yes. In my particular situation it wouldn't be a problem though, since during gridding I'd basically have a (potentially) unaligned load per visibility, and the remaining support**2 memory accesses would be aligned. I still don't expect anything near an 8-fold speedup, however... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#169 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4RE6TXUAQYBQNG3T3VEKTR5GC7ZANCNFSM4KFGJBBQ> .

sjperkins · 2020-07-24T13:38:58Z

Compilers have likely become smarter in the mean time. I don't know if numba exploits special instruction sets or not though

numba defers to LLVM for SIMD: http://numba.pydata.org/numba-doc/latest/user/faq.html?highlight=simd#does-numba-vectorize-array-computations-simd

I'm seen numba generate vectorised assembly code. Something like the following should show you the ASM:

@numba.njit
def function(...):
   pass

function(...)
print(function.inspect_asm())

mreineck · 2020-08-04T12:19:41Z

After finding another way of vectorizing the inner loops of the gridding/degridding functions, I think I won't support simultaneous processing of multiple correlations after all ...
The issue is that if we process, say, four correlations together, then the amount of data which is touched in the innermost loops also becomes four times larger, and this can make the difference between very few and very many L1 cache misses.
Also it would mean reserving memory for 4 complex arrays of size nu*nv instead of just one, which is probably not the best thing to do for really large dirty images.
So unless there are important reasons for switching, I propose to keep the current implementation.

(It's of course true that with separate processing we have to do the same kernel evaluations repeatedly, but I don't think this is an important factor any more.)

bennahugo · 2020-08-04T12:39:06Z

You don't need to grid all the data unless you are forming multiple stokes parameters (which isn't the typical use case). Handling two correlations at once is however quite possible and avoids the large memory reservations you need: See for instance the pull request going in at the moment for my facet based numba gridding and degridding scheme: https://github.com/ska-sa/codex-africanus/blob/681ca8b46dad4693ca0484608c863f9f9ba77f08/africanus/gridding/perleypolyhedron/policies/stokes_conversion_policies.py The one exception is of course DFT based RM synthesis (e.g. Brentjens et al.) you will need to allocate nu*nv for 3 grids (IQU) Cheers

…

On Tue, Aug 4, 2020 at 2:19 PM mreineck ***@***.***> wrote: After finding another way of vectorizing the inner loops of the gridding/degridding functions, I think I won't support simultaneous processing of multiple correlations after all ... The issue is that if we process, say, four correlations together, then the amount of data which is touched in the innermost loops also becomes four times larger, and this can make the difference between very few and very many L1 cache misses. Also it would mean reserving memory for 4 complex arrays of size nu*nv instead of just one, which is probably not the best thing to do for really large dirty images. So unless there are important reasons for switching, I propose to keep the current implementation. (It's of course true that with separate processing we have to do the same kernel evaluations repeatedly, but I don't think this is an important factor any more.) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#169 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4RE6W2S756RVIYQBVMW33R674G3ANCNFSM4KFGJBBQ> .

-- -- Benjamin Hugo PhD. student, Centre for Radio Astronomy Techniques and Technologies Department of Physics and Electronics Rhodes University Junior software developer Radio Astronomy Research Group South African Radio Astronomy Observatory Black River Business Park Observatory Cape Town

mreineck · 2020-08-04T15:03:24Z

I see, thanks!
But to be honest, I think that all this trickery could also be done outside the gridder proper, and if possible I'd like to keep the code as unaware as possible of the details of the data it is processing. Currently the library does not know (and does not care) whether it is dealing with Stokes parameters, correlations or something completely different, which I think is a good thing.

sjperkins added the enhancement New feature or request label Jan 10, 2020

sjperkins changed the title ~~Handle contiguity issues in dask nifty gridder wrappers~~ Track possible multi-correlation support in nifty/ducc gridders Jul 24, 2020

sjperkins changed the title ~~Track possible multi-correlation support in nifty/ducc gridders~~ Track possible multiple correlation support in nifty/ducc gridders Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track possible multiple correlation support in nifty/ducc gridders #169

Track possible multiple correlation support in nifty/ducc gridders #169

sjperkins commented Jan 10, 2020

mreineck commented Jul 23, 2020

sjperkins commented Jul 23, 2020

mreineck commented Jul 23, 2020

sjperkins commented Jul 24, 2020

sjperkins commented Jul 24, 2020

mreineck commented Jul 24, 2020

landmanbester commented Jul 24, 2020

bennahugo commented Jul 24, 2020 via email

mreineck commented Jul 24, 2020

bennahugo commented Jul 24, 2020 via email

sjperkins commented Jul 24, 2020

mreineck commented Aug 4, 2020

bennahugo commented Aug 4, 2020 via email

mreineck commented Aug 4, 2020

Track possible multiple correlation support in nifty/ducc gridders #169

Track possible multiple correlation support in nifty/ducc gridders #169

Comments

sjperkins commented Jan 10, 2020

Description

Solution

mreineck commented Jul 23, 2020

sjperkins commented Jul 23, 2020

mreineck commented Jul 23, 2020

sjperkins commented Jul 24, 2020

sjperkins commented Jul 24, 2020

mreineck commented Jul 24, 2020

landmanbester commented Jul 24, 2020

bennahugo commented Jul 24, 2020 via email

mreineck commented Jul 24, 2020

bennahugo commented Jul 24, 2020 via email

sjperkins commented Jul 24, 2020

mreineck commented Aug 4, 2020

bennahugo commented Aug 4, 2020 via email

mreineck commented Aug 4, 2020