Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track possible multiple correlation support in nifty/ducc gridders #169

Open
sjperkins opened this issue Jan 10, 2020 · 14 comments
Open

Track possible multiple correlation support in nifty/ducc gridders #169

sjperkins opened this issue Jan 10, 2020 · 14 comments
Labels
enhancement New feature or request

Comments

@sjperkins
Copy link
Member

  • Codex Africanus version: 0.2.0
  • Python version: 3.6
  • Operating System: Ubuntu

Description

  1. Nifty Gridder only supports gridding a single correlation
  2. MS data is (row, chan, corr)
  3. We index on corr to pass a single correlation to nifty which produces a non-contiguous array.

Solution

  1. Make an array copy
  2. Investigate whether nifty might be able to now support gridding multiple correlations.
@sjperkins sjperkins added the enhancement New feature or request label Jan 10, 2020
@mreineck
Copy link

While the ducc gridder still does not support multiple correlations, it's on my todo list.

I assume that the most typical situation will be four correlations, i.e. four complex numbers for every entry in the MS instead of just one at the moment, correct? Will the weight array stay the same, or will it also increase by four?

@sjperkins
Copy link
Member Author

Hi @mreineck. Thanks for keeping this on the horizon.

I would say one, two and four correlations are the typical cases. The WEIGHT and WEIGHT_SPECTRUM arrays always have a correlation dimension which is the same as that in the DATA, FLAG, CORRECTED_DATA etc. arrays.

See the CASA MSv2.0 spec for confirmation: https://casa.nrao.edu/Memos/229.html#SECTION00061000000000000000

From admittedly hazy memory, I think I just decided to pass non-contiguous arrays into the nifty gridder. I remember it sent many warning messages about non-contiguous arrays to the console, which I think you removed?

@mreineck
Copy link

OK, I'll see how that fits into the design. My hope is that I can use vector instructions when gridding/degridding multiple values that are at identical (u,v,w) locations. If that works, then a single pass doing four correlations might be noticeably faster than four separate passes doing the one correlation at a time.

Passing non-contiguous arrays is probably not such a big deal. There will be slightly more cache misses, but the effect on overall performance is probably small.

@sjperkins sjperkins changed the title Handle contiguity issues in dask nifty gridder wrappers Track possible multi-correlation support in nifty/ducc gridders Jul 24, 2020
@sjperkins sjperkins changed the title Track possible multi-correlation support in nifty/ducc gridders Track possible multiple correlation support in nifty/ducc gridders Jul 24, 2020
@sjperkins
Copy link
Member Author

I would say one, two and four correlations are the typical cases.

@landmanbester, I can't imagine another RA use case, but pinging you in case there's something exotic that I may have missed

@sjperkins
Copy link
Member Author

Passing non-contiguous arrays is probably not such a big deal. There will be slightly more cache misses, but the effect on overall performance is probably small.

This is useful to know. I'm currently re-implementing a Discrete Wavelet Transform in numba: ratt-ru/pfb-imaging#10 and the thought of applying the 1D DWT on non-contiguous data made me pause: The original implementation copies rows which are non-contiguous before applying the filter. Do you have any thoughts on this that you'd be willing to share on the matter in ratt-ru/pfb-imaging#10? I'm not asking for a code review -- I find this level of discussion useful.

@mreineck
Copy link

Whether unaligned accesses are a problem depends primarily on how many operations you do on an array element before going to the next. In the gridder, I read a value from the ms array and then grid it onto support**2 different locations (which are almost always in Level1 cache). So the penalty for the potential cache miss when accessing ms is rather small compared to the time required to actually grid it.

If this was a 1D gridder, things would already look worse, as cost now only scales linearly with support. And if you use the value only for a single arithmetic operation (as is the case in FFTs, for example), cache misses become really painful. I don't know the details of Discrete Wavelet Transforms, but I'd naively expect that they are more on the FFT side of things...

@landmanbester
Copy link
Collaborator

I would say one, two and four correlations are the typical cases.

@landmanbester, I can't imagine another RA use case, but pinging you in case there's something exotic that I may have missed

I think that about covers it. The weights are per correlation so, strictly speaking, you have to grid the correlations before converting to Stokes components. I don't know if it makes much of a difference but I think the diagonal components of the brightness matrix are always real-valued

@bennahugo
Copy link
Collaborator

bennahugo commented Jul 24, 2020 via email

@mreineck
Copy link

Unaligned accesses can create a bit of heavoc when vectorizing code using
special instruction sets.

Yes. In my particular situation it wouldn't be a problem though, since during gridding I'd basically have a (potentially) unaligned load per visibility, and the remaining support**2 memory accesses would be aligned. I still don't expect anything near an 8-fold speedup, however...

@bennahugo
Copy link
Collaborator

bennahugo commented Jul 24, 2020 via email

@sjperkins
Copy link
Member Author

Compilers have likely become smarter in the mean time. I don't know if numba exploits special instruction sets or not though

numba defers to LLVM for SIMD: http://numba.pydata.org/numba-doc/latest/user/faq.html?highlight=simd#does-numba-vectorize-array-computations-simd

I'm seen numba generate vectorised assembly code. Something like the following should show you the ASM:

@numba.njit
def function(...):
   pass

function(...)
print(function.inspect_asm())

@mreineck
Copy link

mreineck commented Aug 4, 2020

After finding another way of vectorizing the inner loops of the gridding/degridding functions, I think I won't support simultaneous processing of multiple correlations after all ...
The issue is that if we process, say, four correlations together, then the amount of data which is touched in the innermost loops also becomes four times larger, and this can make the difference between very few and very many L1 cache misses.
Also it would mean reserving memory for 4 complex arrays of size nu*nv instead of just one, which is probably not the best thing to do for really large dirty images.
So unless there are important reasons for switching, I propose to keep the current implementation.

(It's of course true that with separate processing we have to do the same kernel evaluations repeatedly, but I don't think this is an important factor any more.)

@bennahugo
Copy link
Collaborator

bennahugo commented Aug 4, 2020 via email

@mreineck
Copy link

mreineck commented Aug 4, 2020

I see, thanks!
But to be honest, I think that all this trickery could also be done outside the gridder proper, and if possible I'd like to keep the code as unaware as possible of the details of the data it is processing. Currently the library does not know (and does not care) whether it is dealing with Stokes parameters, correlations or something completely different, which I think is a good thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants