Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Feature/permute rebase13 #18

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bjoo
Copy link
Contributor

@bjoo bjoo commented May 4, 2020

Hi All, Here are the additions for vector lane permute. Intrinsics for AVX (Single Prec), AVX512 (single prec and double prec), Generic for other CPUs, __shfl_sync for CUDA, __shfl for HIP.

I also came accross an overflow possibility with the CUDA masks when N=32 (got compiler warning) so I made that go away.

A couple of FIXME's still left in there as well as a funny for AVX512 where the permutexvar intrinsic looks at every second element of the permutation index table register, and intevening elements are zeroed out, hence the funny 8 element constructor for the simd<int,> type. Maybe worth renaming these to something like simd_permute_control_t in the future.

This was rebased onto master after #13 was merged.

The testing harness is at: https://github.com/bjoo/simd-math-testing.git and shows how the permutes are being called.

Balint Joo and others added 3 commits May 4, 2020 12:46
Guarded permute in simd_common.h as it won't work for GPU architectures
 FIXME: need better guards for Clang + GPU targets

Fixed: any_of all_of in HIP backend
Fixed: potential mask overflow in cuda_warp
@crtrott
Copy link
Member

crtrott commented May 4, 2020

I am wondering about the constructors with a subset of elements. Since this is specifically for permute vectors, maybe a non-member function to create a permute vector would be better? I would actually prefer an interface where the permute would take a permute integer simd type of the same length as the simd value type, but I take it from you that that is too expansive since you have to convert it internally?

We need to think about what kind of interface would be acceptable to the C++ standard.

@bjoo
Copy link
Contributor Author

bjoo commented May 4, 2020 via email

Balint Joo and others added 2 commits May 4, 2020 17:31
…X512

Removed weird simd<int,> initialization for AVX512 -- essentially
the zeros get put into a simd_t::storage_type by make_permute now.
@bjoo
Copy link
Contributor Author

bjoo commented May 4, 2020

Hi, as suggested by @crtrott I have modified the interface by adding make_permute.
I have also updated the test codes to see how it could be used.

Actually another thing, now that this is kinda traits-y so we have

simd::simd_utils< simd<T,Abi>::make_permute(int mask[ simd<T,Abi>::size() ] )

there is nothing in principle to make the 'int mask' a compile time template parameter i.e:

simd::simd_utils<simd<T,Abi>::template make_permute<from0,from1,....fromN>(void)

where from0 is what would otherwise be mask[0],
from1 is what would otherwise be mask[1]
etc.

however for warp-sizes of 64, that could get unwieldy.

NB: SYCL has its own way of doing shuffles with its vec<T,N> type which is templatized like this, but i) is not a warp parallel permute, and ii) currently has a maximum size of 16. There the shuffle operator returns a shuffled_vec<T,N> type and the shuffle is not actually carried out until this is assigned to another vec<T,N>. That approach could also be a way to go towards standardization too, tho what I have now is simple, and relatively portable. For SSE it would be annoying to implement the shuffle, as one would need to use _mm_shuffle_ps/pd where the masks are immediates :( So probably we'd need to do something gross like 'make_permute'
returns the immediate, and then the permute would need some horrible switch statement with something like

switch( control_value ) {
case 0 : _mm_shuffle_ps(reg1, reg1, 0x0; break;
case 1 : _mm_shuffle_ps(reg1, reg1, 0x01; break;
...
};

This gets pretty tedious to write. Ditto for AVX2 Double Prec shuffles
(_mm256_permutevar_pd is in AVX512 :( )

Let me know if you need anything else added to this.

@ibaned
Copy link
Contributor

ibaned commented Aug 13, 2020

@crtrott @bjoo sorry for not seeing this earlier. @crtrott did @bjoo sufficiently address your feedback? If so I can look at merging.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants