Fix lane offsets for AVX2 pack instructions #1442

Noratrieb · 2024-01-02T20:15:00Z

fast_image_resize yielded broken images, a little bit of println bisecting revealed the SIMD instruction that was at fault. A bit of staring at the cg_clif impl and the Intel manual then revealed the place of the bug. There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.

`fast_image_resize` yielded broken images, a little bit of println bisecting revealed the SIMD instruction that was at fault. A bit of staring at the cg_clif impl and the Intel manual then revealed the place of the bug. There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.

Noratrieb · 2024-01-02T20:20:35Z

I'm not sure where tests for this are supposed to go. stdarch tests?

Noratrieb · 2024-01-02T20:33:06Z

The duplicated code for these packs does make me worry a bit. After going through the intrinsics guide, I also found some packs that weren't implemented yet. I think I'm going to restructure the code here so that the packs are neatly packed together, with all of _mm{,256}_pack{u,us}_epi{16,32} implemented.

bjorn3 · 2024-01-02T20:43:34Z

I'm not sure where tests for this are supposed to go.

I've been copying stdarch tests into example/std_example.rs several times.

There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.

Yeah, this code is horrible. I hope to some day generate it directly from the instruction manual or something like that. Or create a DSL that allows writing this kind of stuff with less code duplication (and maybe also allows it to be reused by miri and other tools).

bjorn3 · 2024-01-02T20:44:33Z

Thanks for the fix! Please ignore the test failure. That is rust-random/rand#1355.

Noratrieb · 2024-01-02T20:45:20Z

What's currently implemented vs what exists:

	sse 16	avx 16	sse 32	avx 32
unsigned	_mm_packus_epi16\|llvm.x86.sse2.packuswb.128 ✅	_mm256_packus_epi16\|llvm.x86.avx2.packuswb ✅	_mm_packus_epi32\|llvm.x86.sse41.packusdw ✅	_mm256_packus_epi32\|llvm.x86.avx2.packusdw
signed	_mm_packs_epi16\|llvm.x86.sse2.packsswb.128	_mm256_packs_epi16\|llvm.x86.avx2.packsswb	_mm_packs_epi32\|llvm.x86.sse2.packssdw.128 ✅	_mm256_packs_epi32\|llvm.x86.avx2.packssdw ✅

I'll clean it up a bit and implement all of those based on that, should be fairly little code. llvm.x86.sse41.packusdw is also pretty suspicious as it currently uses smin, while the other unsigned ones use umin.

bjorn3 · 2024-01-02T20:57:18Z

llvm.x86.sse41.packusdw is also pretty suspicious as it currently uses smin, while the other unsigned ones use umin.

Smin is correct here afaict. The input is a signed 32bit integer and we need to check that it fits in an unsigned 16bit integer. Using umin would cause the input to be interpreted as unsigned 32bit integer. Although because of the smax before it, I think it does actually not matter at all if umin or smin is used.

bjorn3 · 2024-01-02T20:59:57Z

In any case having a helper function for doing the saturating equivalent of ireduce as is done here would be nice to have. It can probably go in num.rs or cast.rs.

Noratrieb · 2024-01-02T21:19:06Z

I created #1443 to restructure all the packed code.

Noratrieb · 2024-01-03T19:35:46Z

closing in favor of #1443

Noratrieb mentioned this pull request Jan 2, 2024

Restructure x86 signed pack instructions #1443

Merged

Noratrieb closed this Jan 3, 2024

Noratrieb deleted the fix-pack-lane-offsets branch January 3, 2024 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix lane offsets for AVX2 pack instructions #1442

Fix lane offsets for AVX2 pack instructions #1442

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 3, 2024

Uh oh!

Uh oh!

Fix lane offsets for AVX2 pack instructions #1442

Fix lane offsets for AVX2 pack instructions #1442

Uh oh!

Conversation

Noratrieb commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

bjorn3 commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 2, 2024

Uh oh!

Noratrieb commented Jan 3, 2024

Uh oh!

Uh oh!