-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lane offsets for AVX2 pack instructions #1442
Conversation
`fast_image_resize` yielded broken images, a little bit of println bisecting revealed the SIMD instruction that was at fault. A bit of staring at the cg_clif impl and the Intel manual then revealed the place of the bug. There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.
I'm not sure where tests for this are supposed to go. |
The duplicated code for these packs does make me worry a bit. After going through the intrinsics guide, I also found some packs that weren't implemented yet. I think I'm going to restructure the code here so that the packs are neatly packed together, with all of |
I've been copying stdarch tests into example/std_example.rs several times.
Yeah, this code is horrible. I hope to some day generate it directly from the instruction manual or something like that. Or create a DSL that allows writing this kind of stuff with less code duplication (and maybe also allows it to be reused by miri and other tools). |
Thanks for the fix! Please ignore the test failure. That is rust-random/rand#1355. |
What's currently implemented vs what exists:
I'll clean it up a bit and implement all of those based on that, should be fairly little code. |
Smin is correct here afaict. The input is a signed 32bit integer and we need to check that it fits in an unsigned 16bit integer. Using umin would cause the input to be interpreted as unsigned 32bit integer. Although because of the smax before it, I think it does actually not matter at all if umin or smin is used. |
In any case having a helper function for doing the saturating equivalent of ireduce as is done here would be nice to have. It can probably go in num.rs or cast.rs. |
I created #1443 to restructure all the packed code. |
closing in favor of #1443 |
fast_image_resize
yielded broken images, a little bit of println bisecting revealed the SIMD instruction that was at fault. A bit of staring at the cg_clif impl and the Intel manual then revealed the place of the bug. There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.