Skip to content

Fix in erroneous implementation of _mm256_bsrli_epi128 #1823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

satiscugcat
Copy link

@satiscugcat satiscugcat commented Jun 3, 2025

This fixes the error mentioned in issue #1822

@rustbot
Copy link
Collaborator

rustbot commented Jun 3, 2025

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@satiscugcat satiscugcat changed the title Fix in erroneous implementation of _mm256_bsrli_epi128, removal of redundant operation in _mm256_alignr_epi8 issue Fix in erroneous implementation of _mm256_bsrli_epi128, removal of redundant operation in _mm256_alignr_epi8 Jun 3, 2025
@@ -191,7 +191,7 @@ pub fn _mm256_alignr_epi8<const IMM8: i32>(a: __m256i, b: __m256i) -> __m256i {
return transmute(a);
}

let r: i8x32 = match IMM8 % 16 {
let r: i8x32 = match IMM8 {
Copy link
Member

@bjorn3 bjorn3 Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to also add a value to the default arm. It is currently unreachable_unchecked(), which makes IMM8 values > 15 UB.
Edit: Maybe you should change the IMM8 == 16 if above to be IMM8 >= 16? Also there is handling for IMM8 > 16 above already: (_mm256_setzero_si256(), a) which would become unreachable in that case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I was mistaken. I think specifically values between 17-31 (inclusive) are UB without the %, which makes it not redundant. My mistake!

@bjorn3
Copy link
Member

bjorn3 commented Jun 3, 2025

Did you try reverifying _mm256_alignr_epi8 with whatever tool you used to find the _mm256_bsrli_epi128 bug? And if so how did the unreachable_unchecked() bug slip past it?

@satiscugcat
Copy link
Author

Did you try reverifying _mm256_alignr_epi8 with whatever tool you used to find the _mm256_bsrli_epi128 bug? And if so how did the unreachable_unchecked() bug slip past it?

The _mm256_bsrli_epi128 bug was found manually upon noticing a discrepancy in the intel specifications and the rust implementations in that case, and then testing it as it is tested in issue #1822 . Thus the other bug slipped past me too. My bad!

@satiscugcat satiscugcat changed the title Fix in erroneous implementation of _mm256_bsrli_epi128, removal of redundant operation in _mm256_alignr_epi8 Fix in erroneous implementation of _mm256_bsrli_epi128 Jun 3, 2025
@sayantn
Copy link
Contributor

sayantn commented Jun 3, 2025

Could you check pls if this bug is also there for _mm256_bslli_epi128 and the _mm512 variants too. They might be affected as the impl is very similar

@satiscugcat
Copy link
Author

Could you check pls if this bug is also there for _mm256_bslli_epi128 and the _mm512 variants too. They might be affected as the impl is very similar

Sure! We'll work on testing the variants from tomorrow.

@sayantn
Copy link
Contributor

sayantn commented Jun 3, 2025

just checked, seems like the same bug is there for _mm512_bsrli_epi128, but not for any of the bslli variants (it is actually amusing how different the implementations are for bslli and bsrli, but anyway it seems like the bslli implementation is much more robust).

Actually, would you mind changing the impl of the bsrli variants to be like as the bslli variants? It would also fix this bug, and make it more consistent

@satiscugcat
Copy link
Author

just checked, seems like the same bug is there for _mm512_bsrli_epi128, but not for any of the bslli variants (it is actually amusing how different the implementations are for bslli and bsrli, but anyway it seems like the bslli implementation is much more robust).

Actually, would you mind changing the impl of the bsrli variants to be like as the bslli variants? It would also fix this bug, and make it more consistent

I've made the appropriate changes.

Copy link
Contributor

@sayantn sayantn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Could you also see if this refactoring is also possible for _mm{256,512}_alignr_epi8. You could see the impl for _mm_alignr_epi8 for some reference (sorry for this many requests 😅)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants