Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize comparison (less-than) with AVX2 #302

Open
chfast opened this issue Feb 5, 2024 · 1 comment
Open

Optimize comparison (less-than) with AVX2 #302

chfast opened this issue Feb 5, 2024 · 1 comment

Comments

@chfast
Copy link
Owner

chfast commented Feb 5, 2024

https://godbolt.org/z/xEcGzqKo9

unsigned bsr(unsigned m)
{
    return  31 - __builtin_clz(m);  
}

auto lt_avx(const u256& x, const u256& y)
{
    auto xv = std::bit_cast<__m256i>(x);
    auto yv = std::bit_cast<__m256i>(y);
    auto e = _mm256_cmpeq_epi64(xv, yv);
    auto ed = std::bit_cast<__m256d>(e);
    unsigned m = _mm256_movemask_pd(ed);
    auto f = m ^ 0xf;  // flip mask (4 bits)
    auto g = f | 1;  // fixup eq
    auto i = bsr(g);
    return x.w[i] < y.w[i];
}
@chfast
Copy link
Owner Author

chfast commented Feb 5, 2024

New idea https://godbolt.org/z/Ge7TanY3M

auto lt_avx_v2_8(const u256& x, const u256& y)
{
    auto xv = std::bit_cast<__m256i>(x);
    auto yv = std::bit_cast<__m256i>(y);
    auto gtv = _mm256_cmpgt_epi8(xv, yv);
    auto ltv = _mm256_cmpgt_epi8(yv, xv);
    unsigned gt = _mm256_movemask_epi8(gtv);
    unsigned lt = _mm256_movemask_epi8(ltv);
    return lt > gt;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant