Feature Request: `RoaringBitmap::from_lsb0_bytes` #288

lemolatoon · 2024-08-21T00:43:27Z

The Feature Explanation

Adding new creation function for RoaringBitmap

 pub fn from_lsb0_bytes(offset: u32, bytes: &[u8]) -> RoaringBitmap

Function Behavior

Interpret bytes as little endian bytes bitmap, and construct RoaringBitmap
offset can be used to offset the passing bitmap's index
If offset is not aligned to # of bits of Container's Store::Bitmap (# of bits of Box<[u64; 1024]>), this function panics

use roaring::RoaringBitmap;

let bytes = [0b00000101, 0b00000010, 0b00000000, 0b10000000];
//             ^^^^^^^^    ^^^^^^^^    ^^^^^^^^    ^^^^^^^^
//             76543210          98
let rb = RoaringBitmap::from_lsb0_bytes(0, &bytes);
assert!(rb.contains(0));
assert!(!rb.contains(1));
assert!(rb.contains(2));
assert!(rb.contains(9));
assert!(rb.contains(31));

let rb = RoaringBitmap::from_lsb0_bytes(8, &bytes);
assert!(rb.contains(8));
assert!(!rb.contains(9));
assert!(rb.contains(10));
assert!(rb.contains(17));
assert!(rb.contains(39));

Motivation

Sometimes bitmap is calculated by SIMD instructions. The result of SIMD instruction is likely to be already bitmask, not the series of bitmap indicies.

Under current implementation, when you intend use RoaringBitmap with bitmask produced by SIMD instruction, you have to use RoaringBitmap::sorted_iter or just insert one by one.

To solve this problem, I implemented RoaringBitmap::from_bitmap_bytes, which can be used to construct directly from bitmask.

Example of Production of Bitmask by SIMD instructions

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=40ddd13554c171be31fe53893401d40f

use std::arch::x86_64::*;

#[target_feature(enable = "avx2")]
unsafe fn compare_u8_avx2(a: &[u8], b: &[u8]) -> u32 {
    assert!(
        a.len() == 32 && b.len() == 32,
        "Inputs must have a length of 32."
    );

    // Load the data into 256-bit AVX2 registers
    let a_vec = _mm256_loadu_si256(a.as_ptr() as *const __m256i);
    let b_vec = _mm256_loadu_si256(b.as_ptr() as *const __m256i);

    // Perform comparison (a == b)
    let cmp_result = _mm256_cmpeq_epi8(a_vec, b_vec);

    // Extract the comparison result as a bitmask
    let mask = _mm256_movemask_epi8(cmp_result);

    mask as u32
}

fn main() {
    let a: [u8; 32] = [
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
        26, 27, 28, 29, 30, 31, 32,
    ];
    let b: [u8; 32] = [
        1, 0, 3, 4, 5, 6, 7, 8, 0, 10, 11, 12, 13, 0, 15, 16, 0, 18, 19, 20, 21, 0, 23, 24, 0, 26,
        27, 0, 29, 0, 31, 32,
    ];

    let mask = unsafe {
        compare_u8_avx2(&a, &b)
    };
    println!("Bitmask: {:#034b}", mask);
    // Bitmask: 0b11010110110111101101111011111101
    print!("Bitmask (little endian u8): ");
    for b in mask.to_le_bytes() {
        print!("{:08b} ", b);
    }
    println!();
    // Bitmask (little endian u8): 11111101 11011110 11011110 11010110 
    
    let n = 2;
    println!("Bitmask at {n}: {}", mask & (1 << n) != 0);
    // Bitmask at 2: true
}

Benchmark Result

On my laptop (Apple M3 MacBook Air Sonoma14.3 Memory 16 GB), in most cases from_lsb0_bytes is much faster than from_sorted_iter.

Part of Results

creation/from_bitmap_bytes/census-income_srt                                                                             
                        time:   [984.25 µs 987.00 µs 990.37 µs]
                        thrpt:  [6.1521 Gelem/s 6.1731 Gelem/s 6.1904 Gelem/s]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
creation/from_sorted_iter/census-income_srt                                                                            
                        time:   [23.383 ms 23.397 ms 23.413 ms]
                        thrpt:  [260.24 Melem/s 260.41 Melem/s 260.57 Melem/s]

Kerollmops

Hey @lemolatoon 👋

Thank you very much for these changes. The results look very good, indeed. However, could you:

Write a better explanation of what offset means. I understand, but it needs to be clearer. Maybe talk about internal containers that are aligned around 64k values integer groups?
Explain what kind of input is expected in plain text in the function description (endianness, size, alignment).
Move this function and the test into the serialization module.

Thank you very much for the work!

lemolatoon · 2024-08-21T17:33:04Z

Hi @Kerollmops 👋

Thank you for your quick reply. I have just made changes based on your review.

Basically I did:

Moved RoaringBitmap::from_bitmap_bytes, and its tests to serialization module.
Added detailed decument to the RoaringBitmap::from_bitmap_bytes, including offset, bytes explanations.
Relaxed the alignment requirement for offset.
- Thanks to #[inline], I belive the compiler can easily optimize if offset is actually aligned to 8, or even 64Ki

roaring/src/bitmap/serialization.rs

lemolatoon · 2024-09-05T23:56:14Z

@Dr-Emann I've just fixed the documentation and made the implementation endian-aware.
You can try big endian system by running cargo +nightly miri test --target s390x-unknown-linux-gnu --package roaring --lib -- bitmap::serialization::test::test_from_bitmap_bytes.
I also added this big endian test to CI.

lemolatoon · 2024-09-12T04:22:27Z

I have just merged patch from @Dr-Emann (Thank you so much.) If merge commit aace6b8 is unnecessary, I'll remove it by force push.

roaring/src/bitmap/serialization.rs

lemolatoon · 2024-10-01T04:52:33Z

I applied cargo fmt, and rename from_bitmap_bytes to from_lsb0_bytes.

@Kerollmops I would like you to review this pull requests, then I can see the CI result.

Kerollmops · 2024-10-09T08:44:26Z

Hey @lemolatoon 👋

Sorry for the very late reply. I just approved your changes so that you can see the CI.

lemolatoon · 2024-10-09T10:28:48Z

Thank you @Kerollmops !
CI seems to be failing because of the change of cargo clippy on the nightly channel. I think this should be fixed on another pull request.

Dr-Emann · 2024-10-16T05:11:57Z

@lemolatoon The CI errors should be fixed by pulling in main, thanks to #293.

lemolatoon · 2024-10-19T04:31:14Z

I git rebaseed to the main branch. It seems to be required @Kerollmops approval to re-run the CI 🙏

lemolatoon · 2024-10-22T06:37:16Z

It seems to have the clippy warnings appeared only in 1.66.0 not in stable. I fixed the warnings. Could you re-run CI again please? @Kerollmops

Kerollmops · 2024-10-30T08:58:54Z

Hey @lemolatoon 👋 Would you mind rebasing on main, please? I just merged #295. Have a nice one 🐿️

* Directly create an array/bitmap store based on the count of bits * Use u64 words to count bits and enumerate bits * Store in little endian, then just fixup any touched words * Only use unsafe to reinterpret a whole bitmap container as bytes, everything else is safe * Allow adding bytes up to the last possible offset

we can setting an initial value in that case

lemolatoon · 2024-10-31T07:50:04Z

@Kerollmops
Sorry for the late fix. I have just pushed with two changes:

git rebased upstream/main
Fix the CI error regarding to the miri

I am afraid to bother you multiple times, but I'd like to see the CI result again 🙏

lemolatoon · 2024-11-01T05:02:05Z

I accidentally skipped one commit fixing the cargo clippy warning by the last rebase. I fixed the warning again and pushed.

Kerollmops · 2024-11-01T12:37:36Z

roaring/src/bitmap/serialization.rs

+    pub fn from_lsb0_bytes(offset: u32, mut bytes: &[u8]) -> RoaringBitmap {
+        assert_eq!(offset % 8, 0, "offset must be a multiple of 8");


I am wondering if we can't just switch this offset in bits to an offset in bytes as the precision can only be expressed 8 bits at a time. This way we avoid this assert and reduce the possible errors.

If I understand correctly the offset is like a helper to increment the bitmap numbers by 8 at a time? Why is it necessary that it must be 8 at a time? Do we want to keep this? Why users can't just shift numbers themselves?

And if we want to keep it can we make it more clear (like the description I did above)? Your example helped me understand the high intent of this parameter.

Thank you for your review and sorry for the late responce.

The reason why the offset must be 8 multiples is if the function accepts not 8 multiples offset, the copying will not so easy. If the container is Array, the implementation would be almost the same. In case Bitmap store, we need bit shifting for every bitmap byte copy.

We can switch the implementation based on the offset. In terms of the peformance, the users who use 8 multiples offset might not be impacted by this API change.
And the peformance of users with not 8 multiples offset is unknown.

I could allow any offset for this method with a little additional implementation and the document on peformances.

Thank you, @lemolatoon, for the explanation. It would be great if we could accept for than just 8 multipliers, indeed.

Kerollmops requested changes Aug 21, 2024

View reviewed changes

lemolatoon force-pushed the from-bitmap-bytes branch from 83af352 to adbc00b Compare August 21, 2024 17:28

lemolatoon force-pushed the from-bitmap-bytes branch from adbc00b to af5ea01 Compare August 21, 2024 17:54

lemolatoon requested a review from Kerollmops August 21, 2024 17:55

lemolatoon force-pushed the from-bitmap-bytes branch from af5ea01 to 03a61ad Compare August 22, 2024 03:05

Dr-Emann requested changes Aug 22, 2024

View reviewed changes

roaring/src/bitmap/serialization.rs Outdated Show resolved Hide resolved

roaring/src/bitmap/serialization.rs Outdated Show resolved Hide resolved

Dr-Emann mentioned this pull request Sep 7, 2024

Speed up from_bitmap_bytes, and use less unsafe lemolatoon/roaring-rs#1

Merged

Dr-Emann approved these changes Sep 14, 2024

View reviewed changes

roaring/src/bitmap/serialization.rs Outdated Show resolved Hide resolved

Dr-Emann approved these changes Oct 1, 2024

View reviewed changes

lemolatoon changed the title ~~Feature Request: RoaringBitmap::from_bitmap_bytes~~ Feature Request: RoaringBitmap::from_lsb0_bytes Oct 2, 2024

Dr-Emann mentioned this pull request Oct 10, 2024

Fix warnings when testing with nightly #293

Merged

lemolatoon force-pushed the from-bitmap-bytes branch from 2706bd9 to ac672a5 Compare October 19, 2024 04:29

lemolatoon and others added 8 commits October 31, 2024 16:36

Impl RoaringBitmap::from_bitmap_bytes

32b4d60

Add benchmark for from_bitmap_bytes

9834c01

Handle big endian system in from_bitmap_bytes

4c4a2dd

Add big endian test to CI

b15c11b

Update description of bit order in from_bitmap_bytes

0758f8a

add special case for creating a full bitmap container

09e9d6d

we can setting an initial value in that case

Apply cargo fmt

f0dbb72

Rename from_bitmap_bytes to from_lsb0_bytes

915e582

lemolatoon force-pushed the from-bitmap-bytes branch from e72b0fc to 915e582 Compare October 31, 2024 07:36

Fix CI error

53ec34b

Fix cargo clippy warning

2a9a0c2

Kerollmops reviewed Nov 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: `RoaringBitmap::from_lsb0_bytes` #288

Feature Request: `RoaringBitmap::from_lsb0_bytes` #288

lemolatoon commented Aug 21, 2024 •

edited

Loading

Kerollmops left a comment

lemolatoon commented Aug 21, 2024

lemolatoon commented Sep 5, 2024 •

edited

Loading

lemolatoon commented Sep 12, 2024 •

edited

Loading

lemolatoon commented Oct 1, 2024

Kerollmops commented Oct 9, 2024

lemolatoon commented Oct 9, 2024

Dr-Emann commented Oct 16, 2024

lemolatoon commented Oct 19, 2024

lemolatoon commented Oct 22, 2024

Kerollmops commented Oct 30, 2024

lemolatoon commented Oct 31, 2024

lemolatoon commented Nov 1, 2024

Kerollmops Nov 1, 2024

lemolatoon Nov 7, 2024

Kerollmops Nov 11, 2024

		pub fn from_lsb0_bytes(offset: u32, mut bytes: &[u8]) -> RoaringBitmap {
		assert_eq!(offset % 8, 0, "offset must be a multiple of 8");

Feature Request: RoaringBitmap::from_lsb0_bytes #288

Are you sure you want to change the base?

Feature Request: RoaringBitmap::from_lsb0_bytes #288

Conversation

lemolatoon commented Aug 21, 2024 • edited Loading

The Feature Explanation

Function Behavior

Motivation

Example of Production of Bitmask by SIMD instructions

Benchmark Result

Part of Results

Kerollmops left a comment

Choose a reason for hiding this comment

lemolatoon commented Aug 21, 2024

lemolatoon commented Sep 5, 2024 • edited Loading

lemolatoon commented Sep 12, 2024 • edited Loading

lemolatoon commented Oct 1, 2024

Kerollmops commented Oct 9, 2024

lemolatoon commented Oct 9, 2024

Dr-Emann commented Oct 16, 2024

lemolatoon commented Oct 19, 2024

lemolatoon commented Oct 22, 2024

Kerollmops commented Oct 30, 2024

lemolatoon commented Oct 31, 2024

lemolatoon commented Nov 1, 2024

Kerollmops Nov 1, 2024

Choose a reason for hiding this comment

lemolatoon Nov 7, 2024

Choose a reason for hiding this comment

Kerollmops Nov 11, 2024

Choose a reason for hiding this comment

Feature Request: `RoaringBitmap::from_lsb0_bytes` #288

Feature Request: `RoaringBitmap::from_lsb0_bytes` #288

lemolatoon commented Aug 21, 2024 •

edited

Loading

lemolatoon commented Sep 5, 2024 •

edited

Loading

lemolatoon commented Sep 12, 2024 •

edited

Loading