Select vs. ranked for succinct counting blocked bloom filter #28

fredpflintstone · 2021-09-23T15:45:28Z

What is the difference between the select version and the ranked version? Thanks.

thomasmueller · 2021-09-25T08:06:37Z

Yes, documentation is needed... I'm currently away, and won't have time this week sorry. But next week should be ok.

thomasmueller · 2021-10-05T15:20:42Z

So, the "succinct counting blocked Bloom filter", as the name implies, is a blocked Bloom filter, and at the same time a counting Bloom filter. The data structure contains 3 areas:

First area is identical to the blocked Bloom filter (which makes lookup identical to blocked Bloom filter).
Second area succinctly counts which bits in the first area are set how many times (counting part).
Third area is overflow area, in case counts overflow somewhere.

There are two ways to succinctly count how often the bits in the first area were set: using "rank", and using "select".

Using "select" may be a bit easier to explain. Say, in the Bloom filter area, we have a block of 64 bits as follows:

00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001

So only bit 0 and bit 8 are set. The question is, how often (the count). This is stored in a block of 64 bits in the second area. Only bits that are set in the first area need to be counted, so we only need 2 counters. Those are encoded in the number of zeros between two bit that are set. E.g. for counts 1 (for Bloom filter bit 0) and 3 (for Bloom filter bit 8), it is:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001001

If we increment the count for bit 0 from 1 to 2, we shift all the bits to the left, and get:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010010

If we decrement the count for bit 0 from 1 to 0, we reset the low bit and shift, so we get:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000100

This was for "select". (I called it "select" because the count is read using the "select" operation).

"Rank" is similar to a https://en.wikipedia.org/wiki/Hash_array_mapped_trie. For "rank", we count the number of bits that are set in the Bloom filter area (in this case it's 2). So the lowest 2 bits are used to say whether the count is higher than 1. For those that are set, we need one more bit in this area, to the left, until there are none left. For counts 1 (for Bloom filter bit 0) and 3 (for Bloom filter bit 8), it is:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000110

If we increment the count for bit 0 from 1 to 2, we set the last bit and shift, to get:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001011

Both "rank" and "select" use roughly the same space (I think rank uses a bit less on average).

The overflow area is needed if a block in the counter area grows too large, in which case a pointer to the overflow area is stored, and the highest bit is set.

Well I see it is very, very hard to explain, and understand. I'm afraid I need to give up here... In order to explain this, more time, and more examples are needed...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select vs. ranked for succinct counting blocked bloom filter #28

Select vs. ranked for succinct counting blocked bloom filter #28

fredpflintstone commented Sep 23, 2021

thomasmueller commented Sep 25, 2021

thomasmueller commented Oct 5, 2021 •

edited

Loading

Select vs. ranked for succinct counting blocked bloom filter #28

Select vs. ranked for succinct counting blocked bloom filter #28

Comments

fredpflintstone commented Sep 23, 2021

thomasmueller commented Sep 25, 2021

thomasmueller commented Oct 5, 2021 • edited Loading

thomasmueller commented Oct 5, 2021 •

edited

Loading