Inconsistent `BF.ADD`: modify the implementation of bloom filters to achieve similar result as redis #1287

shashi-sah2003 · 2024-11-15T04:55:35Z

Steps to reproduce

bf.add bf1 item1
bf.info bf1

Expected output

The expected output when the above set of commands when run on Redis

Observed output

The observed output when the above set of commands when run on DiceDB

The steps to run the test cases are mentioned in the README of the dice-tests repository.

Expectations for resolution

This issue will be considered resolved when the following things are done

changes in the dice code to meet the expected behavior
Successful run of the tcl test behavior

You can find the tests under the tests directory of the dice repository and the steps to run are in the README file. Refer to the following links to set up DiceDB and Redis 7.2.5 locally

The text was updated successfully, but these errors were encountered:

shashi-sah2003 · 2024-11-15T04:56:15Z

@JyotinderSingh @apoorvyadav1111 I am going to work on this. pls assign
thanks

shashi-sah2003 · 2024-11-15T05:17:34Z

@JyotinderSingh @apoorvyadav1111 from above you can see in bloom.go the value for capacity has been set to 1024 and number of filters has been set to number of hash function explicitly. I believe in redis implementation they use only on filter but in dicedb implementation number of filters are decided according to number of bits per element. Should I go ahead and change this configuration to use only one filter with 100 capacity?

apoorvyadav1111 · 2024-11-18T22:03:16Z

@shashi-sah2003 , yes please go ahead. Please consider that while we aim at being redis-compliant, finer values can differ based on the implementation of the data structure. That said, bloom filters do need an upgrade especially in expansion (currently its fixed). Thank you for working on this and lets stay connected on discord for this issue.

shashi-sah2003 · 2024-11-20T08:53:36Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

JyotinderSingh · 2024-11-22T08:41:16Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.

shashi-sah2003 · 2024-11-23T12:20:55Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.

@JyotinderSingh @apoorvyadav1111 I have done some more research from my side and found out that Redis has a capacity of 100 , so only one filter is sufficient for this purpose while in DICEDB implementation the capacity is 1024 which requires more number of filters (to reduce false positive rate) which is calculated using errorRate

JyotinderSingh · 2024-11-23T12:22:05Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.

@JyotinderSingh @apoorvyadav1111 I have done some more research from my side and found out that Redis has a capacity of 100 , so only one filter is sufficient for this purpose while in DICEDB implementation the capacity is 1024 which requires more number of filters (to reduce false positive rate) which is calculated using errorRate

In that case we can adjust our filters accordingly

shashi-sah2003 · 2024-11-23T12:25:49Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.

@JyotinderSingh @apoorvyadav1111 I have done some more research from my side and found out that Redis has a capacity of 100 , so only one filter is sufficient for this purpose while in DICEDB implementation the capacity is 1024 which requires more number of filters (to reduce false positive rate) which is calculated using errorRate

In that case we can adjust our filters accordingly

Then how should I go about this? adjust the capacity and filters accordinly as redis?

JyotinderSingh · 2024-11-23T12:28:07Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.

@JyotinderSingh @apoorvyadav1111 I have done some more research from my side and found out that Redis has a capacity of 100 , so only one filter is sufficient for this purpose while in DICEDB implementation the capacity is 1024 which requires more number of filters (to reduce false positive rate) which is calculated using errorRate

In that case we can adjust our filters accordingly

Then how should I go about this? adjust the capacity and filters accordinly as redis?

We don't need to match redis output in this case.

shashi-sah2003 · 2024-11-23T12:38:20Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.

@JyotinderSingh @apoorvyadav1111 I have done some more research from my side and found out that Redis has a capacity of 100 , so only one filter is sufficient for this purpose while in DICEDB implementation the capacity is 1024 which requires more number of filters (to reduce false positive rate) which is calculated using errorRate

In that case we can adjust our filters accordingly

Then how should I go about this? adjust the capacity and filters accordinly as redis?

We don't need to match redis output in this case.

yeah also in the codebase, the optimized formula is being used to calculate number of filters which is calculated using errorRate, for now it has been set to 0.01
we can also try to reduce the errorRate 0.001 which ensures less false positive rates but it would require more space and more number of filters. what's our aim for now? is it having less false positive rate or less space?

JyotinderSingh · 2024-11-23T12:44:16Z

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.

@JyotinderSingh @apoorvyadav1111 I have done some more research from my side and found out that Redis has a capacity of 100 , so only one filter is sufficient for this purpose while in DICEDB implementation the capacity is 1024 which requires more number of filters (to reduce false positive rate) which is calculated using errorRate

In that case we can adjust our filters accordingly

Then how should I go about this? adjust the capacity and filters accordinly as redis?

We don't need to match redis output in this case.

yeah also in the codebase, the optimized formula is being used to calculate number of filters which is calculated using errorRate, for now it has been set to 0.01

we can also try to reduce the errorRate 0.001 which ensures less false positive rates but it would require more space and more number of filters. what's our aim for now? is it having less false positive rate or less space?

That could be a separate discussion, it would depend on a lot of factors which are unclear for now.

lucifercr07 assigned shashi-sah2003 Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent `BF.ADD`: modify the implementation of bloom filters to achieve similar result as redis #1287

Inconsistent `BF.ADD`: modify the implementation of bloom filters to achieve similar result as redis #1287

shashi-sah2003 commented Nov 15, 2024

shashi-sah2003 commented Nov 15, 2024

shashi-sah2003 commented Nov 15, 2024 •

edited

Loading

apoorvyadav1111 commented Nov 18, 2024

shashi-sah2003 commented Nov 20, 2024

JyotinderSingh commented Nov 22, 2024

shashi-sah2003 commented Nov 23, 2024

JyotinderSingh commented Nov 23, 2024

shashi-sah2003 commented Nov 23, 2024

JyotinderSingh commented Nov 23, 2024

shashi-sah2003 commented Nov 23, 2024

JyotinderSingh commented Nov 23, 2024

Inconsistent BF.ADD: modify the implementation of bloom filters to achieve similar result as redis #1287

Inconsistent BF.ADD: modify the implementation of bloom filters to achieve similar result as redis #1287

Comments

shashi-sah2003 commented Nov 15, 2024

Steps to reproduce

Expected output

Observed output

Expectations for resolution

shashi-sah2003 commented Nov 15, 2024

shashi-sah2003 commented Nov 15, 2024 • edited Loading

apoorvyadav1111 commented Nov 18, 2024

shashi-sah2003 commented Nov 20, 2024

JyotinderSingh commented Nov 22, 2024

shashi-sah2003 commented Nov 23, 2024

JyotinderSingh commented Nov 23, 2024

shashi-sah2003 commented Nov 23, 2024

JyotinderSingh commented Nov 23, 2024

shashi-sah2003 commented Nov 23, 2024

JyotinderSingh commented Nov 23, 2024

Inconsistent `BF.ADD`: modify the implementation of bloom filters to achieve similar result as redis #1287

Inconsistent `BF.ADD`: modify the implementation of bloom filters to achieve similar result as redis #1287

shashi-sah2003 commented Nov 15, 2024 •

edited

Loading