-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tunable error rate #19
Comments
I assume you mean false positive rate, right? I know some libraries provide functions to e.g. calculate the bits per entry for a false positive rate, or the false positive rate for a given bits per entry, or the size of the filter (memory used) from rate and entry count, and so on. Yes I agree it would be a good addition! |
Sorry, yes, I meant false positive rate. |
In particular, I'm interested in the case of setting a fixed size and letting the positive rate fluctuate as the keys are inserted. |
A xor filter, unlike a Bloom filter, is built once all keys are present. A Bloom filter allows to add entries afterwards, but a xor filter does not. So what you are looking for might not be possible I'm afraid... |
Yeah to clarify, I indeed meant all at once, such that you set the size and let the error rate suffer. |
Ah ok! Yes that works then.
The implementation here (the current implementation) only supports 8 bit fingerprints (uint8). It doesn't support smaller or larger fingerprints right now. So I'm afraid this doesn't currently work... It's possible to add more implementations, but we believe it's not currently worth it... Of course feel free to provide a patch!
No I'm afraid that's not a solution... What you could change a tiny bit is this line: capacity := 32 + uint32(math.Ceil(1.23*float64(size))) and, for example, try to use 1.229 instead of 1.23. It would work sometimes. But probability to find a seed reduces drastically, and very quickly... It's not something I would try. With a regular Bloom filter, yes you can just make it smaller and it will still "work" (just with a bad false positive rate), but not a xor filter. |
I implemented xor16 and I think I got it generalized also to xorN (for [9..32] bits). |
@brianolson Your generic implementation seems to set the memory usage to 32-bit per entry. I understand that you can write only the least significant N bits of each word to disk or to the network, and then upsample when deserializing... but it comes at a cost. |
It would be great to have a way to change/tune the false negative rate.
The text was updated successfully, but these errors were encountered: