-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4 billion records max? #38
Comments
Unfortunately just changing the constant to Even storing a billion keys with a 32-bit hash function is not great. The closer to 4 billion you get, the more hash collisions you'll see. For now, I would recommend sharding the database - running multiple databases. Can you tell me more about how you use Pogreb? What is your typical access pattern? Is it write-heavy? What is your average key and value size? |
Apologies for the delay. The use case is for https://intelx.io storing hashes of all of our records in a key-value database which helps for some internal caching operations. The plan is to update the key-value store every 24 hours, so it would be "write-heavy-once" then read heavy. We are still running into the other troubles (the weird disk errors coming from NTFS), but those I can handle/fix myself. For now I have shutdown the key-value store as we are too dangerously close to the 4 billion records and I'm afraid of hash collisions and false positive lookups. |
Thanks for the details! While the database will get slower as it gets close to 4 billion keys, it won't impact correctness, you don't need to worry about false positives. After doing a hash lookup Pogreb compares the key to the data in the WAL, so false positives are impossible. |
You can close all the issues that I opened. We stopped using Pogreb earlier this year when all those issues appeared. The plan was to keep the Pogreb running in parallel and switch over once the issues have been solved, but since this hasn't been resolved I have decided to switch over to a different key-value database. |
@Kleissner just curious, what are you using now? |
Yes, the 4B record limit is a deal breaker for me also. I was hoping on using this instead of bolt, but now cannot. Any chance of changing this? It is a real limit for people with large # items to manage. |
@derkan we have tried:
We fell back to continue using Bitcask, but half abandoned our internal project altogether since no suitable key-value database was found. Each new run takes a few weeks to recompile the key-value database (since we have billions of records) and is therefore resource and time intensive. |
@Kleissner have you check etcd-io/bbolt? It was a forked of |
Look at PebbleDB. Ethereum Geth use it as blockchain storage. |
I just realized that
index.numKeys
is a 32-bit uint, and there'sMaxKeys = math.MaxUint32
😲I think it would make sense to change it to 64-bit (any reason why we wouldn't support max 64-bit number of records)? I assume it would break existing dbs (but is still necessary)?
At least it should be clearly stated as limitation in the readme I would suggest.
Our use case is to store billions of records. We've reached already 2 billion records with Pogreb - which means in a matter of weeks we'll hit the current upper limit 😢
The text was updated successfully, but these errors were encountered: