-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move DB to bincode #302
move DB to bincode #302
Conversation
I think you are right that it is time to consider other serialization options than those based on serde. Before I commit to one, I want to outline the rough requirements that I have for the serialization:
I looked a bit through some of the options, guided by the benchmark list.
What are your thoughts, @bluskript? |
Regarding the choice for SQLite, that is definitely something that could change. I'm not very happy with its performance, but lack of time and experience with other databases have prevented me from trying something else. There would be similar requirements for a database alternative as for the serialization crate. |
Finally, regarding the Nix flake. I'm a little reluctant to add things that we are not necessarily able to maintain, and I think these flakes could be hosted outside of this repo as well. But more importantly, committing the
|
thanks for recommending bincode, i agree its probably a better fit than rkyv for this. I will adjust PR accordingly. I wanted to migrate the codebase to RocksDB earlier, but the fact that there was an SQliteWriter and SQLiteReader made me concerned about making too many breaking changes to the library :( To address the same concern as serialization, RocksDB is very stable and maintained by Facebook so it isn't going anywhere, its a simple KV binary-to-binary database which would mean every operation is as fine-grained and lightweight as it gets. About the Nix flake and Cargo.lock, the primary reason Cargo.lock is recommended to be put in gitignore is for CIs to test with the latest versions instead of being pinned to the versions the project specifies, which is something that can be fixed with 2 lines in CI. imo at least its not a very good rust standard practice, and it unfortunately conflicts with Nix's goal of perfect reproducibility. The main reason I wanted to add it to the project itself and not an outside repo is because it enables people with nix to be able to get a development environment with all of the necessary dependencies by simply running |
Great! In terms of technicalities, I'm thinking to use conditional attributes to derive the various traits depending on whether How does that sound?
Let's not do everything in one PR :) Thanks for adding some context on RocksDB. I'll read up on it early next week to learn a bit more about it and see if it could be useful.
Re |
Note that I just merged #303, which adds a CI test that triggers the deserialization bug. Once this branch is up-to-date with |
Just pushed a new change that moved from rkyv to bincode, as you described its completely conditional and only enables when the storage feature is enabled. I tested it manually and it seems to be working. |
a4cc797
to
8ee3a6a
Compare
8ee3a6a
to
2917000
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I only hope we can depend on a stable version and move to v2 once that's properly released.
I'm waiting on review for #304 before the performance test CI job works properly. If you want, you can already merge in the changes from that branch, and maybe just use it as the base branch for this PR?
I started reviewing the code and found out that my suggestion to depend on |
@bluskript I did some minor cleanup, which I hoped to push directly to this branch, but I couldn't. So I opened bluskript#1 so you can merge them. I think with those, this will be good to merge. |
Make storage feature independent of serde
I have verified this works with
load_graphs_for_file_or_directory
now, solving the issue I was facing.In terms of compatibility it's best to increment the database version whenever a change to the data structures being serialized is made, otherwise it will likely fail to deserialize.
Side note: there's likely a lot of room for optimization if this project were to use RocksDB instead of Sqlite. Currently, there doesn't seem to be much usage of SQL's relational features, so it might be worth considering to reduce the database overhead as much as possible.
Also added a Nix flake, this allows any developer to easily spin up a reproducible development environment by running
nix develop
in the folder, as well as exposingtree-sitter-stack-graphs
as a Nix package (note: for this to work Cargo.lock needs to be committed to ensure dependencies are downloaded as the exact versions they are defined as).