-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization benchmarks. #808
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Andres Santana <[email protected]>
Signed-off-by: Andres Santana <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I'll add a few specific comments, but first:
I want to get feedback on changes made to DiskDataCache to abstract out the serialization library
Do we need to abstract the serialization library? Any reason not to just replace it?
I wanted to test them at the same time and needed ability to configure it and thus this abstraction. We don't necessarily need it. However, we could be in the same situation in the future where we want to test with a new library and it would be nice to have this in place to just plug the new one in. |
This is cool, but |
bincode = "1.3.3" | ||
bincode2 = "2.0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we have to take a non-dev-dependency on "all the versions we want to benchmark" here. I'd rather put it behind a feature flag, or just not commit the versions we're not using.
One way forward is to switch to More importantly, I was hoping this micro-optimization would help with the overall cache performance as described in #719 but when I tested it following what is described there, I did not see an improvement, which looks like something else (to be determined) is the bottleneck. |
Description of change
bincode
tobincode2
when reading the cached blocks from disk.DiskDataCache
to abstract out the serialization library. For example, am I doing the right abstraction? Am I using the trait boundaries in the right way? So, mostly on the Rust-side of things.Benchmarks results
In summary, using
bincode2
gets a better performance thanbincode
and closer to the base line of reading the contents of a file to memory (read_file
in the graphs). From flamegraphs, we noticed thatbincode
was zeroing the vectors in itsfill_buffer
implementation.bincode2
has a different implementation which is more efficient.Reading a file (base line)
Reading cache through bincode
Reading cache through bincode2
Relevant issues: N/A
Does this change impact existing behavior?
No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).