-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: small optimizations to reduce parquet metadata decoding time by 40% #5856
Conversation
@@ -37,7 +37,7 @@ pub enum Page { | |||
encoding: Encoding, | |||
def_level_encoding: Encoding, | |||
rep_level_encoding: Encoding, | |||
statistics: Option<Statistics>, | |||
statistics: Option<Box<Statistics>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea is that doing this reduces allocation size when no statistics are present? I would actually think in the case where there are page statistics this might actually slow things down due to the indirection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for none stats, this is pure improvement; but for page stats, this is a trade-off: extra allocation vs smaller stack movement, in the blog post's Figure 7 (+layout opt), we can see that smaller stack movement wins and the improvement is indeed smaller for page/chunk stats.
return Ok(in_progress); | ||
} | ||
} | ||
let (val, shift) = unsafe { varint_simd::decode_unsafe(self.buf.as_ptr()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safety docs for this function say
There must be at least 16 bytes of allocated memory after the beginning of the pointer.
I think we need to check the remaining len and use the scalar code as a fallback for smaller lengths. Or somehow guarantee that the slice is always padded with 16 additional bytes at the end.
Closing this PR as I think it has served its purpose, to demonstrate such wins are possible, and is not intended to be merged |
Indeed -- if anyone else finds this PR you can read about the larger context here https://www.influxdata.com/blog/how-good-parquet-wide-tables/ |
Rationale for this change
This draft PR is not meant to be merged. Instead, it serves as a record for people who want to improve parquet metadata decoding performance.
It demonstrates three hacks that can easily improve decoding time by 40%:
(1) Use a better allocator (e.g., mimalloc)
(2) Move statistics to heap, i.e., reduce stack memory movement.
(3) Use SIMD to accelerate decoding varint, as provided by varint-simd crate.