-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to decouple from relying directly on proto #44
Conversation
compression: Option<Compression>, | ||
type_description: Arc<TypeDescription>, | ||
number_of_rows: u64, | ||
/// Statistics of columns across entire file | ||
column_statistics: Vec<ColumnStatistics>, | ||
stripes: Vec<StripeMetadata>, | ||
user_custom_metadata: HashMap<String, Vec<u8>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't contain all fields yet, but can fill in more as needed (such as rowIndexStride)
// TODO: for now keeping this, but ideally won't want all stripe footers here | ||
// since don't want to require parsing all stripe footers in file unless actually required | ||
stripe_footers: Vec<StripeFooter>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping to move this out of this struct, as don't want to require parsing all stripe footers in file upfront (at least not in this metadata section)
// TODO: false count? | ||
Some(TypeStatistics::Bucket { | ||
true_count: stats.count[0], // TODO: safety check this | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORC spec is kinda vague on details here, been referring to https://github.com/apache/orc/blob/e1c7e4d9150820fa64b151ad942bffdf81ef34d0/c%2B%2B/src/Statistics.cc#L197
Might need further investigation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Refactor to decouple from relying directly on proto * Fix
Part of datafusion-contrib/orc-rust#18
Start introducing some of our own structs/types to represent their protobuf counter part
Allows us to have more control, such as ensuring fields are actually present (since proto has them as Option usually)