-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add roundtrip test case for null buffer test #1
Add roundtrip test case for null buffer test #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this @progval ?
// Encoding to bytes | ||
let mut f = vec![]; | ||
let mut writer = ArrowWriterBuilder::new(&mut f, batch.schema()) | ||
.try_build() | ||
.unwrap(); | ||
writer.write(&batch).unwrap(); | ||
writer.close().unwrap(); | ||
let mut f = Bytes::from(f); | ||
let builder = ArrowReaderBuilder::try_new(f.clone()).unwrap(); | ||
|
||
// Ensure the ORC file we wrote indeed has a present stream | ||
let stripe = Stripe::new( | ||
&mut f, | ||
&builder.file_metadata, | ||
builder.file_metadata().root_data_type(), | ||
&builder.file_metadata().stripe_metadatas()[0], | ||
) | ||
.unwrap(); | ||
assert_eq!(stripe.columns().len(), 1); | ||
// Make sure we're getting the right column | ||
assert_eq!(stripe.columns()[0].name(), "int64"); | ||
// Then check present stream | ||
let present_stream = stripe | ||
.stream_map() | ||
.get_opt(&stripe.columns()[0], proto::stream::Kind::Present); | ||
assert!(present_stream.is_some()); | ||
|
||
// Decoding from bytes | ||
let reader = builder.build(); | ||
let rows = reader.collect::<Result<Vec<_>, _>>().unwrap(); | ||
|
||
assert_eq!(rows.len(), 1); | ||
assert_eq!(rows[0].num_columns(), 1); | ||
// Ensure read array has no null buffer | ||
assert!(rows[0].column(0).nulls().is_none()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit verbose but has important checks to ensure that if our ORC writer behaviour changes (such that it no longer writes an empty present stream) we'll be informed of this test breaking
thanks! |
As suggested in this comment: datafusion-contrib#13 (review)
Instead of relying on PyORC to generate a file which tests the fix, we can rely on our own writing behaviour to generate a test file in the expected format.