Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add roundtrip test case for null buffer test #1

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

Jefffrey
Copy link

As suggested in this comment: datafusion-contrib#13 (review)

Instead of relying on PyORC to generate a file which tests the fix, we can rely on our own writing behaviour to generate a test file in the expected format.

Copy link
Author

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this @progval ?

Comment on lines +500 to +534
// Encoding to bytes
let mut f = vec![];
let mut writer = ArrowWriterBuilder::new(&mut f, batch.schema())
.try_build()
.unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
let mut f = Bytes::from(f);
let builder = ArrowReaderBuilder::try_new(f.clone()).unwrap();

// Ensure the ORC file we wrote indeed has a present stream
let stripe = Stripe::new(
&mut f,
&builder.file_metadata,
builder.file_metadata().root_data_type(),
&builder.file_metadata().stripe_metadatas()[0],
)
.unwrap();
assert_eq!(stripe.columns().len(), 1);
// Make sure we're getting the right column
assert_eq!(stripe.columns()[0].name(), "int64");
// Then check present stream
let present_stream = stripe
.stream_map()
.get_opt(&stripe.columns()[0], proto::stream::Kind::Present);
assert!(present_stream.is_some());

// Decoding from bytes
let reader = builder.build();
let rows = reader.collect::<Result<Vec<_>, _>>().unwrap();

assert_eq!(rows.len(), 1);
assert_eq!(rows[0].num_columns(), 1);
// Ensure read array has no null buffer
assert!(rows[0].column(0).nulls().is_none());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit verbose but has important checks to ensure that if our ORC writer behaviour changes (such that it no longer writes an empty present stream) we'll be informed of this test breaking

@progval
Copy link
Owner

progval commented Dec 31, 2024

thanks!

@progval progval merged commit 7367b59 into progval:primitive-null-buffer Dec 31, 2024
@Jefffrey Jefffrey deleted the progval-null-buffer branch December 31, 2024 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants