-
Notifications
You must be signed in to change notification settings - Fork 924
Add example for how to read encrypted parquet files #7283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
14643dd
to
78bca86
Compare
@alamb this now covers two reading examples. We can either add it now or wait for the writing path and add it as well. |
parquet/src/arrow/mod.rs
Outdated
//! | ||
//! # Example of reading uniformly encrypted parquet file into arrow record batch | ||
//! | ||
#![cfg_attr(feature = "encryption", doc = "```rust")] | ||
#![cfg_attr(not(feature = "encryption"), doc = "```ignore")] | ||
//! # use arrow_array::{Int32Array, ArrayRef}; | ||
//! # use arrow_array::{types, RecordBatch}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb does this look like a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that familiar with Rust documentation and doc tests, does cfg_attr mean that the example will only be included in the docs if the encryption feature is enabled, or is that only controlling whether the example is tested? Do we want to always show the example but document that it requires the encryption feature (and that the feature is experimental)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly this does the equivalent of:
#![cfg(feature = "encryption")]
//! ```rust
#![cfg(not(feature = "encryption"))]
//! ```ignore
I noticed that docs generation fails if encryption is not enabled, however I tested now and on CI appears to have encryption enabled when testing docs so I'll remove these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, it seems that sometimes docs are tested without encryption so this would fail. I'm reintroducing the cfg_attr
check for now.
However it's not clear to me if the docs that will ultimately be produced will include this example or not.
parquet/src/arrow/mod.rs
Outdated
//! | ||
//! # Example of reading uniformly encrypted parquet file into arrow record batch | ||
//! | ||
#![cfg_attr(feature = "encryption", doc = "```rust")] | ||
#![cfg_attr(not(feature = "encryption"), doc = "```ignore")] | ||
//! # use arrow_array::{Int32Array, ArrayRef}; | ||
//! # use arrow_array::{types, RecordBatch}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that familiar with Rust documentation and doc tests, does cfg_attr mean that the example will only be included in the docs if the encryption feature is enabled, or is that only controlling whether the example is tested? Do we want to always show the example but document that it requires the encryption feature (and that the feature is experimental)?
Co-authored-by: Adam Reeve <[email protected]>
Thanks for the review @adamreeve ! I'veused your suggestions. I'm not sure what to do about the comments, perhaps @alamb has a better suggestion (#7283 (comment)). |
@@ -93,6 +93,57 @@ | |||
//! | |||
//! println!("Read {} records.", record_batch.num_rows()); | |||
//! ``` | |||
//! | |||
//! # Example of reading non-uniformly encrypted parquet file into arrow record batch | |||
//! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a note here that this requires the experimental encryption
feature to be enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @rok and @adamreeve
I took the liberty of updating pushing a change to this PR as I wanted to get the doc changes in for the next release and so avoid the back and forth of another round of reviews.
parquet/src/arrow/mod.rs
Outdated
//! ArrowReaderOptions::default().with_file_decryption_properties(decryption_properties); | ||
//! let reader_metadata = ArrowReaderMetadata::load(&file, options.clone()).unwrap(); | ||
//! let file_metadata = reader_metadata.metadata().file_metadata(); | ||
//! println!("File has {} rows.", file_metadata.num_rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more common pattern in rust crates is to use assert_eq!
rather than println! -- it then contains the actual expected output in the example and is verified by the docs test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of doing this in 97ecf40
MSRV check is unrelated, see: |
* Add decryption example to parquet docs * Always build docs as if we have encryption enabled * Check if encryption is enabled when building encryption example docs * Update parquet/src/arrow/mod.rs Co-authored-by: Adam Reeve <[email protected]> * Remove uniform encryption example, better comments * Update parquet/src/arrow/mod.rs Co-authored-by: Adam Reeve <[email protected]> * Review feedback * Change to assert_eq! and whitespace obsession --------- Co-authored-by: Adam Reeve <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Rationale for this change
This is to provide users a starting point and a reference for reading encrypted files.
What changes are included in this PR?
We add couple of examples to the parquet docs.
Are there any user-facing changes?
Only in docs.