Skip to content

Add example for how to read encrypted parquet files #7283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 14, 2025

Conversation

rok
Copy link
Member

@rok rok commented Mar 13, 2025

Which issue does this PR close?

Rationale for this change

This is to provide users a starting point and a reference for reading encrypted files.

What changes are included in this PR?

We add couple of examples to the parquet docs.

Are there any user-facing changes?

Only in docs.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 13, 2025
@rok rok force-pushed the decryption_example branch 2 times, most recently from 14643dd to 78bca86 Compare March 13, 2025 16:54
@rok rok force-pushed the decryption_example branch from 78bca86 to ddee1e2 Compare March 13, 2025 17:03
@rok
Copy link
Member Author

rok commented Mar 13, 2025

@alamb this now covers two reading examples. We can either add it now or wait for the writing path and add it as well.
cc @adamreeve

Comment on lines 96 to 102
//!
//! # Example of reading uniformly encrypted parquet file into arrow record batch
//!
#![cfg_attr(feature = "encryption", doc = "```rust")]
#![cfg_attr(not(feature = "encryption"), doc = "```ignore")]
//! # use arrow_array::{Int32Array, ArrayRef};
//! # use arrow_array::{types, RecordBatch};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb does this look like a good idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with Rust documentation and doc tests, does cfg_attr mean that the example will only be included in the docs if the encryption feature is enabled, or is that only controlling whether the example is tested? Do we want to always show the example but document that it requires the encryption feature (and that the feature is experimental)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly this does the equivalent of:

#![cfg(feature = "encryption")]
//! ```rust
#![cfg(not(feature = "encryption"))]
//! ```ignore

I noticed that docs generation fails if encryption is not enabled, however I tested now and on CI appears to have encryption enabled when testing docs so I'll remove these.

Copy link
Member Author

@rok rok Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, it seems that sometimes docs are tested without encryption so this would fail. I'm reintroducing the cfg_attr check for now.
However it's not clear to me if the docs that will ultimately be produced will include this example or not.

@rok rok marked this pull request as ready for review March 13, 2025 19:17
@rok rok requested review from adamreeve and alamb March 13, 2025 19:17
Comment on lines 96 to 102
//!
//! # Example of reading uniformly encrypted parquet file into arrow record batch
//!
#![cfg_attr(feature = "encryption", doc = "```rust")]
#![cfg_attr(not(feature = "encryption"), doc = "```ignore")]
//! # use arrow_array::{Int32Array, ArrayRef};
//! # use arrow_array::{types, RecordBatch};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with Rust documentation and doc tests, does cfg_attr mean that the example will only be included in the docs if the encryption feature is enabled, or is that only controlling whether the example is tested? Do we want to always show the example but document that it requires the encryption feature (and that the feature is experimental)?

@rok rok force-pushed the decryption_example branch from 3791ff3 to 0ef51f5 Compare March 14, 2025 00:10
@rok rok force-pushed the decryption_example branch from f54828b to 50c3ae0 Compare March 14, 2025 00:46
@rok
Copy link
Member Author

rok commented Mar 14, 2025

Thanks for the review @adamreeve ! I'veused your suggestions. I'm not sure what to do about the comments, perhaps @alamb has a better suggestion (#7283 (comment)).

@rok rok force-pushed the decryption_example branch from 50c3ae0 to 1a16c31 Compare March 14, 2025 01:16
@rok rok force-pushed the decryption_example branch from 1a16c31 to b64b0c1 Compare March 14, 2025 01:34
@rok rok requested a review from adamreeve March 14, 2025 01:41
@@ -93,6 +93,57 @@
//!
//! println!("Read {} records.", record_batch.num_rows());
//! ```
//!
//! # Example of reading non-uniformly encrypted parquet file into arrow record batch
//!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a note here that this requires the experimental encryption feature to be enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @rok and @adamreeve

I took the liberty of updating pushing a change to this PR as I wanted to get the doc changes in for the next release and so avoid the back and forth of another round of reviews.

//! ArrowReaderOptions::default().with_file_decryption_properties(decryption_properties);
//! let reader_metadata = ArrowReaderMetadata::load(&file, options.clone()).unwrap();
//! let file_metadata = reader_metadata.metadata().file_metadata();
//! println!("File has {} rows.", file_metadata.num_rows());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more common pattern in rust crates is to use assert_eq! rather than println! -- it then contains the actual expected output in the example and is verified by the docs test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of doing this in 97ecf40

@alamb alamb added the documentation Improvements or additions to documentation label Mar 14, 2025
@alamb
Copy link
Contributor

alamb commented Mar 14, 2025

MSRV check is unrelated, see:

@alamb alamb merged commit 514735b into apache:main Mar 14, 2025
16 of 17 checks passed
PinkCrow007 pushed a commit to PinkCrow007/arrow-rs that referenced this pull request Mar 20, 2025
* Add decryption example to parquet docs

* Always build docs as if we have encryption enabled

* Check if encryption is enabled when building encryption example docs

* Update parquet/src/arrow/mod.rs

Co-authored-by: Adam Reeve <[email protected]>

* Remove uniform encryption example, better comments

* Update parquet/src/arrow/mod.rs

Co-authored-by: Adam Reeve <[email protected]>

* Review feedback

* Change to assert_eq! and whitespace obsession

---------

Co-authored-by: Adam Reeve <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add example for how to read/write encrypted parquet files
3 participants