-
Notifications
You must be signed in to change notification settings - Fork 924
Add Parquet Modular encryption support (write) #7111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8e6a4be
to
d1ccf75
Compare
f6d1155
to
05d4d60
Compare
dad9642
to
5d52ef7
Compare
36d7c70
to
bc75007
Compare
24e70d9
to
cc58a3a
Compare
65fca4b
to
44559b0
Compare
c5a692c
to
87224ea
Compare
cab7dd4
to
de5feef
Compare
Update:
|
Thanks, this example helped. I took another pass at refactoring
The one thing that comes to mind is whether we want to change |
@alamb wrote:
One minor API change that I would like to have considered is this related PR that I feel makes decryption usage better. At the suggestion of @adamreeve I have posted this on its own PR. |
We would have to try it I suspect to test |
How is this PR doing, btw? Is it ready for another review? |
Yes I think this is ready for review. There are a couple of comments I'd like some input on (https://github.com/apache/arrow-rs/pull/7111/files#r2015196618 and https://github.com/apache/arrow-rs/pull/7111/files#r2009421835) but otherwise it's looking good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rok @adamreeve and @corwinjoy 👏
I think this is looking really nice. I had some style comments but I don't think anything is required. I am not an expert in this area of the parquet spec, but I found the code clear, clean, and well commented and tested 🥇 🏆
I do also think it would be great to add some examples (as a follow on PR) that shows how to write an arrow record batch with encryption to a parquet file
The only thing I want to do before approving this is to run some benchmark tests (to make sure it doesn't accidentally slow down writing without encryption). I have started this and will report back
I ran the performance benchmarks and I do not see any changes for parquet writing with this PR (though the numbers do vary somewhat). Nice work. Details below Details
Second run
|
Your latest comments should all be addressed now thanks @alamb. I've also just added one extra test to cover being able to use unencrypted row group metadata when writing a file. I'll follow up with a PR soon to add some examples. |
Thanks @adamreeve @corwinjoy and @rok ! I'll plan to merge this tomorrow This PR appeared to have a conflict so I took the liberty of merging up from main |
Merged to resolve another conflict 🙄 |
Thanks again @rok @etseidl @adamreeve and @corwinjoy 🦾 🚀 |
I also made a small PR to update the parquet status page (when this is released): |
Follow on PR (for full support) is:
I will mark that ready for review when we have released the next version of arrow-rs |
Which issue does this PR close?
Closes #7327
This PR is based on branch and an internal patch and /pull/6637 and aims to provide basic modular encryption support.
Rationale for this change
See #3511 and #7278. This allows users to write Parquet files with modular encryption.
What changes are included in this PR?
encryption::encrypt
module, containing types for configuring encryption, and types that implement encryption of metadata and page data.with_file_encryption_properties
method toWriterPropertiesBuilder
to enable encryption.encryption
test library, moves existing tests for reading encrypted Parquet files to this and adds new tests for writing encrypted files.Are there any user-facing changes?
Yes, this adds the ability to specify encryption properties when writing Parquet files.
Several new types and methods are added when the
encryption
feature is enabled but there should be no breaking changes.