-
Notifications
You must be signed in to change notification settings - Fork 924
Allow retrieving Parquet decryption keys using the key metadata #7286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @adamreeve! I did a rough pass and the proposal looks good. I left some comments and will do another pass tomorrow.
Thank you @adamreeve and @rok for the reviews |
…ryptionProperties
df845a4
to
5f2fddb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adamreeve and @rok -- this looks good to me
As part of completing the encryption / decryption feature it will be great to add some more end-to-end examples (e.g. using a custom key source). Buy for now this looks good
footer_key: Option<Vec<u8>>, | ||
key_retriever: Option<Arc<dyn KeyRetriever>>, | ||
column_keys: HashMap<String, Vec<u8>>, | ||
aad_prefix: Option<Vec<u8>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As potentially follow on item, I think you could avoid the unreachable!
code below by changing this to an enum. Something like
enum DecryptionPropertiesBuilder {
ExplicitKeys {
footer_key: Vec<u8>,
column_keys: HashMap<String, Vec<u8>>,
aad_prefix: Option<Vec<u8>>,
},
Retriever {
key_retriever: Arc<dyn KeyRetriever>
}
}
The compiler would force you to then explicitly spell out which APIs could be used in what mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I did this for FileDecryptionProperties
but I'm not sure why I didn't do it for DecryptionPropertiesBuilder
too, I'll follow up and tidy this up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit late but I made a PR to follow up on this here: #7477
Which issue does this PR close?
Rationale for this change
This allows Parquet readers to not need to directly specify the data encryption keys required to read an encrypted Parquet file, and instead be able to retrieve them dynamically based on the key metadata.
This is a prerequisite for implementing the higher level Key Management Tools API (#7256)
What changes are included in this PR?
KeyRetriever
trait.FileDecryptionProperties
so that it can store aKeyRetriever
rather than explicit decryption keys.ColumnCryptoMetaData
type corresponding to the type of the same name in the Thrift format, so that this can be parsed from the column metadata and used to retrieve the key metadata to pass to theKeyRetriever
.FileDecryptionProperties
(soFileDecryptor::is_column_encrypted
has been removed).Are there any user-facing changes?
Yes, this adds a new way for users to create
FileDecryptionProperties
.