Skip to content

Allow retrieving Parquet decryption keys using the key metadata #7286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 24, 2025

Conversation

adamreeve
Copy link
Contributor

@adamreeve adamreeve commented Mar 14, 2025

Which issue does this PR close?

Rationale for this change

This allows Parquet readers to not need to directly specify the data encryption keys required to read an encrypted Parquet file, and instead be able to retrieve them dynamically based on the key metadata.

This is a prerequisite for implementing the higher level Key Management Tools API (#7256)

What changes are included in this PR?

  • Adds a new KeyRetriever trait.
  • Updates FileDecryptionProperties so that it can store a KeyRetriever rather than explicit decryption keys.
  • Adds a new ColumnCryptoMetaData type corresponding to the type of the same name in the Thrift format, so that this can be parsed from the column metadata and used to retrieve the key metadata to pass to the KeyRetriever.
  • Changes how we decide if a column is encrypted. We now read this from the file metadata rather than using the FileDecryptionProperties (so FileDecryptor::is_column_encrypted has been removed).

Are there any user-facing changes?

Yes, this adds a new way for users to create FileDecryptionProperties.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 14, 2025
@adamreeve adamreeve requested a review from rok March 14, 2025 02:40
Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @adamreeve! I did a rough pass and the proposal looks good. I left some comments and will do another pass tomorrow.

@alamb
Copy link
Contributor

alamb commented Mar 17, 2025

Thank you @adamreeve and @rok for the reviews

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adamreeve and @rok -- this looks good to me

As part of completing the encryption / decryption feature it will be great to add some more end-to-end examples (e.g. using a custom key source). Buy for now this looks good

Comment on lines +257 to 260
footer_key: Option<Vec<u8>>,
key_retriever: Option<Arc<dyn KeyRetriever>>,
column_keys: HashMap<String, Vec<u8>>,
aad_prefix: Option<Vec<u8>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As potentially follow on item, I think you could avoid the unreachable! code below by changing this to an enum. Something like

enum DecryptionPropertiesBuilder {
  ExplicitKeys {
    footer_key: Vec<u8>,
    column_keys: HashMap<String, Vec<u8>>,
    aad_prefix: Option<Vec<u8>>,
  },
  Retriever {
    key_retriever: Arc<dyn KeyRetriever>
  }
}

The compiler would force you to then explicitly spell out which APIs could be used in what mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I did this for FileDecryptionProperties but I'm not sure why I didn't do it for DecryptionPropertiesBuilder too, I'll follow up and tidy this up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit late but I made a PR to follow up on this here: #7477

@alamb alamb merged commit 329f6c9 into apache:main Mar 24, 2025
17 checks passed
@adamreeve adamreeve deleted the key-retriever branch March 24, 2025 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow retrieving Parquet decryption keys based on the key metadata
3 participants