-
Notifications
You must be signed in to change notification settings - Fork 924
Use a "SecureString" like type to store Parquet encryption keys #7373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems reasonable to me, though the memory safety concern is likely less a problem with rust You can probably make something like this: struct SecureKey {
inner: Vec<u8>
}
impl Drop for SecureKey {
...
// reset memory here
}
/// allow Secure key to be coerced to &[u8]
impl AsRef<[u8]> for SecureKey {
fn as_ref(&self) -> &[u8] {
&inner
}
} |
This might be unwanted complication, but having done a quick scan through the parquet encryption code I was a little surprised that the interfaces are in terms of explicit keys at all, I would have perhaps expected an interface where the manner in which ciphertext gets generated/decrypted was opaque to the internal parquet machinery beyond propagating key metadata. This would allow using mechanisms like HSMs or external key stores to handle the key material, without that material ever being visible. TBC if we do handle key material we should probably do something like SecureString to reduce the risk of compromise, but I also think we ideally should be designing to allow for more secure mechanisms that don't expose the private key material to the process at all. TBC my knowledge of parquet's modular encryption is limited and it may be some aspect of its design prevent doing this |
I don't think this is exactly what you have in mind, but I expect most users would want to use the key management tools API that is in progress and integrates with a KMS to encrypt and decrypt data encryption keys, rather than directly using the existing low level APIs. I have a PR open for that at #7387 When that is used, the master encryption keys aren't exposed to the process but are managed by a KMS which will usually perform the encryption and decryption remotely. It encrypts and decrypts data encryption keys that are randomly generated per file, rather than decrypting the Parquet data directly. So the only keys that are kept in memory are keys that are limited in scope to a single Parquet file, limiting the damage that can be done if a key is exposed. cc @ggershinsky you might have some thoughts on this topic. |
Right, my suggestion is that we keep the interfaces within the parquet crate as unopinionated as possible, with a view to enabling people to handle keys as appropriate for their use-case. |
This is an accurate description of the security model, which follows the standard "envelope encryption" practice. The master keys never leave the KMS (typically HSM-based). |
#6637 and #7111 added support for reading and writing Parquet files with encryption. These add
FileDecryptionProperties
andFileEncryptionProperties
types that hold encryption keys as aVec<u8>
. Precaution should be taken to prevent accidentally exposing these keys and allowing unauthorised access to encrypted data.In the C++ Parquet implementation for example, these keys are "wiped" after a file is read or written, which is intended to prevent any memory access bugs from being able to expose these keys. But it's known that this wiping isn't very secure as only the first byte of the key is usually overwritten. See apache/arrow#31603 and some of the discussion in apache/arrow#44990.
Ideally these keys should be stored in a type that automatically clears the whole key from memory when it is dropped, eg. something like https://crates.io/crates/secure-string, or a custom abstraction built on top of https://crates.io/crates/zeroize.
We might also want to have a
Debug
implementation that doesn't show the key contents to avoid accidental logging of keys.The text was updated successfully, but these errors were encountered: