Skip to content

Add Parquet Modular encryption support (write) #7111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 101 commits into from
Apr 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
11e083e
Start encryption
ggershinsky Mar 21, 2024
0b31469
Work
rok Feb 14, 2025
d2a73ef
Work
rok Feb 21, 2025
3b87a04
Pass Encryptor to SerializedRowGroupWriter
rok Feb 25, 2025
5c40e40
Expand test, pass FileEncryptionProperties instead of FileEncryptor t…
rok Feb 28, 2025
6655d21
Add encrypt_object helper
rok Mar 3, 2025
804e903
Implement serialization of column crypto metadata
adamreeve Mar 3, 2025
d328396
Fix writing Parquet magic bytes
adamreeve Mar 3, 2025
a5ffb49
Add key metadata to file encryption properties
adamreeve Mar 3, 2025
ae42dd4
Generate unique file aad and add prefix if set
adamreeve Mar 3, 2025
9d5db21
Add aad param to encrypt_object and don't require TrackedWrite
adamreeve Mar 3, 2025
c92559d
Write file crypto metadata
adamreeve Mar 3, 2025
bead72a
Set column crypto metadata
adamreeve Mar 3, 2025
27e095b
Store file_aad and aad_file_unique in FileEncryptor
adamreeve Mar 4, 2025
c3a3ac7
Work towards using correct AADs
adamreeve Mar 4, 2025
5c04f1c
Fix writing ciphertext length
adamreeve Mar 4, 2025
84b4d50
Add check of ciphertext length
adamreeve Mar 4, 2025
a609111
Ugly workaround for setting compressed page size
adamreeve Mar 4, 2025
8c64215
Fix test logic
adamreeve Mar 4, 2025
3a79993
Add page_ordinal
rok Mar 4, 2025
07dc037
Add some feature flags
rok Mar 4, 2025
33ffb97
Add page_ordinal, row_group_ordinal and column_ordinal to SerializedP…
rok Mar 4, 2025
73ee0fa
minor changes
rok Mar 4, 2025
3a867fb
clippy fixes
rok Mar 4, 2025
24e73ed
Encapsulate page encryption context in a PageEncryptor struct
adamreeve Mar 4, 2025
dd59b46
SerializedRowGroupWriter.column_index starts at 1 not 0
rok Mar 5, 2025
bbd3587
Fix handling dictionary pages and update test
adamreeve Mar 5, 2025
3f43fee
Fix clippy issues
rok Mar 5, 2025
877805a
clippy
rok Mar 5, 2025
e390d7b
Use PageEncryptor in ArrowPageWriter
adamreeve Mar 5, 2025
8facaca
Tidy up feature handling and reduce duplication
adamreeve Mar 5, 2025
dc0ce9e
Test fixes
adamreeve Mar 5, 2025
a14fdf4
Fix setting Arrow page writer for byte typed columns
adamreeve Mar 5, 2025
08958af
WIP Add per-column encryption keys
adamreeve Mar 5, 2025
69f557a
Add test_non_uniform_encryption
rok Mar 5, 2025
8eb841a
lint
rok Mar 5, 2025
96d09a7
lint
rok Mar 5, 2025
69969dc
Add SchemaRef ArrowColumnWriterFactory to get column_path via column_…
rok Mar 5, 2025
65fc410
Get column path from descriptor rather than Arrow schema
adamreeve Mar 5, 2025
3f8907a
Fix writing multiple encrypted pages with ArrowPageWriter
adamreeve Mar 5, 2025
7a4d6c0
Return encryptors as a Result<Box<dyn BlockEncryptor>>
adamreeve Mar 5, 2025
36f3163
Get per-column encryption working and various tidy ups
adamreeve Mar 5, 2025
b3f24e3
Handle non-encrypted columns
adamreeve Mar 6, 2025
cbf7d74
Tidy up some duplication
adamreeve Mar 6, 2025
d862c63
Add encryption_util module for tests
adamreeve Mar 6, 2025
a384663
Add uniform encryption test
rok Mar 6, 2025
0f8d6f8
lint
rok Mar 6, 2025
3e7efe4
post rebase
rok Mar 10, 2025
854b257
Check if columns to encrypt are in schema
rok Mar 12, 2025
ea2db66
Apply suggestions from code review
rok Mar 13, 2025
2b2a3ef
Move tests to tests/. Post rebase fixes.
rok Mar 13, 2025
22135ba
Review feedback
rok Mar 13, 2025
32673c1
Review feedback
rok Mar 15, 2025
f479df1
Minor changes
rok Mar 16, 2025
8ac232b
Raise if writing plaintext footer
rok Mar 16, 2025
75b98cd
Docs for crypto methods
rok Mar 16, 2025
0460424
More practical key API
rok Mar 17, 2025
4032b4f
Refactor PageEncryptor use
adamreeve Mar 18, 2025
e2790dd
Simplify with_new_compressed_buffer method
adamreeve Mar 19, 2025
6991943
Apply suggestions from code review
rok Mar 19, 2025
b01b285
Review feedback
rok Mar 20, 2025
ffd2fc9
Lint and remove redundant test.
rok Mar 20, 2025
a522ab1
Docs
rok Mar 20, 2025
70e7190
Add async writer test for encrypted data
rok Mar 20, 2025
a4fbaf2
Test struct array encryption, column name with '.'
rok Mar 20, 2025
c46a259
Review feedback
rok Mar 20, 2025
3d2054c
First round of changes, add accessors and return result for encryptio…
corwinjoy Mar 20, 2025
4ddbc4c
Add
corwinjoy Mar 20, 2025
159b3df
Update parquet/src/encryption/encrypt.rs
rok Mar 21, 2025
ce8d2a9
Move encryption tests
rok Mar 21, 2025
4f15a96
Backout change to arrow/async_reader/mod.rs. TODO, put this in a sepa…
corwinjoy Mar 21, 2025
cf57871
Update notes on changes to writer.rs
corwinjoy Mar 21, 2025
8baa777
Merge branch 'encryption-basics-fork' into encryption-basics-fork-pr-cj
corwinjoy Mar 21, 2025
3e02f74
Merge pull request #3 from rok/encryption-basics-fork-pr-cj
corwinjoy Mar 21, 2025
614d5e8
Fix struct array encryption
rok Mar 21, 2025
b1db051
Minor fixes
rok Mar 21, 2025
ae4c089
Lint
rok Mar 21, 2025
6240644
Fix test
rok Mar 21, 2025
df640cd
Add '.' to struct array name
rok Mar 21, 2025
cdc3246
Fix required features for encryption tests
adamreeve Mar 23, 2025
05f5982
Fix reading encrypted struct columns and writer test
adamreeve Mar 23, 2025
9811c76
Tidy ups
adamreeve Mar 24, 2025
a15816d
Remove unnecessary clone of all row group metadata in unencrypted case
adamreeve Mar 24, 2025
0ccbc03
Remove overly broad error remapping
adamreeve Mar 24, 2025
2a3e905
Tidy up duplicated test function
adamreeve Mar 24, 2025
5a6fac8
Suppress unused mut error
adamreeve Mar 24, 2025
9e79d75
Re-use block encryptors in PageEncryptor
adamreeve Mar 24, 2025
b55a1b3
Merge remote-tracking branch 'upstream/main' into encryption-basics-fork
adamreeve Mar 24, 2025
5600e61
Slightly update error message for missing column key.
corwinjoy Mar 25, 2025
c7181b8
Refactor PageEncryptor to reduce use of cfg(feature)
adamreeve Mar 25, 2025
6c4d5b3
Refactor PageEncryptor construction in SerializedPageWriter
adamreeve Mar 25, 2025
78ac7b2
Reduce use of inline #[cfg(feature = "encryption")]
adamreeve Mar 26, 2025
fd6c30b
Refactor ThriftMetadataWriter to reduce use of feature checks within …
adamreeve Mar 26, 2025
26279eb
Tidy ups
adamreeve Mar 27, 2025
4e9e157
Refactor ArrowRowGroupWriter creation
adamreeve Mar 27, 2025
c11dc01
Make pub(crate) more explicit on some structs
adamreeve Mar 30, 2025
b9b860b
Check for length mismatch in with_column_keys
adamreeve Mar 30, 2025
754e8bd
Comment and error message tidy ups
adamreeve Mar 30, 2025
bfe49c8
Add test to verify column statistics are usable after write
adamreeve Mar 30, 2025
70d3a9b
Merge remote-tracking branch 'apache/main' into encryption-basics-fork
alamb Mar 31, 2025
04a8dad
Merge remote-tracking branch 'apache/main' into encryption-basics-fork
alamb Apr 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions parquet/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,11 @@ name = "arrow_reader"
required-features = ["arrow"]
path = "./tests/arrow_reader/mod.rs"

[[test]]
name = "encryption"
required-features = ["arrow"]
path = "./tests/encryption/mod.rs"

[[bin]]
name = "parquet-read"
required-features = ["cli"]
Expand Down
2 changes: 1 addition & 1 deletion parquet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ The `parquet` crate provides the following features which may be enabled in your
- [ ] Row record writer
- [x] Arrow record writer
- [x] Async support
- [ ] Encrypted files
- [x] Encrypted files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

- [x] Predicate pushdown
- [x] Parquet format 4.0.0 support

Expand Down
Loading
Loading