|
33 | 33 | //! * [`ColumnChunkMetaData`]: Metadata for each column chunk (primitive leaf)
|
34 | 34 | //! within a Row Group including encoding and compression information,
|
35 | 35 | //! number of values, statistics, etc.
|
| 36 | +//! |
| 37 | +//! # APIs for working with Parquet Metadata |
| 38 | +//! |
| 39 | +//! The Parquet readers and writers in this crate read and write |
| 40 | +//! metadata into parquet files. To work with metadata directly, |
| 41 | +//! the following APIs are available. |
| 42 | +//! |
| 43 | +//! Reading: |
| 44 | +//! * Read from bytes to `ParquetMetaData`: [`decode_footer`] |
| 45 | +//! and [`decode_metadata`] |
| 46 | +//! * Read from an `async` source to `ParquetMetadata`: [`MetadataLoader`] |
| 47 | +//! |
| 48 | +//! [`MetadataLoader`]: https://docs.rs/parquet/latest/parquet/arrow/async_reader/struct.MetadataLoader.html |
| 49 | +//! [`decode_footer`]: crate::file::footer::decode_footer |
| 50 | +//! [`decode_metadata`]: crate::file::footer::decode_metadata |
| 51 | +//! |
| 52 | +//! Writing: |
| 53 | +//! * Write `ParquetMetaData` to bytes in memory: Not yet supported (see [#6002]) |
| 54 | +//! * Writes `ParquetMetaData` to an async target: Not yet supported |
| 55 | +//! |
| 56 | +//! [#6002]: https://github.com/apache/arrow-rs/issues/6002 |
| 57 | +//! |
| 58 | +//! # Metadata Encodings and Structures |
| 59 | +//! |
| 60 | +//! There are three different encodings of Parquet Metadata in this crate: |
| 61 | +//! |
| 62 | +//! 1. `bytes`:encoded with the Thrift TCompactProtocol as defined in |
| 63 | +//! [parquet.thrift] |
| 64 | +//! |
| 65 | +//! 2. [`format`]: Rust structures automatically generated by the thrift compiler |
| 66 | +//! from [parquet.thrift]. These structures are low level and mirror |
| 67 | +//! the thrift definitions. |
| 68 | +//! |
| 69 | +//! 3. [`file::metadata`] (this module): Easier to use Rust structures |
| 70 | +//! with a more idiomatic API. Note that, confusingly, some but not all |
| 71 | +//! of these structures have the same name as the [`format`] structures. |
| 72 | +//! |
| 73 | +//! [`format`]: crate::format |
| 74 | +//! [`file::metadata`]: crate::file::metadata |
| 75 | +//! [parquet.thrift]: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift |
| 76 | +//! |
| 77 | +//! Graphically, this is how the different structures relate to each other: |
| 78 | +//! |
| 79 | +//! ```text |
| 80 | +//! ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ |
| 81 | +//! ┌──────────────┐ │ ┌───────────────────────┐ │ |
| 82 | +//! │ │ ColumnIndex │ ││ ParquetMetaData │ |
| 83 | +//! └──────────────┘ │ └───────────────────────┘ │ |
| 84 | +//! ┌──────────────┐ │ ┌────────────────┐ │┌───────────────────────┐ |
| 85 | +//! │ ..0x24.. │ ◀────▶ │ OffsetIndex │ │ ◀────▶ │ ParquetMetaData │ │ |
| 86 | +//! └──────────────┘ │ └────────────────┘ │└───────────────────────┘ |
| 87 | +//! ... │ ... │ |
| 88 | +//! │ ┌──────────────────┐ │ ┌──────────────────┐ |
| 89 | +//! bytes │ FileMetaData* │ │ │ FileMetaData* │ │ |
| 90 | +//! (thrift encoded) │ └──────────────────┘ │ └──────────────────┘ |
| 91 | +//! ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ |
| 92 | +//! |
| 93 | +//! format::meta structures file::metadata structures |
| 94 | +//! |
| 95 | +//! * Same name, different struct |
| 96 | +//! ``` |
36 | 97 | mod memory;
|
37 | 98 |
|
38 | 99 | use std::ops::Range;
|
|
0 commit comments