Skip to content

Commit db239e5

Browse files
authored
Add (more) Parquet Metadata Documentation (#6184)
* Minor: Add (more) Parquet Metadata Documenation * fix clippy
1 parent d5ed6b9 commit db239e5

File tree

1 file changed

+61
-0
lines changed
  • parquet/src/file/metadata

1 file changed

+61
-0
lines changed

parquet/src/file/metadata/mod.rs

+61
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,67 @@
3333
//! * [`ColumnChunkMetaData`]: Metadata for each column chunk (primitive leaf)
3434
//! within a Row Group including encoding and compression information,
3535
//! number of values, statistics, etc.
36+
//!
37+
//! # APIs for working with Parquet Metadata
38+
//!
39+
//! The Parquet readers and writers in this crate read and write
40+
//! metadata into parquet files. To work with metadata directly,
41+
//! the following APIs are available.
42+
//!
43+
//! Reading:
44+
//! * Read from bytes to `ParquetMetaData`: [`decode_footer`]
45+
//! and [`decode_metadata`]
46+
//! * Read from an `async` source to `ParquetMetadata`: [`MetadataLoader`]
47+
//!
48+
//! [`MetadataLoader`]: https://docs.rs/parquet/latest/parquet/arrow/async_reader/struct.MetadataLoader.html
49+
//! [`decode_footer`]: crate::file::footer::decode_footer
50+
//! [`decode_metadata`]: crate::file::footer::decode_metadata
51+
//!
52+
//! Writing:
53+
//! * Write `ParquetMetaData` to bytes in memory: Not yet supported (see [#6002])
54+
//! * Writes `ParquetMetaData` to an async target: Not yet supported
55+
//!
56+
//! [#6002]: https://github.com/apache/arrow-rs/issues/6002
57+
//!
58+
//! # Metadata Encodings and Structures
59+
//!
60+
//! There are three different encodings of Parquet Metadata in this crate:
61+
//!
62+
//! 1. `bytes`:encoded with the Thrift TCompactProtocol as defined in
63+
//! [parquet.thrift]
64+
//!
65+
//! 2. [`format`]: Rust structures automatically generated by the thrift compiler
66+
//! from [parquet.thrift]. These structures are low level and mirror
67+
//! the thrift definitions.
68+
//!
69+
//! 3. [`file::metadata`] (this module): Easier to use Rust structures
70+
//! with a more idiomatic API. Note that, confusingly, some but not all
71+
//! of these structures have the same name as the [`format`] structures.
72+
//!
73+
//! [`format`]: crate::format
74+
//! [`file::metadata`]: crate::file::metadata
75+
//! [parquet.thrift]: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
76+
//!
77+
//! Graphically, this is how the different structures relate to each other:
78+
//!
79+
//! ```text
80+
//! ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
81+
//! ┌──────────────┐ │ ┌───────────────────────┐ │
82+
//! │ │ ColumnIndex │ ││ ParquetMetaData │
83+
//! └──────────────┘ │ └───────────────────────┘ │
84+
//! ┌──────────────┐ │ ┌────────────────┐ │┌───────────────────────┐
85+
//! │ ..0x24.. │ ◀────▶ │ OffsetIndex │ │ ◀────▶ │ ParquetMetaData │ │
86+
//! └──────────────┘ │ └────────────────┘ │└───────────────────────┘
87+
//! ... │ ... │
88+
//! │ ┌──────────────────┐ │ ┌──────────────────┐
89+
//! bytes │ FileMetaData* │ │ │ FileMetaData* │ │
90+
//! (thrift encoded) │ └──────────────────┘ │ └──────────────────┘
91+
//! ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
92+
//!
93+
//! format::meta structures file::metadata structures
94+
//!
95+
//! * Same name, different struct
96+
//! ```
3697
mod memory;
3798

3899
use std::ops::Range;

0 commit comments

Comments
 (0)