-
Notifications
You must be signed in to change notification settings - Fork 0
Specifying Row and Page Size
Selfeer edited this page Nov 12, 2024
·
1 revision
-
rowGroupSize
: Defines the maximum size (in bytes) of each row group when writing data to a Parquet file. -
pageSize
: Defines the size (in bytes) of each page within a column chunk.
Row group: A logical horizontal partitioning of the data into rows. There is no physical structure that is guaranteed for a row group. A row group consists of a column chunk for each column in the dataset.
Page: Column chunks are divided up into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). There can be multiple page types which are interleaved in a column chunk.
"options": {
"rowGroupSize": 256,
"pageSize": 1024
}
Developed and maintained by the Altinity team.
- Home
- Parquet File Name
- Options of the File
- File Compression
- Writer Version
- Row and Page Size
- Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Float16 Column
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types
- Encodings
- File Encryption
- Extra Metadata Entries