Skip to content

Specifying Row and Page Size

Selfeer edited this page Nov 12, 2024 · 1 revision

Row Group Size and Page size

  • rowGroupSize: Defines the maximum size (in bytes) of each row group when writing data to a Parquet file.
  • pageSize: Defines the size (in bytes) of each page within a column chunk.

Row group: A logical horizontal partitioning of the data into rows. There is no physical structure that is guaranteed for a row group. A row group consists of a column chunk for each column in the dataset.

Page: Column chunks are divided up into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). There can be multiple page types which are interleaved in a column chunk.

source: https://parquet.apache.org/docs/concepts/

Full example here

  "options": {
    "rowGroupSize": 256,
    "pageSize": 1024
}
Clone this wiki locally