Skip to content

Commit

Permalink
Merge pull request #2594 from domoritz/patch-4
Browse files Browse the repository at this point in the history
Fix typo in 42.parquet post
  • Loading branch information
szarnyasg authored Mar 26, 2024
2 parents 3c1f959 + 50b5fda commit 1020047
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ But we can go up from there. Columns can contain multiple pages referring to *th

With some fiddling, we found that if we repeat the data page 1000 times and repeat the row group 290 times, we end up with [a Parquet file](https://github.com/hannes/fortytwodotparquet/raw/main/42.parquet) that is 42 kilobytes large, yet contains *622 trillion* values (622,770,257,630,000 to be exact). If one would materialize this table in memory, it would require over *4 petabytes* of memory, finally a real example of [Big Data](https://motherduck.com/blog/big-data-is-dead/), coincidentally roughly the same size as the original `42.zip` mentioned above.

We've made the [script that we use to generate this file available as well](https://github.com/hannes/fortytwodotparquet/blob/main/create-parquet-file.py), we hope it can be used to test Parquet readers better. We hope to have shown that Parquet files can possible be considered harmful and should certainly not be shoved into some pipeline without being extra careful. And while DuckDB *can* read data from our file (e.g., with a `LIMIT`), if you would make it read through it all, you better get some coffee.
We've made the [script that we use to generate this file available as well](https://github.com/hannes/fortytwodotparquet/blob/main/create-parquet-file.py), we hope it can be used to test Parquet readers better. We hope to have shown that Parquet files can be considered harmful and should certainly not be shoved into some pipeline without being extra careful. And while DuckDB *can* read data from our file (e.g., with a `LIMIT`), if you would make it read through it all, you better get some coffee.

0 comments on commit 1020047

Please sign in to comment.