Reconsider default for statistics in parquet writer #15586
Labels
A-io-parquet
Area: reading/writing Parquet files
A-io-partitioning
Area: reading/writing (Hive) partitioned files
enhancement
New feature or an improvement of an existing feature
Description
With polars on the verge of doing native dataset writing, it seems it's a good time to rethink the default statistics behavior. Datasets largely depend on having statistics so it seems the small price to pay to calculate them at write time is worth it at read time. I suppose the standalone writer could still default to False but the dataset writer would have to have statistics turned on (wouldn't it?). I think that might create more confusion for those two to be different so just another reason to turn them on by default in the standalone writer.
The text was updated successfully, but these errors were encountered: