-
Notifications
You must be signed in to change notification settings - Fork 933
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce memory use when writing tables with very short columns to ORC (#…
…18136) Closes #18059 To avoid estimating the maximum compressed size for each actual block in the file, ORC writer uses the estimate for the (uncompressed) block size limit, which defaults to 256KB. However, when we write many small blocks, this compressed block size estimate is much larger than what is needed, leading to high memory use for wide/short tables. This PR adds logic to take the actual block size into account, and to use the size of the actual largest block in the file, not the largest possible block. This changes the memory usage by orders of magnitude in some tests. --------- Co-authored-by: Bradley Dice <[email protected]>
- Loading branch information
Showing
3 changed files
with
22 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters