From 47eb6fb8a5fa396fe234dfd5756c04100ca67588 Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Sat, 14 Sep 2024 11:42:34 -0700 Subject: [PATCH] [Website] Correct statement about compression in FAQ (#541) --- faq.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/faq.md b/faq.md index f5894b2df858..e61340a65ac2 100644 --- a/faq.md +++ b/faq.md @@ -180,10 +180,12 @@ This efficiency comes at the cost of relatively expensive reading into memory, as Parquet data cannot be directly operated on but must be decoded in large chunks. -Conversely, Arrow is an in-memory format meant for direct and efficient use -for computational purposes. Arrow data is not compressed (or only lightly so, -when using dictionary encoding) but laid out in natural format for the CPU, -so that data can be accessed at arbitrary places at full speed. +Conversely, Arrow is an in-memory format meant primarily for direct and +efficient use for computational purposes. Arrow data is typically not +compressed but laid out in natural format for the CPU, so that data can be +accessed at arbitrary places at full speed. (However, Arrow does provide a +limited set of options for increasing space efficiency, including +dictionary encoding, run-end encoding, and buffer compression.) Therefore, Arrow and Parquet complement each other and are commonly used together in applications. Storing your data on disk