Replies: 1 comment 2 replies
-
Hello Sašo, a "true" append is not possible, or sensible. For once parquet writes a footer at the end of each file, and also the row groups are column based, and should probably be larger than 50k for most usecases. Append is one thing, but merging another entirely. There are many ways to merge multiple parquet files with similar schemas into one big one. Regarding the scope of One of the ways I found with a search to merge parquet files is utilizing Cheers, Markus |
Beta Was this translation helpful? Give feedback.
-
Hi,
Would it be possible to add a feature to append the data to an existing parquet file?
Use-case is the following: We are using odbc2parquet to export CDC (Change Data Capture) data from an SQL Server database. The data is being exported every few minutes, so we get a large amount of small files (~150 files per table per day, each file ~50kB) that are uploaded to a data lake and ingested to a data warehousing platform (Snowflake). The platform is optimized to ingest optimally the data file sizes between 100-200MB. lots of small files create additional costs/overhead and the performance is not optimal.
Best regards,
Sašo
Beta Was this translation helpful? Give feedback.
All reactions