partitioning the files #556
Replies: 2 comments 1 reply
-
Just to make sure I understand the feature request: You want odbc2parquet to start a new file, anytime one of the columns would have different value compared to the predecessor row? In addition to that you would make sure that the query returns the data already in correct order? |
Beta Was this translation helpful? Give feedback.
-
This would require a file split, based on row wise logic. odbc2parquet currently writes units of batches. It also would require to look ahead in order to see how columns are split. All of this could be done, but currently at least me personally can not pull this of in my spare time I would suggest to either: Use a compression which allows you to split and concatenate the files written and partition them in a second pass. Alternatively you could use either the rust or Python version of arrow-odbc and write it to parquet in your own custom code. E.g. using the arrow-odbc Python bindings you use the existing Python libraries mentioned in the Stack overflow answer to achieve what you want. |
Beta Was this translation helpful? Give feedback.
-
would it be possible to pass a series of columns that the parquet files would be partitioned on?
Beta Was this translation helpful? Give feedback.
All reactions