QUESTION: Keep columns when using PARTITIONED BY with SQL #10962
-
Hi folks, I am using Datafusion to partition some data stored in parquet files into a different set of parquet files. I would like those newly created files to contain the columns I am partitioning by, however currently the column gets removed as it becomes part of the file directory structure. Something like:
Is there a way in SQL to keep I have been looking in the documentation and open/closed issues but I could not find a way to do this, but if there is some information about it, a link would be greatly appreciated. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Your description of the behavior is correct. Datafusion will remove the partition columns from the data before writing out any files. There is not currently a config to modify this behavior, but it would be relatively straightforward to add. We would just need to make this line conditional on a config: |
Beta Was this translation helpful? Give feedback.
Feel free to open a feature request issue! You are also free to take a stab at implementing if you are interested. I may be able to take a look at it myself as well.