-
Notifications
You must be signed in to change notification settings - Fork 924
Add coerce_types flag to parquet ArrowWriter #1938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
take |
I think this issue was done in #6840 If that is not correct, please reopen / let me know what else needs to be done |
@alamb my original PR only handled Date64. I'm not sure if Interval and Timestamp are still outstanding. |
reopening per @dsgibbons 's comments |
@dsgibbons would you be willing to make a PR to complete Interval and Timestamp so we can close this issue? |
@alamb yes, but will probably take a while (1-2 months). If someone else wants to finish this off in the meantime then go ahead. I'll comment once I start working on this. |
Thank you @dsgibbons ! |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As discussed in #1666 not all types can be represented within a parquet schema.
Describe the solution you'd like
The consensus appears to be to:
In particular
Date64
If not coerce_types, write as Int64 and embed logical type in arrow schema only. Otherwise case to Date32
Timestamp
If not coerce_types, write as is, setting LogicalType / ConvertedType only where appropriate.
If coerce_types, cast to a UTC timestamp with the closest supported time unit, likely needing #1936.
Interval
If not coerce_types, write as FixedSizeBinaryArray matching the arrow representation and store logical type in arrow schema.
If coerce_types, convert to the relevant parquet representation.
Describe alternatives you've considered
See #1666
The text was updated successfully, but these errors were encountered: