Skip to content

Add coerce_types flag to parquet ArrowWriter #1938

Open
@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As discussed in #1666 not all types can be represented within a parquet schema.

Describe the solution you'd like

The consensus appears to be to:

  • By default faithfully round-trip the source data, performing no potentially lossy type conversion
  • Add a coerce_types flag that will use the arrow cast kernels to coerce incompatible types prior to writing them

In particular

Date64

If not coerce_types, write as Int64 and embed logical type in arrow schema only. Otherwise case to Date32

Timestamp

If not coerce_types, write as is, setting LogicalType / ConvertedType only where appropriate.

If coerce_types, cast to a UTC timestamp with the closest supported time unit, likely needing #1936.

Interval

If not coerce_types, write as FixedSizeBinaryArray matching the arrow representation and store logical type in arrow schema.

If coerce_types, convert to the relevant parquet representation.

Describe alternatives you've considered

See #1666

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions