[WIP] Attempt piping through field metadata in as many places as possible #15036
+116
−19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As discussed in #12644 and on a recent DataFusion sync, a lightweight mechanism to patch in user-defined type support is to use the
Field
rather than theDataType
in as many places as possible.This is a very in-progress experiment...I know there is also work on LogicalType that may also be able to solve this (as would adding an
Extension
member to the arrow-rsDataType
enum). I figured experimenting in public might be productive but am happy to go back to experimenting in private if that is less confusing!Which issue does this PR close?
Rationale for this change
Most database systems have the concept of a "user defined type" that can act as first-class types (or close to them). The exact things that can be customized about this vary by system but usually include the ability to define new functions whose signature matches that type, add overloads to existing functions (notably: cast), or sort/display values in a particular way.
Arrow has the concept of an "extension type", which is implemented in the C data interface and IPC formats as two
Field
metadata fields:ARROW:extension:name
andARROW:extension:metadata
. In arrow-rs there is theExtensionType
which provides a means by which to (at least) centralize the serialization and deserialization ofARROW:extension:metadata
.What changes are included in this PR?
Experiments on the way towards replacing
DataType
withField
where possible.Are these changes tested?
If this is ever more than just an experiment, they will be!
Are there any user-facing changes?
If this is ever more than just an experiment, there will be!