-
Notifications
You must be signed in to change notification settings - Fork 916
Support Utf8View for Avro #7434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kumarlokesh -- this looks really nice
To make reviewing this PR easier, is there any way you can break it into multiple smaller PRs? Specifically,
- Improve the comments
- Support for Map
- Support for Utf8View?
Also it is not clear to me how much of a conflict this PR would have with:
arrow-avro/src/codec.rs
Outdated
pub fn codec(&self) -> &Codec { | ||
&self.codec | ||
} | ||
|
||
/// Returns the nullability status of this data type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice -- thank you
ComplexType::Map(m) => Err(ArrowError::NotYetImplemented(format!( | ||
"Map of {m:?} not currently supported" | ||
))), | ||
ComplexType::Map(m) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be a new feature (not just string view, but also map support)
Have you had a chance to review this PR?
arrow-avro/src/compression.rs
Outdated
@@ -23,9 +23,16 @@ use std::io::Read; | |||
pub const CODEC_METADATA_KEY: &str = "avro.codec"; | |||
|
|||
#[derive(Debug, Copy, Clone, Eq, PartialEq)] | |||
/// Supported compression codecs for Avro data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these comments
91a31ce
to
78f8c5b
Compare
Which issue does this PR close?
Closes #7262.
Rationale for this change
What changes are included in this PR?
This PR adds support for
Utf8View
to the arrow-avro reader, enabling the creation ofStringViewArray
instead of regularStringArray
for string data.with_utf8view
method for codec conversionDecoder
enum with a StringView variant to handle StringViewArray creationuse_utf8view
toReadOptions
structBenchmark output:
From the above benchmark results, we can draw following conclusions:
Are there any user-facing changes?