-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tracing option for reading strings as (Large)Utf8 #246
Add tracing option for reading strings as (Large)Utf8 #246
Conversation
I would prefer to keep both changes, large lists and Utf8, in a single PR as they are so closely related. Probably this one. Re how to execute tests the x.py script bundles all common commands and is documented in Development.md. It could be advertised a bit more prominently, though. Running Re asymetry: as only LargeUtf8 can be encountered during from_samples, only this case is handled in the coercion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. Looks great, but you're right that coerce_primitives
should be updated.
Please also expand the coercion tests. I will look them up for you.
Okay. Actually the tests for the coercion could be a bit more tricky. Separate PRs is the way to go :) |
As for the tests, I would include the coercion tests here: E.g., by changing the macro to: Potential test impl macro_rules! test {
($name:ident, $($data:tt)*) => {
mod $name {
use super::super::*;
#[test]
fn large_utf8() -> PanicOnError<()> {
let expected = SerdeArrowSchema::from_value(&json!([
{
"name": "date",
"data_type": "LargeUtf8",
"nullable": true,
},
]))?;
let data = json!($($data)*);
let actual = SerdeArrowSchema::from_samples(&data, TracingOptions::default().guess_dates(true))?;
assert_eq!(actual, expected);
Ok(())
}
#[test]
fn utf8() -> PanicOnError<()> {
let expected = SerdeArrowSchema::from_value(&json!([
{
"name": "date",
"data_type": "Utf8",
"nullable": true,
},
]))?;
let data = json!($($data)*);
let actual = SerdeArrowSchema::from_samples(&data, TracingOptions::default().guess_dates(true).strings_as_large_utf8(false))?;
assert_eq!(actual, expected);
Ok(())
}
}
};
} |
68b6160
to
0ed54e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkylling Thanks a lot for your changes!
I took the opportunity to slightly touch up the TracingOptions
docs and make it more consistent between the different items.
Should be on top of #245
Is there something missing in
coerce_primitive_type
? There seems to be an asymmetry between Utf8 and LargeUtf8 there.Could some instructions on how to run tests be added to the README. To run the full test suite the appropriate
--features
and--cfg
flags must be set, and I was a bit confused about which flags to set to make the tests run.