-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return null for overflow when casting string to integer under safe option enabled #5398
Conversation
arrow-cast/src/parse.rs
Outdated
@@ -438,7 +438,7 @@ macro_rules! parser_primitive { | |||
($t:ty) => { | |||
impl Parser for $t { | |||
fn parse(string: &str) -> Option<Self::Native> { | |||
lexical_core::parse::<Self::Native>(string.as_bytes()).ok() | |||
string.parse::<Self::Native>().ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only touched integer parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if floating parser also has same issue. I'm not sure about if there is floating type overflow behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can wait for a fix at the upstream crate. But the ticket is open for more than 6 months, and no progress so far. I think we may need to fix here directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense in terms of correctness
Looks like this was added by #4050 which boasted a non-trivial speedup, so I guess this correctness fix will cause a performance regression 🤔
I wouldn't mind trying to take a look at the core issue in lexical_core
but I can't promise anything 😅
Edit: seems there was even an issue for that raised ~1.5 years ago too: Alexhuszagh/rust-lexical#91
Given the maintainer doesn't seem active anymore either, I guess even if an upstream fix is suggested it might not get merged... unless we rely on a fork 🤔
Edit2: for reference, polars removed dependency on lexical: pola-rs/polars#12512
Hmm, polars uses |
I switched to use |
The benchmark is mixed with improvement and a little regression. Improvement:
Regression:
|
arrow-cast/Cargo.toml
Outdated
@@ -49,6 +49,7 @@ chrono = { workspace = true } | |||
half = { version = "2.1", default-features = false } | |||
num = { version = "0.4", default-features = false, features = ["std"] } | |||
lexical-core = { version = "^0.8", default-features = false, features = ["write-integers", "write-floats", "parse-integers", "parse-floats"] } | |||
atoi_simd = "0.15.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that this crate does not seem to have a very large community around it.
How does performance compare to atoi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will run benchmark with atoi
to compare it.
As this is a parser for integers, I guess we can easily switch to other similar crate (e.g., atoi) if we want. Community size seems not a big concern to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Community size seems not a big concern to me
Given the motivating factor for switching is a bug not getting attention, I am concerned. I'd be willing to sacrifice performance, for a better long-term maintenance story.
Hmm, that’s good point. Although I also think maintenance is not highly
related to community size?
Anyway, I will try atoi once I return to my laptop later.
…On Wed, Feb 14, 2024 at 11:34 AM Raphael Taylor-Davies < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In arrow-cast/Cargo.toml
<#5398 (comment)>:
> @@ -49,6 +49,7 @@ chrono = { workspace = true }
half = { version = "2.1", default-features = false }
num = { version = "0.4", default-features = false, features = ["std"] }
lexical-core = { version = "^0.8", default-features = false, features = ["write-integers", "write-floats", "parse-integers", "parse-floats"] }
+atoi_simd = "0.15.6"
Community size seems not a big concern to me
Given the motivating factor for switching is a bug not getting attention,
I am concerned. I'd be willing to sacrifice performance, for a better
long-term maintenance story.
—
Reply to this email directly, view it on GitHub
<#5398 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAQZ5362DXWCNDEYLWPSXLYTUGT7AVCNFSM6AAAAABDHB72B2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQOBRGA4TKNZZG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
Due to |
By default
|
This reverts commit 53dd047.
Which issue does this PR close?
Closes #5397.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?