-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: incorrect negative offset in multi-byte string slicing #15140
Conversation
Ahh I have a fix for this too, but I'm assuming yours is better. Also the primary issue is now #15136. |
@mcrumiller It was rather tricky, we also were doing things wrong with 'very negative' indices. For example |
I wasn't sure what the correct usage here was--is it a sliding window that gets "stuck" when it hits the end (in which case it would be |
@mcrumiller This is the current behavior of -7 ab
-6 ab
-5 ab
0 ab -4 ab
1 bc -3 bc
2 cd -2 cd
3 d -1 d
4
5 This is the bugfix behavior: -7
-6
-5 a
0 ab -4 ab
1 bc -3 bc
2 cd -2 cd
3 d -1 d
4
5 To me the latter is much more logical. It's also consistent with Python: >>> "abcd"[-5:-3]
"a" This is |
Ugh but this is inconsistent with |
Most other languages implement slice with |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #15140 +/- ##
==========================================
+ Coverage 81.08% 81.22% +0.13%
==========================================
Files 1342 1346 +4
Lines 174150 175228 +1078
Branches 2459 2506 +47
==========================================
+ Hits 141210 142324 +1114
+ Misses 32473 32424 -49
- Partials 467 480 +13 ☔ View full report in Codecov by Sentry. |
Any news on this on? |
@ritchie46 I think the question up for discussion is how s = "abcdefghij" # length 10
s2 = s.str.slice(-12, 5) Which of the two options should this return?
My other comment is that in both Rust and python, |
@ritchie46 This PR will be merged later, as part of this issue: #15232. Unless you temporarily want to make @mcrumiller We've discussed this internally and we want to go forward with the solution described in that issue. We want to keep Polars' offset-length style, as it would literally break everyone's code out there to switch to Python's style if we even wanted to. |
Fixes #15136.
Issue identified by @mcrumiller here: #14425 (comment).