-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add str.head
and str.tail
#14425
Changes from 12 commits
1988f37
76ac085
dfd6b66
939d6c5
2df1b08
54b0cf1
788484f
e2e51db
66a8ecd
3acf020
0e0d146
8d90538
0b9f353
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -525,6 +525,26 @@ impl StringNameSpace { | |
) | ||
} | ||
|
||
/// Take the first `n` characters of the string values. | ||
pub fn head(self, n: Expr) -> Expr { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if we can't just define these in terms of a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @stinodego -- this was my initial intent, when I said I would piggyback on @reswqa's implementation of Here are the operations and their s.str.head(3) # s.str.slice(3, None)
s.str.head(-3) # no equivalent: must know string length for start offset
s.str.tail(3) # s.str.slice(-3, None)
s.str.tail(-3) # s.str.slice(3, None) So for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess you can use |
||
self.0.map_many_private( | ||
FunctionExpr::StringExpr(StringFunction::Head), | ||
&[n], | ||
false, | ||
false, | ||
) | ||
} | ||
|
||
/// Take the last `n` characters of the string values. | ||
pub fn tail(self, n: Expr) -> Expr { | ||
self.0.map_many_private( | ||
FunctionExpr::StringExpr(StringFunction::Tail), | ||
&[n], | ||
false, | ||
false, | ||
) | ||
} | ||
|
||
pub fn explode(self) -> Expr { | ||
self.0 | ||
.apply_private(FunctionExpr::StringExpr(StringFunction::Explode)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be changed, string head/tail must be defined in terms of codepoints, not bytes! Otherwise you get illegal UTF-8 and general nonsense. Please change this and add a test-case that tests this, for example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @orlp, may have to make a few changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I am noticing that
str.slice
does not properly index codepoints with negative indexes. Using your example:I'll see if I can address this as a separate issue once I have finished with this one. Edit: opened #15136.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orlp the new impl respects code points instead of bytes. I added some specific code point tests using your example. Let me know if anything looks off to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.