Skip to content

Commit

Permalink
Optimization improvement for substr in cudf::string_view (#18062)
Browse files Browse the repository at this point in the history
Slight optimization improvement sets the character count in the `cudf::string_view` produced by `cudf::string_view::substr` when the number of output characters is known. This can save redundant character counting in downstream usage of the new string.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Devavret Makkar (https://github.com/devavret)
  - Shruti Shivakumar (https://github.com/shrshi)

URL: #18062
  • Loading branch information
davidwendt authored Mar 4, 2025
1 parent b6a6d39 commit 93d98af
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions cpp/include/cudf/strings/string_view.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -443,10 +443,12 @@ __device__ inline size_type string_view::rfind(char_utf8 chr, size_type pos, siz
__device__ inline string_view string_view::substr(size_type pos, size_type count) const
{
if (pos < 0 || pos >= length()) { return string_view{}; }
auto const itr = begin() + pos;
auto const spos = itr.byte_offset();
auto const epos = count >= 0 ? (itr + count).byte_offset() : size_bytes();
return {data() + spos, epos - spos};
auto const spos = begin() + pos;
auto const epos = count >= 0 ? (spos + count) : const_iterator{*this, _length, size_bytes()};
auto ss = string_view{data() + spos.byte_offset(), epos.byte_offset() - spos.byte_offset()};
// this potentially saves redundant character counting downstream
if (_length != UNKNOWN_STRING_LENGTH) { ss._length = epos.position() - spos.position(); }
return ss;
}

__device__ inline size_type string_view::character_offset(size_type bytepos) const
Expand Down

0 comments on commit 93d98af

Please sign in to comment.