Skip to content

Commit

Permalink
Optimize mb_strcut for text encodings with mblen_table
Browse files Browse the repository at this point in the history
For legacy text encodings where mb_strcut is implemented using an
mblen_table (such as the various SJIS variants), mb_strcut is now
~30% faster on small strings (about 10 bytes). This is because we
are now avoiding an extra, unnecessary copy operation on the
output string.

When used on large strings, the difference in performance is
negligible, as almost the entire runtime is spent stepping through
the string to find the starting and ending cut points.
  • Loading branch information
alexdowad committed Dec 4, 2023
1 parent 775fb31 commit c1a37c4
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions ext/mbstring/mbstring.c
Original file line number Diff line number Diff line change
Expand Up @@ -2455,6 +2455,29 @@ PHP_FUNCTION(mb_strcut)
RETURN_STR(zend_string_init_fast((const char*)(string.val + from), len & -char_len));
}

if (enc->mblen_table) {
const unsigned char *mbtab = enc->mblen_table;
const unsigned char *p, *q, *end;
int m = 0;
/* Search for start position */
for (p = (const unsigned char*)string.val, q = p + from; p < q; p += (m = mbtab[*p]));
if (p > q) {
p -= m;
}
const unsigned char *start = p;
/* Search for end position */
if (len >= string.len - (start - (const unsigned char*)string.val)) {
end = (const unsigned char*)(string.val + string.len);
} else {
for (q = p + len; p < q; p += (m = mbtab[*p]));
if (p > q) {
p -= m;
}
end = p;
}
RETURN_STR(zend_string_init_fast((const char*)start, end - start));
}

ret = mbfl_strcut(&string, &result, from, len);
ZEND_ASSERT(ret != NULL);
RETVAL_STRINGL((char *)ret->val, ret->len); /* the string is already strdup()'ed */
Expand Down

0 comments on commit c1a37c4

Please sign in to comment.