Description
Maybe this isn't an appropriate issue, if so please feel free to close it. I have a string implementation and I need to do some basic UTF8 operations on it: I need to compute the length (in characters not bytes), compare strings in a case-insensitive way (folding), and convert to upper or lowercase strings. I need these done as efficiently as possible as this has a real impact on my system. Then there are a few other more esoteric things I need like reverse a utf8 string etc. but these don't need to be done super-efficiently.
I really would like something small and I only need UTF8, so ICU is too much.
utf8proc seems like a great per-character interface, but it seems difficult to use efficiently on entire strings. For example, there's no simple, fast string length function. Also, the way that the map functions always allocate new memory and can't be used on existing buffers is a major drawback: it necessitates a lot of extra copying in many situations. It seems like a folded comparison function could be written inside utf8proc a good bit more efficiently. Etc.
Maybe that's a goal of utf8proc: to provide a character-based interface and have users compose their own higher-level (string-based) algorithms using them: simplicity taking priority over efficiency? And/or perhaps the way Julia uses utf8proc just matches well with the current interface; it doesn't have a need for writing into existing buffers etc.?