utf8proc seems difficult to use efficiently on strings

Maybe this isn't an appropriate issue, if so please feel free to close it.  I have a string implementation and I need to do some basic UTF8 operations on it: I need to compute the length (in characters not bytes), compare strings in a case-insensitive way (folding), and convert to upper or lowercase strings.  I need these done as efficiently as possible as this has a real impact on my system.  Then there are a few other more esoteric things I need like reverse a utf8 string etc. but these don't need to be done super-efficiently.

I really would like something small and I only need UTF8, so ICU is too much.

utf8proc seems like a great _per-character_ interface, but it seems difficult to use efficiently on entire strings.  For example, there's no simple, fast string length function.  Also, the way that the map functions always allocate new memory and can't be used on existing buffers is a major drawback: it necessitates a lot of extra copying in many situations.  It seems like a folded comparison function could be written inside utf8proc a good bit more efficiently.  Etc.

Maybe that's a goal of utf8proc: to provide a character-based interface and have users compose their own higher-level (string-based) algorithms using them: simplicity taking priority over efficiency?  And/or perhaps the way Julia uses utf8proc just matches well with the current interface; it doesn't have a need for writing into existing buffers etc.?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

utf8proc seems difficult to use efficiently on strings #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

utf8proc seems difficult to use efficiently on strings #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions