You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TextCleaner could sends its output in UTF-32 so the RoughTokenizer doesn't have to redecode it from UTF8. Since the TextCleaner must also be able to output UTF8 for the Classifier stage (reading annotated data and aligning), the TextCleaner class would have to be heavily templated. Performance gain would probably wouldn't be too high.
The text was updated successfully, but these errors were encountered:
During a potential rewrite session (not too likely in the coming weeks), it could be useful to actually switch from UTF-8 to UTF-32 for most of the application.
TextCleaner could sends its output in UTF-32 so the RoughTokenizer doesn't have to redecode it from UTF8. Since the TextCleaner must also be able to output UTF8 for the Classifier stage (reading annotated data and aligning), the TextCleaner class would have to be heavily templated. Performance gain would probably wouldn't be too high.
The text was updated successfully, but these errors were encountered: