Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextCleaner could benefit from templating #5

Open
jirkamarsik opened this issue Mar 23, 2011 · 1 comment
Open

TextCleaner could benefit from templating #5

jirkamarsik opened this issue Mar 23, 2011 · 1 comment

Comments

@jirkamarsik
Copy link
Owner

TextCleaner could sends its output in UTF-32 so the RoughTokenizer doesn't have to redecode it from UTF8. Since the TextCleaner must also be able to output UTF8 for the Classifier stage (reading annotated data and aligning), the TextCleaner class would have to be heavily templated. Performance gain would probably wouldn't be too high.

@jirkamarsik
Copy link
Owner Author

During a potential rewrite session (not too likely in the coming weeks), it could be useful to actually switch from UTF-8 to UTF-32 for most of the application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant