-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use bytestrings #84
use bytestrings #84
Conversation
b9ca41e
to
319ad18
Compare
I think this is too user-hostile for the use-cases where you actually want to modify the subtitle texts. I would like to get the same effect with Unicode surrogate escapes, which should improve the situation ie. prevent some hard errors when reading subtitle files that happen today, without burdening everyone with |
mixed-encoding files are a rare problem handling mixed-encoding files requires to detect boundaries between different encodings returning bytes instead of str could be a fallback but pysubs could still return strings strings would be more user-friendly but less performant
that just ignores the problem of mixed-encoding files the output will also be a mixed-encoding file
|
I hope that mixed-encoding files are rare, since they sound like hell to deal with :) I'd like if it was possible to use the library to parse them, at least in principle, but I'm not sure how much support the library itself should provide. I think being able to get the raw bytes of the individual subtitle texts is enough - this is now possible with the latest version of my code, the user can The performance aspect is something I haven't really thought about, and isn't my goal for the library. Your
Thanks for pointing this error, that's a bit nasty. I should note this in the documentation, if the next release is going to have surrogate escapes as default. |
fix #43
simply ignore text encoding, and use raw bytestrings
dealing with text encoding is deferred to the user
this allows handling "broken" files with multiple encodings
probably this is too much change, so i merged this into pysubs2bytes