You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code currently does no translation for filename encodings and there are a variety of different ways that filenames can be encoded. In particular Shift-JIS and EUC support are important since lha format is/was very popular in in Japan. These unfortunately will need to be manually specified since as far as I know there is no way to detect the encodings. We should internally translate everything to UTF-8.
There are some extended ASCII formats that can be reasonably autodetected based on the OS field: for example CP437 is probably a sensible default for DOS archives (or the system codepage when running on Windows) , and Mac Extended ASCII for macOS archives. If the encoding cannot be determined then non-ASCII characters should become the Unicode replacement character.
With this in place we can relax the "safe print" code currently in place, although it's still important to never print a terminal escape character or anything in the C0/C1 control character ranges (and probably the specials range too)
The text was updated successfully, but these errors were encountered:
Also, lha has been popular on Amiga OS. Default encoding seems to be Latin1, although there are different mappings for countries, which doesn't easily fall into Latin1. I guess, auto detection for corner cases could be difficult if not possible. Perhaps an external mapfile as an command line option could help in such situation, so that lhasa doesn't need to do make assumptions.
The code currently does no translation for filename encodings and there are a variety of different ways that filenames can be encoded. In particular Shift-JIS and EUC support are important since lha format is/was very popular in in Japan. These unfortunately will need to be manually specified since as far as I know there is no way to detect the encodings. We should internally translate everything to UTF-8.
There are some extended ASCII formats that can be reasonably autodetected based on the OS field: for example CP437 is probably a sensible default for DOS archives (or the system codepage when running on Windows) , and Mac Extended ASCII for macOS archives. If the encoding cannot be determined then non-ASCII characters should become the Unicode replacement character.
With this in place we can relax the "safe print" code currently in place, although it's still important to never print a terminal escape character or anything in the C0/C1 control character ranges (and probably the specials range too)
The text was updated successfully, but these errors were encountered: