You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You have a gibberish text string in the here and now, e.g. "éä"
You know the source language with absolute certainty, e.g. "he" ("Hebrew"), because it's a song sung in Hebrew.
You have no idea how many false interpretation/conversions that string has undergone in its lifecycle from creation to how you currently see it on screen, e.g. I found this in a song title in Apple Music, which was in the ID3 tag of an MP3 file. ID3 has different versions allowing different charsets, who knows what that has undergone (ID3 updates over decades usage of iTunes/Apple Music, syncing between different ID3 versions present in the same file, etc), and who knows which charset Apple Music uses on it now for reading and presenting.
User goal
I'd like a mode where you can paste a text snippet via stdin into enca and state --language <language-code> --certainty <percentage> and enca cleverly tells me something like:
85% plausibility: charsetX → charsetY → charsetZ
79% plausibility: charsetX falsely interpreted as charsetY → then converted to charsetZ
I have no faint idea whether that's feasibility at all, no idea at all by which heuristics / analysis something like this could be done, but hey 😂 in the age of machine learning and huge data correlating engines, I can at least formulate the use case and hope it may be possible.
The text was updated successfully, but these errors were encountered:
this is a feature request.
Use case
User goal
I'd like a mode where you can paste a text snippet via stdin into enca and state
--language <language-code> --certainty <percentage>
andenca
cleverly tells me something like:I have no faint idea whether that's feasibility at all, no idea at all by which heuristics / analysis something like this could be done, but hey 😂 in the age of machine learning and huge data correlating engines, I can at least formulate the use case and hope it may be possible.
The text was updated successfully, but these errors were encountered: