Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle undefined conversions? #12

Open
krepflap opened this issue Jun 18, 2019 · 1 comment
Open

How to handle undefined conversions? #12

krepflap opened this issue Jun 18, 2019 · 1 comment

Comments

@krepflap
Copy link

krepflap commented Jun 18, 2019

I was wondering how to replace undefined conversions by a substitute character when they are outside of the destination encoding, e.g. when I try to convert the euro sign (€) to SHIFT JIS encoding.

In Ruby, we can do this:

"xx€xx".encode('SHIFT_JIS', 'UTF-8', undef: :replace)
=> "xx?xx"

And the € which cannot be converted is replaced by a "?" character. This is important when doing text comparison i.e. https://unicode.org/reports/tr36/#Text_Comparison

When converting charsets, never simply omit characters that cannot be converted; at least substitute U+FFFD (when converting to Unicode) or 0x1A (when converting to bytes) to reduce security problems.

Can we do this using iconv library in Elixir/Erlang? Currently the undefined character is omitted. I guess I could do the conversion char by char and check if it returns an empty string but I was hoping if there is anything more elegant possible?

@krepflap
Copy link
Author

If any one stumbles upon this, I'm using this to handle the case above, though it does call :iconv.convert for every character.

  defp to_shift_jis(input) do
    convert = fn x ->
      case :iconv.convert("utf-8", "shift-jis", <<x::utf8>>) do
        "" -> "?"
        c -> c
      end
    end

    for <<c::utf8 <- input>>, do: convert.(c), into: ""
  end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant