Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing and exporting to UTF-8 #41

Open
LienReyserhove opened this issue Oct 25, 2017 · 1 comment
Open

Importing and exporting to UTF-8 #41

LienReyserhove opened this issue Oct 25, 2017 · 1 comment
Assignees
Labels

Comments

@LienReyserhove
Copy link
Contributor

There seems to be a problem with the data encoding when importing from and exporting to the taxon dataset to a .csv file.

With respect to the importing:

A warning message emerges:

Warning message: In Sys.setlocale("LC_ALL", "en_US.UTF-8") : OS reports request to set locale to "en_US.UTF-8" cannot be honored

This warning message only occurs in Windows and can be changed after applying

Sys.setlocale(,` "English_United States.1252")

see this table of locales and the following GitHub issue

I already adapted this in the script

With respect to the exporting:

The taxon data is not exported as UTF-8, despite the fileEncoding = "UTF-8" argument in the write.csv statement

E.g. in line 188 from the taxon dataset, the special character č in the scientificName Pastinaca sativa L. subsp. urens (Req. ex Godr.) čelak is reversed to c.

I suspect this problem is somehow due to a printing bug in R, despite the fact that the data it is correctly read and stored. Apparently, the print() method on data.frame tries to round-trip characters through the active encoding, which is lossy when converting UTF-8 encoded characters. (see this GitHub issue) . However, this is just an idea, I have tried several other options, without result.

@qgroom do you have any idea what the problem is here?

@peterdesmet
Copy link
Member

On my Mac OS X:

  • This works: Sys.setlocale("LC_CTYPE", "en_US.UTF-8")
  • This generates an error: Sys.setlocale("LC_CTYPE", "English_Australia.1252")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants