Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when loading Simple Tab Delimited File into Postgres #300

Open
ocourch opened this issue Dec 5, 2023 · 2 comments
Open

Error when loading Simple Tab Delimited File into Postgres #300

ocourch opened this issue Dec 5, 2023 · 2 comments

Comments

@ocourch
Copy link

ocourch commented Dec 5, 2023

Hi, I've been following the instructions here to copy the Simple Tab Delimited File to a Postgres instance. When running the command, I first got this error:

postgres=# \copy backbone FROM 'C:\Users\oscar\Downloads\simple.txt'
ERROR:  character with byte sequence 0x81 in encoding "WIN1252" has no equivalent in encoding "UTF8"
CONTEXT:  COPY backbone, line 737

I checked that line and the file, and it contained some Cyrillic characters that, when removed, shifted the error down the file:

postgres=# \copy backbone FROM 'C:\Users\oscar\Downloads\simple.txt'
ERROR:  character with byte sequence 0x9d in encoding "WIN1252" has no equivalent in encoding "UTF8"
CONTEXT:  COPY backbone, line 4735

Is this a known issue when attempting to do this in Windows? I've already made sure the Postgres instance and the file are all UTF-8, but I still get the error. Is there a version of the file that copies correctly on Windows?

@ocourch ocourch changed the title Error when loading Simple Tab Delimited File into postgres Error when loading Simple Tab Delimited File into Postgres Dec 5, 2023
@MattBlissett
Copy link
Member

The file is in UTF-8 encoding:

$ head -n 737 simple.txt | tail -n 1
7518485 7895461 \N      f       ACCEPTED        GENUS   {}      7ddf754f-d193-4cc9-b351-99906754a03b    SOURCE  181472999       4  8376456  7434778 7378538 7895461 7518485 \N      35789   Abdullaevia Suleymanov, 1965    Abdullaevia     Abdullaevia     \N      \N \N       Suleymanov      1965    \N      \N      Suleymanov, I. S. (1965). О полиморфиноидных формах и о формах с ячеистым строением стенки среди агглютинированных фораминифер - About polymorphinoid forms and forms with a cellular structure of the wall among agglutinated foraminifera. Доклады Академии наук Узбекской ССР - Reports of the Academy of Sciences of the Uzbek SSR, 7: 46–48. https://www.marinespecies.org/foraminifera/aphia.php?p=sourcedetails&id=286417    {}

$ head -n 737 simple.txt | tail -n 1 | head -c 263 | tail -c 2
л

$ head -n 737 simple.txt | tail -n 1 | head -c 263 | tail -c 2 | hexdump -C
0000000 d0 bb

And the UTF-8 encoding of л is d0 bb

I don't think we have a Windows system with PostgreSQL installed, but it is probably something to do with the client configuration.

According to https://stackoverflow.com/questions/68627853/psql-copy-statement-error-says-character-is-encoded-win1252-despite-file-seeming via https://stackoverflow.com/questions/20952893/postgresql-encoding-problems-on-windows-when-using-psql-command-line-utility it is the terminal ("cmd") that needs to be configured. If this works we can add a note to our documentation.

@ocourch
Copy link
Author

ocourch commented Dec 6, 2023

This worked! Thank you for the help. I had actually tried these commands previously but ran psql through the shortcut to the .bat rather than from the same cmd window. Following the stack overflow and calling the .bat from the same window allowed me to copy successfully.

C:\Users\oscar>"\Program Files\PostgreSQL\16\scripts\runpsql.bat"

I appreciate the quick help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants