UnicodeDecodeError #11

dkltimon · 2015-02-03T14:22:14Z

Hi Allen,

https://de.dariah.eu/tatom/preprocessing.html#every-1-000-words

def split_text(filename, n_words):
....: """Split a text into chunks approximately n_words words in length."""
....: input = open(filename, 'r')
....: words = input.read().split(' ')
....: input.close()

At the place of "input = open(filname, 'r')".

I don't konw if we use "input = open(filname, 'r', encoding = 'UTF-8')" would be better.

Otherwise you may get error message: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 10: character maps to ".

The text was updated successfully, but these errors were encountered:

ariddell · 2015-02-03T15:54:17Z

You're completely right. Thanks for the report.

I'm very used to Linux and OS X where the default encoding is frequently utf-8 and you don't need to specify utf-8 under Python 3. For the longest time I assumed that utf-8 was actually the fixed default for Python 3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError #11

UnicodeDecodeError #11

dkltimon commented Feb 3, 2015

ariddell commented Feb 3, 2015

UnicodeDecodeError #11

UnicodeDecodeError #11

Comments

dkltimon commented Feb 3, 2015

ariddell commented Feb 3, 2015