You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def split_text(filename, n_words):
....: """Split a text into chunks approximately n_words words in length."""
....: input = open(filename, 'r')
....: words = input.read().split(' ')
....: input.close()
At the place of "input = open(filname, 'r')".
I don't konw if we use "input = open(filname, 'r', encoding = 'UTF-8')" would be better.
Otherwise you may get error message: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 10: character maps to ".
The text was updated successfully, but these errors were encountered:
I'm very used to Linux and OS X where the default encoding is frequently utf-8 and you don't need to specify utf-8 under Python 3. For the longest time I assumed that utf-8 was actually the fixed default for Python 3.
Hi Allen,
https://de.dariah.eu/tatom/preprocessing.html#every-1-000-words
def split_text(filename, n_words):
....: """Split a text into chunks approximately
n_words
words in length."""....: input = open(filename, 'r')
....: words = input.read().split(' ')
....: input.close()
At the place of "input = open(filname, 'r')".
I don't konw if we use "input = open(filname, 'r', encoding = 'UTF-8')" would be better.
Otherwise you may get error message: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 10: character maps to ".
The text was updated successfully, but these errors were encountered: