Open
Description
Perhaps I'm doing something wrong, but it's worth it to check.
My input to the ReadabilityTool
is unicode utf-8 text. The input is already encoded, and I received a TypeError
when trying to run the tests on it.
Traceback (most recent call last):
File "/Users/uname/projects/news_genome/news_genome/features.py", line 137, in metrics
flesch_readability(story),
File "/Users/uname/projects/news_genome/news_genome/mlstripper.py", line 23, in wrapper
return fn(text,*args,**kwargs)
File "/Users/uname/projects/news_genome/news_genome/mlstripper.py", line 30, in wrapper
ret = fn(*args,**kwargs)
File "/Users/uname/projects/news_genome/news_genome/features.py", line 49, in flesch_readability
contrib_score = rt.FleschReadingEase(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/readabilitytests.py", line 87, in FleschReadingEase
self.__analyzeText(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/readabilitytests.py", line 49, in __analyzeText
words = t.getWords(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/textanalyzer.py", line 50, in getWords
text = self._setEncoding(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/textanalyzer.py", line 130, in _setEncoding
text = unicode(text, "utf8").encode("utf8")
TypeError: decoding Unicode is not supported
It appears the logic at line 130 in textanalyzer.py
expects to perform a encoding that is already performed.
def _setEncoding(self,text):
try:
text = unicode(text, "utf8").encode("utf8")
except UnicodeError:
try:
text = unicode(text, "iso8859_1").encode("utf8")
except UnicodeError:
text = unicode(text, "ascii", "replace").encode("utf8")
return text
Is there something I need to configure in order to make the module expect Unicode by default?
Metadata
Metadata
Assignees
Labels
No labels