Passing Unicode directly raises TypeError in textanalyzer

Perhaps I'm doing something wrong, but it's worth it to check.

My input to the `ReadabilityTool` is unicode utf-8 text.  The input is already encoded, and I received a `TypeError` when trying to run the tests on it.

```
Traceback (most recent call last):
  File "/Users/uname/projects/news_genome/news_genome/features.py", line 137, in metrics
    flesch_readability(story),
  File "/Users/uname/projects/news_genome/news_genome/mlstripper.py", line 23, in wrapper
    return fn(text,*args,**kwargs)
  File "/Users/uname/projects/news_genome/news_genome/mlstripper.py", line 30, in wrapper
    ret = fn(*args,**kwargs)
  File "/Users/uname/projects/news_genome/news_genome/features.py", line 49, in flesch_readability
    contrib_score = rt.FleschReadingEase(text)
  File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/readabilitytests.py", line 87, in FleschReadingEase
    self.__analyzeText(text)
  File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/readabilitytests.py", line 49, in __analyzeText
    words = t.getWords(text)
  File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/textanalyzer.py", line 50, in getWords
    text = self._setEncoding(text)
  File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/textanalyzer.py", line 130, in _setEncoding
    text = unicode(text, "utf8").encode("utf8")
TypeError: decoding Unicode is not supported
```

It appears the logic at line 130 in `textanalyzer.py` expects to perform a encoding that is already performed.

```
def _setEncoding(self,text):
        try:
            text = unicode(text, "utf8").encode("utf8")
        except UnicodeError:
            try:
                text = unicode(text, "iso8859_1").encode("utf8")
            except UnicodeError:
                text = unicode(text, "ascii", "replace").encode("utf8")
        return text
```

Is there something I need to configure in order to make the module expect Unicode by default?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Passing Unicode directly raises TypeError in textanalyzer #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Passing Unicode directly raises TypeError in textanalyzer #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions