Skip to content

chapter05: invalid parameter, in TfidfVectorizer #34

Open
@bmerkle

Description

@bmerkle
from spacy.lang.en.stop_words import STOP_WORDS as stopwords
print(len(stopwords))
tfidf = TfidfVectorizer(stop_words=list(stopwords))
dt = tfidf.fit_transform(headlines["headline_text"])
dt

error message:

---------------------------------------------------------------------------
InvalidParameterError                     Traceback (most recent call last)
Cell In[33], [line 4](vscode-notebook-cell:?execution_count=33&line=4)
      [2](vscode-notebook-cell:?execution_count=33&line=2) print(len(stopwords))
      [3](vscode-notebook-cell:?execution_count=33&line=3) tfidf = TfidfVectorizer(stop_words=stopwords)
----> [4](vscode-notebook-cell:?execution_count=33&line=4) dt = tfidf.fit_transform(headlines["headline_text"])
      [5](vscode-notebook-cell:?execution_count=33&line=5) dt

fix:
tfidf = TfidfVectorizer(stop_words=list(stopwords))
or we could use
tfidf = TfidfVectorizer(stop_words='english')

there are several places where this has to be fixed in the chapter05 notebook

@christianw @datanizing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions