Allow sparse input for naive bayes classifier #18

cyan198 · 2021-11-23T14:03:38Z

I tried converting pipeline to pure_sklearn. The pipeline consist of TfidfVectorizer and MultinomialNB. The output of TfIdfVectorizer is sparse array as input to MultinomialNB. However, the naive bayes predict method does not support sparse array as input (X), as defined in the code below and thus throws error.

pure-predict/pure_sklearn/naive_bayes.py

Line 25 in c3431b7

X = check_array(X, handle_sparse="error")

Possible solution
I'm not sure why the code above is necessary to reject sparse input. However I tried changing to allow sparse and tested it. I don't encounter any issue as the estimator works as expected.

X = check_array(X, handle_sparse="allow")

Is this the right way?

I've created a test method under test_pipeline to test this scenario. I can submit a PR if you want to review.

My dev environment:
Package Version

fasttext 0.9.2
numpy 1.21.4
pandas 1.3.4
pure-predict 0.0.4
pytest 6.2.5
scikit-learn 1.0.1
scipy 1.7.2

The text was updated successfully, but these errors were encountered:

cyan198 added the enhancement New feature or request label Nov 23, 2021

cyan198 assigned denver1117 Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow sparse input for naive bayes classifier #18

Allow sparse input for naive bayes classifier #18

cyan198 commented Nov 23, 2021

Allow sparse input for naive bayes classifier #18

Allow sparse input for naive bayes classifier #18

Comments

cyan198 commented Nov 23, 2021