Skip to content

Latest commit

 

History

History
15 lines (12 loc) · 857 Bytes

README.md

File metadata and controls

15 lines (12 loc) · 857 Bytes

spacy-vectorizers

Scikit-learn compatible vectorizers built with spaCy NLP famework.

This repo contains customized scikit-learn compatible classes and vectorizers inspired by CountVectorizer, but with more accurate tokenization and lemmatization funcitonality with the help of spaCy NLP framework. Simple Keras-like punctuation removal support is also added.

Built on (prerequisites):

  • Python 3.5.4
  • scikit-learn 0.19.1
  • spaCy 2.0.4

Usage:

Please refer to the Usage Examples & Tests Jupyter notebook or here.