Skip to content

sid030sid/nlp-text-classification-and-textual-similarity

Repository files navigation

NLP: Text classification and textual similarity

About

This project firstly creates a binary classifier based on naïve Bayes. Its purpose is determing whether a text is spam or not. After creation, the dependency of the classifier's performance on different vectorization methods is tested (see spam_classifier.py). To be within the scope, this projects limits itself to the vectorization methods: bag of words, tf-idf, bag of N-Gram (n=2) and bag of N-Gram (n=1 and 2). Secondly, this project analysis the similarity between spam messages (see textual_similarity.py). As a first step for reproducing this project's work, it is recommended to analysis the data (see data_analysis.py).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages