This project firstly creates a binary classifier based on naïve Bayes. Its purpose is determing whether a text is spam or not. After creation, the dependency of the classifier's performance on different vectorization methods is tested (see spam_classifier.py
). To be within the scope, this projects limits itself to the vectorization methods: bag of words, tf-idf, bag of N-Gram (n=2) and bag of N-Gram (n=1 and 2). Secondly, this project analysis the similarity between spam messages (see textual_similarity.py
). As a first step for reproducing this project's work, it is recommended to analysis the data (see data_analysis.py
).
-
Notifications
You must be signed in to change notification settings - Fork 0
sid030sid/nlp-text-classification-and-textual-similarity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published