NLP: Text classification and textual similarity

About

This project firstly creates a binary classifier based on naïve Bayes. Its purpose is determing whether a text is spam or not. After creation, the dependency of the classifier's performance on different vectorization methods is tested (see spam_classifier.py). To be within the scope, this projects limits itself to the vectorization methods: bag of words, tf-idf, bag of N-Gram (n=2) and bag of N-Gram (n=1 and 2). Secondly, this project analysis the similarity between spam messages (see textual_similarity.py). As a first step for reproducing this project's work, it is recommended to analysis the data (see data_analysis.py).

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
documentation		documentation
.gitignore		.gitignore
README.md		README.md
data_analysis.py		data_analysis.py
spam_classifier.py		spam_classifier.py
textual_similarity.py		textual_similarity.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP: Text classification and textual similarity

About

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

sid030sid/nlp-text-classification-and-textual-similarity

Folders and files

Latest commit

History

Repository files navigation

NLP: Text classification and textual similarity

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages