Skip to content

The model that predicts spam filtering with 0.98 accuracy using NLP and Naive Bayes

Notifications You must be signed in to change notification settings

gamzeaslan/Spam_Detection_App

Repository files navigation

Project Overview

  • With this project, you can get information on whether sms messages are spam.
  • The dataset used for the model includes sms messages and whether these messages are spam or not.
  • Built a client facing API using streamlit

Code and Resources Used:

  • Python Version : 3.10.9
  • Packages : pandas ,matplotlib,sklearn,pickle,streamlit,warnings ,nltk ,plotly and wordcloud

Data Cleaning

  • With NLP, sms messages were separated into words, those containing only alphanumeric characters, those without punctuation marks, and those without English stopwords were taken and these words were added to the dataset as a new column with word roots using PorterStemmer.

EDA

  • At this stage, an interactive pie chart was drawn to see the percentages of spam and non-spam messages in the data set. alt text

  • In conclusion, we can say that the bias of the data set is high, as there is a large difference between the percentages of spam and non-spam messages.

  • I used Word Cloud to visualize the most frequent words in spam messages and non-spam messages alt text alt text

Model Building

  • The frequencies of the words were calculated using the CountVectorizer library. The weights of the words were calculated with the TfidfVectorizer
  • Then, as a result of the fit and estimation processes using BernoulliNB, the accuracy value of the model was obtained as 0.96. alt text alt text

APP

  • By using Streamlit, it is estimated whether the sms texts entered by the user are spam or not through the pre-recorded model alt text

About

The model that predicts spam filtering with 0.98 accuracy using NLP and Naive Bayes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published