Skip to content

Latest commit

 

History

History
31 lines (24 loc) · 2.05 KB

README.md

File metadata and controls

31 lines (24 loc) · 2.05 KB

Project Overview

  • With this project, you can get information on whether sms messages are spam.
  • The dataset used for the model includes sms messages and whether these messages are spam or not.
  • Built a client facing API using streamlit

Code and Resources Used:

  • Python Version : 3.10.9
  • Packages : pandas ,matplotlib,sklearn,pickle,streamlit,warnings ,nltk ,plotly and wordcloud

Data Cleaning

  • With NLP, sms messages were separated into words, those containing only alphanumeric characters, those without punctuation marks, and those without English stopwords were taken and these words were added to the dataset as a new column with word roots using PorterStemmer.

EDA

  • At this stage, an interactive pie chart was drawn to see the percentages of spam and non-spam messages in the data set. alt text

  • In conclusion, we can say that the bias of the data set is high, as there is a large difference between the percentages of spam and non-spam messages.

  • I used Word Cloud to visualize the most frequent words in spam messages and non-spam messages alt text alt text

Model Building

  • The frequencies of the words were calculated using the CountVectorizer library. The weights of the words were calculated with the TfidfVectorizer
  • Then, as a result of the fit and estimation processes using BernoulliNB, the accuracy value of the model was obtained as 0.96. alt text alt text

APP

  • By using Streamlit, it is estimated whether the sms texts entered by the user are spam or not through the pre-recorded model alt text