Project Overview

With this project, you can get information on whether sms messages are spam.
The dataset used for the model includes sms messages and whether these messages are spam or not.
Built a client facing API using streamlit

Code and Resources Used:

Python Version : 3.10.9
Packages : pandas ,matplotlib,sklearn,pickle,streamlit,warnings ,nltk ,plotly and wordcloud

With NLP, sms messages were separated into words, those containing only alphanumeric characters, those without punctuation marks, and those without English stopwords were taken and these words were added to the dataset as a new column with word roots using PorterStemmer.

At this stage, an interactive pie chart was drawn to see the percentages of spam and non-spam messages in the data set.
In conclusion, we can say that the bias of the data set is high, as there is a large difference between the percentages of spam and non-spam messages.
I used Word Cloud to visualize the most frequent words in spam messages and non-spam messages

The frequencies of the words were calculated using the CountVectorizer library. The weights of the words were calculated with the TfidfVectorizer
Then, as a result of the fit and estimation processes using BernoulliNB, the accuracy value of the model was obtained as 0.96.

By using Streamlit, it is estimated whether the sms texts entered by the user are spam or not through the pre-recorded model

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Data_Cleaning.ipynb		Data_Cleaning.ipynb
EDA.ipynb		EDA.ipynb
README.md		README.md
app.png		app.png
classification_report.png		classification_report.png
cleaning_data.csv		cleaning_data.csv
confusion_matrix.png		confusion_matrix.png
data.csv		data.csv
model_building.ipynb		model_building.ipynb
non_spam.png		non_spam.png
pie.png		pie.png
spam_detection_model.sav		spam_detection_model.sav
spam_sms_predict_model_app.py		spam_sms_predict_model_app.py
spam_wc.png		spam_wc.png
vectorizer.pkl		vectorizer.pkl