- With this project, you can get information on whether sms messages are spam.
- The dataset used for the model includes sms messages and whether these messages are spam or not.
- Built a client facing API using streamlit
- Python Version : 3.10.9
- Packages : pandas ,matplotlib,sklearn,pickle,streamlit,warnings ,nltk ,plotly and wordcloud
- With NLP, sms messages were separated into words, those containing only alphanumeric characters, those without punctuation marks, and those without English stopwords were taken and these words were added to the dataset as a new column with word roots using PorterStemmer.
-
At this stage, an interactive pie chart was drawn to see the percentages of spam and non-spam messages in the data set.
-
In conclusion, we can say that the bias of the data set is high, as there is a large difference between the percentages of spam and non-spam messages.
-
I used Word Cloud to visualize the most frequent words in spam messages and non-spam messages