Project-1: Spam Detection

Overview

This project implements a machine learning model to classify SMS messages as either "spam" or "ham" (non-spam). It leverages natural language processing (NLP) techniques and the Multinomial Naive Bayes classifier for accurate classification.

Dataset

The SMS spam dataset used in this project contains text messages labeled as "spam" or "ham". It was sourced from https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset.

Steps Involved

Data Preprocessing: i. Text cleaning: Lowercasing, punctuation removal. ii. Tokenization and removal of stopwords. iii. Stemming using Porter stemming algorithm.
Feature Engineering: i. TF-IDF vectorization: Converting text data into numerical features.
Model Selection and Training: Utilized Multinomial Naive Bayes classifier for its suitability in text classification tasks. Trained the model on a labeled dataset.
Model Evaluation: Evaluated performance using metrics such as precision, recall, and F1-score. Visualized results with a confusion matrix.
Hyperparameter Tuning: Used GridSearchCV for optimizing model parameters like alpha for better accuracy.
Deployment and Usage: Saved the trained model and TF-IDF vectorizer for future predictions on new SMS messages.

Files Included

Z_Rock_ML_Internship_Project_1.ipynb: Jupyter notebook containing the entire project code and detailed explanations.
spam.csv: Dataset used for training and testing the model.
requirements.txt: List of Python dependencies required to run the project.
spam_detection_model.pkl: Serialized Multinomial Naive Bayes classifier trained to classify SMS messages as 'spam' or 'ham'.
tfidf_vectorizer.pkl: Serialized TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer used to transform text data into numerical features for SMS spam detection.

Usage

Clone the repository using the following commands:

git clone https://github.com/milap573/Zrock-internship-2024-Project1.git
cd Zrock-internship-2024-Project1

Install dependencies

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project-1: Spam Detection

Overview

Dataset

Steps Involved

Files Included

Usage

Install dependencies

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
Z_Rock_ML_Internship_Project_1.ipynb		Z_Rock_ML_Internship_Project_1.ipynb
requirements.txt		requirements.txt
spam.csv		spam.csv
spam_detection_model.pkl		spam_detection_model.pkl
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl

milap573/Zrock-internship-2024-Project1

Folders and files

Latest commit

History

Repository files navigation

Project-1: Spam Detection

Overview

Dataset

Steps Involved

Files Included

Usage

Install dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages