Project uploaded

abhisheks008 · Jun 7, 2024 · df81699 · df81699
1 parent c9993dd
commit df81699
Show file tree

Hide file tree

Showing 30 changed files with 4,526 additions and 4,921 deletions.
diff --git a/Medical Recommendation System/Dataset/README.md b/Medical Recommendation System/Dataset/README.md
@@ -0,0 +1,33 @@
+# Medical Recommendation System Dataset 
+
+The Dataset used here is taken from the Kaggle database website. You can download the file from the link given here, [Medical Recommendation System](https://www.kaggle.com/datasets/noorsaeed/medicine-recommendation-system-dataset)
+
+This dataset includes various medical features that are used to recommend treatments, medications, or other medical interventions. It can be used for building machine learning models to improve decision-making processes in healthcare settings.
+
+The dataset typically includes columns such as patient symptoms, diagnosis, recommended treatments, and possibly demographic information. It is structured to facilitate tasks such as classification, prediction, and recommendation in the medical domain.
+
+## About the dataset
+
+-***Shape of the training dataset:*** 4920 rows * 133 columns
+
+Apart from the Training dataset, there are 6 other important datasets to provide details about the predicted disease:
+
+#### Description:
+- ***Description***: contains descriptions of the predicted disease
+
+#### Diets:
+- ***Description***: contains diets regarding the predicted disease
+
+#### Medications:
+- ***Description***: contains proposed medicine for the predicted disease
+
+#### Precautions:
+- ***Description***: contains precautions to prevent the predicted disease in future
+
+#### Symptoms:
+- ***Description***: contains symptoms of the predicted disease
+
+#### Workout:
+- ***Description***: contains recommendations to be followed overall for the predicted disease
+
+---
diff --git a/...ecommendation System/Symptom-severity.csv → ...ation System/Dataset/Symptom-severity.csv b/...ecommendation System/Symptom-severity.csv → ...ation System/Dataset/Symptom-severity.csv
diff --git a/...cal Recommendation System/description.csv → ...mmendation System/Dataset/description.csv b/...cal Recommendation System/description.csv → ...mmendation System/Dataset/description.csv
diff --git a/Medical Recommendation System/diets.csv → ...l Recommendation System/Dataset/diets.csv b/Medical Recommendation System/diets.csv → ...l Recommendation System/Dataset/diets.csv
diff --git a/...Recommendation System/medical_dataset.csv → ...dation System/Dataset/medical_dataset.csv b/...Recommendation System/medical_dataset.csv → ...dation System/Dataset/medical_dataset.csv
diff --git a/...cal Recommendation System/medications.csv → ...mmendation System/Dataset/medications.csv b/...cal Recommendation System/medications.csv → ...mmendation System/Dataset/medications.csv
diff --git a/... Recommendation System/precautions_df.csv → ...ndation System/Dataset/precautions_df.csv b/... Recommendation System/precautions_df.csv → ...ndation System/Dataset/precautions_df.csv
diff --git a/Medical Recommendation System/symtoms_df.csv → ...ommendation System/Dataset/symtoms_df.csv b/Medical Recommendation System/symtoms_df.csv → ...ommendation System/Dataset/symtoms_df.csv
diff --git a/Medical Recommendation System/workout_df.csv → ...ommendation System/Dataset/workout_df.csv b/Medical Recommendation System/workout_df.csv → ...ommendation System/Dataset/workout_df.csv
diff --git a/Medical Recommendation System/Images/ANN_model_summary.png b/Medical Recommendation System/Images/ANN_model_summary.png
diff --git a/Medical Recommendation System/Images/Count_of_each_disease.png b/Medical Recommendation System/Images/Count_of_each_disease.png
diff --git a/Medical Recommendation System/Images/README.md b/Medical Recommendation System/Images/README.md
@@ -0,0 +1,74 @@
+<u><h1>EDA on Medical Recommendation Dataset using Machine Learning</h1></u>
+
+<img alt="graph" src="./Images/accuracy_comparison_of_models.png">
+<u><h2>Count of Each Disease :</h2></u>
+
+<img alt="graph" src="./Images/Count_of_each_disease.png">
+
+<u><h2>Correlation Heatmap</h2></u>
+
+<img alt="graph" src="./Images/heatmap.png">
+
+<u><h1>ROC Scores of Different Models</h1></u>
+
+<u><h2> ROC curve: Logistic Regression</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_logistic_regression.png">
+
+<u><h2> ROC curve: Decision Tree</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_decision_tree.png">
+
+<u><h2> ROC curve: Random Forest Classifier</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_random_forest.png">
+
+<u><h2> ROC curve: Support Vector Classifier</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_support_vector_classifier.png">
+
+<u><h2> ROC curve: Gradient Boosting</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_gradient_boosting.png">
+
+<u><h2> ROC curve: Naive Bayes</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_naive_bayes.png">
+
+<u><h2> ROC curve: K-nearest neighbors</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_k_nearest_neighbors.png">
+
+<u><h2> ROC curve: Artificial Neural Networks</h2></u>
+
+<img alt="graph" src="./Images/ROC_curve_artificial_neural_networks.png">
+
+<u><h2> ANN Model Summary </h2></u>
+
+<img alt="graph" src="./Images/ANN_model_summary.png">
+
+<u><h2>Confusion Matrix of Best Model : SVC</h2></u>
+
+A confusion matrix is a tabular representation that summarizes the performance of a classification model. It provides insights into the model's predictive accuracy by displaying the counts of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions.
+
+<img alt="graph" src="./Images/confusion_matrix_of_best_model.png">
+
+<!-- ![Screenshot 2023-06-01 144438](https://github.com/restlesshornet/MRI-Brain-Tumor-Diagnosis/assets/88588444/260ebfcd-e914-4430-be3d-f3ee73eb8071) -->
+
+<u><h2>Accuracy v/s precision for SVC Model:</h2></u>
+
+Accuracy refers to the degree of closeness between a predicted value and the actual value. It is a measure of how correct the model's predictions are whereas precision, on the other hand, quantifies the consistency and reproducibility of the model's predictions. It assesses how well the model produces similar results for repeated experiments or runs. Achieving high accuracy means minimizing the gap between predicted and actual values, while high precision indicates low variability in the model's outputs. Both accuracy and precision are important metrics for evaluating the performance and reliability of data science models.
+
+<img alt="graph" src="./Images/SVC_model_accuracy_and_precision.png">
+
+The bar plot represents the comparison between the accuracy and precision of our model. The plot displays two bars, one for accuracy and one for precision. As we can see that both bars indicate values greater than 95%, indicating that our ML model is consistently producing correct predictions (high accuracy) and demonstrating low variability in its results (high precision). 
+
+<u><h2> Accuracy comparison:</h2></u>
+
+<img alt="graph" src="./Images/accuracy_comparison_of_models.png">
+
+This plots shows that all models except Decision Tree and Gradient Boosting have high accuracy of almost 100.
+
+<u><h2> ROC Score comparison:</h2></u>
+
+<img alt="graph" src="./Images/ROC_comparison_of_models.png">
diff --git a/Medical Recommendation System/Images/ROC_comparison_of_models.png b/Medical Recommendation System/Images/ROC_comparison_of_models.png
diff --git a/Medical Recommendation System/Images/ROC_curve_artificial_neural_networks.png b/Medical Recommendation System/Images/ROC_curve_artificial_neural_networks.png
diff --git a/Medical Recommendation System/Images/ROC_curve_decision_tree.png b/Medical Recommendation System/Images/ROC_curve_decision_tree.png
diff --git a/Medical Recommendation System/Images/ROC_curve_gradient_boosting.png b/Medical Recommendation System/Images/ROC_curve_gradient_boosting.png
diff --git a/Medical Recommendation System/Images/ROC_curve_k_nearest_neighbors.png b/Medical Recommendation System/Images/ROC_curve_k_nearest_neighbors.png
diff --git a/Medical Recommendation System/Images/ROC_curve_logistic_regression.png b/Medical Recommendation System/Images/ROC_curve_logistic_regression.png
diff --git a/Medical Recommendation System/Images/ROC_curve_naive_bayes.png b/Medical Recommendation System/Images/ROC_curve_naive_bayes.png
diff --git a/Medical Recommendation System/Images/ROC_curve_random_forest.png b/Medical Recommendation System/Images/ROC_curve_random_forest.png
diff --git a/Medical Recommendation System/Images/ROC_curve_support_vector_classifier.png b/Medical Recommendation System/Images/ROC_curve_support_vector_classifier.png
diff --git a/Medical Recommendation System/Images/SVC model -accuracy and precision.png b/Medical Recommendation System/Images/SVC model -accuracy and precision.png
diff --git a/Medical Recommendation System/Images/accuracy_comparison_of_models.png b/Medical Recommendation System/Images/accuracy_comparison_of_models.png
diff --git a/Medical Recommendation System/Images/confusion_matrix_of_best_model.png b/Medical Recommendation System/Images/confusion_matrix_of_best_model.png
diff --git a/Medical Recommendation System/Images/heatmap.png b/Medical Recommendation System/Images/heatmap.png
diff --git a/Medical Recommendation System/Model/README.md b/Medical Recommendation System/Model/README.md
@@ -0,0 +1,86 @@
+# Medical Recommendation System using Machine Learning ALgorithms
+
+## 📝 Abstract
+
+A medical recommendation system is increasingly valuable in today's healthcare landscape due to its ability to enhance personalized care, efficiency, and data-driven decision-making. These systems tailor treatments and recommendations to individual patients based on their unique health data, leading to more effective and personalized care. By automating the analysis of patient data, medical recommendation systems provide healthcare professionals with rapid and actionable insights, reducing the time required for diagnosis and treatment planning. They also ensure consistency and accuracy in medical recommendations, adhering to the latest evidence and guidelines, which can improve diagnosis accuracy and treatment outcomes.
+
+### Project Overview
+The Medical Recommendation System project aims to develop a machine learning based model to diagnose a patient on the basis of given symptoms, and propose medications, workouts, diets and precautions.
+
+### Project Directory Structure
+```
+Medical Recommendation System
+|- Dataset
+  |- 8 datasets
+  |- README.md
+|- Images
+  |- 16 images
+|- Model
+  |- medical_recommendation_system.ipynb
+  |- README.md
+|- README.md
+|- requirements.txt
+```
+
+### Methodology
+1. **Importing Libraries:**  
+   - Libraries such as NumPy, Pandas, TensorFlow, and others are imported for data manipulation, visualization, and model building.
+
+2. **Loading the Dataset:**
+   - The training and testing datasets are loaded into dataframes.
+
+3. **Data Preprocessing:**
+   - Dataset is checked for null values and a comprehensive EDA is done.
+   - The training dataset is split into train and test sets.
+
+4. **Model Structure:**
+   - Eight models have been trained: 
+     - Logistic Regression
+     - Decision Tree
+     - Random Forest Classifier
+     - Support Vector Classifier
+     - Gradient Boosting Classifier
+     - Guassian NB classifier using the Naive Bayes algorithm
+     - K-Nearest Neighbors
+     - ANN model using artificial neural networks concepts, dense layers and dropout layers
+
+5. **Training the Models:**
+   - ML models have been trained on the training dataset provided with necessary parameters to improve performance.
+   - ANN model has been trained with softmax and ReLu activation functions and sparse_categorical_crossentropy as loss function.
+
+6. **Model Performance:**
+   - Decision Tree model has a testing accuracy of 87.5, Gradient Boosting model has testing accuracy of 87.6.
+   - All other models have a testing accuracy of 100.
+
+### Model Performance
+#### Comparison of Accuracy
+
+<img alt="graph" src="./Images/accuracy_comparison_of_models.png">
+
+
+#### Comparison of ROC score:
+
+<img alt="graph" src="./Images/ROC_comparison_of_models.png">
+
+
+
+### Conclusion
+- The project explores eight different models for medical recommendation. Each model's performance is evaluated based on accuracy and ROC score.
+- The best-performing model has been selected as Support Vector Classifier.
+- The Decision Tree and Gradient Boosting models have not achieved as high an accuracy as the other models.
+- No overfitting of data has occured as evident from the accuracy on train and test dataset.
+
+
+## How to Use
+Requirements: Ensure you have the necessary libraries and dependencies installed. You can find the list of required packages in the requirements.txt file.
+
+Download Data: Download the dataset from Kaggle as mentioned in the dataset section of the project.
+
+Run the Jupyter Notebook: Open the provided Jupyter Notebook file and run each cell sequentially. Make sure to update any file paths or configurations as needed for your environment.
+
+Training and Evaluation: Train the models using the provided data and evaluate their performance using metrics such as accuracy.
+
+Interpret Results: Analyze the model's performance using the visualizations and metrics provided in the notebook.
+
+
+**Name :** **Sebonti Patra**