Water Quality Prediction Project

Objective

The primary objective of this project was to develop a predictive model to determine water quality, specifically distinguishing between potable (safe to drink) and non-potable water. This project was a part of our second-year AI course.

Data Preprocessing and Cleaning

Data Source: The water quality dataset was provided by our college and included various chemical and physical parameters.
Data Cleaning: Addressed missing values through imputation techniques, handled outliers, and corrected any inconsistencies in the dataset.

Handling Class Imbalance

Random Oversampling: To address the class imbalance issue (as potable water instances were significantly fewer than non-potable), we used random oversampling to balance the dataset by duplicating samples from the minority class.

Model Training

We trained and evaluated six different machine learning models to predict water quality:

Logistic Regression
Decision Tree
Random Forest
Support Vector Machine (SVM)
Naive Bayes
k-Nearest Neighbors (k-NN)

Hyperparameter Tuning

Grid Search: Employed grid search with cross-validation to find the optimal hyperparameters for each model, enhancing their performance by exhaustively searching through predefined parameter grids.

Model Evaluation

Compared the models based on accuracy, precision, recall, F1-score, and ROC-AUC to select the best-performing model.
The Random Forest model emerged as the top performer, providing a good balance between bias and variance, along with robust predictive performance.

Deployment with Streamlit

User Interface: Developed a user-friendly GUI using Streamlit, allowing users to input water quality parameters and get instant predictions on water potability.
Real-time Prediction: The Streamlit app integrated the trained model to provide real-time predictions, making it accessible for non-technical stakeholders to assess water quality quickly and easily.

Conclusion

This project demonstrated a comprehensive approach to solving the water quality prediction problem. By combining robust data preprocessing, addressing class imbalance with random oversampling, evaluating multiple models, and fine-tuning them with grid search, we developed a reliable and accessible water quality prediction system. The deployment through Streamlit ensured that the solution was user-friendly and could be utilized effectively by stakeholders for real-world applications.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
project_final_water_quality		project_final_water_quality
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Water Quality Prediction Project

Objective

Data Preprocessing and Cleaning

Handling Class Imbalance

Model Training

Hyperparameter Tuning

Model Evaluation

Deployment with Streamlit

Conclusion

About

Releases

Packages

Languages

karensamuel/water-quality-prediction-Ai-project-

Folders and files

Latest commit

History

Repository files navigation

Water Quality Prediction Project

Objective

Data Preprocessing and Cleaning

Handling Class Imbalance

Model Training

Hyperparameter Tuning

Model Evaluation

Deployment with Streamlit

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages