Mushroom_Classifier-Python

My postgraduate studies project: A classifier to determine whether a mushroom is poisonous or edible

In this project, a classifier was built to determine if a mushroom is poisonous based on its external appearance characteristics. An attempt was also made to interpret the classifier using the DALEX package and to explain the prediction for a selected observation.

The data was donated in April 2021 on the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets/Secondary+Mushroom+Dataset) and contains 61069 hypothetical fungal observations that were randomly simulated from the popular Mushroom Data Set (1987), which contains information on 173 fungal species. The dataset was created as part of the thesis work of D. Heider and G. Hattab at the University of Marburg.

The set consists of a class target variable determining whether the mushroom is poisonous or edible and 20 variables.

What's interesting?

Label encoding: Dataset includes mostly categorical variables. Due to the fact that machine learning algorithms perform better with numerical inputs, categorical encoding was needed. Because of the rather large number of classes in the qualitative variables, Label Encoding was used because One-Hot Encoding would generate a very large number of predictors. Encoding was performed with Sci-Kit Learn.
DALEX: The project was an opportunity for me to try out the DALEX library which includes methods for the exploration, explanation and visualization of machine learning models. Recently, there is a growing interest in the topic of XAI (Explainable artificial intelligence) and the development of new XAI methods. This is due to the increasing demand for interpretation of "black box" models. In my work, I've visualized the differences between 3 models and i showed how each model deals with classification of selectd observation (example histogram for Logistic Regression model below).

Workflow of a project

Project is divided into parts:

Introduction
Data description
Preparing data for modeling
3.1. Importing Libraries and Dataset
3.2. EDA
   3.2.1. Continuous Variables
   3.3.2. Qualitative variables
3.3. Data Cleaning (NAs)
   3.3.1. Label Encoding
   3.3.2. Checking correlation between variables
3.4. Split Dataset to Train and Test sets
Modeling
4.1.
   4.1.1. Logistic regression
   4.1.2. Support vector machine (SVM)
   4.1.3. Random Forest
4.2. Cross-validation
   4.2.1. Logistic regression
   4.2.2. Support Vector Machine (SVM)
   4.2.3. Random Forest
4.3. Summary
4.4. Interpretation of models with DALEX
Conclusions

Jupyter Notebook 6.3.0

Python libraries:

NumPy
Pandas
Matplotlib.pyplot
Seaborn
SciKit-Learn
DALEX

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Mushroom_Classifier.html		Mushroom_Classifier.html
Mushroom_Classifier.ipynb		Mushroom_Classifier.ipynb
README.md		README.md
secondary_data.csv		secondary_data.csv
secondary_data_meta.txt		secondary_data_meta.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mushroom_Classifier-Python

What's interesting?

Workflow of a project

Jupyter Notebook 6.3.0

About

Uh oh!

Releases

Packages

Languages

hhnks/Mushroom_Classifier-Python

Folders and files

Latest commit

History

Repository files navigation

Mushroom_Classifier-Python

What's interesting?

Workflow of a project

Jupyter Notebook 6.3.0

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages