Flu-shot-learning-predict-H1N1-and-seasonal-Flu-vaccines

Data Cleaning and EDA

By utilising the median and knn imputer to fill in some columns that had 50% of the values missing, the data was biassed more in favour of one variable.
Health Insurance had a scenario where more information was needed, with 0 denoting a lack of insurance and 1 denoting those who have it. So, post missing value imputation, all missing values are substituted with 1, which causes a critical bias in the medical domain. Hence, eliminating missing is thought to be the least significant bias as opposed to portraying a dataset with biassed data.
Rest of the missing are filled via KNN imputer
Categorical variables are converted into numerical variables via ordinal encoding.
Given that every column contains a discrete variable, there are no outliers.

Approach 1

Feature selection

There are several feature selection approaches, including recursive feature elimination, which produced fewer features and less precision. As a result, Forward feature selection with Lightbgm as an estimator provided the best accuracy.

Model Creation and selection

A feature selected using forward feature selection was trained using xgboost, random forest, catboost, and lightbgm, with catboost and xgboost providing the best accuracy.
Although accuracy values are not satisfied, hyperparameter tweaking techniques like Random search cv, Generic algorithm, Hyperopt, and optuna are applied.
When Y1 and Y2 are merged, these models provided less accurate results, resulting in a 2D array.

Approach 2

Feature selection

The optimal feature for both Y1 and Y2 is chosen using the forward feature selection technique; however, choosing different features for Y1 and Y2 makes deployment challenging.

Model Creation and selection

Instead of using multioutput functions during training, which some machine learning algorithms are not designed for, y1 and y2 are combined. As a result, y1 and y2 receive distinct training.
As Y1 and Y2 were trained independently, accuracy was good.

Future Work

Hyperparameter Tuning
Getting Higher accuracy by combining y1 and y2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Flu shot learning predict H1N1 and seasonal Flu vaccines.ipynb		Flu shot learning predict H1N1 and seasonal Flu vaccines.ipynb
README.md		README.md
test_set_features.csv		test_set_features.csv
training_set_features.csv		training_set_features.csv
training_set_labels.csv		training_set_labels.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flu-shot-learning-predict-H1N1-and-seasonal-Flu-vaccines

Data Cleaning and EDA

Approach 1

Feature selection

Model Creation and selection

Approach 2

Feature selection

Model Creation and selection

Future Work

About

Releases

Packages

Languages

loki4514/Flu-shot-learning-predict-H1N1-and-seasonal-Flu-vaccines

Folders and files

Latest commit

History

Repository files navigation

Flu-shot-learning-predict-H1N1-and-seasonal-Flu-vaccines

Data Cleaning and EDA

Approach 1

Feature selection

Model Creation and selection

Approach 2

Feature selection

Model Creation and selection

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages