Skip to content

Latest commit

 

History

History
31 lines (17 loc) · 1.85 KB

README.md

File metadata and controls

31 lines (17 loc) · 1.85 KB

Loan-Eligibility

This script utilizes a logistical regression to predict if we should give a loan to a person

Documentation This code imports the necessary libraries including numpy, pandas, missingno, and matplotlib.

The loan-train and loan-test datasets are read using pandas and stored in the variables loan_train and loan_test respectively.

The missing values in the datasets are then filled using either the mode (most frequent value) or the mean for the specific columns.

Categorical variables in the datasets are transformed into numerical values for processing by the machine learning model.

A Logistic Regression model is imported from the sklearn library and trained on the training data.

The accuracy of the model is then computed and the coefficients and intercepts are printed.

Things to Add

Data Visualization: Adding more data visualization techniques like bar plots, histograms, boxplots to better understand the distribution of the data and identify any outliers.

Feature Engineering: Adding new features based on the current data that can improve the model's performance.

Hyperparameter Tuning: Experimenting with different hyperparameter values to improve the model's accuracy and avoid overfitting.

Model Ensemble: Using an ensemble of multiple models to improve the overall performance and reduce the chances of overfitting.

Handling Imbalanced Data: The data is often imbalanced, where one class has a higher frequency than the other. This can be handled by resampling, oversampling, or undersampling to create a balanced dataset.

Handling Missing Values: Dealing with missing values in a better way, such as imputing values or removing the rows with missing values.

Adding Advanced Models: Experimenting with advanced machine learning models like Random Forest, XGBoost, or Artificial Neural Networks to see if they perform better than the current model