Loan Approval Prediction

Loan Prediction ia a very common in real-life problem that each retail bank faces atleast once in its lifetime. I done Correctly, it can save a lot of man hours at the end of a retail bank.

Objectives:

It is a classification Problem where we have to predict whether a loan would be approved or not.

Hypothesis Generation:

Dependent Variables:

Salary: Applicants with high income should have more chances of loan approval
Previous history: Applicants who have repayed their previous debts should have higher chances of loan approval.
Loan amount: Loan approval should also depend on the loan amount. If the loan amount is less, chances of loan approval should be high.
Loan term: Loan for less time period and less amount should have higher chances of approval.
EMI: Lesser the amount to be paid monthly to repay the loan, higher the chances of lean approval.

Dataset:

Loan Prediction

Predict Loan Eligibility for Dream Housing Finance company Dream Housing Finance company deals in all kinds of home loans. They have presence across all urban, semi urban and rural areas. Customer first applies for home loan and after that company validates the customer eligibility for loan.

Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have provided a dataset to identify the customers segments that are eligible for loan amount so that they can specifically target these customers.

Data Fields:

Implementation:

Libraries: sklearn Matplotlib pandas seaborn NumPy

Understanding the data:

Loan approval status:

Numerical Variables:

Categorical Variables:

Misiing value imputation:

train["Gender"].fillna(train["Gender"].mode()[0], inplace=True)
train["Married"].fillna(train["Married"].mode()[0], inplace=True)
train["Dependents"].fillna(train["Dependents"].mode()[0], inplace=True)
train["Self_Employed"].fillna(train["Self_Employed"].mode()[0], inplace=True)
train["Credit_History"].fillna(train["Credit_History"].mode()[0], inplace=True)

Due to the presence of Outliers bulk of data in the Loan amount is at the left and the tail at the right is longer i.e. the data has Right skewness. we can use Log tranformation to remove the skewness of the data, it does not affect the small values much but reduces the larger values.

# Log transformation
train["LoanAmount_log"] = np.log(train["LoanAmount"])
test_data["LoanAmount_log"] = np.log(test_data["LoanAmount"])

Model Training and Evaluation:

Feature Importances:

Results of various models:

Optimizations:

For optimization we are using Cross Validation and Hyper Parameter Tuning.

Results after cross validation:

Accuracy:

{'LogisticRegression': [0.7377049180327868, 0.04056751358116207],
 'KNeighborsClassifier': [0.6140077302412368, 0.016665887272068795],
 'GaussianNB': [0.783406637345062, 0.023724757270509555],
 'DecisionTreeClassifier': [0.693815807010529, 0.01954930344811238],
 'RandomForestClassifier': [0.7882980141276823, 0.019662051295215595],
 'AdaBoostClassifier': [0.7866186858589896, 0.02231507202012775],
 'GradientBoostingClassifier': [0.7720111955217913, 0.030242126729569573],
 'XGBClassifier': [0.7589497534319605, 0.01834053196492521]}

roc_auc:

{'LogisticRegression': [0.7613108752272838, 0.0572604451135635],
 'KNeighborsClassifier': [0.5091802186461629, 0.025701960476993368],
 'GaussianNB': [0.7545628021789013, 0.025945127116381292],
 'DecisionTreeClassifier': [0.6466124020458385, 0.031276977676230555],
 'RandomForestClassifier': [0.759339761545644, 0.03629165276048184],
 'AdaBoostClassifier': [0.7278719044972914, 0.04367138332205218],
 'GradientBoostingClassifier': [0.7357179335971906, 0.04536974947781717],
 'XGBClassifier': [0.7614087443344408, 0.025850559057545144]}

Results after Hyperparameter Tuning:

Lessons Learned

Data Imputation Cross Validation Hyperparameter Tuning

Feedback

If you have any feedback, please reach out at pradnyapatil671@gmail.com

🚀 About Me

Hi, I'm Pradnya! 👋

I am an AI Enthusiast and Data science & ML practitioner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Loan Approval Prediction

Objectives:

Hypothesis Generation:

Dependent Variables:

Dataset:

Data Fields:

Implementation:

Understanding the data:

Loan approval status:

Numerical Variables:

Categorical Variables:

Misiing value imputation:

Model Training and Evaluation:

Feature Importances:

Results of various models:

Optimizations:

Results after cross validation:

Results after Hyperparameter Tuning:

Lessons Learned

Feedback

🚀 About Me

Hi, I'm Pradnya! 👋

Files

README.md

Latest commit

History

README.md

File metadata and controls

Loan Approval Prediction

Objectives:

Hypothesis Generation:

Dependent Variables:

Dataset:

Data Fields:

Implementation:

Understanding the data:

Loan approval status:

Numerical Variables:

Categorical Variables:

Misiing value imputation:

Model Training and Evaluation:

Feature Importances:

Results of various models:

Optimizations:

Results after cross validation:

Results after Hyperparameter Tuning:

Lessons Learned

Feedback

🚀 About Me

Hi, I'm Pradnya! 👋