Loan Prediction ia a very common in real-life problem that each retail bank faces atleast once in its lifetime. I done Correctly, it can save a lot of man hours at the end of a retail bank.
It is a classification Problem where we have to predict whether a loan would be approved or not.
- Salary: Applicants with high income should have more chances of loan approval
- Previous history: Applicants who have repayed their previous debts should have higher chances of loan approval.
- Loan amount: Loan approval should also depend on the loan amount. If the loan amount is less, chances of loan approval should be high.
- Loan term: Loan for less time period and less amount should have higher chances of approval.
- EMI: Lesser the amount to be paid monthly to repay the loan, higher the chances of lean approval.
Predict Loan Eligibility for Dream Housing Finance company Dream Housing Finance company deals in all kinds of home loans. They have presence across all urban, semi urban and rural areas. Customer first applies for home loan and after that company validates the customer eligibility for loan.
Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have provided a dataset to identify the customers segments that are eligible for loan amount so that they can specifically target these customers.
Libraries: sklearn
Matplotlib
pandas
seaborn
NumPy
train["Gender"].fillna(train["Gender"].mode()[0], inplace=True)
train["Married"].fillna(train["Married"].mode()[0], inplace=True)
train["Dependents"].fillna(train["Dependents"].mode()[0], inplace=True)
train["Self_Employed"].fillna(train["Self_Employed"].mode()[0], inplace=True)
train["Credit_History"].fillna(train["Credit_History"].mode()[0], inplace=True)
Due to the presence of Outliers bulk of data in the Loan amount is at the left and the tail at the right is longer i.e. the data has Right skewness. we can use Log tranformation to remove the skewness of the data, it does not affect the small values much but reduces the larger values.
# Log transformation
train["LoanAmount_log"] = np.log(train["LoanAmount"])
test_data["LoanAmount_log"] = np.log(test_data["LoanAmount"])
For optimization we are using Cross Validation and Hyper Parameter Tuning.
- Accuracy:
{'LogisticRegression': [0.7377049180327868, 0.04056751358116207],
'KNeighborsClassifier': [0.6140077302412368, 0.016665887272068795],
'GaussianNB': [0.783406637345062, 0.023724757270509555],
'DecisionTreeClassifier': [0.693815807010529, 0.01954930344811238],
'RandomForestClassifier': [0.7882980141276823, 0.019662051295215595],
'AdaBoostClassifier': [0.7866186858589896, 0.02231507202012775],
'GradientBoostingClassifier': [0.7720111955217913, 0.030242126729569573],
'XGBClassifier': [0.7589497534319605, 0.01834053196492521]}
- roc_auc:
{'LogisticRegression': [0.7613108752272838, 0.0572604451135635],
'KNeighborsClassifier': [0.5091802186461629, 0.025701960476993368],
'GaussianNB': [0.7545628021789013, 0.025945127116381292],
'DecisionTreeClassifier': [0.6466124020458385, 0.031276977676230555],
'RandomForestClassifier': [0.759339761545644, 0.03629165276048184],
'AdaBoostClassifier': [0.7278719044972914, 0.04367138332205218],
'GradientBoostingClassifier': [0.7357179335971906, 0.04536974947781717],
'XGBClassifier': [0.7614087443344408, 0.025850559057545144]}
Data Imputation
Cross Validation
Hyperparameter Tuning
If you have any feedback, please reach out at [email protected]
I am an AI Enthusiast and Data science & ML practitioner