Losses related to credit card fraud will grow to $43 billion within five years and climb to $408.5 billion globally within the next decade, according to a recent Nilson Report — meaning that credit card fraud detection has become more important than ever.
The sting of these rising costs will be felt by all parties within the payment lifecycle: from banks and credit card companies who foot the bill of such fraud, to the consumers who pay higher fees or receive lower credit scores, to merchants and small businesses who are slapped with chargeback fees.
With digital crime and online fraud of all kinds on the rise, it’s more important than ever for organizations to take firm and clear steps to prevent payment card fraud through advanced technology and strong security measures.
Credit card fraud is the act of using another person’s credit card to make purchases or request cash advances without the cardholder’s knowledge or consent. These criminals may obtain the card itself through physical theft, though increasingly fraudsters are leveraging digital means to steal the credit card number and accompanying personal information to make illicit transactions.
There is some overlap between identity theft and credit card theft. In fact, credit card theft is one of the most common forms of identity theft. In such cases, a fraudster uses an individual’s personal information, which is often stolen as part of a cyberattack or data breach, to open a new account that the victim does not know about. This activity is considered both identity fraud and credit card fraud.
In this machine learning project, we solve the problem of detecting credit card fraud transactions using machine numpy, scikit learn, and few other python libraries. We overcome the problem by creating a binary classifier and experimenting with various machine learning techniques to see which fits better.
The results obtained from this project can be used by various stakeholders within the bank such as
- Credit risk department
- Credit analysts
- Bank fraud team
- Cybersecurity team
For any bank or financial organization, credit card fraud detection is of utmost importance. We have to spot potential fraud so that consumers can not bill for goods that they haven’t purchased. The aim is, therefore, to create a classifier that indicates whether a requested transaction is a fraud.
Python Version:3.9.12
Packages:Pandas,Numpy,Scikit learn,Matplotlib,Seaborn,Imblearn, Collection, Intertools
Data Source:https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
The datasets used in this project were downloaded from https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud. I then read the two csv files using the pd.read_csv() command.
After downloading the data, I needed to clean it up so that it was usable for our model. In our case, the dataset did not contain any missing values and the data was of the correct format.
The data only consists of numerical variables, no categorical variables were present. I looked at different distributions for the numeric data. Below are highlights from the data visualization section
Our dataset suffers a serious problem of class imbalance. The genuine (not fraud) transactions are more than 99% with the credit card fraud transactions constituting 0.17%.
With such a distribution, if we train our model without taking care of the imbalance issues, it predicts the label with higher importance given to genuine transactions (as there is more data about them) and hence obtains more accuracy.
The class imbalance problem can be solved by various techniques. Oversampling is one of them.
Oversample the minority class is one of the approaches to address the imbalanced datasets. The easiest solution entails doubling examples in the minority class, even though these examples contribute no new data to the model.
Instead, new examples may be generated by replicating existing ones. The Synthetic Minority Oversampling Technique, or SMOTE for short, is a method of data augmentation for the minority class. The SMOTE package using the imblearn library. Now that our dataset was balanced, we proceed with the model building.
We train different models on our dataset and observe which algorithm works better for our problem. This is actually a binary classification problem as we have to predict only 1 of the 2 class labels. We can apply a variety of algorithms for this problem like Random Forest, Decision Tree, Support Vector Machine algorithms, etc.
In this machine learning project, we build 6 classifiers and see which one works best
The 6 different classifiers used are:
- Logistic regression Classifier
- Decision tree Classifier
- Random forest Classifier
- K Nearest Neighbor Classifier
- Linear Support Vector Classifier
- Gaussian Naive Bayes Classifier
The reason why I chose this algorithms is beacause since we are dealing with a classification problem these models work best with categorical variables. In addition, these models are easy to implement.
In order to evaluate the model performance for the different classifiers, three classification metrics were used:
- Classification report - Classification report is another way to evaluate the classification model performance. It displays the precision, recall, f1 and support scores for the model.
Precision can be defined as the percentage of correctly predicted positive outcomes out of all the predicted positive outcomes. It can be given as the ratio of true positives (TP) to the sum of true and false positives (TP + FP).
So, Precision identifies the proportion of correctly predicted positive outcome. It is more concerned with the positive class than the negative class.
Mathematically, precision can be defined as the ratio of TP to (TP + FP).
Recall
Recall can be defined as the percentage of correctly predicted positive outcomes out of all the actual positive outcomes. It can be given as the ratio of true positives (TP) to the sum of true positives and false negatives (TP + FN). Recall is also called Sensitivity.
Recall identifies the proportion of correctly predicted actual positives.
Mathematically, recall can be given as the ratio of TP to (TP + FN).
f1-score is the weighted harmonic mean of precision and recall. The best possible f1-score would be 1.0 and the worst would be 0.0. f1-score is the harmonic mean of precision and recall. So, f1-score is always lower than accuracy measures as they embed precision and recall into their computation. The weighted average of f1-score should be used to compare classifier models, not global accuracy.
Support is the actual number of occurrences of the class in our dataset.
- Confusion matrix - A confusion matrix is a tool for summarizing the performance of a classification algorithm. A confusion matrix will give us a clear picture of classification model performance and the types of errors produced by the model. It gives us a summary of correct and incorrect predictions broken down by each category. The summary is represented in a tabular form.
Four types of outcomes are possible while evaluating a classification model performance. These four outcomes are described below:-
True Positives (TP) – True Positives occur when we predict an observation belongs to a certain class and the observation actually belongs to that class.
True Negatives (TN) – True Negatives occur when we predict an observation does not belong to a certain class and the observation actually does not belong to that class.
False Positives (FP) – False Positives occur when we predict an observation belongs to a certain class but the observation actually does not belong to that class. This type of error is called Type I error.
False Negatives (FN) – False Negatives occur when we predict an observation does not belong to a certain class but the observation actually belongs to that class. This is a very serious error and it is called Type II error.
- Roc curve - ROC Curve stands for Receiver Operating Characteristic Curve. An ROC Curve is a plot which shows the performance of a classification model at various classification threshold levels. The ROC Curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold levels. True Positive Rate (TPR) is also called Recall. It is defined as the ratio of TP to (TP + FN). False Positive Rate (FPR) is defined as the ratio of FP to (FP + TN).
In the ROC Curve, we will focus on the TPR (True Positive Rate) and FPR (False Positive Rate) of a single point. This will give us the general performance of the ROC curve which consists of the TPR and FPR at various threshold levels. So, an ROC Curve plots TPR vs FPR at different classification threshold levels. If we lower the threshold levels, it may result in more items being classified as positve. It will increase both True Positives (TP) and False Positives (FP).
ROC AUC stands for Receiver Operating Characteristic - Area Under Curve. It is a technique to compare classifier performance. In this technique, we measure the area under the curve (AUC). A perfect classifier will have a ROC AUC equal to 1, whereas a purely random classifier will have a ROC AUC equal to 0.5, therefore, ROC AUC is the percentage of the ROC plot that is underneath the curve.
Based on the results obtained from these metrics, the Random forest model far outperformed the the other approaches on the test and validation sets as shown below
This results makes sense intuitively, since random forest algorithm is a collection of mutliple decision tree algorithm, hence it will perform better compared to the other algorithms.
- In this python machine learning project, I built a binary classifier using the 6 algorithms to detect credit card fraud transactions. Through this project, I applied techniques to address the class imbalance issues and achieved an accuracy of more than 90%. The random forest model yields a very good performance as indicated by the model accuracy which was found to be 0.99990035.
- To address the issue of class imbalance problem, we used the oversampling technique, this was done by the SMOTE package imported from the imblearn module.
- ROC AUC of our models approaches towards 1. So, we can conclude that our classifier does a very good job in predicting whether a transcation is genuine or fraud.