Although shill bidding is a common auction fraud, it is however very tough to detect. The challenge is that the behavior of fraud bidders could be very similar to normal bidders. In this research project, I will implement an unsupervised learning algorithm for clustering and labeling of shill bidding using real business data.
The challenge of this experiment is the records whose present obvious features on normal or shill biddings. I labeled those data as 'suspicious' and used supervised classification to further split the 'suspicious' records. The training data for supervised classification were those clearly labeled normal and shill bidding records.
https://github.com/zhaoxin1124ds/Shill-bidding-detection-/blob/main/Shill_bider_clustering.pdf
he data is real auction records from eBay, and is downloaded from UCI.
I separate data based on auction duration into 5 groups.
I will test below unsupervised learning algorithms:
- K-means
- Agglomerative
- Gaussian Mixture
I designed a "voting" algorithm to label the clusters:
- for each auction duration group, calculate the mean value of all feature -- group mean
- for each cluster, calculate the mean value for each feature -- feature mean if (feature mean > group mean): vote 1 else: vote 0
- for each cluster, sum the votes (total 8 features) if (vote > 4): Shill bidding if (vote < 4): Normal bidding else: Suspicious bidding
I used the recognized normal and shill bidding as training set for modeling to further split the suspicious records
- K_means is determined to have the best clustering performace.
- 6% records are recognized as shill bidding using unsupervised clustering.
- 28% records are recognized as suspicious bidding using unsupervised clustering.
- After supervised classification on suspicious bidding, totally 9% records are recognized as shill bidding.
- After being fed more data, this model should be able to predict shill bidding on eBay more efficiently.