Kaggle-TalkingData

This repository contains my solution for the Kaggle competition TalkingData AdTracking Fraud Detection. It ranked 197 out of 3967 (Top 5%).

The goal of this competition was to predict if mobile users will install an app they have clicked (Click-Through prediction). The biggest challenge in this competition was to handle the huge amount of data (about 250 millions rows).

This solution is based on 2 different models:

Field-Aware Factorization Machine
Gradient Boosted Decision Tree

The Field-Aware Factorization Machine was combined with an unsupervised gradient boosted decision tree (with 30 trees) for feature engineering. The gradient boosted decision tree is regularly trained in a supervised way but instead of using its target predictions the leaf index predictions are used as features for the Field-Aware Factorization machine. This approach was proposed by Xinran He et al. Practical Lessons from Predicting Clicks on Ads at Facebook and used in the winning solution of the previous Click-Trough prediction competition Display Advertising Challenge. Field-Aware Factorization machines proved to be a very strong powerful model in past Click-Trough prediction competitions. They work well when used with categorical features. The used library is xLearn.

The Gradient Boosted Decision Tree is trained using various Groupby and Aggregating features including aggregate functions count, var, mean, nuniqueand cumcount in addition to time-to-next-click features. Those features Those features were used in almost all top solutions and many kernels. The used library was LightGBM because its impressived speed given the huge amount of data.

These 2 models were ensembled using weighted blending.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Ensembling.ipynb		Ensembling.ipynb
ExploratoryDataAnalysis.ipynb		ExploratoryDataAnalysis.ipynb
FFMPredictor.ipynb		FFMPredictor.ipynb
FeatureEngineering.ipynb		FeatureEngineering.ipynb
LGBMPredictor.ipynb		LGBMPredictor.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle-TalkingData

About

Releases

Packages

Languages

pklauke/Kaggle-TalkingData

Folders and files

Latest commit

History

Repository files navigation

Kaggle-TalkingData

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages