Vandalism detection (task 2) - WSDM Cup 2017
We are a team of 4 from Complex Network Research Group (Murata Laboratory) - Tokyo Insitute of Technology. This repository is our submission to the 2017 WSDM Cup. In summary, the task is to detect vandalism in Wikidata dumps.
- Setup personal computer to match each others. (Python 3.5.2, Tensorflow 0.10, scikit-learn 0.17.1, Anaconda virtual env, coding style, etc.)
- Literature review. (Paper listing, reading, and discussion)
- Competition score metric analysis.
- Finalize and present possible approaches.
- 27th: List of vandalism detection papers; setup working environments; study wikidata dumps; analyze WSDM'17 score metrics.
- 28th: Paper reading; discussion about Random Forest and features selection; focusing on Random Forest model and its variations.
- 29th: Run the provided reference paper's code on the lab's machine; study related techniques to RR; study NN techiqnues that complement RR.
- 30th: Review week 1.
- Preprocess wikimedia data, study previous features extration code.
- Implement simple random forest model based on [1].
- Working baseline model and sketch of neural network model.
- 3rd: Features from the baseline model [1] are all hand-picked.
- 4th: Meeting cancelled.
- 5th: Meeting cancelled.
- 6th: Some features are missing compared to the original implementation [1]. Using only 29 available features now yields 0.02 on ROC. This result is extremely low.
- 7th: