Here is one of my submissions to Kaggle challenge 'Bag of Words meets Bags of Popcorn'.
It is based on the idea of combining pre-trained word2vec embeddings with convolutional networks proposed by Yoon Kim [http://arxiv.org/abs/1408.5882].
The code consists of two IPython Notebooks:
-
Process Kaggle Dataset Train+Test.ipynb contains data pre-processing.
-
Train CNN IMDB.ipynb implements convolutional network with one convolutional layer.
This model (trained for 3 epochs) yields AUC = 0.96823 (on test data).
Ensemble of three convolutional networks (having different number of convolutional layers and feature maps) gives AUC = 0.97310.