-
Loading the data
Load the raw data into python lists -
Process to sentences
Convert the raw reviews to sentences -
Text preprocessing
Tokenize the texts using keras. preprocessing.text module -
Create Training set and validation set
Randomly pick reviews(which are now converted to lists of tokens. one list per review) from the pool of dataset and assing it as train and test samples. -
Create word2vec embeddings using GloVe
Use the GloVe dataset to convert each word of the review into a 100 dimension tensor, which is ready to be sent into our model for training. Get the GloVe Dataset from https://nlp.stanford.edu/projects/glove/ -
Training
Used LSTM network for our model. Used dropout activation function = sigmoid error function = binary_crossentropy