Receives the root directory
Loads the data and transforms all the string columns into integers
fields (not required): Column names that we want to read
Allows to split the data into train and test sets
Searches for the label in the decoded dataset, then encodes back and returns the encoded value for the label
Receives the dataManager object
Displays a graph where nodes are the different methods. Edges are sequences of method->method by timestamp.
Displays the count of IsFirst for each QueryName divided by the number of IsFirst in the dataset
Displays the count of below and above session for each QueryName divided by the number of queries in session in the dataset.
For each unique query it displays the percentage of the users that have 3 or more sessions
Displays session duration times for users that visited the site more than 3 times and users that visited 3 times or less.
Creates a decision tree in order to detect which features are best to predict the low retention
Displays the percentage of the users that have 3 sessions or less
Displays Sessions and queries
Finds Aid's with many sessions
Runs baseline methods: Random and ItemKNN
Runs recurrent neural network based on the paper: http://arxiv.org/pdf/1511.06939v4.pdf
Runs the GRU, learns from the users that have more than average amount of sessions, and predicts on users that have less than 3 sessions
Runs the gru on a specific Aid, and then on the aid's sessions
Runs PCA on the data
Runs K means on the Dataset
Runs Hirarchical Clustering on the Dataset, creating a Dendrogram visualizing the clustering process for further analysis.
Utility function for clustering.
Runs T-SNE on the Dataset