This repository contains the solution to the Quora Question Pair Challenge, a natural language processing task that consists of predicting whether a pair of questions asked on Quora are duplicates or not.
The solution was developed by the following collaborators:
The repository is organized as follows:
train_models.ipynb
: a Jupyter notebook that contains the code to preprocess the data and train the models. When executed, it creates a folder calledmodel_artifacts
that contains all the necessary information to reproduce the results.reproduce_results.ipynb
: a Jupyter notebook that contains the code to reproduce the results obtained by our models. It reads from themodel_artifacts
folder.utils.py
: a Python module that contains all the helper functions for the two previous notebooks.
To reproduce the results obtained by our models, follow these steps:
- Clone this repository:
git clone https://github.com/sarabase/quora-question-pairs.git
- Create a conda environment and install the necessary requirements. Activate the environment:
conda create --name quora_test_env --file requirements.txt
conda activate quora_test_env
- Run the train_models.ipynb notebook.
- Open the reproduce_results.ipynb notebook in Jupyter and execute the cells.