Skip to content

Latest commit

 

History

History
30 lines (14 loc) · 1.8 KB

README.md

File metadata and controls

30 lines (14 loc) · 1.8 KB

Dialogue Response System

Our goal is to design a single response system for a given context , as an exploration into how an Hierarchical Recurrent Encoder-Decoder (HRED) model proposed by Serban et al. (2016)[1] could provide the building block to a more robust chatbot system.The HRED model [2] uses a Gated Recurrent Unit (GRU) Recurrent Neural Network (RNN) and has additional functionality to remember the context.

We plan to score our HRED model using perplexity scores against the ngram/Knesser-Ney which should have a much more limited context.

We are using the movie dialogues corpus (Movie-DiC) as our main dataset as this was used in [1]. This is a corpus scraped from the internet movie script data collection. The dialogue corpus contains 132,229 dialogues containing a total of 764,146 turns which have been extracted from 753 movies [3]. This dataset cannot be made public and hence we are making our github repository public.

Directory Structure

Report -> Contains the project report

Data Exploration -> Contains analaysis of the dataset being used

ngram -> ngram language model

rnn -> rnn language model

shared -> Common utilities being used by various models.

References

[1] Iulian V. Serban , Alessandro Sordoni, Yoshua Bengio, Aaron Courville and Joelle Pineau 2016. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models.

[2] Alessandro Sordonif , Yoshua Bengiof , Hossein Vahabig , Christina Liomah , Jakob G. Simonsenh , Jian-Yun Nief 2015. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion.

[3] Banchs, R. E. 2012. Movie-DiC: A movie dialogue corpus for research and development. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 203–207.