Deep Belief Network for Extractive Text Summarization
Here we explore and build an unsupervised deep belief network with the goal of becoming a commercially viable extractive text summarization tool.
Document summarization is the method of taking a document and decomposing it into its most important pieces. There are two studied methods in document summarization: extractive methods and abstractive methods. The abstractive summarization method, paraphrasing sections of the source document, is inherently much harder as it requires external domain knowledge, and thus is not widely researched. Extractive text summarization, however, is the process of copying sentences from a source document into a summary to maximize importance, and is thus more widely studied. The goal here is to build a practical method for extractive text summarization that is commercially viable. We determined three attributes of this method that are important for our goal: the method must be unsupervised, the method must be fast, and the method must be accurate. In addition, we are developing the method to be scalable, general, and valid in order to allow for future commercial intergration.
SEDA itself is a bigger project of which this is one component to bring near 0 second search, ergo, a knowledge overlay.