Skip to content

Latest commit

 

History

History
21 lines (15 loc) · 753 Bytes

README.md

File metadata and controls

21 lines (15 loc) · 753 Bytes

Overview

Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or used for featurization and dimensionality reduction in later stages of the data analysis.

LDA (Latent Dirichlet Allocation) is a topic model library. I used LDA in this project to derive ‘topics’ from the dataset provided, the code was written in Python.

Dataset

The dataset was obtained from Yelp’s website.

Script Steps

  1. Prepare the data:
  • Tokenizing
  • Stopping
  • Stemming
  1. Construct a Document-term Matrix
  2. Apply the LDA Model
  3. Examine the results

License

This project is licensed under the GNU 2.0 License - see the LICENSE.md file for details