Topic Modeling Playground

A review of the most popular topic modeling techniques.

This repository contains the code for hands-on sessions related to topic modeling. It is designed to help you understand the concepts and implementations of topic modeling techniques, including but not limited to LDA (Latent Dirichlet Allocation) and more advanced approaches based on word embeddings, such as BERTopic.

Prerequisites

Before you begin, ensure you have the following software installed:

Python 3.7 or higher
Required Python libraries (listed below)

Installation

Clone the repository to your local machine:

git clone https://github.com/mauroIstat/word-embedding-tutorial.git
cd word-embedding-tutorial

Install Dependencies

You can install the required dependencies using pip. It is recommended to create a virtual environment before installing the packages.

pip install -r requirements.txt

The requirements.txt file includes the following libraries:

pandas
numpy
scikit-learn
gensim
matplotlib
pyLDAvis
bertopic

Additional Resources

Usage

To run the notebooks or scripts for topic modeling:

Download and preprocess the dataset (if not already available).
Explore the code and try running different techniques for topic modeling.
Use the provided Jupyter Notebooks or Python scripts for each part of the tutorial.

File Structure

data/: Sample datasets used for the tutorial.
papers/: Papers on Wordembedding techniques (Word2Vec & Glove).
resources/: An extended list of Italian stopword and the Italian .pickle file needed to tokenize text.
src/: Utility functions in python.

Contributing

If you'd like to contribute to the repository, feel free to fork it and submit a pull request. Please make sure your code adheres to the existing coding standards and includes tests where necessary.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
papers		papers
resources		resources
src		src
.gitignore		.gitignore
1.text-cleaning.ipynb		1.text-cleaning.ipynb
2.topic-modeling-lda.ipynb		2.topic-modeling-lda.ipynb
3.topic-modeling-bertopic.ipynb		3.topic-modeling-bertopic.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic Modeling Playground

Prerequisites

Installation

Install Dependencies

Additional Resources

Usage

File Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

istat-methodology/topic-modeling-playground

Folders and files

Latest commit

History

Repository files navigation

Topic Modeling Playground

Prerequisites

Installation

Install Dependencies

Additional Resources

Usage

File Structure

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages