Telegram Dataset Builder

This project uses Telethon to build datasets of public Telegram groups. To do this, it needs the channel name or ID. The result is a json file with the channel information and several json files organised in folders containing the messages in batches.

Requirements

The required packages and versions are the following:

Telethon==1.35.0
python-dotenv==1.0.1

Telegram API credentials

The default configuration uses a telegram.env file in the root folder to load the credentials. This file must follow the next schema (note that the phone number must be with prefix):

PHONE_NUMBER= "+34..."
TELEGRAM_APP_ID = 9...
TELEGRAM_APP_HASH = "d..."

How to gather groups messages?

To get all messages in some groups you can run dataset_creator.py and modify the next elements:

You have to modify the channel_names= ["foo", "bar"] to the channel names you want to extract.
You can set a different BATCH_SIZE if you want.
If you put you telegram credentials in a different path, modify telegram_env_path.
The output_chats_path is the folder were everythin is going to be stored. Both the channels chats and the channels info, it can be modified.

How to monitor groups messages?

To monitor new messages sent in some groups you can run engagement_monitor.py and modify the next elements:

You have to modify the channel_names= ["foo", "bar"] to the channel names you want to extract.
You can set a different BATCH_SIZE if you want.
If you put you telegram credentials in a different path, modify telegram_env_path.
The output_chats is the folder were everythin is going to be stored. Both the channels chats and the channels info, it can be modified.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
dataset_creator.py		dataset_creator.py
engagement_monitor.py		engagement_monitor.py
tdb.py		tdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telegram Dataset Builder

Requirements

Telegram API credentials

How to gather groups messages?

How to monitor groups messages?

About

Releases 2

Packages

Languages

License

oeg-upm/telegram-dataset-builder

Folders and files

Latest commit

History

Repository files navigation

Telegram Dataset Builder

Requirements

Telegram API credentials

How to gather groups messages?

How to monitor groups messages?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages