Skip to content

Commit

Permalink
add more notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
uetchy committed Jul 4, 2022
1 parent 281fd7e commit a28598e
Show file tree
Hide file tree
Showing 17 changed files with 475,618 additions and 2,919 deletions.
8 changes: 8 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
MONGODB_URI=mongodb://localhost:27017/honeybee
ANONYMIZATION_SALT=supersecret
CURRENCY_API_KEY=

RAW_DATA_DIR=./data/raw
VTLC_DIR=./data/vtlc
VTLC_ELEMENTS_DIR=./data/vtlc-elements
VTLC_COMPLETE_DIR=./data/vtlc-complete
7 changes: 0 additions & 7 deletions .env.placeholder

This file was deleted.

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
/tmp
/*.ipynb
TODO
node_modules

# Created by https://www.toptal.com/developers/gitignore/api/python
# Edit at https://www.toptal.com/developers/gitignore?templates=python
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

**VTuber 1B** is an academic purpose NLP dataset, collecting over a billion live chats, superchats, and moderation events (bans/deletions) from virtual YouTubers' live streams.

Download the dataset from [Kaggle Datasets](https://www.kaggle.com/uetchy/vtuber-livechat) and join `#livechat-dataset` channel on [holodata Discord](https://holodata.org/discord) for discussions.
Download the dataset from [Kaggle Datasets](https://www.kaggle.com/uetchy/vtuber-livechat) and join `#vtuber-1b` channel on [holodata Discord](https://holodata.org/discord) for discussions.

> We also offer [❤️‍🩹 Sensai](https://github.com/holodata/sensai-dataset), a live chat dataset specifically made for building ML models for spam detection / toxic chat classification.
Expand All @@ -30,7 +30,7 @@ Download the dataset from [Kaggle Datasets](https://www.kaggle.com/uetchy/vtuber
- Superchat Analysis
- Training neural language models

See public notebooks built on [VTuber 1B](https://www.kaggle.com/uetchy/vtuber-livechat/code) and [VTuber 1B Elements](https://www.kaggle.com/uetchy/vtuber-livechat-elements/code) for ideas.
See Kaggle public notebooks ([VTuber 1B](https://www.kaggle.com/uetchy/vtuber-livechat/code) / [VTuber 1B Elements](https://www.kaggle.com/uetchy/vtuber-livechat-elements/code)) for ideas, as well as [`/notebooks`](./notebooks) folder in the repo.

> We employed [Honeybee](https://github.com/holodata/honeybee) cluster to collect real-time live chat events across major Vtubers' live streams. All sensitive data such as author name or author profile image are omitted from the dataset, and author channel id is anonymized by SHA-1 hashing algorithm with a grain of salt.
Expand Down
23,309 changes: 23,309 additions & 0 deletions notebooks/hololive-insights.ipynb

Large diffs are not rendered by default.

Binary file added notebooks/img/fanatics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/img/overlaps_by_groups.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added notebooks/img/overlaps_by_sector.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
232 changes: 232 additions & 0 deletions notebooks/livechat-population.ipynb

Large diffs are not rendered by default.

1,492 changes: 0 additions & 1,492 deletions notebooks/livechat_population.ipynb

This file was deleted.

Loading

0 comments on commit a28598e

Please sign in to comment.