Skip to content

A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.

License

Notifications You must be signed in to change notification settings

m-newhauser/distilbert-senator-tweets

Repository files navigation

Fine-tuning DistilBERT on senator tweets

A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.

Built in 🐍

using 🤗 Transformers and

deployed on Streamlit 🎈 (coming soon!).

Read the Medium article here.

Code

Part 1: Creating the dataset - get_tweets.ipynb

Part 2: Fine-tuning DistilBERT - finetune_distilbert_senator_tweets_pt.ipynb

Sample

All 2021 tweets (~100,000) posted by 100 United States Senators and scraped by me.

Model

DistilBERT base model (uncased) for sequence classification.

Evaluation

The model was evaluated on a test dataset (20%):

{'accuracy': 0.908, 
'f1': 0.912}

About

A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published