A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.
Built in 🐍
using 🤗 Transformers and
deployed on Streamlit 🎈 (coming soon!).
Read the Medium article here.
Part 1: Creating the dataset - get_tweets.ipynb
Part 2: Fine-tuning DistilBERT - finetune_distilbert_senator_tweets_pt.ipynb
All 2021 tweets (~100,000) posted by 100 United States Senators and scraped by me.
DistilBERT base model (uncased) for sequence classification.
The model was evaluated on a test dataset (20%):
{'accuracy': 0.908,
'f1': 0.912}