Fine-tuning DistilBERT on senator tweets

A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.

Built in 🐍

using 🤗 Transformers and

deployed on Streamlit 🎈 (coming soon!).

Read the Medium article here.

Code

Part 1: Creating the dataset - get_tweets.ipynb

Part 2: Fine-tuning DistilBERT - finetune_distilbert_senator_tweets_pt.ipynb

Sample

All 2021 tweets (~100,000) posted by 100 United States Senators and scraped by me.

Model

DistilBERT base model (uncased) for sequence classification.

Evaluation

The model was evaluated on a test dataset (20%):

{'accuracy': 0.908, 
'f1': 0.912}

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
data		data
notebooks		notebooks
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
app.py		app.py
app_zero_shot.py		app_zero_shot.py
embed_tweets.ipynb		embed_tweets.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tuning DistilBERT on senator tweets

A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.

Code

Sample

Model

Evaluation

About

Releases

Packages

Languages

License

m-newhauser/distilbert-senator-tweets

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning DistilBERT on senator tweets

A guide to fine-tuning DistilBERT on the tweets of American Senators with snscrape, SQLite, and Transformers (PyTorch) on Google Colab.

Code

Sample

Model

Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages