During the Capitol storming on January 6, 2021, what kinds of sentiment manifested in the English tweets discussing U.S. electoral affairs? My classmate Sarah Sramota and I conducted sentiment analysis with 270,000 Tweets. I implemented the code and Sarah wrote up the findings. This was an assignment for the tutorial Supervised Sentiment Analysis in R by @ccs-amsterdam in January 2021. One year and a half later, I updated the code to be compatible with the latest R pacakges.
This assignment used the data from 2020 US Presidential Election Tweet IDs collected by @echen102 and @emilioferrara. The following conditions apply:
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:
Chen, E., Deb, A. & Ferrara, E. #Election2020: the first public Twitter dataset on the 2020 US Presidential election. J Comput Soc Sc (2021). https://doi.org/10.1007/s42001-021-00117-9
Specifically, Twitter's Terms of Service only allows the sharing of Tweet IDs. To retrieve the original text based on Tweet IDs (or to "hydrate"), you need to access the Twitter API yourself. A suitable tool with GUI is Hydrator.
I certify that the authors have legitimate access to and permission to use the data.
Some data cannot be made publicly available.
Data files | Source | Notes | Provided |
---|---|---|---|
ElecTweetID.csv |
us-pres-elections-2020 | Tweet IDs for retrieving original Tweets | Yes |
Capitol_tweet_0106.csv |
Twitter API | No |
I adopt R
(version 4.2.0) for all the analyses. This involves the following packages:
quanteda
(3.2.0), quanteda.textplots
(0.94.1), quanteda.textstats
(0.95), readr
(2.1.2), syuzhet
(1.0.6)
Less than ten minutes is needed to reproduce the analyses on a standard 2022 desktop machine. This does not account for Chunk 37, which takes a long time to run. The code was last run on a Windows 11 laptop with a 4-core Intel processor.
Download ElecTweetID.csv
. Load it in Hydrator to access Twitter API and retrieve the original text. Save the collected tweets as Capitol_tweet_0106.csv
. Place it and script.Rmd
in the same folder. Run the script to execute all steps in sequence. Chunk 37's execution is time-consuming; Skip it if necessary.
The script is provided in the same folder. Run script_tweet_sentiment.Rmd
to execute all steps in sequence.
Chen, E., Deb, A., & Ferrara, E. (2021). #ELECTION2020: The first public twitter dataset on the 2020 US presidential election. Journal of Computational Social Science, 5(1), 1–18. https://doi.org/10.1007/s42001-021-00117-9