Replication materials for Why Keep Arguing? Predicting Participation in Political Conversations Online. Sarah Shugars and Nick Beauchamp, SAGE Open: Social Media and Political Participation Global Issue, Forthcoming (2019). Code for this paper are organized as follows:
Scripts for finding tweets and building conversations.
File with your twitter app credentials Go to to apply for a developer account
Retrieve metadata for list of tweet IDs using Twitter's REST API
Requires: : file with Twitter authorization information
Input: folder with .txt files of tweet IDs, where each file represents a conversation
Output: For each .txt file, creates a .json.gzip file with full metadata for tweets
Retrieve tweet IDs connected to a single tweet ID (e.g., IDs for all tweets in a conversation tree)
Input: Folder with .json.gzip objects from Twitter API
Output: For each seed tweet, a .txt file with connected tweetIDs
Process tweet metadata into single document indexed by conversation ID.
Input: Folder of .json.gzip files where each file contains
metadata for tweets in the same conversation tree
Output: conversation.json.gzip: a single file with all conversation data. Formated as:
convoID : {
tweets : { tweetID: tweet metadata}
threads: [[tweetIDs ordered by entry]]
Get topics from corpus of tweets
Input: .json.gzip file of tweet metadata data
.json.gzip file with topic loadings for each tweetID of format:
{convoID : {tweetID : {topic loadings}}}
.txt file with top words per topic; comma seperated, 1 topic per line
gensim save object with LDA model which can be reloaded using:
ldamodel = gensim.models.ldamodel.LdaModel.load('ldamodel', mmap='r')
Calculate features for tweets / conversations
.json.gzip file of tweet metadata data. Expected to be of format:
.json.gzip file with topic loadings for each tweetID of format:
.txt.gzip file with matrix of output features
Run logit and SVM on calculated features
Input: .txt.gzip file of feature matrix (one row = one observation)
Output: Stargazer / latex summary of results
To respect the privacy of individuals whose tweets our in our dataset, we've included a list of ids for all tweets used (tweet_ids.txt
) rather than include the raw data itself. In order to replicate this study, researchers should begin by scraping these tweets. This process will ensure that we do not violate users privacy by sharing any tweet data which a user has elected to delete following our data collection window.