Skip to content

Replication materials for Why Keep Arguing? Predicting Participation in Political Conversations Online

Notifications You must be signed in to change notification settings

sshugars/conversations

Repository files navigation

Replication materials for Why Keep Arguing? Predicting Participation in Political Conversations Online. Sarah Shugars and Nick Beauchamp, SAGE Open: Social Media and Political Participation Global Issue, Forthcoming (2019). Code for this paper are organized as follows:

get_conversations/

Scripts for finding tweets and building conversations.

auth.py

File with your twitter app credentials Go to https://developer.twitter.com to apply for a developer account

get_tweets.py

Retrieve metadata for list of tweet IDs using Twitter's REST API

 Requires: auth.py : file with Twitter authorization information

 Input: folder with .txt files of tweet IDs, where each file represents a conversation 
    (from get_conversations.py)
 Output: For each .txt file, creates a .json.gzip file with full metadata for tweets

get_conversations.py

Retrieve tweet IDs connected to a single tweet ID (e.g., IDs for all tweets in a conversation tree)

Input: Folder with .json.gzip objects from Twitter API 
Output: For each seed tweet, a .txt file with connected tweetIDs  

select_conversations.py

Process tweet metadata into single document indexed by conversation ID.

 Input: Folder of .json.gzip files where each file contains 
    metadata for tweets in the same conversation tree
    (from get_tweets.py)

 Output: conversation.json.gzip: a single file with all conversation data. Formated as:
     convoID : { 
        	tweets : { tweetID: tweet metadata}
        	threads: [[tweetIDs ordered by entry]]        

feature_extraction/

topics.py

Get topics from corpus of tweets

 Input: .json.gzip file of tweet metadata data 
     (from select_conversations.py)

 Output: 
	.json.gzip file with topic loadings for each tweetID of format:
    	{convoID : {tweetID : {topic loadings}}}

	.txt file with top words per topic; comma seperated, 1 topic per line

    gensim save object with LDA model which can be reloaded using:
	ldamodel = gensim.models.ldamodel.LdaModel.load('ldamodel', mmap='r')

features.py

Calculate features for tweets / conversations

 Input:
.json.gzip file of tweet metadata data. Expected to be of format:
(from select_conversations.py)

.json.gzip file with topic loadings for each tweetID of format:
(from topics.py)       

 Output:  
     .txt.gzip file with matrix of output features

model/

conversation_dynamics_model.R

Run logit and SVM on calculated features

Input: .txt.gzip file of feature matrix (one row = one observation)

Output: Stargazer / latex summary of results

Data

To respect the privacy of individuals whose tweets our in our dataset, we've included a list of ids for all tweets used (tweet_ids.txt) rather than include the raw data itself. In order to replicate this study, researchers should begin by scraping these tweets. This process will ensure that we do not violate users privacy by sharing any tweet data which a user has elected to delete following our data collection window.

About

Replication materials for Why Keep Arguing? Predicting Participation in Political Conversations Online

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published