Skip to content

This is for "Surveillance of COVID-19 Pandemic using Social Media: A Reddit Study in North Carolina".

Notifications You must be signed in to change notification settings

yliu9418/Surveillance-of-COVID-19-Pandemic-using-Social-Media-A-Reddit-Study-in-North-Carolina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

Surveillance of COVID-19 Pandemic using Social Media: A Reddit Study in North Carolina

Abstract

Coronavirus disease (COVID-19) pandemic has changed various aspects of people’s lives and behaviors. At this stage, there are no other ways to control the natural progression of the disease than adopting mitigation strategies such as wearing masks, watching distance, and washing hands. Moreover, at this time of social distancing, social media plays a key role in connecting people and providing a platform for expressing their feelings. In this study, we tap into social media to surveil the uptake of mitigation and detection strategies, and capture issues and concerns about the pandemic. In particular, we explore the research question, “how much can be learned regarding the public uptake of mitigation strategies and concerns about COVID-19 pandemic by using natural language processing on Reddit posts?” After extracting COVID-related posts from the four largest subreddit communities of North Carolina over six months, we performed NLP-based preprocessing to clean the noisy data. We employed a custom Named-entity Recognition (NER) system and a Latent Dirichlet Allocation (LDA) method for topic modeling on a Reddit corpus. We observed that mask, flu, and testing are the most prevalent named-entities for “Personal Protective Equipment”, “symptoms”, and “testing” categories, respectively. We also observed that the most discussed topics are related to testing, masks, and employment. The mitigation measures are the most prevalent theme of discussion across all subreddits.

NC_dataset

We used application programming interfaces (APIs) (Python Reddit API Wrapper2 and Python Pushshift.io API Wrapper3), and a set of predefined search terms (“corona virus”, “Coronavirus”, “COVID-19” and “SARS-CoV-2”) to extract 122,249 comments from 2,319 Reddit posts from four location- specific North Carolina subreddit communities from March 1, 2020 through August 31, 2020. Using our Reddit instance, we selected the criteria for extracting posts and comments containing COVID- 19 keywords from March 1st through August 31st 2020 from the subreddits: r/Charlotte, r/raleigh, r/gso (Greensboro), and r/NorthCarolina. The first three subreddits represent communities of the top three populated cities in North Carolina, and r/NorthCarolina represents the subreddit for the entire state. Thus, we created a query selecting posts and comments from the given subreddits with the keywords corona virus OR Coronavirus OR COVID-19 OR SARS-CoV-2, between the dates 3/1/2020 AND 8/31/2020 (inclusive). Next, we looped through the query results (omitting duplicates) and saved the comment objects to a list. Using the list of comment objects, we then retrieved each comment’s parent submission (Reddit post) id, submission date, and submission title. Additionally, we used the submission ids to retrieve the entire comment thread associated with the parent post, including the total number of comments contained in each thread. We then organized the acquired data in a list arranged by date, title, comment body, and number of comments. Upon completion of the list, the data was converted to a table, sorted by date in descending order, and saved to a file in UTF-8 text format.

Reference

Whitfield, C., Liu, Y., & Anwar, M. (2021, August). Surveillance of COVID-19 pandemic using social media: a reddit study in North Carolina. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 1-8).

About

This is for "Surveillance of COVID-19 Pandemic using Social Media: A Reddit Study in North Carolina".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages