Data Types and Approaches
The laboratory collects social media and forum data from a variety of platforms and conducts content, spatial, network, statistical, machine learning analyses, data mining, and topic modeling of social media text and images along with analyses of the associated metadata. We also utilize other novel user-generated datasets including Google Trends (i.e., a database of search queries), Ngram (i.e., a database of digitized books), and Google Street View (i.e., Map related data containing images).
The Human Sensor Project
Since 2015, the DS3 laboratory has been developing a comprehensive dataset of millions of geolocated tweets for the continental U.S. This database allows our research team to search for keywords of topics for a variety of research projects. This database includes meta-data attributes for both Twitter users and tweets allowing our team to conduct advanced analytics of the data. We entitled this project the Human Sensor Project with the idea that human activities, such as aggregate thoughts, behaviors, and collective actions, can be sensed online through an analysis of Twitter activity and approximate locations associated with tweets. Our team to date has used the data to study health and natural disasters, such as Hurricane Sandy, the Flint Water Crisis, COVID-19, as well as other societal issues such as rape culture, victim blaming, and reactions to crime cases.
NSF RAPID COPE-ID Database
As part of an NSF Rapid funded project #2031246, social and computer science research faculty and students at MSU developed a comprehensive database that captured 14,970,419 posts about emotions expressed (e.g., anxiety, sadness, fear, hope) during the COVID-19 pandemic from 10 social media and forums platforms from January 2020 to April 2021. The database entitled, “COVID-19 Online Prevalence of Emotions in Institutions Database” (COPE-ID) was constructed using Application Programming Interfaces and Python code to collect data based on selected keywords from the following platforms: 1) Twitter, 2) YouTube, 3) Reddit, 4) Parler, 5) 4chan, 6) 8kun, 7) Gab, 8) Tumblr, 9) Flickr, and 10) Mastodon. COPE-ID allows researchers to identify how people coped during various phases of the pandemic and how emotions are connected to and influenced by various social institutions (e.g., work, religion, family). The team integrates social, computer science, and data science research methods to address research questions surrounding the topic areas of emotions, mental health, misinformation, countering misinformation, and reactions to public health intervention efforts (e.g., stay-at-home-orders, gathering bans, mask mandates, and bar closures). More information about the COPE-ID database and resulting products can be found at copeid.ssrc.msstate.edu/.
Enter your email address to subscribe to this blog and receive notifications of new posts by email.