Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news
Journal of Information Science
Published online on October 05, 2015
Abstract
The present study investigates topic coverage and sentiment dynamics of two different media sources, Twitter and news publications, on the hot health issue of Ebola. We conduct content and sentiment analysis by: (1) applying vocabulary control to collected datasets; (2) employing the n-gram LDA topic modeling technique; (3) adopting entity extraction and entity network; and (4) introducing the concept of topic-based sentiment scores. With the query term ‘Ebola’ or ‘Ebola virus’, we collected 16,189 news articles from 1006 different publications and 7,106,297 tweets with the Twitter stream API. The experiments indicate that topic coverage of Twitter is narrower and more blurry than that of the news media. In terms of sentiment dynamics, the life span and variance of sentiment on Twitter is shorter and smaller than in the news. In addition, we observe that news articles focus more on event-related entities such as person, organization and location, whereas Twitter covers more time-oriented entities. Based on the results, we report on the characteristics of Twitter and news media as two distinct news outlets in terms of content coverage and sentiment dynamics.