The outbreak of the novel Coronavirus Disease (COVID-19) has greatly influenced people's daily lives across the globe. Emergent measures and policies (e.g., lockdown, social distancing) have been taken by governments to combat this highly infectious disease. However, people's mental health is also at risk due to the long-time strict social isolation rules. Hence, monitoring people's mental health across various events and topics will be extremely necessary for policy makers to make the appropriate decisions. On the other hand, social media have been widely used as an outlet for people to publish and share their personal opinions and feelings. The large scale social media posts (e.g., tweets) provide an ideal data source to infer the mental health for people during this pandemic period. In this work, we propose a novel framework to analyze the topic and sentiment dynamics due to COVID-19 from the massive social media posts. Based on a collection of 13 million tweets related to COVID-19 over two weeks, we found that the positive sentiment shows higher ratio than the negative sentiment during the study period. When zooming into the topic-level analysis, we find that different aspects of COVID-19 have been constantly discussed and show comparable sentiment polarities. Some topics like "stay safe home" are dominated with positive sentiment. The others such as "people death" are consistently showing negative sentiment. Overall, the proposed framework shows insightful findings based on the analysis of the topic-level sentiment dynamics.
Clustering short texts are one of the most important text analysis methods to help extract knowledge from online social media platforms, such as Twitter, Facebook, and Weibo. However, the instant features (such as abbreviation and informal expression) and the limited length of short texts challenge the clustering task. Fortunately, short texts about the same topic often share some common terms (or term stems), which can effectively represent a topic (i.e., supported by a cluster of short texts), and we also call them topic representative terms. Taking advantage of topic representative terms, it is much easier to cluster short texts by grouping short texts into the most similar topic representative term groups. This paper provides a novel topic representative term discovery (TRTD) method for short text clustering. In our TRTD method, we discover groups of closely bound up topic representative terms by exploiting the closeness and significance of terms. The closeness of the topic representative terms is measured by their interdependent co-occurrence, and the significance is measured by their global term occurrences throughout the whole short text corpus. The experimental results on real-world datasets demonstrate that TRTD achieves better accuracy and efficiency in short text clustering than the state-of-the-art methods. INDEX TERMS Short text, clustering, topic representative terms.
The outbreak of the novel coronavirus disease (COVID-19) has been ongoing for almost two years and has had an unprecedented impact on the daily lives of people around the world. More recently, the emergence of the Delta variant of COVID-19 has once again put the world at risk. Fortunately, many countries and companies have developed vaccines for the coronavirus. As of 23 August 2021, more than 20 vaccines have been approved by the World Health Organization (WHO), bringing light to people besieged by the pandemic. The global rollout of the COVID-19 vaccine has sparked much discussion on social media platforms, such as the effectiveness and safety of the vaccine. However, there has not been much systematic analysis of public opinion on the COVID-19 vaccine. In this study, we conduct an in-depth analysis of the discussions related to the COVID-19 vaccine on Twitter. We analyze the hot topics discussed by people and the corresponding emotional polarity from the perspective of countries and vaccine brands. The results show that most people trust the effectiveness of vaccines and are willing to get vaccinated. In contrast, negative tweets tended to be associated with news reports of post-vaccination deaths, vaccine shortages, and post-injection side effects. Overall, this study uses popular Natural Language Processing (NLP) technologies to mine people’s opinions on the COVID-19 vaccine on social media and objectively analyze and visualize them. Our findings can improve the readability of the confusing information on social media platforms and provide effective data support for the government and policy makers.
The recent Coronavirus Infectious Disease 2019 (COVID-19) pandemic has caused an unprecedented impact across the globe. We have also witnessed millions of people with increased mental health issues, such as depression, stress, worry, fear, disgust, sadness, and anxiety, which have become one of the major public health concerns during this severe health crisis. Depression can cause serious emotional, behavioral, and physical health problems with significant consequences, both personal and social costs included. This article studies community depression dynamics due to the COVID-19 pandemic through user-generated content on Twitter. A new approach based on multimodal features from tweets and term frequency-inverse document frequency (TF-IDF) is proposed to build depression classification models. Multimodal features capture depression cues from emotion, topic, and domain-specific perspectives. We study the problem using recently scraped tweets from Twitter users emanating from the state of New South Wales in Australia. Our novel classification model is capable of extracting depression polarities that may be affected by COVID-19 and related events during the COVID-19 period. The results found that people became more depressed after the outbreak of COVID-19. The measures implemented by the government, such as the state lockdown, also increased depression levels.
The outbreak of the novel Coronavirus Disease 2019 (COVID-19) has caused unprecedented impacts to people’s daily life around the world. Various measures and policies such as lockdown and social-distancing are implemented by governments to combat the disease during the pandemic period. These measures and policies as well as virus itself may cause different mental health issues to people such as depression, anxiety, sadness, etc. In this paper, we exploit the massive text data posted by Twitter users to analyse the sentiment dynamics of people living in the state of New South Wales (NSW) in Australia during the pandemic period. Different from the existing work that mostly focuses on the country-level and static sentiment analysis, we analyse the sentiment dynamics at the fine-grained local government areas (LGAs). Based on the analysis of around 94 million tweets that posted by around 183 thousand users located at different LGAs in NSW in 5 months, we found that people in NSW showed an overall positive sentimental polarity and the COVID-19 pandemic decreased the overall positive sentimental polarity during the pandemic period. The fine-grained analysis of sentiment in LGAs found that despite the dominant positive sentiment most of days during the study period, some LGAs experienced significant sentiment changes from positive to negative. This study also analysed the sentimental dynamics delivered by the hot topics in Twitter such as government policies (e.g. the Australia’s JobKeeper program, lockdown, social-distancing) as well as the focused social events (e.g. the Ruby Princess Cruise). The results showed that the policies and events did affect people’s overall sentiment, and they affected people’s overall sentiment differently at different stages.
The current COVID-19 pandemic and its uncertainty have given rise to various myths and rumours. These myths spread incredibly fast through social media, which has caused massive panic in the society. In this paper, we comprehensively examined the prevailing myths related to COVID-19 in regard to the diffusion of myths, people's engagement with myths and people's subjective emotions to myths. First, we classified the myths into five categories: spread of infection, preventive measures, detection measures, treatment and miscellaneous. We collected the tweets about each category of myths from 1 January to 7 July in the year 2020. We found that the vast majority of the myth tweets were about the spread of the infection. Next, we fitted myths spreading with the SIR epidemic model and calculated the basic reproduction number R0 for each category of myths. We observed that the myths about the spread of infection and preventive measures propagated faster than other categories of myths, and more miscellaneous myths raised and quickly spread from later June 2020. We further analyzed people's emotions evoked by each category of myths and found that fear was the strongest emotion in all categories of myths and around 64% of the collected tweets expressed the emotion of fear. The study in this paper provides insights for authorities and governments to understand the myths during the eruption of the pandemic, and hence enable targeted and feasible measures to demystify the most concerned myths in due time.
With the development of internet technologies, social media and mobile devices, short texts have become an increasingly popular medium among users to communicate with friends, search information and review products. Measuring the similarity between short texts is a fundamental task due to its importance in many applications, such as text retrieval, topic discovery, and event detection. However, short texts generally comprise sparse, noisy, and ambiguous information. Hence, effectively measuring the distance between short texts is a challenging task. In this paper, we exploit the advantageous corpus-wide word co-occurrence information into document-level feature enrichment to mitigate the challenges caused by the sparseness of short texts for distance measurement. We propose a novel context-aware weighted Biterm method for short text Distance Measurement (BDM). In BDM, we extract biterms (ie, word pairs) from a short text corpus and exploit a biterm topic model to determine the global weights of biterms in the corpus. We then determine the local importance of a biterm in different contexts (ie, short texts) based on the corpus-level biterm weight. The distance between two short texts is computed using the context-aware weighted biterms. Experimental results on three real-world datasets demonstrate better accuracy and effectiveness of the proposed BDM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.