Background The coronavirus disease (COVID-19) pandemic is considered to be the most daunting public health challenge in decades. With no effective treatments and with time needed to develop a vaccine, alternative approaches are being used to control this pandemic. Objective The objective of this paper was to identify topics, opinions, and recommendations about the COVID-19 pandemic discussed by medical professionals on the Twitter social medial platform. Methods Using a mixed methods approach blending the capabilities of social media analytics and qualitative analysis, we analyzed COVID-19–related tweets posted by medical professionals and examined their content. We used qualitative analysis to explore the collected data to identify relevant tweets and uncover important concepts about the pandemic using qualitative coding. Unsupervised and supervised machine learning techniques and text analysis were used to identify topics and opinions. Results Data were collected from 119 medical professionals on Twitter about the coronavirus pandemic. A total of 10,096 English tweets were collected from the identified medical professionals between December 1, 2019 and April 1, 2020. We identified eight topics, namely actions and recommendations, fighting misinformation, information and knowledge, the health care system, symptoms and illness, immunity, testing, and infection and transmission. The tweets mainly focused on needed actions and recommendations (2827/10,096, 28%) to control the pandemic. Many tweets warned about misleading information (2019/10,096, 20%) that could lead to infection of more people with the virus. Other tweets discussed general knowledge and information (911/10,096, 9%) about the virus as well as concerns about the health care systems and workers (909/10,096, 9%). The remaining tweets discussed information about symptoms associated with COVID-19 (810/10,096, 8%), immunity (707/10,096, 7%), testing (605/10,096, 6%), and virus infection and transmission (503/10,096, 5%). Conclusions Our findings indicate that Twitter and social media platforms can help identify important and useful knowledge shared by medical professionals during a pandemic.
Background Despite scientific evidence supporting the importance of wearing masks to curtail the spread of COVID-19, wearing masks has stirred up a significant debate particularly on social media. Objective This study aimed to investigate the topics associated with the public discourse against wearing masks in the United States. We also studied the relationship between the anti-mask discourse on social media and the number of new COVID-19 cases. Methods We collected a total of 51,170 English tweets between January 1, 2020, and October 27, 2020, by searching for hashtags against wearing masks. We used machine learning techniques to analyze the data collected. We investigated the relationship between the volume of tweets against mask-wearing and the daily volume of new COVID-19 cases using a Pearson correlation analysis between the two-time series. Results The results and analysis showed that social media could help identify important insights related to wearing masks. The results of topic mining identified 10 categories or themes of user concerns dominated by (1) constitutional rights and freedom of choice; (2) conspiracy theory, population control, and big pharma; and (3) fake news, fake numbers, and fake pandemic. Altogether, these three categories represent almost 65% of the volume of tweets against wearing masks. The relationship between the volume of tweets against wearing masks and newly reported COVID-19 cases depicted a strong correlation wherein the rise in the volume of negative tweets led the rise in the number of new cases by 9 days. Conclusions These findings demonstrated the potential of mining social media for understanding the public discourse about public health issues such as wearing masks during the COVID-19 pandemic. The results emphasized the relationship between the discourse on social media and the potential impact on real events such as changing the course of the pandemic. Policy makers are advised to proactively address public perception and work on shaping this perception through raising awareness, debunking negative sentiments, and prioritizing early policy intervention toward the most prevalent topics.
Search engines are important outlets for information query and retrieval. They have to deal with the continual increase of information available on the web, and provide users with convenient access to such huge amounts of information. Furthermore, with this huge amount of information, a more complex challenge that continuously gets more and more difficult to illuminate is the spam in web pages. For several reasons, web spammers try to intrude in the search results and inject artificially biased results in favour of their websites or pages. Spam pages are added to the internet on a daily basis, thus making it difficult for search engines to keep up with the fast-growing and dynamic nature of the web, especially since spammers tend to add more keywords to their websites to deceive the search engines and increase the rank of their pages. In this research, we have investigated four different classification algorithms (naïve Bayes, decision tree, SVM and K-NN) to detect Arabic web spam pages, based on content. The three groups of datasets used, with 1%, 15% and 50% spam contents, were collected using a crawler that was customized for this study. Spam pages were classified manually. Different tests and comparisons have revealed that the Decision Tree was the best classifier for this purpose.
The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.
The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.
Abstract-Nowadays, huge amount of data and information are available for everyone, Data can now be stored in many different kinds of databases and information repositories, besides being available on the Internet or in printed form. With such amount of data, there is a need for powerful techniques for better interpretation of these data that exceeds the human's ability for comprehension and making decision in a better way. In order to reveal the best tools for dealing with the classification task that helps in decision making, this paper has conducted a comparative study between a number of some of the free available data mining and knowledge discovery tools and software packages. Results have showed that the performance of the tools for the classification task is affected by the kind of dataset used and by the way the classification algorithms were implemented within the toolkits. For the applicability issue, the WEKA toolkit has achieved the highest applicability followed by Orange, Tanagra, and KNIME respectively. Finally; WEKA toolkit has achieved the highest improvement in classification performance; when moving from the percentage split test mode to the Cross Validation test mode, followed by Orange, KNIME and finally Tanagra respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.