This paper presents the Multilingual COVID-19 Analysis Method (CMTA) for detecting and observing the spread of misinformation about this disease within texts. CMTA proposes a data science (DS) pipeline that applies machine learning models for processing, classifying (Dense-CNN) and analyzing (MBERT) multilingual (micro)-texts. DS pipeline data preparation tasks extract features from multilingual textual data and categorize it into specific information classes (i.e., 'false', 'partly false', 'misleading'). The CMTA pipeline has been experimented with multilingual micro-texts (tweets), showing misinformation spread across different languages. To assess the performance of CMTA and put it in perspective, we performed a comparative analysis of CMTA with eight monolingual models used for detecting misinformation. The comparison shows that CMTA has surpassed various monolingual models and suggests that it can be used as a general method for detecting misinformation in multilingual micro-texts. CMTA experimental results show misinformation trends about COVID-19 in different languages during the first pandemic months.
Twitter is an active communication channel for spreading information during crises (e.g., earthquake). To exploit this information, civilians require to explore the tweets produced along a crisis period. For instance, for getting information about crisis' related events (e.g. landslide, building collapse), and their associated relief actions (e.g., gathering of food supply, search for victims). However, such Twitter usage demand significant effort and answers must be accurate to support the coordination of actions in response to crisis events (e.g., avoiding a massive concentration of efforts in only one place). This requirement calls for efficient information classification so that people can perform agile and useful relief actions. This paper introduces an approach based on classification and query expansion techniques in the context of micro-texts (i.e., tweets) search. In our approach, a user's query is rewritten using a classified vocabulary derived from top-k results, to reflect her search intent better. For classification purpose, we study and compare different models to find the one that can best provide answers to a user query. Our experimental results show that the use of Multi-Task Deep Neural Network (MT-DNN) models further improves micro-text classification. Also, the experimental results demonstrate that our query expansion method is effective and reduces noise in the expanded query terms when looking for crisis tweets on Twitter datasets.
This paper presents the Multilingual COVID-19 Analysis Method (CMTA) for detecting and observing the spread of misinformation about this disease within texts. CMTA proposes a data science (DS) pipeline that applies machine learning models for processing, classifying (Dense-CNN) and analyzing (MBERT) multilingual (micro)-texts. DS pipeline data preparation tasks extract features from multilingual textual data and categorize it into specific information classes (i.e., 'false', 'partly false', 'misleading'). The CMTA pipeline has been experimented with multilingual micro-texts (tweets), showing misinformation spread across different languages. To assess the performance of CMTA and put it in perspective, we performed a comparative analysis of CMTA with eight monolingual models used for detecting misinformation. The comparison shows that CMTA has surpassed various monolingual models and suggests that it can be used as a general method for detecting misinformation in multilingual micro-texts. CMTA experimental results show misinformation trends about COVID-19 in different languages during the first pandemic months.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.