“…In Twitter research, pre-processing often leads to removing most of the data. For example, our previous research on Twitter regarding the Supreme Court ( Sandhu et al, 2019 ) discarded 87−89% of the data, while our examination of Twitter and obesity discarded 73% of the data ( Sandhu, Giabbanelli & Mago, 2019 ). The reason is that pre-processing has historically involved a series of filters ( e.g ., removing words that are not deemed informative in English, removing hashtags and emojis), which were necessary as the analysis model could not satisfactorily cope with raw data.…”
Influencing and framing debates on Twitter provides power to shape public opinion. Bots have become essential tools of ‘computational propaganda’ on social media such as Twitter, often contributing to a large fraction of the tweets regarding political events such as elections. Although analyses have been conducted regarding the first impeachment of former president Donald Trump, they have been focused on either a manual examination of relatively few tweets to emphasize rhetoric, or the use of Natural Language Processing (NLP) of a much larger corpus with respect to common metrics such as sentiment. In this paper, we complement existing analyses by examining the role of bots in the first impeachment with respect to three questions as follows. (Q1) Are bots actively involved in the debate? (Q2) Do bots target one political affiliation more than another? (Q3) Which sources are used by bots to support their arguments? Our methods start with collecting over 13M tweets on six key dates, from October 6th 2019 to January 21st 2020. We used machine learning to evaluate the sentiment of the tweets (via BERT) and whether it originates from a bot. We then examined these sentiments with respect to a balanced sample of Democrats and Republicans directly relevant to the impeachment, such as House Speaker Nancy Pelosi, senator Mitch McConnell, and (then former Vice President) Joe Biden. The content of posts from bots was further analyzed with respect to the sources used (with bias ratings from AllSides and Ad Fontes) and themes. Our first finding is that bots have played a significant role in contributing to the overall negative tone of the debate (Q1). Bots were targeting Democrats more than Republicans (Q2), as evidenced both by a difference in ratio (bots had more negative-to-positive tweets on Democrats than Republicans) and in composition (use of derogatory nicknames). Finally, the sources provided by bots were almost twice as likely to be from the right than the left, with a noticeable use of hyper-partisan right and most extreme right sources (Q3). Bots were thus purposely used to promote a misleading version of events. Overall, this suggests an intentional use of bots as part of a strategy, thus providing further confirmation that computational propaganda is involved in defining political events in the United States. As any empirical analysis, our work has several limitations. For example, Trump’s rhetoric on Twitter has previously been characterized by an overly negative tone, thus tweets detected as negative may be echoing his message rather than acting against him. Previous works show that this possibility is limited, and its existence would only strengthen our conclusions. As our analysis is based on NLP, we focus on processing a large volume of tweets rather than manually reading all of them, thus future studies may complement our approach by using qualitative methods to assess the specific arguments used by bots.
“…In Twitter research, pre-processing often leads to removing most of the data. For example, our previous research on Twitter regarding the Supreme Court ( Sandhu et al, 2019 ) discarded 87−89% of the data, while our examination of Twitter and obesity discarded 73% of the data ( Sandhu, Giabbanelli & Mago, 2019 ). The reason is that pre-processing has historically involved a series of filters ( e.g ., removing words that are not deemed informative in English, removing hashtags and emojis), which were necessary as the analysis model could not satisfactorily cope with raw data.…”
Influencing and framing debates on Twitter provides power to shape public opinion. Bots have become essential tools of ‘computational propaganda’ on social media such as Twitter, often contributing to a large fraction of the tweets regarding political events such as elections. Although analyses have been conducted regarding the first impeachment of former president Donald Trump, they have been focused on either a manual examination of relatively few tweets to emphasize rhetoric, or the use of Natural Language Processing (NLP) of a much larger corpus with respect to common metrics such as sentiment. In this paper, we complement existing analyses by examining the role of bots in the first impeachment with respect to three questions as follows. (Q1) Are bots actively involved in the debate? (Q2) Do bots target one political affiliation more than another? (Q3) Which sources are used by bots to support their arguments? Our methods start with collecting over 13M tweets on six key dates, from October 6th 2019 to January 21st 2020. We used machine learning to evaluate the sentiment of the tweets (via BERT) and whether it originates from a bot. We then examined these sentiments with respect to a balanced sample of Democrats and Republicans directly relevant to the impeachment, such as House Speaker Nancy Pelosi, senator Mitch McConnell, and (then former Vice President) Joe Biden. The content of posts from bots was further analyzed with respect to the sources used (with bias ratings from AllSides and Ad Fontes) and themes. Our first finding is that bots have played a significant role in contributing to the overall negative tone of the debate (Q1). Bots were targeting Democrats more than Republicans (Q2), as evidenced both by a difference in ratio (bots had more negative-to-positive tweets on Democrats than Republicans) and in composition (use of derogatory nicknames). Finally, the sources provided by bots were almost twice as likely to be from the right than the left, with a noticeable use of hyper-partisan right and most extreme right sources (Q3). Bots were thus purposely used to promote a misleading version of events. Overall, this suggests an intentional use of bots as part of a strategy, thus providing further confirmation that computational propaganda is involved in defining political events in the United States. As any empirical analysis, our work has several limitations. For example, Trump’s rhetoric on Twitter has previously been characterized by an overly negative tone, thus tweets detected as negative may be echoing his message rather than acting against him. Previous works show that this possibility is limited, and its existence would only strengthen our conclusions. As our analysis is based on NLP, we focus on processing a large volume of tweets rather than manually reading all of them, thus future studies may complement our approach by using qualitative methods to assess the specific arguments used by bots.
“…In this paper, our interest is on (i) generating FCMs from text, and (ii) using them to craft scenarios. With regard to (i), we note that several works have extracted causal maps from text [26,[61][62][63]; hence, they could generate the causal structure, but did not produce a complete FCM. Some works have focused on creating FCMs from summaries or large collection of documents [64,65], but they needed manual interventions (e.g., manual labeling, expert verification); hence, the process was only semi-automatic.…”
Creating ‘what-if’ scenarios to estimate possible futures is a key component of decision-making processes. However, this activity is labor intensive as it is primarily done manually by subject-matter experts who start by identifying relevant themes and their interconnections to build models, and then craft diverse and meaningful stories as scenarios to run on these models. Previous works have shown that text mining could automate the model-building aspect, for example, by using topic modeling to extract themes from a large corpus and employing variations of association rule mining to connect them in quantitative ways. In this paper, we propose to further automate the process of scenario generation by guiding pre-trained deep neural networks (i.e., BERT) through simulated conversations to extract a model from a corpus. Our case study on electric vehicles shows that our approach yields similar results to previous work while almost eliminating the need for manual involvement in model building, thus focusing human expertise on the final stage of crafting compelling scenarios. Specifically, by using the same corpus as a previous study on electric vehicles, we show that the model created here either performs similarly to the previous study when there is a consensus in the literature, or differs by highlighting important gaps on domains such as government deregulation.
“…Data representativity is another issue associated with web mining. For instance, researchers have recently questioned the usage of social media data for inferring health-related outcomes due to the issues of sampling bias (Cesare, Grant, & Nsoesie, 2019;Mooney & Garber, 2019), and effects on validating complex models compared with expert reports (Sandhu, Giabbanelli, & Mago, 2019). Likewise, the integrity of database elements is their accuracy.…”
Big data analysis has found applications in many industries due to its ability to turn huge amounts of data into insights for informed business and operational decisions. Advanced data mining techniques have been applied in many sectors of supply chains in the food industry. However, the previous work has mainly focused on the analysis of instrument‐generated data such as those from hyperspectral imaging, spectroscopy, and biometric receptors. The importance of digital text data in the food and nutrition has only recently gained attention due to advancements in big data analytics. The purpose of this review is to provide an overview of the data sources, computational methods, and applications of text data in the food industry. Text mining techniques such as word‐level analysis (e.g., frequency analysis), word association analysis (e.g., network analysis), and advanced techniques (e.g., text classification, text clustering, topic modeling, information retrieval, and sentiment analysis) will be discussed. Applications of text data analysis will be illustrated with respect to food safety and food fraud surveillance, dietary pattern characterization, consumer‐opinion mining, new‐product development, food knowledge discovery, food supply‐chain management, and online food services. The goal is to provide insights for intelligent decision‐making to improve food production, food safety, and human nutrition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.