Adverse Events in Twitter-Development of a Benchmark Reference Dataset: Results from IMI WEB-RADR

Dietrich, Juergen; Gattepaille, Lucie M.; Grum, Britta Anne; Jiri, Letitia; Lerch, Magnus; Sartori, Daniele; Wisniewski, A

doi:10.1007/s40264-020-00912-9

Cited by 19 publications

(17 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although efforts have been made for ensuring fair comparison between systems [ 11 , 35 ], additional publicly available annotated benchmark datasets, used solely for evaluation purposes, could help the field progress and allow for more comparisons across studies, notably on their ability to generalize to new data. In this study, by using the WEB-RADR reference dataset, a publicly available dataset [ 43 ], we identified a number of factors that could explain the poor transferability of the system we developed and of another published system aimed at classifying AE posts. The poor transferability offers a plausible explanation to why, despite almost a decade since the first AE recognition systems in social media have been published, such systems have not been adopted in routine pharmacovigilance practice.…”

Section: Discussionmentioning

confidence: 99%

“…To our knowledge, this study is the first to present the development of an AE recognition system together with a prospective evaluation of its performance outside of the universe of the data it has been trained on. We perform an external evaluation using a publicly available benchmark dataset manually curated and annotated by members of the WEB-RADR consortium [43]. The dataset is entirely independent from the dataset we used for training our system, which was provided to us by Epidemico, a health informatics company (later acquired by Booz Allen Hamilton) and former WEB-RADR partner.…”

Section: Key Pointsmentioning

confidence: 99%

“…A second dataset is used in this study, to provide an external prospective validation of the system and to provide an idea of its transferability: a publicly available set of 57,473 Tweets manually curated for AE relations, developed in the course of the WEB-RADR project and intended as a benchmark for the task [43]. In this dataset, only Tweets with valid AE relations are annotated for medicinal products of interest and medical events, as well as the AE relations.…”

Section: Datasetsmentioning

confidence: 99%

See 2 more Smart Citations

Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project

et al. 2020

Self Cite

View full text Add to dashboard Cite

Introduction A large number of studies on systems to detect and sometimes normalize adverse events (AEs) in social media have been published, but evidence of their practical utility is scarce. This raises the question of the transferability of such systems to new settings. Objectives The aims of this study were to develop an AE recognition system, prospectively evaluate its performance on an external benchmark dataset and identify potential factors influencing the transferability of AE recognition systems. Methods A pipeline based on dictionary lookups and logistic regression classifiers was developed using a proprietary dataset of 196,533 Tweets manually annotated for AE relations and prospectively evaluated the system on the publicly available WEB-RADR reference dataset, exploring different aspects affecting transferability. Results Our system achieved 0.53 precision, 0.52 recall and 0.52 F1-score on the development test set; however, when applied to the WEB-RADR reference dataset, system performance dropped to 0.38 precision, 0.20 recall and 0.26 F1-score. Similarly, a previously published method aiming at automatically detecting adverse event posts reported 0.5 precision, 0.92 recall and 0.65 F1-score on thus another dataset, while performance on the WEB-RADR reference dataset was reduced to 0.37 precision, 0.63 recall and 0.46 F1-score. We identified four potential factors leading to poor transferability: overfitting, selection bias, label bias and prevalence. Conclusion We warn the community about a potentially large discrepancy between the expected performance of automated AE recognition systems based on published results and the actual observed performance on independent data. This study highlights the difficulty of implementing an all-purpose system for automatic adverse event recognition in Twitter, which could explain the lack of such systems in practical pharmacovigilance settings. Our recommendation is to use benchmark independent datasets, such as the WEB-RADR reference, to investigate the transferability of the adverse event recognition systems and ultimately enforce rigorous comparisons across studies on the task.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Key Pointsmentioning

confidence: 99%

Section: Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…Among tweets that mention medications, which is often a starting point for data collection, tweets that mention ADE are outnumbered 10:1 to 50:1 by tweets that do not contain ADEs. 7 , 8 , 12 , 13 From our preliminary analysis of datasets in shared tasks, the variability in the ratios could be largely attributed to the class of drugs being used for the study. Emerging medications are often promoted by bots as well as mentioned in news articles which overshadow firsthand reports of medication consumption by users.…”

Section: Introductionmentioning

confidence: 99%

DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter

Magge

Tutubalina

Miftahutdinov

et al. 2021

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

Objective Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs. Materials and Methods We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average ‘natural balance’ with ADEs present in about 7% of the tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all 3 tasks. Results The system presented achieved state-of-the-art performance on comparable datasets and scored a classification performance of F1 = 0.63, span extraction performance of F1 = 0.44 and an end-to-end entity resolution performance of F1 = 0.34 on the presented dataset. Discussion The performance of the models continues to highlight multiple challenges when deploying pharmacovigilance systems that use social media data. We discuss the implications of such models in the downstream tasks of signal detection and suggest future enhancements. Conclusion Mining ADEs from Twitter posts using a pipeline architecture requires the different components to be trained and tuned based on input data imbalance in order to ensure optimal performance on the end-to-end resolution task.

show abstract

“…On the other hand, in the second category are tools used to analyze data that are published exclusively on social networks, and that seek to provide users, expert and inexpert, with sufficient elements to perform an easy and intuitive analysis of the results provided by their methods of identification of issues, polarity, etc. As examples of such tools are Spot, AnaliticPro, and Socialmention, which provide the user with various data visualization schemes that seek to highlight certain indicators allowing the analyst to evaluate and determine the reputation of a particular product or topic within a specific community [12].…”

Section: Related Studiesmentioning

confidence: 99%

Web Platform for the Identification and Analysis of Events on Twitter

Viloria

Varela

Vargas

et al. 2020

Computational Methods and Data Engineering

View full text Add to dashboard Cite

Due to the great popularity of social networks among people, businesses, public figures, etc., there is a need for automatic methods to facilitate the search, retrieval, and analysis of large amounts of information. Given this situation, the Online Reputation Analyst (ORA) faces the challenge of identifying relevant issues around an event, product and/or public figure, from which it can propose different strategies to strengthen and/or reverse trends. Therefore, this paper proposes and describes a web tool whose main objective is to support the tasks performed by an ORA. The proposed visualization techniques make it possible to immediately identify the relevance and scope of the opinions generated about an event that took place on Twitter.

show abstract

Adverse Events in Twitter-Development of a Benchmark Reference Dataset: Results from IMI WEB-RADR

Cited by 19 publications

References 13 publications

Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project

Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project

DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter

Web Platform for the Identification and Analysis of Events on Twitter

Contact Info

Product

Resources

About