Background Drug-induced suicide has been debated as a crucial issue in both clinical and public health research. Published research articles contain valuable data on the drugs associated with suicidal adverse events. An automated process that extracts such information and rapidly detects drugs related to suicide risk is essential but has not been well established. Moreover, few data sets are available for training and validating classification models on drug-induced suicide. Objective This study aimed to build a corpus of drug-suicide relations containing annotated entities for drugs, suicidal adverse events, and their relations. To confirm the effectiveness of the drug-suicide relation corpus, we evaluated the performance of a relation classification model using the corpus in conjunction with various embeddings. Methods We collected the abstracts and titles of research articles associated with drugs and suicide from PubMed and manually annotated them along with their relations at the sentence level (adverse drug events, treatment, suicide means, or miscellaneous). To reduce the manual annotation effort, we preliminarily selected sentences with a pretrained zero-shot classifier or sentences containing only drug and suicide keywords. We trained a relation classification model using various Bidirectional Encoder Representations from Transformer embeddings with the proposed corpus. We then compared the performances of the model with different Bidirectional Encoder Representations from Transformer–based embeddings and selected the most suitable embedding for our corpus. Results Our corpus comprised 11,894 sentences extracted from the titles and abstracts of the PubMed research articles. Each sentence was annotated with drug and suicide entities and the relationship between these 2 entities (adverse drug events, treatment, means, and miscellaneous). All of the tested relation classification models that were fine-tuned on the corpus accurately detected sentences of suicidal adverse events regardless of their pretrained type and data set properties. Conclusions To our knowledge, this is the first and most extensive corpus of drug-suicide relations.
BACKGROUND Drug-induced suicide has been debated as a crucial issue in clinical and public health research. Published research articles are valuable data resources to find information on drugs associated with suicidal adverse events. It is essential to apply an automated process to extract such information and rapidly detect drugs related to suicide risk. Still, such a process is not well established, and there has also been little dataset to train and validate models to classify drug-induced suicide. OBJECTIVE This study aims to build a drug-suicide relations (DSR) corpus of annotation of entities for drugs and suicidal events and their relations. To confirm the effectiveness of the DSR corpus, we evaluate the performance of a relation classification model in conjunction with various embeddings applying the corpus. METHODS We collect abstracts and titles of research articles associated with drugs and suicide from PubMed. We conduct manual annotation for entities of drug and suicide and their relation (as adverse drug events, treatment, suicide means, and miscellaneous) at the sentence level. To reduce the manual annotation effort, we run the annotation procedure after selecting sentences with pre-trained zero-shot classifier or selecting only sentences with both drug and suicide keywords. We train a relation classification model using various BERT (Bidirectional Encoder Representations from Transformers) embeddings with the proposed corpus. We then evaluate the performance of the model to determine the best BERT-based embedding that best suits our corpus. RESULTS Our corpus comprises 11,894 sentences from titles and abstracts of research articles extracted from PubMed. Each sentence is annotated with (1) drug and suicide entities and (2) the relation between these two entities, including adverse drug events, treatment, means, miscellaneous, and others. All relation classification models that we observe are fine-tuned based on our corpus. The models achieve F1 scores above 0.8 in detecting sentences of suicidal adverse events, regardless of the pre-trained model and dataset properties. CONCLUSIONS To the best of our knowledge, the proposed corpus is the first and the most extensive corpus targeting drug-suicide relations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.