Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models
Abstract:BackgroundAn adverse drug event (ADE) is commonly defined as “an injury resulting from medical intervention related to a drug.” Providing information related to ADEs and alerting caregivers at the point of care can reduce the risk of prescription and diagnostic errors and improve health outcomes. ADEs captured in structured data in electronic health records (EHRs) as either coded problems or allergies are often incomplete, leading to underrep… Show more
“…A total of seven (24.1%) of the studies wrote about participation in the 2018 n2c2 challenge [ 34 , 43 , 44 , 48 , 53 , 56 , 57 ] and four (13.8%) described participation in the MADE 1.0 challenge [ 49 , 54 , 56 , 58 ] (see glossary). A further three studies (10.3%) did not participate in either challenge but used one or both of these challenge datasets [ 45 , 50 , 59 ]. Table 3 provides details on the datasets used in the studies.…”
Section: Resultsmentioning
confidence: 99%
“…Some commented on difficulties encountered when applying off-the-shelf generic pre-processing tools to clinical text. Dandala et al observed that sentence boundary detection and tokenization are difficult issues in clinical text as sentence ends are frequently denoted by newline characters rather than punctuation [ 45 ]. This was echoed in another paper where it was noted that several generic sentence segmentation tools did not perform well due to differences in punctuation patterns and the use of newline characters in formatting [ 43 ].…”
Section: Resultsmentioning
confidence: 99%
“…This was echoed in another paper where it was noted that several generic sentence segmentation tools did not perform well due to differences in punctuation patterns and the use of newline characters in formatting [ 43 ]. Four studies overcame this by building their own custom tokenizer or sentence splitter [ 36 , 45 , 48 , 54 ].…”
To reduce adverse drug events (ADEs), hospitals need a system to support them in monitoring ADE occurrence routinely, rapidly, and at scale. Natural language processing (NLP), a computerized approach to analyze text data, has shown promising results for the purpose of ADE detection in the context of pharmacovigilance. However, a detailed qualitative assessment and critical appraisal of NLP methods for ADE detection in the context of ADE monitoring in hospitals is lacking. Therefore, we have conducted a scoping review to close this knowledge gap, and to provide directions for future research and practice. We included articles where NLP was applied to detect ADEs in clinical narratives within electronic health records of inpatients. Quantitative and qualitative data items relating to NLP methods were extracted and critically appraised. Out of 1,065 articles screened for eligibility, 29 articles met the inclusion criteria. Most frequent tasks included named entity recognition (n = 17; 58.6%) and relation extraction/classification (n = 15; 51.7%). Clinical involvement was reported in nine studies (31%). Multiple NLP modelling approaches seem suitable, with Long Short Term Memory and Conditional Random Field methods most commonly used. Although reported overall performance of the systems was high, it provides an inflated impression given a steep drop in performance when predicting the ADE entity or ADE relation class. When annotating corpora, treating an ADE as a relation between a drug and non-drug entity seems the best practice. Future research should focus on semi-automated methods to reduce the manual annotation effort, and examine implementation of the NLP methods in practice.
“…A total of seven (24.1%) of the studies wrote about participation in the 2018 n2c2 challenge [ 34 , 43 , 44 , 48 , 53 , 56 , 57 ] and four (13.8%) described participation in the MADE 1.0 challenge [ 49 , 54 , 56 , 58 ] (see glossary). A further three studies (10.3%) did not participate in either challenge but used one or both of these challenge datasets [ 45 , 50 , 59 ]. Table 3 provides details on the datasets used in the studies.…”
Section: Resultsmentioning
confidence: 99%
“…Some commented on difficulties encountered when applying off-the-shelf generic pre-processing tools to clinical text. Dandala et al observed that sentence boundary detection and tokenization are difficult issues in clinical text as sentence ends are frequently denoted by newline characters rather than punctuation [ 45 ]. This was echoed in another paper where it was noted that several generic sentence segmentation tools did not perform well due to differences in punctuation patterns and the use of newline characters in formatting [ 43 ].…”
Section: Resultsmentioning
confidence: 99%
“…This was echoed in another paper where it was noted that several generic sentence segmentation tools did not perform well due to differences in punctuation patterns and the use of newline characters in formatting [ 43 ]. Four studies overcame this by building their own custom tokenizer or sentence splitter [ 36 , 45 , 48 , 54 ].…”
To reduce adverse drug events (ADEs), hospitals need a system to support them in monitoring ADE occurrence routinely, rapidly, and at scale. Natural language processing (NLP), a computerized approach to analyze text data, has shown promising results for the purpose of ADE detection in the context of pharmacovigilance. However, a detailed qualitative assessment and critical appraisal of NLP methods for ADE detection in the context of ADE monitoring in hospitals is lacking. Therefore, we have conducted a scoping review to close this knowledge gap, and to provide directions for future research and practice. We included articles where NLP was applied to detect ADEs in clinical narratives within electronic health records of inpatients. Quantitative and qualitative data items relating to NLP methods were extracted and critically appraised. Out of 1,065 articles screened for eligibility, 29 articles met the inclusion criteria. Most frequent tasks included named entity recognition (n = 17; 58.6%) and relation extraction/classification (n = 15; 51.7%). Clinical involvement was reported in nine studies (31%). Multiple NLP modelling approaches seem suitable, with Long Short Term Memory and Conditional Random Field methods most commonly used. Although reported overall performance of the systems was high, it provides an inflated impression given a steep drop in performance when predicting the ADE entity or ADE relation class. When annotating corpora, treating an ADE as a relation between a drug and non-drug entity seems the best practice. Future research should focus on semi-automated methods to reduce the manual annotation effort, and examine implementation of the NLP methods in practice.
“…We have listed the top performing methods from the 2018 n2c2 ADE challenge in Table 1. Dandala et al (2020) custom-trained biomedical ELMo embeddings using the MIMIC-III data-set (Johnson et al, 2016); they also used a rich set of sentence tokenization rules. Ju et al (2020) leveraged a tree-architecture to detect overlapping spans in addition to lexical and knowledge features (e.g., word shapes, Human Disease Ontology / MedDRA side-effect database information).…”
Section: Related Workmentioning
confidence: 99%
“…Among medication entities, ADE and Reason are challenging to disambiguate (Henry et al, 2020). Frequently, the specific reason for drug administration may appear in a subsequent sentence (Dandala et al, 2020). Besides, ADE data-sets include goldannotations for these entities, only if they are associated with a drug.…”
We evaluate several biomedical contextual embeddings (based on BERT, ELMo, and Flair) for the detection of medication entities such as Drugs and Adverse Drug Events (ADE) from Electronic Health Records (EHR) using the 2018 ADE and Medication Extraction (Track 2) n2c2 data-set. We identify best practices for transfer learning, such as languagemodel fine-tuning and scalar mix. Our transfer learning models achieve strong performance in the overall task (F1=92.91%) as well as in ADE identification (F1=53.08%). Flairbased embeddings out-perform in the identification of context-dependent entities such as ADE. BERT-based embeddings out-perform in recognizing clinical terminology such as Drug and Form entities. ELMo-based embeddings deliver competitive performance in all entities. We develop a sentence-augmentation method for enhanced ADE identification benefiting BERT-based and ELMo-based models by up to 3.13% in F1 gains. Finally, we show that a simple ensemble of these models outpaces most current methods in ADE extraction (F1=55.77%).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.