Contextualised word embeddings is a powerful tool to detect contextual synonyms. However, most of the current state-of-the-art (SOTA) deep learning concept extraction methods remain supervised and underexploit the potential of the context. In this paper, we propose a self-supervised pre-training approach which is able to detect contextual synonyms of concepts being training on the data created by shallow matching. We apply our methodology in the sparse multi-class setting (over 15,000 concepts) to extract phenotype information from electronic health records. We further investigate data augmentation techniques to address the problem of the class sparsity. Our approach achieves a new SOTA for the unsupervised phenotype concept annotation on clinical text on F1 and Recall outperforming the previous SOTA with a gain of up to 4.5 and 4.0 absolute points, respectively. After fine-tuning with as little as 20% of the labelled data, we also outperform BioBERT and ClinicalBERT. The extrinsic evaluation on three ICU benchmarks also shows the benefit of using the phenotypes annotated by our model as features.
This work presents an in-depth analysis of machine translations of morphologically-rich Indo-Aryan and Dravidian languages under zero-resource conditions. It focuses on Zero-Shot Systems for these languages and leverages transfer-learning by exploiting target-side monolingual corpora and parallel translations from other languages. These systems are compared with direct translations using the BLEU and TER metrics. Further, Zero-Shot Systems are used as pre-trained models for fine-tuning with real human-generated data taken in different proportions that range from 100 sentences to the entire training set. Performances of the Indo-Aryan and Dravidian languages are compared with a focus on their morphological complexity. The systems with a Dravidian source language performed much better and reached very near to the level of direct translations. This is observed likely due to morphological richness and complexity in the language, which in turn provided more room for transfer-learning in this case. A comparative analysis based on language families has been done. These systems were fine-tuned further, which in turn outperformed direct translations with just 500 parallel sentences for a Dravidian source language. However, systems with an Indo-Aryan source language showed similar performance after getting fine-tuned with 10,000 sentences.
ObjectiveClinical notes contain information that has not been documented elsewhere, including responses to treatment and clinical findings, which are crucial for predicting key outcomes in patients in acute care. In this study, we propose the automatic annotation of phenotypes from clinical notes as a method to capture essential information to predict outcomes in the intensive care unit (ICU). This information is complementary to typically used vital signs and laboratory test results.MethodsIn this study, we developed a novel phenotype annotation model to extract the phenotypical features of patients, which were then used as input features of predictive models to predict ICU patient outcomes. We demonstrated and validated this approach by conducting experiments on three ICU prediction tasks, including in-hospital mortality, physiological decompensation and length of stay (LOS) for over 24 000 patients using the Medical Information Mart for Intensive Care (MIMIC-III) dataset.ResultsThe predictive models incorporating phenotypical information achieved 0.845 (area under the curve–receiver operating characteristic (AUC-ROC)) for in-hospital mortality, 0.839 (AUC-ROC) for physiological decompensation and 0.430 (kappa) for LOS, all of which consistently outperformed the baseline models using only vital signs and laboratory test results. Moreover, we conducted a thorough interpretability study showing that phenotypes provide valuable insights at both the patient and cohort levels.ConclusionThe proposed approach demonstrates that phenotypical information complements traditionally used vital signs and laboratory test results and significantly improves the accuracy of outcome prediction in the ICU.
Phenotypic information of patients, as expressed in clinical text, is important in many clinical applications such as identifying patients at risk of hard-to-diagnose conditions. Extracting and inferring some phenotypes from clinical text requires numerical reasoning, for example, a temperature of 102°F suggests the phenotype Fever. However, while current state-of-the-art phenotyping models using natural language processing (NLP) are in general very efficient in extracting phenotypes, they struggle to extract phenotypes that require numerical reasoning. In this article, we propose a novel unsupervised method that leverages external clinical knowledge and contextualized word embeddings by ClinicalBERT for numerical reasoning in different phenotypic contexts. Experiments show that the proposed method achieves significant improvement against unsupervised baseline methods with absolute increase in generalized Recall and F1 scores of up to 79% and 71%, respectively. Also, the proposed method outperforms supervised baseline methods with absolute increase in generalized Recall and F1 scores of up to 70% and 44%, respectively. In addition, we validate the methodology on clinical use cases where the detected phenotypes significantly contribute to patient stratification systems for a set of diseases, namely, HIV and myocardial infarction (heart attack). Moreover, we find that these phenotypes from clinical text can be used to impute the missing values in structured data, which enrich and improve data quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.