Abstract:Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely Clinical-BERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longf… Show more
“…This strategy allows us to make better use of the available data since no data is discarded. In future work, these limitations can also be addressed by using a sliding window approach [27] or using another transformer-based architecture, such as Longformer [28], that is able to process longer text sequences.…”
Clinical prediction models are often based solely on the use of structured data in electronic health records, e.g. vital parameters and laboratory results, effectively ignoring potentially valuable information recorded in other modalities, such as freetext clinical notes. Here, we report on the development of a multimodal model that combines structured and unstructured data. In particular, we study how best to make use of a clinical language model in a multimodal setup for predicting 30-day all-cause mortality upon hospital admission in patients with COVID-19. We evaluate three strategies for incorporating a domain-specific clinical BERT model in multimodal prediction systems: (i) without fine-tuning, (ii) with unimodal fine-tuning, and (iii) with multimodal fine-tuning. The best-performing model leverages multimodal fine-tuning, in which the clinical BERT model is updated based also on the structured data. This multimodal mortality prediction model is shown to outperform unimodal models that are based on using either only structured data or only unstructured data. The experimental results indicate that clinical prediction models can be improved by including data in other modalities and that multimodal fine-tuning of a clinical language model is an effective strategy for incorporating information from clinical notes in multimodal prediction systems.
“…This strategy allows us to make better use of the available data since no data is discarded. In future work, these limitations can also be addressed by using a sliding window approach [27] or using another transformer-based architecture, such as Longformer [28], that is able to process longer text sequences.…”
Clinical prediction models are often based solely on the use of structured data in electronic health records, e.g. vital parameters and laboratory results, effectively ignoring potentially valuable information recorded in other modalities, such as freetext clinical notes. Here, we report on the development of a multimodal model that combines structured and unstructured data. In particular, we study how best to make use of a clinical language model in a multimodal setup for predicting 30-day all-cause mortality upon hospital admission in patients with COVID-19. We evaluate three strategies for incorporating a domain-specific clinical BERT model in multimodal prediction systems: (i) without fine-tuning, (ii) with unimodal fine-tuning, and (iii) with multimodal fine-tuning. The best-performing model leverages multimodal fine-tuning, in which the clinical BERT model is updated based also on the structured data. This multimodal mortality prediction model is shown to outperform unimodal models that are based on using either only structured data or only unstructured data. The experimental results indicate that clinical prediction models can be improved by including data in other modalities and that multimodal fine-tuning of a clinical language model is an effective strategy for incorporating information from clinical notes in multimodal prediction systems.
“…In addition, because it is trained with general domain text such as that from Wikipedia, it is hard to cover clinical expertise contained in the discharge summary (Lee et al 2020). Instead of BERT, we used the Clinical-Longformer model (Li et al 2022), which is trained with biomedical text and can embed long text up to 4096 tokens, with a bi-LSTM layer as the text encoder.…”
Section: Encoding Part Of Paatmentioning
confidence: 99%
“…• Our encoder, composed of Clinical-Longformer (Li et al 2022) and a bidirectional long short-term memory (bi-LSTM) layer, can be applied to long texts exceeding the maximum allowable input length of Longformer-based models, i.e., 4096 tokens, by segmentally encoding the input text.…”
International Classification of Diseases (ICD) is a global medical classification system which provides unique codes for diagnoses and procedures appropriate to a patient's clinical record. However, manual coding by human coders is expensive and error-prone. Automatic ICD coding has the potential to solve this problem. With the advancement of deep learning technologies, many deep learning-based methods for automatic ICD coding are being developed. In particular, a label attention mechanism is effective for multi-label classification, i.e., the ICD coding. It effectively obtains the labelspecific representations from the input clinical records. However, because the existing label attention mechanism finds key tokens in the entire text at once, the important information dispersed in each paragraph may be omitted from the attention map. To overcome this, we propose a novel neural network architecture composed of two parts of encoders and two kinds of label attention layers. The input text is segmentally encoded in the former encoder and integrated by the follower. Then, the conventional and partition-based label attention mechanisms extract important global and local feature representations. Our classifier effectively integrates them to enhance the ICD coding performance. We verified the proposed method using the MIMIC-III, a benchmark dataset of the ICD coding. Our results show that our network improves the ICD coding performance based on the partition-based mechanism.
“…(For the sake of brevity, we refer readers to the original papers for more details.) In the biomedical area, Li et al [19] have proposed two models based on BigBird [10] and LongFormer [11] architectures. However, their research did not consider the paramount importance of using a tailored tokenizer for a specific domain with the default vocabulary from the source model.…”
<p>For this research, we propose a biomedical language model trained on biomedical publicly available datasets from Kaggle, Pubmed abstract baseline 2019, and MIMIC III.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.