Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Pascual, Damián; Luck, Sandro; Wattenhofer, Roger

doi:10.18653/v1/2021.bionlp-1.6

Cited by 29 publications

(17 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(multi-label) or SOFTMAX (multi-class) function to output logits. 2 We mainly conduct our experiments on the MIMIC-III dataset (Johnson et al, 2016), where researchers still fail to transfer "the Magic of BERT" to medical code assignment tasks (Ji et al, 2021a;Pascual et al, 2021).…”

Section: Problem Formulation and Datasetsmentioning

confidence: 99%

“…On datasets with longer documents, such as the MIMIC-III dataset (Johnson et al, 2016) with an average length of 2,000 words, it has been shown that multiple variants of BERT perform worse than a CNN or RNN-based model (Chalkidis et al, 2020;Vu et al, 2020;Dong et al, 2021;Ji et al, 2021a;Gao et al, 2021;Pascual et al, 2021). We believe there is a need to understand the performance of Transformer-based models on classifying documents that are actually long.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Revisiting Transformer-based Models for Long Document Classification

Ding¹,

Chalkidis²,

Darkner³

et al. 2022

Preprint

View full text Add to dashboard Cite

The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods. We examine several aspects of sparse attention (e.g., size of local attention window, use of global attention) and hierarchical (e.g., document splitting strategy) transformers on four document classification datasets covering different domains. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks. * This work was partially done when Dai was at the University of Copenhagen.

show abstract

Section: Problem Formulation and Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Revisiting Transformer-based Models for Long Document Classification

Ding¹,

Chalkidis²,

Darkner³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Previous works (Ji et al, 2021;Pascual et al, 2021) have shown that pretrained language models like BERT (Devlin et al, 2019) cannot help the ICD coding performance, hence we use an LSTM (Hochreiter and Schmidhuber, 1997) as our encoder. We use pre-trained word embeddings to map words w i to x i .…”

Section: Encodingmentioning

confidence: 99%

Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding

Zheng¹,

Tan²,

Huang³

2022

Preprint

View full text Add to dashboard Cite

Automatic ICD coding is defined as assigning disease codes to electronic medical records (EMRs). Existing methods usually apply label attention with code representations to match related text snippets. Unlike these works that model the label with the code hierarchy or description, we argue that the code synonyms can provide more comprehensive knowledge based on the observation that the code expressions in EMRs vary from their descriptions in ICD. By aligning codes to concepts in UMLS, we collect synonyms of every code. Then, we propose a multiple synonyms matching network to leverage synonyms for better code representation learning, and finally help the code classification. Experiments on the MIMIC-III dataset show that our proposed method outperforms previous state-of-the-art methods. * Work done at Alibaba DAMO Academy. 1 who.int/standards/classifications/ classification-of-diseases 2 "Label" equals to "code" in some contexts of this paper. 3 Our codes and model can be found at https:// github.com/GanjinZero/ICD-MSMN.

show abstract

“…Especially their ability to model long-range dependencies within an input sequence would potentially benefit the task of ICD-9 coding since the information for a certain label prediction can be distributed across the whole text. Unlike other areas of natural language processing (NLP), little research on applying transformerbased architectures on the task of ICD-9 coding has been explored (Pascual et al, 2021;Biswas et al, 2021;Ji et al, 2021). Sun and Lu (2020) argue that attention scores are able to capture global, absolute importance of work tokens and can thus provide some degree of explainability for text classification.…”

Section: Introductionmentioning

confidence: 99%

Description-based Label Attention Classifier for Explainable ICD-9 Classification

Feucht¹,

Althammer

et al. 2021

Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021)

View full text Add to dashboard Cite

ICD-9 coding is a relevant clinical billing task, where unstructured texts with information about a patient's diagnosis and treatments are annotated with multiple ICD-9 codes. Automated ICD-9 coding is an active research field, where CNN-and RNN-based model architectures represent the state-of-the-art approaches. In this work, we propose a description-based label attention classifier to improve the model explainability when dealing with noisy texts like clinical notes. We evaluate our proposed method with different transformer-based encoders on the MIMIC-III-50 dataset. Our method achieves strong results together with augmented explainablilty.

show abstract

Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Cited by 29 publications

References 27 publications

Revisiting Transformer-based Models for Long Document Classification

Revisiting Transformer-based Models for Long Document Classification

Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding

Description-based Label Attention Classifier for Explainable ICD-9 Classification

Contact Info

Product

Resources

About