Proceedings of the 20th Workshop on Biomedical Language Processing 2021
DOI: 10.18653/v1/2021.bionlp-1.6
|View full text |Cite
|
Sign up to set email alerts
|

Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Abstract: Automatic ICD coding is the task of assigning codes from the International Classification of Diseases (ICD) to medical notes. These codes describe the state of the patient and have multiple applications, e.g., computer-assisted diagnosis or epidemiological studies. ICD coding is a challenging task due to the complexity and length of medical notes. Unlike the general trend in language processing, no transformer model has been reported to reach high performance on this task. Here, we investigate in detail ICD co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(17 citation statements)
references
References 27 publications
1
11
0
Order By: Relevance
“…(multi-label) or SOFTMAX (multi-class) function to output logits. 2 We mainly conduct our experiments on the MIMIC-III dataset (Johnson et al, 2016), where researchers still fail to transfer "the Magic of BERT" to medical code assignment tasks (Ji et al, 2021a;Pascual et al, 2021).…”
Section: Problem Formulation and Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…(multi-label) or SOFTMAX (multi-class) function to output logits. 2 We mainly conduct our experiments on the MIMIC-III dataset (Johnson et al, 2016), where researchers still fail to transfer "the Magic of BERT" to medical code assignment tasks (Ji et al, 2021a;Pascual et al, 2021).…”
Section: Problem Formulation and Datasetsmentioning
confidence: 99%
“…On datasets with longer documents, such as the MIMIC-III dataset (Johnson et al, 2016) with an average length of 2,000 words, it has been shown that multiple variants of BERT perform worse than a CNN or RNN-based model (Chalkidis et al, 2020;Vu et al, 2020;Dong et al, 2021;Ji et al, 2021a;Gao et al, 2021;Pascual et al, 2021). We believe there is a need to understand the performance of Transformer-based models on classifying documents that are actually long.…”
Section: Introductionmentioning
confidence: 99%
“…Previous works (Ji et al, 2021;Pascual et al, 2021) have shown that pretrained language models like BERT (Devlin et al, 2019) cannot help the ICD coding performance, hence we use an LSTM (Hochreiter and Schmidhuber, 1997) as our encoder. We use pre-trained word embeddings to map words w i to x i .…”
Section: Encodingmentioning
confidence: 99%
“…Especially their ability to model long-range dependencies within an input sequence would potentially benefit the task of ICD-9 coding since the information for a certain label prediction can be distributed across the whole text. Unlike other areas of natural language processing (NLP), little research on applying transformerbased architectures on the task of ICD-9 coding has been explored (Pascual et al, 2021;Biswas et al, 2021;Ji et al, 2021). Sun and Lu (2020) argue that attention scores are able to capture global, absolute importance of work tokens and can thus provide some degree of explainability for text classification.…”
Section: Introductionmentioning
confidence: 99%