Shang-Chi Tsai scite author profile

Shang-Chi Tsai

3Publications

18Citation Statements Received

73Citation Statements Given

How they've been cited

How they cite others

Affiliations

National Taiwan University, Institute of Information Science, Academia Sinica, Tamkang University

Publications

Order By: Most citations

PLM-ICD: Automatic ICD Coding with Pretrained Language Models

Huang¹,

Tsai²,

Chen³

2022

View full text Add to dashboard Cite

Automatically classifying electronic health records (EHRs) into diagnostic codes has been challenging to the NLP community. State-ofthe-art methods treated this problem as a multilabel classification problem and proposed various architectures to model this problem. However, these systems did not leverage the superb performance of pretrained language models, which achieved superb performance on natural language understanding tasks. Prior work has shown that pretrained language models underperformed on this task with the regular finetuning scheme. Therefore, this paper aims at analyzing the causes of the underperformance and developing a framework for automatic ICD coding with pretrained language models. We spotted three main issues through the experiments: 1) large label space, 2) long input sequences, and 3) domain mismatch between pretraining and fine-tuning. We propose PLM-ICD, a framework that tackles the challenges with various strategies. The experimental results show that our proposed framework can overcome the challenges and achieves state-ofthe-art performance in terms of multiple metrics on the benchmark MIMIC data. 1

show abstract

Modeling Diagnostic Label Correlation for Automatic ICD Coding

Tsai¹,

Huang²,

Chen³

2021

View full text Add to dashboard Cite

Given the clinical notes written in electronic health records (EHRs), it is challenging to predict the diagnostic codes which is formulated as a multi-label classification task. The large set of labels, the hierarchical dependency, and the imbalanced data make this prediction task extremely hard. Most existing work built a binary prediction for each label independently, ignoring the dependencies between labels. To address this problem, we propose a two-stage framework to improve automatic ICD coding by capturing the label correlation. Specifically, we train a label set distribution estimator to rescore the probability of each label set candidate generated by a base predictor. This paper is the first attempt at learning the label set distribution as a reranking module for medical code prediction. In the experiments, our proposed framework is able to improve upon best-performing predictors on the benchmark MIMIC datasets. 1

show abstract

Leveraging Hierarchical Category Knowledge for Data-Imbalanced Multi-Label Diagnostic Text Understanding

Tsai¹,

Chang²,

Chen³

2019

View full text Add to dashboard Cite

Clinical notes are essential medical documents to record each patient's symptoms. Each record is typically annotated with medical diagnostic codes, which means diagnosis and treatment. This paper focuses on predicting diagnostic codes given the descriptive present illness in electronic health records by leveraging domain knowledge. We investigate various losses in a convolutional model to utilize hierarchical category knowledge of diagnostic codes in order to allow the model to share semantics across different labels under the same category. The proposed model not only considers the external domain knowledge but also addresses the issue about data imbalance. The MIMIC3 benchmark experiments show that the proposed methods can effectively utilize category knowledge and provide informative cues to improve the performance in terms of the top-ranked diagnostic codes which is better than the prior state-of-the-art. The investigation and discussion express the potential of integrating the domain knowledge in the current machine learning based models and guiding future research directions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shang-Chi Tsai

PLM-ICD: Automatic ICD Coding with Pretrained Language Models

Modeling Diagnostic Label Correlation for Automatic ICD Coding

Leveraging Hierarchical Category Knowledge for Data-Imbalanced Multi-Label Diagnostic Text Understanding

Contact Info

Product

Resources

About