Selecting relevant features from the electronic health record for clinical code prediction

Scheurwegs, Elyne; Čule, Boris; Luyckx, Kim; Luyten, Léon; Daelemans, Walter

doi:10.1016/j.jbi.2017.09.004

Cited by 37 publications

(22 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Automatic ICD coding ICD coding is a longstanding task in the medical informatics community, which has been approached with machine learning and handcrafted methods (Scheurwegs et al, 2015). Many recent approaches, like ours, use unstructured text data as the only source of information (e.g., Kavuluru et al, 2015;Subotin and Davis, 2014), though some incorporates struc- tured data as well (e.g., Scheurwegs et al, 2017;Wang et al, 2016). Most previous methods have either evaluated only on a strict subset of the full ICD label space (Wang et al, 2016), relied on datasets that focus on a subset of medical scenarios , or evaluated on data that are not publicly available, making direct comparison difficult (Subotin and Davis, 2016).…”

Section: Related Workmentioning

confidence: 99%

Explainable Prediction of Medical Codes from Clinical Text

Mullenbach¹,

Wiegreffe²,

Duke³

et al. 2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

421

674

View full text Add to dashboard Cite

Clinical notes are text documents that are created by clinicians for each patient encounter. They are typically accompanied by medical codes, which describe the diagnosis and treatment. Annotating these codes is labor intensive and error prone; furthermore, the connection between the codes and the text is not annotated, obscuring the reasons and details behind specific diagnoses and treatments. We present an attentional convolutional network that predicts medical codes from clinical text. Our method aggregates information across the document using a convolutional neural network, and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes. The method is accurate, achieving precision@8 of 0.71 and a Micro-F1 of 0.54, which are both better than the prior state of the art. Furthermore, through an interpretability evaluation by a physician, we show that the attention mechanism identifies meaningful explanations for each code assignment.

show abstract

Section: Related Workmentioning

confidence: 99%

Explainable Prediction of Medical Codes from Clinical Text

Mullenbach¹,

Wiegreffe²,

Duke³

et al. 2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

421

674

View full text Add to dashboard Cite

show abstract

“…Variable selection and penalization methods along with sparse estimation strategies allow many predictors to be incorporated into statistical models, and there is an excellent opportunity for the use of such methods in the setting of EHR. Automated feature selection algorithms are often used within machine learning algorithms to determine which predictors to include, and this can also be combined with expert preprocessing of the candidate predictors . Regularization techniques, including LASSO, ridge regression, and elastic net, have been applied in the EHR setting …”

Section: Statistical Issues Related To Biobank Researchmentioning

confidence: 99%

“…Automated feature selection algorithms are often used within machine learning algorithms to determine which predictors to include, and this can also be combined with expert preprocessing of the candidate predictors. 153,154 Regularization techniques, including LASSO, ridge regression, and elastic net, have been applied in the EHR setting. 155,156 Machine learning algorithms have also gained popularity in EHR data analysis, particularly in the development of risk prediction models.…”

Section: Modelingmentioning

confidence: 99%

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

Beesley

Salvatore

Fritsche

et al. 2019

Statistics in Medicine

View full text Add to dashboard Cite

Biobanks linked to electronic health records provide rich resources for health‐related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large‐scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis‐generating studies of disease‐treatment, disease‐exposure, and disease‐gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank‐based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank‐based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.

show abstract

“…The authors have evaluated ICD coding performance on a dataset consisting of more than 70,000 textual Electronic Medical Records (EMRs) from the University of Kentucky (UKY) Medical Center tagged with ICD-9 codes. Integrating feature selection on both structured and unstructured data is researched by the authors of [9] and has proven to aid the classification process. Two approaches are evaluated in this setting: early and late integration of structured and unstructured data, the latter yielding the better results.…”

Section: Traditional Models For Icd Codingmentioning

confidence: 99%

A Comparison of Deep Learning Methods for ICD Coding of Clinical Records

et al. 2020

View full text Add to dashboard Cite

In this survey, we discuss the task of automatically classifying medical documents into the taxonomy of the International Classification of Diseases (ICD), by the use of deep neural networks. The literature in this domain covers different techniques. We will assess and compare the performance of those techniques in various settings and investigate which combination leverages the best results. Furthermore, we introduce an hierarchical component that exploits the knowledge of the ICD taxonomy. All methods and their combinations are evaluated on two publicly available datasets that represent ICD-9 and ICD-10 coding, respectively. The evaluation leads to a discussion of the advantages and disadvantages of the models.

show abstract

Selecting relevant features from the electronic health record for clinical code prediction

Cited by 37 publications

References 31 publications

Explainable Prediction of Medical Codes from Clinical Text

Explainable Prediction of Medical Codes from Clinical Text

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

A Comparison of Deep Learning Methods for ICD Coding of Clinical Records

Contact Info

Product

Resources

About