2017
DOI: 10.1016/j.jbi.2017.09.004
|View full text |Cite
|
Sign up to set email alerts
|

Selecting relevant features from the electronic health record for clinical code prediction

Abstract: A multitude of information sources is present in the electronic health record (EHR), each of which can contain clues to automatically assign diagnosis and procedure codes. These sources however show information overlap and quality differences, which complicates the retrieval of these clues. Through feature selection, a denser representation with a consistent quality and less information overlap can be obtained. We introduce and compare coverage-based feature selection methods, based on confidence and informati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
18
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(22 citation statements)
references
References 31 publications
1
18
0
1
Order By: Relevance
“…Automatic ICD coding ICD coding is a longstanding task in the medical informatics community, which has been approached with machine learning and handcrafted methods (Scheurwegs et al, 2015). Many recent approaches, like ours, use unstructured text data as the only source of information (e.g., Kavuluru et al, 2015;Subotin and Davis, 2014), though some incorporates struc- tured data as well (e.g., Scheurwegs et al, 2017;Wang et al, 2016). Most previous methods have either evaluated only on a strict subset of the full ICD label space (Wang et al, 2016), relied on datasets that focus on a subset of medical scenarios , or evaluated on data that are not publicly available, making direct comparison difficult (Subotin and Davis, 2016).…”
Section: Related Workmentioning
confidence: 99%
“…Automatic ICD coding ICD coding is a longstanding task in the medical informatics community, which has been approached with machine learning and handcrafted methods (Scheurwegs et al, 2015). Many recent approaches, like ours, use unstructured text data as the only source of information (e.g., Kavuluru et al, 2015;Subotin and Davis, 2014), though some incorporates struc- tured data as well (e.g., Scheurwegs et al, 2017;Wang et al, 2016). Most previous methods have either evaluated only on a strict subset of the full ICD label space (Wang et al, 2016), relied on datasets that focus on a subset of medical scenarios , or evaluated on data that are not publicly available, making direct comparison difficult (Subotin and Davis, 2016).…”
Section: Related Workmentioning
confidence: 99%
“…Variable selection and penalization methods along with sparse estimation strategies allow many predictors to be incorporated into statistical models, and there is an excellent opportunity for the use of such methods in the setting of EHR. Automated feature selection algorithms are often used within machine learning algorithms to determine which predictors to include, and this can also be combined with expert preprocessing of the candidate predictors . Regularization techniques, including LASSO, ridge regression, and elastic net, have been applied in the EHR setting …”
Section: Statistical Issues Related To Biobank Researchmentioning
confidence: 99%
“…Automated feature selection algorithms are often used within machine learning algorithms to determine which predictors to include, and this can also be combined with expert preprocessing of the candidate predictors. 153,154 Regularization techniques, including LASSO, ridge regression, and elastic net, have been applied in the EHR setting. 155,156 Machine learning algorithms have also gained popularity in EHR data analysis, particularly in the development of risk prediction models.…”
Section: Modelingmentioning
confidence: 99%
“…The authors have evaluated ICD coding performance on a dataset consisting of more than 70,000 textual Electronic Medical Records (EMRs) from the University of Kentucky (UKY) Medical Center tagged with ICD-9 codes. Integrating feature selection on both structured and unstructured data is researched by the authors of [9] and has proven to aid the classification process. Two approaches are evaluated in this setting: early and late integration of structured and unstructured data, the latter yielding the better results.…”
Section: Traditional Models For Icd Codingmentioning
confidence: 99%