Representation learning for clinical time series prediction tasks in electronic health records

Ruan, Tong; Lei, Liqi; Zhou, Yangming; Zhai, Jie; Zhang, Le; He, Ping; Gao, Ju

doi:10.1186/s12911-019-0985-7

Cited by 36 publications

(30 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The representation of how a patient's condition evolves over time using ML has been predominantly studied from the perspective of temporal disease trajectories through consecutive clinical encounters [19,33,34]. These representations are derived mainly from clinical diagnosis codes, which can be merged with other categorical information such as medications, demographics or non-numeric descriptions of laboratory test results [8,18]. In this work we focus on a narrower time-window, hospital admissions in a general inpatient population, and use laboratory blood tests and vital signs to form a numeric representation of patients' physiological status and illness acuity over time.…”

Section: Discussionmentioning

confidence: 99%

Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models

Herrero-Zazo

Taylor

et al. 2021

Preprint

View full text Add to dashboard Cite

The implementation of Electronic Health Records (EHR) in UK hospitals provides new opportunities for clinical 'big data' analysis. The representation of observations routinely recorded in clinical practice is the first step to use these data in several research tasks. Anonymised data were extracted from 11 158 first emergency admission episodes (AE) in older adults. Irregular records from 23 laboratory blood tests and vital signs were normalized and regularised into daily bins and represented as numerical multivariate time-series (MVTS). Unsupervised Hidden Markov Models (HMM) were trained to represent each day of each AE as one of 17 state spaces. The visual clinical interpretation of these states showed remarkable differences between patients who died at the end of the AE and those who were discharged. All states had marked features that allowed their clinical interpretation and differentiation between those associated with the patients' disease burden, their physiological response to this burden or the stage of admission. The most evident relationships with hold-out clinical information were also confirmed by Chi-square tests, with two states strongly associated with inpatient mortality (IM) and 12 states (71%) associated with at least one admission diagnosis. The potential of these data representations on prediction of hospital outcomes was also explored using Logistic Regression (LR) and Random Forest (RF) models, with higher prediction performance observed when models were trained with MVTS data compared to HMM state spaces. However, the outputs of generative and discriminative analyses were complementary. For example, highest ranking features of the best performing RF model for IM (ROC-AUC 0.851) resembled the laboratory blood test and vital sign variables characterising the 'Early Inflammatory Response-like' state, itself strongly associated with IM. These results provide evidence of the capability of generative models to extract biological signals from routinely collected clinical data and their potential to represent interpretable patients' trajectories for future research in hypothesis generation or prediction modelling.

show abstract

Section: Discussionmentioning

confidence: 99%

Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models

Herrero-Zazo

Taylor

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We considered as benchmarks five supervised methods using the labeled set alone: (i) LASSO-penalized logistic regression [16,17,34,37–39], (ii) random forest (RF) [40,41], (iii) linear discriminant analysis (LDA) [42], and (iv) LSTM-gated recurrent neural network (RNN) [24,39,43,44] trained with raw feature counts C i , t , as well as (v) LDA trained with patient-timepoint embeddings generated without weights , which we refer to as LDA embed . In addition, we considered a semi-supervised benchmark: hidden markov model (HMM) [26–29,45,46] with a multivariate gaussian emission trained with the weight-free embeddings .…”

Section: Methodsmentioning

confidence: 99%

“…Recurrent neural networks (RNNs), which are designed for sequence data and well-conditioned to high feature dimensions, have enjoyed particularly widespread application to prediction using EHR data. [21][22][23][24][25] However, these models require large numbers of training labels to achieve stable performance, which is not feasible for phenotypes necessitating manual labeling. Consequently, existing applications of RNNs to EHR-based prediction all use readily available outcome measures such as discharge billing codes, limiting application to phenotypes with reliable codified proxies.…”

Section: Introductionmentioning

confidence: 99%

SAMGEP: A Novel Method for Prediction of Phenotype Event Times Using the Electronic Health Record

Ahuja

Hong

Xia

et al. 2021

Preprint

View full text Add to dashboard Cite

Objective: While there exist numerous methods to predict binary phenotypes using electronic health record (EHR) data, few exist for prediction of phenotype event times, or equivalently phenotype state progression. Estimating such quantities could enable more powerful use of EHR data for temporal analyses such as survival and disease progression. We propose Semi-supervised Adaptive Markov Gaussian Embedding Process (SAMGEP), a semi-supervised machine learning algorithm to predict phenotype event times using EHR data. Methods: SAMGEP broadly consists of four steps: (i) assemble time-evolving EHR features predictive of the target phenotype event, (ii) optimize weights for combining raw features and feature embeddings into dense patient-timepoint embeddings, (iii) fit supervised and semi-supervised Markov Gaussian Process models to this embedding progression to predict marginal phenotype probabilities at each timepoint, and (iv) take a weighted average of these supervised and semi-supervised predictions. SAMGEP models latent phenotype states as a binary Markov process, conditional on which patient- timepoint embeddings are assumed to follow a Gaussian Process. Results: SAMGEP achieves significantly improved AUCs and F1 scores relative to common machine learning approaches in both simulations and a real-world task using EHR data to predict multiple sclerosis relapse. It is particularly adept at predicting a patient's longitudinal phenotype course, which can be used to estimate population-level cumulative probability and count process estimators. Reassuringly, it is robust to a variety of generative model parameters. Discussion: SAMGEP's event time predictions can be used to estimate accurate phenotype progression curves for use in downstream temporal analyses, such as a survival study for comparative effectiveness research.

show abstract

“…The importance of diagnosis codes should be taken seriously. A recurrent neural network based denoising autoencoder, proposed in [29], was employed to encode in-hospital records of each patient into a low dimensional dense vector. The patient representation they learned is used to the prediction of clinical events.…”

Section: Representation Learning In Ehrsmentioning

confidence: 99%

Multi-layer Representation Learning and Its Application to Electronic Health Records

Yang

Zheng

et al. 2021

Neural Process Lett

View full text Add to dashboard Cite

Electronic Health Records (EHRs) are digital records associated with hospitalization, diagnosis, medications and so on. Secondary use of EHRs can promote the clinical informatics applications and the development of healthcare undertaking. EHRs have the unique characteristic where the patient visits are temporally ordered but the diagnosis codes within a visit are randomly ordered. The hierarchical structure requires a multi-layer network to explore the different relational information of EHRs. In this paper, we propose a Multi-Layer Representation Learning method (MLRL), which is capable of learning effective patient representation by hierarchically exploring the valuable information in both diagnosis codes and patient visits. Firstly, MLRL utilizes the multi-head attention mechanism to explore the potential connections in diagnosis codes, and a linear transformation is implemented to further map the code vectors to non-negative real-valued representations. The initial visit vectors are then obtained by summarizing all the code representations. Secondly, the proposed method combines Bidirectional Long Short-Term Memory with self-attention mechanism to learn the weighted visit vectors which are aggregated to form the patient representation. Finally, to evaluate the performance of MLRL, we apply it to patient's mortality prediction on real EHRs and the experimental results demonstrate that MLRL has a significant improvement in prediction performance. MLRL achieves around 0.915 in Area Under Curve which is superior to the results obtained by baseline methods. Furthermore, compared with raw data and other data representations, the learned representation with MLRL shows its outstanding results and availability on multiple different classifiers.

show abstract

Representation learning for clinical time series prediction tasks in electronic health records

Cited by 36 publications

References 30 publications

Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models

Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models

SAMGEP: A Novel Method for Prediction of Phenotype Event Times Using the Electronic Health Record

Multi-layer Representation Learning and Its Application to Electronic Health Records

Contact Info

Product

Resources

About