Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

Li, Yikuan; Mamouei, Mohammad; Khorshidi, Gholamreza Salimi; Rao, Shishir; Hassaïne, Abdelâali; Canoy, Dexter; Lukasiewicz, Thomas; Rahimi, Kazem

doi:10.48550/arxiv.2106.11360

Cited by 3 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, we compared the performance of PCB with standard DL models for HF risk prediction, to investigate the trade-off between counterfactual reasoning and predic-tion accuracy. For PCB models, we constructed the initial representation learning architecture m(•) by adopting a previous high-performing Transformer (Devlin et al, 2019) model architecture, Hi-BEHRT, and its parameters (Li et al, 2021) (see Supplementary, Method S2 for more). Next, a two-layer multi-layer perceptron g(•) and a vector quantization component h(•) were trained to map m(x) to c and l, respectively.…”

Section: Baseline Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Li¹,

Mamouei²,

Rao³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Most machine learning (ML) models are developed for prediction only; offering no option for causal interpretation of their predictions or parameters/properties. This can hamper the health systems' ability to employ ML models in clinical decision-making processes, where the need and desire for predicting outcomes under hypothetical interventions (i.e., counterfactual reasoning/explanation) is high. In this research, we introduce a new representation learning framework (i.e., partial concept bottleneck), which considers the provision of counterfactual explanations as an embedded property of the risk model. Despite architectural changes necessary for jointly optimising for prediction accuracy and counterfactual reasoning, the accuracy of our approach is comparable to prediction-only models. Our results suggest that our proposed framework has the potential to help researchers and clinicians improve personalised care (e.g., by investigating the hypothetical differential effects of interventions).

show abstract

Section: Baseline Modelsmentioning

confidence: 99%

“…Hi-BEHRT (Li et al, 2021) We used the identical Hi-BEHRT as mentioned above for latent representation learning. However, instead of pooling the first-time step for classification, we used a two-layer multi-layer perceptron to map the representation to the highlevel concepts with 64 units for the first layer.…”

Section: S22 Hi-behrtmentioning

confidence: 99%

Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Li¹,

Mamouei²,

Rao³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently, more studies focused on the use of time-series information. Methods for such an approach include autoencoders, convolutional neural networks [3], or sequential models like recurrent neural networks (RNN) [4] or transformer-based models [5][6][7][8][9]. Transformer-based models originate from natural language processing (NLP) and have recently gained much attention since they have achieved excellent results in many areas [10][11][12][13][14].…”

Section: Introductionmentioning

confidence: 99%

“…Later, Li et al developed BERT for EHR (BEHRT), which generated a patient embedding based on the history of diagnoses and used it for disease prediction in different time windows [5]. Since BEHRT -like most transformer-based models -is limited with respect to the maximum sequence length, the authors later developed a hierarchical BEHRT variant (HI-BEHRT), which can process longer medical histories [6]. Another model, called the Bidirectional Representation Learning model with a Transformer architecture on Multimodal EHR (BRLTM), was published by Meng et al in 2021[8].…”

Section: Introductionmentioning

confidence: 99%

A Transformer-Based Model Trained on Large Scale Claims Data for Prediction of Severe COVID-19 Disease Progression

Lentzen

Lindén

Veeranki

et al. 2022

Preprint

View full text Add to dashboard Cite

In situations like the COVID-19 pandemic, healthcare systems are under enormous pressure as they can rapidly collapse under the burden of the crisis. Machine learning (ML) based risk models could lift the burden by identifying patients with high risk of severe disease progression. Electronic Health Records (EHRs) provide crucial sources of information to develop these models because they rely on routinely collected healthcare data. However, EHR data is challenging for training ML models because it contains irregularly timestamped diagnosis, prescription, and procedure codes. For such data, transformer-based models are promising. We extended the previously published Med-BERT model by including age, sex, medications, quantitative clinical measures, and state information. After pre-training on approximately 988 million EHRs from 3.5 million patients, we developed models to predict Acute Respiratory Manifestations (ARM) risk using the medical history of 80,211 COVID-19 patients. Compared to XGBoost and Random Forests, our transformer-based models more accurately forecast the risk of developing ARM after COVID-19 infection. We used Integrated Gradients and Bayesian networks to understand the link between the essential features of our model. Finally, we evaluated adapting our model to Austrian in-patient data. Our study highlights the promise of predictive transformer-based models for precision medicine.

show abstract

Self-attention with temporal prior: can we learn more from the arrow of time?

Kim,

Lee

2024

Front. Artif. Intell.

View full text Add to dashboard Cite

Many diverse phenomena in nature often inherently encode both short- and long-term temporal dependencies, which especially result from the direction of the flow of time. In this respect, we discovered experimental evidence suggesting that interrelations of these events are higher for closer time stamps. However, to be able for attention-based models to learn these regularities in short-term dependencies, it requires large amounts of data, which are often infeasible. This is because, while they are good at learning piece-wise temporal dependencies, attention-based models lack structures that encode biases in time series. As a resolution, we propose a simple and efficient method that enables attention layers to better encode the short-term temporal bias of these data sets by applying learnable, adaptive kernels directly to the attention matrices. We chose various prediction tasks for the experiments using Electronic Health Records (EHR) data sets since they are great examples with underlying long- and short-term temporal dependencies. Our experiments show exceptional classification results compared to best-performing models on most tasks and data sets.

show abstract

Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

Cited by 3 publications

References 22 publications

Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

A Transformer-Based Model Trained on Large Scale Claims Data for Prediction of Severe COVID-19 Disease Progression

Self-attention with temporal prior: can we learn more from the arrow of time?

Contact Info

Product

Resources

About