Multi-View Deep Learning Framework for Predicting Patient Expenditure in Healthcare

Zeng, Xianlong; Lin, Simon; Liu, Chang

doi:10.1109/ojcs.2021.3052518

Cited by 14 publications

(8 citation statements)

References 31 publications

(31 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We can observe that the pre-training model can gain sufficient predictive power on the two downstream predictive tasks, adding demographic and utilization information can barely improve the model performance. This observation is consistent with experimental results found in the previously study 30 . The marginal improvement is likely due to the fact that medical codes often contain information that is overlapped with medical utilization information and demographic information.…”

Section: Resultssupporting

confidence: 94%

“…Xiang et al 29 predict the risk of asthma exacerbations and explore the potential risk factors involved in the progression of asthma via a time-sensitive attentive neural network. Zeng et al 30 developed a multi-view framework to predict the future medical expenses for better care delivery and care management. Choi et al 31 proposed RETAIN to estimate the future heart failure rate with explainable risk factors.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Pretrained transformer framework on pediatric claims data for population specific tasks

Zeng

Lin

Liu

2022

Sci Rep

Self Cite

View full text Add to dashboard Cite

The adoption of electronic health records (EHR) has become universal during the past decade, which has afforded in-depth data-based research. By learning from the large amount of healthcare data, various data-driven models have been built to predict future events for different medical tasks, such as auto diagnosis and heart-attack prediction. Although EHR is abundant, the population that satisfies specific criteria for learning population-specific tasks is scarce, making it challenging to train data-hungry deep learning models. This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset, followed by a discriminative fine-tuning on each population-specific task. The semantic meaning of medical events can be captured in the pre-training stage, and the effective knowledge transfer is completed through the task-aware fine-tuning stage. The fine-tuning process requires minimal parameter modification without changing the model architecture, which mitigates the data scarcity issue and helps train the deep learning model adequately on small patient cohorts. We conducted experiments on a real-world pediatric dataset with more than one million patient records. Experimental results on two downstream tasks demonstrated the effectiveness of our method: our general task-agnostic pre-training framework outperformed tailored task-specific models, achieving more than 10% higher in model performance as compared to baselines. In addition, our framework showed a potential to transfer learned knowledge from one institution to another, which may pave the way for future healthcare model pre-training across institutions.

show abstract

Section: Resultssupporting

confidence: 94%

Section: Related Workmentioning

confidence: 99%

Pretrained transformer framework on pediatric claims data for population specific tasks

Zeng

Lin

Liu

2022

Sci Rep

Self Cite

View full text Add to dashboard Cite

show abstract

“…The accuracy, precision, recall, and F1-score were used as metrics to evaluate the prediction performance. Each metric was calculated, as shown in Equations ( 10) through (13), using the confusion matrix shown in Table 7.…”

Section: Metricsmentioning

confidence: 99%

“…The development of deep learning-based NLP techniques has led to powerful data-driven approaches [9]- [11]. NLP techniques have been used to predict patient prognoses [12], AEs [4], and patient healthcare expenditures [13]. Although there have been some studies with imbalanced text data [14]- [16] or medical data analyses using NLP techniques [17]- [21], imbalances in medical data have not yet been addressed.…”

Section: Introductionmentioning

confidence: 99%

An NLP-Inspired Data Augmentation Method for Adverse Event Prediction Using an Imbalanced Healthcare Dataset

2022

View full text Add to dashboard Cite

This paper proposes a data augmentation method for imbalanced healthcare datasets. This method was inspired by a data augmentation method in natural language processing (NLP) that generates synthetic sentences for training by replacing some words with similar words. The proposed method generates synthetic patient records by replacing patient backgrounds with similar backgrounds. In this paper, the cosine similarity of the distributed representations was used as the similarity metric between patient backgrounds. The distributed representations of the patient backgrounds were generated by the skip-gram model. To confirm the performance improvement with the proposed data augmentation method, the prediction performance of adverse events (AEs) caused by drug administration was experimentally evaluated on a real-world medical dataset with 1,510,137 records. The combination of the proposed data augmentation method and a conventional undersampling method resulted in an 80.0 % improvement in accuracy and a 40.0 % improvement in the precision and F1-score. The multifaceted evaluation demonstrated that the proposed method is effective, especially for predicting AEs with positive ratios ranging from 1.0 % to 2.1 %, which are difficult to predict with conventional machine learning methods but should be predictable in the medical field.INDEX TERMS Adverse event prediction, data augmentation, distributed representation, healthcare dataset, imbalanced dataset.

show abstract

“…GRAM [15] and PRIME [16] were developed to incorporate medical domain knowledge for estimating patients' heart failure risk based on their prior medical visits. Zeng et al [17] and Morid et al [18] proposed a sequential deep learning model to predict next year's medical cost for estimating a patient's risk level. Yang et al [19] evaluate the explainability and fidelity of the current deep learning models with respect to predicting future medical costs.…”

Section: Supervised Risk Prediction Modelsmentioning

confidence: 99%

Transformer-based unsupervised patient representation learning based on medical claims for risk stratification and analysis

Zeng

Lin

Liu

2021

Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Self Cite

View full text Add to dashboard Cite

The claims data, containing medical codes, services information, and incurred expenditure, can be a good resource for estimating an individual's health condition and medical risk level. In this study, we developed Transformer-based Multimodal AutoEncoder (TMAE), an unsupervised learning framework that can learn efficient patient representation by encoding meaningful information from the claims data. TMAE is motivated by the practical needs in healthcare to stratify patients into different risk levels for improving care delivery and management. Compared to previous approaches, TMAE is able to 1) model inpatient, outpatient, and medication claims collectively, 2) handle irregular time intervals between medical events, 3) alleviate the sparsity issue of the rare medical codes, and 4) incorporate medical expenditure information. We trained TMAE using a real-world pediatric claims dataset containing more than 600,000 patients and compared its performance with various approaches in two clustering tasks. Experimental results demonstrate that TMAE has superior performance compared to all baselines. Multiple downstream applications are also conducted to illustrate the effectiveness of our framework. The promising results confirm that the TMAE framework is scalable to large claims data and is able to generate efficient patient embeddings for risk stratification and analysis.

show abstract

Multi-View Deep Learning Framework for Predicting Patient Expenditure in Healthcare

Cited by 14 publications

References 31 publications

Pretrained transformer framework on pediatric claims data for population specific tasks

Pretrained transformer framework on pediatric claims data for population specific tasks

An NLP-Inspired Data Augmentation Method for Adverse Event Prediction Using an Imbalanced Healthcare Dataset

Transformer-based unsupervised patient representation learning based on medical claims for risk stratification and analysis

Contact Info

Product

Resources

About