2023
DOI: 10.1038/s41746-023-00840-9
|View full text |Cite
|
Sign up to set email alerts
|

A foundational vision transformer improves diagnostic performance for electrocardiograms

Abstract: The electrocardiogram (ECG) is a ubiquitous diagnostic modality. Convolutional neural networks (CNNs) applied towards ECG analysis require large sample sizes, and transfer learning approaches for biomedical problems may result in suboptimal performance when pre-training is done on natural images. We leveraged masked image modeling to create a vision-based transformer model, HeartBEiT, for electrocardiogram waveform analysis. We pre-trained this model on 8.5 million ECGs and then compared performance vs. standa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(9 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…Our group has demonstrated that CNN architectures continue to increase performance with increasing amounts of ECG data even above 500 000 training examples. 18 Given these prior works, we did not seek to demonstrate the additive value of UKBB data, which would require multiple model retrainings at different training sample sizes. This is an interesting direction of research in the data science realm and would be useful to establish minimum sample size estimates for neural network training in the ECG analysis.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our group has demonstrated that CNN architectures continue to increase performance with increasing amounts of ECG data even above 500 000 training examples. 18 Given these prior works, we did not seek to demonstrate the additive value of UKBB data, which would require multiple model retrainings at different training sample sizes. This is an interesting direction of research in the data science realm and would be useful to establish minimum sample size estimates for neural network training in the ECG analysis.…”
Section: Discussionmentioning
confidence: 99%
“…Neural network architectures are uniquely able to extract information‐rich features from complex waveform data and form the basis of several published DL‐ECG models. 14 , 15 , 16 , 17 , 18 We trained the network using the Adam optimizer with cross‐entropy as the loss function for classification and mean absolute error for regression tasks, respectively. To minimize overfitting, we monitored train‐test loss over training epochs against an internal validation set of 5% of ECGs within the training cohort.…”
Section: Methodsmentioning
confidence: 99%
“…Another avenue worth exploring is the replacement of the Fourier method in spectrogram creation with a signature-based approach [90]. The application of self-supervised learning to incorporate more domain-specific data shows promise, as evidenced by a recent paper that introduced HeartBEiT, a vision-based transformer model for ECG analysis [91]. HeartBEiT demonstrated significantly superior performance at lower sample sizes compared to standard CNNs.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, the model leverages both spatial understanding of features from image data and features from clinical data using a fusion layer for a more holistic feature representation. Recently, the Vision Transformer (ViT) model [12] was developed for processing ECG for diagnosis of hypertrophic cardiomyopathy, low left ventricular ejection fraction, and ST elevation myocardial infarction and achieved state-of-the-art performance [12]. For comparative benchmarking on the PCI dataset, we evaluated the predictive performance of the CNN model with block attention which preserves local and semi-global dependencies, against a Vision Transformer (ViT) model which preserves long-range dependency, presenting an extensive analysis of their respective results in the context of post-PCI prognosis.…”
Section: Introductionmentioning
confidence: 99%