Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification

Park, Sangjoon; Kim, Gwanghyun; Oh, Yung-Hwan; Seo, Joon Beom; Lee, Sang Min; Kim, Jin Hwan; Moon, Sung-Jun; Lim, Jae‐Kwang; Ye, Jong Chul

doi:10.48550/arxiv.2104.07235

Cited by 2 publications

(3 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They used an encoder-decoder design. ViT was recently used to diagnose and predict the severity of COVID-19, demonstrating its SOTA performance [24].…”

Section: Vision Transformer Modelsmentioning

confidence: 99%

Peer-to-Peer Federated Learning for COVID-19 Detection Using Transformers

Chetoui

Akhloufi

2023

Computers

View full text Add to dashboard Cite

The simultaneous advances in deep learning and the Internet of Things (IoT) have benefited distributed deep learning paradigms. Federated learning is one of the most promising frameworks, where a server works with local learners to train a global model. The intrinsic heterogeneity of IoT devices, or non-independent and identically distributed (Non-I.I.D.) data, combined with the unstable communication network environment, causes a bottleneck that slows convergence and degrades learning efficiency. Additionally, the majority of weight averaging-based model aggregation approaches raise questions about learning fairness. In this paper, we propose a peer-to-peer federated learning (P2PFL) framework based on Vision Transformers (ViT) models to help solve some of the above issues and classify COVID-19 vs. normal cases on Chest-X-Ray (CXR) images. Particularly, clients jointly iterate and aggregate the models in order to build a robust model. The experimental results demonstrate that the proposed approach is capable of significantly improving the performance of the model with an Area Under Curve (AUC) of 0.92 and 0.99 for hospital-1 and hospital-2, respectively.

show abstract

“…They used an encoder-decoder design. ViT was recently used to diagnose and predict the severity of COVID-19, demonstrating its SOTA performance [24].…”

Section: Vision Transformer Modelsmentioning

confidence: 99%

Peer-to-Peer Federated Learning for COVID-19 Detection Using Transformers

Chetoui

Akhloufi

2023

Computers

View full text Add to dashboard Cite

show abstract

“…Recently, ViT was successfully used for diagnosis and severity prediction of COVID-19, showing the SOTA performance [43]. Specifically, to alleviate the overfitting problem with limited data available, the overall framework is decomposed into two steps: the pre-trained backbone network to classify common low-level CXR features, which was leveraged in the second step by Transformer for high-level diagnosis and severity prediction of COVID-19.…”

Section: Related Workmentioning

confidence: 99%

“…As suggested in Park et al [43], the head for classification was first initialized with pre-trained weights from the CheXpert dataset. We minimized the cross-entropy loss for the classification task.…”

Section: Implementation Detailsmentioning

confidence: 99%

Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training

Park¹,

Kim²,

Kim³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Federated learning, which shares the weights of the neural network across clients, is gaining attention in the healthcare sector as it enables training on a large corpus of decentralized data while maintaining data privacy. For example, this enables neural network training for COVID-19 diagnosis on chest X-ray (CXR) images without collecting patient CXR data across multiple hospitals. Unfortunately, the exchange of the weights quickly consumes the network bandwidth if highly expressive network architecture is employed. So-called split learning partially solves this problem by dividing a neural network into a client and a server part, so that the client part of the network takes up less extensive computation resources and bandwidth. However, it is not clear how to find the optimal split without sacrificing the overall network performance. To amalgamate these methods and thereby maximize their distinct strengths, here we show that the Vision Transformer, a recently developed deep learning architecture with straightforward decomposable configuration, is ideally suitable for split learning without sacrificing performance. Even under the non-independent and identically distributed data distribution which emulates a real collaboration between hospitals using CXR datasets from multiple sources, the proposed framework was able to attain performance comparable to data-centralized training. In addition, the proposed framework along with heterogeneous multi-task clients also improves individual task performances including the diagnosis of COVID-19, eliminating the need for sharing large weights with innumerable parameters. Our results affirm the suitability of Transformer for collaborative learning in medical imaging and pave the way forward for future real-world implementations.

show abstract

Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification

Cited by 2 publications

References 39 publications

Peer-to-Peer Federated Learning for COVID-19 Detection Using Transformers

Peer-to-Peer Federated Learning for COVID-19 Detection Using Transformers

Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training

Contact Info

Product

Resources

About