Overcoming barriers of multi-center data analysis is challenging due to privacy protection and data heterogeneity in the healthcare system. In this study, we propose Distributed Synthetic Learning (DSL) architecture to learn across multi-medical centers without leaking sensitive personal information. DSL emphasizes the building of a homogeneous data center with entirely synthetic medical images via a form of GAN-based synthetic learning. In particular, DSL architecture is extensible with three key variances: multi-modality learning, missing modality completion learning, and continuous learning over time. We systematically evaluate the performance of DSL on different medical applications using cardiac computed tomography angiography (CTA), brain tumor MRI, and histopathology nuclei datasets. Extensive experiments demonstrate the superior performance of DSL as a high-quality synthetic medical image provider for various image tasks from an ideal synthetic image quality metric called Dist-FID. We show that our model can be adapted to heterogeneous data and remarkably outperforms the real misaligned modalities segmentation model by 55% and the temporal datasets segmentation model by 8%. The proposed DSL framework demonstrates its potential for integrating multi-center heterogeneous data to support downstream clinical decision making.
Statistically and information-wise adequate data plays a critical role in training a robust deep learning model. However, collecting sufficient medical data to train a centralized model is still challenging due to various constraints such as privacy regulations and security. In this work, we develop a novel privacy-preserving federated-discriminator GAN, named FedD-GAN, that can learn and synthesize high-quality and various medical images regardless of their type, from heterogeneous datasets residing in multiple data centers whose data cannot be transferred or shared. We trained and evaluated FedD-GAN on three essential classes of medical data, each involving different types of medical images: cardiac CTA, brain MRI, and histopathology. We show that the synthesized images using our method have better quality than using a standard federated learning method and are realistic and accurate enough to train accurate segmentation models in downstream tasks. The segmentation model trained on the synthetic data only is comparable to that trained on an all-in-one real-image dataset shared from multiple data centers if possible. FedD-GAN can learn to generate a scalable and diverse synthetic database without compromising data privacy. This synthetic database could help to boost machine learning techniques in medical data analytics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.