2022
DOI: 10.1177/14604582221077000
|View full text |Cite
|
Sign up to set email alerts
|

A method for machine learning generation of realistic synthetic datasets for validating healthcare applications

Abstract: Digital health applications can improve quality and effectiveness of healthcare, by offering a number of new tools to users, which are often considered a medical device. Assuring their safe operation requires, amongst others, clinical validation, needing large datasets to test them in realistic clinical scenarios. Access to datasets is challenging, due to patient privacy concerns. Development of synthetic datasets is seen as a potential alternative. The objective of the paper is the development of a method for… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 25 publications
0
8
0
Order By: Relevance
“…Similar methods have been performed in other studies to generate health data for the evaluation of healthcare solutions 24,25 . Synthetic datasets that both preserved the statistical properties of the original real data and prevented any disclosure of patient information were generated 26 .…”
Section: Data Compilationmentioning
confidence: 98%
“…Similar methods have been performed in other studies to generate health data for the evaluation of healthcare solutions 24,25 . Synthetic datasets that both preserved the statistical properties of the original real data and prevented any disclosure of patient information were generated 26 .…”
Section: Data Compilationmentioning
confidence: 98%
“…The idea that synthetic data augmentation can support deflating inherent bias in large-scale image datasets is presented by Jaipuria et al [13] as an approach consisting of mixing GAN and gaming-engine simulations, creating semantically consistent data of targeted task-specific scenarios [13]. Similarly, a GAN trained over six experiments with a mix of numerical and categorical variables originated from three datasets is discussed by Arvanitis et al [14]. The generated synthetic dataset was validated across correlation matrices of real and generated data by using the Jaccard similarity.…”
Section: Related Workmentioning
confidence: 99%
“…The last field -the use of synthetic data sets in training machine learning algorithms also has a long history of research related to it. Synthetically generated data sets do not necessarily have to be images and have been used in many areas, ranging from sociology [31], finance [32], medicine [33], to the issues related to computer vision. Due to the fact that sets of correctly labeled data are necessary in the training of ML algorithms, and their manual collection is extremely time-consuming, automatic generators of synthetic data were also developed, thus further reducing laboriousness of building a data sets [34].…”
Section: Related Workmentioning
confidence: 99%