A method for machine learning generation of realistic synthetic datasets for validating healthcare applications

Arvanitis, Theodoros N.; White, Sarahlouise; Harrison, Stuart; R, Chaplin; Despotou, George

doi:10.1177/14604582221077000

Cited by 17 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar methods have been performed in other studies to generate health data for the evaluation of healthcare solutions 24,25 . Synthetic datasets that both preserved the statistical properties of the original real data and prevented any disclosure of patient information were generated 26 .…”

Section: Data Compilationmentioning

confidence: 98%

Entropy Removal of Medical Diagnostics

Chong

Yoon

et al. 2023

Preprint

View full text Add to dashboard Cite

Shannon entropy is a core concept in machine learning and information theory, particularly in decision tree modeling. Decision tree representations of medical decision-making tools can be generated using diagnostic metrics found in literature and entropy removal can be calculated for these tools. This analysis was done for 623 diagnostic tools and provided unique insights into the utility of such tools. This concept of clinical entropy removal has significant potential for further use to bring forth healthcare innovation, such as the quantification of the impact of clinical guidelines and value of care and applications to Emergency Medicine scenarios where diagnostic accuracy in a limited time window is paramount. For studies that provided detailed data on medical decision-making algorithms, bootstrapped datasets were generated from source data in order to perform comprehensive machine learning analysis on these algorithms and their constituent steps, which revealed a novel thorough evaluation of medical diagnostic algorithms.

show abstract

Section: Data Compilationmentioning

confidence: 98%

Entropy Removal of Medical Diagnostics

Chong

Yoon

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The idea that synthetic data augmentation can support deflating inherent bias in large-scale image datasets is presented by Jaipuria et al [13] as an approach consisting of mixing GAN and gaming-engine simulations, creating semantically consistent data of targeted task-specific scenarios [13]. Similarly, a GAN trained over six experiments with a mix of numerical and categorical variables originated from three datasets is discussed by Arvanitis et al [14]. The generated synthetic dataset was validated across correlation matrices of real and generated data by using the Jaccard similarity.…”

Section: Related Workmentioning

confidence: 99%

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

et al. 2022

View full text Add to dashboard Cite

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

show abstract

“…The last field -the use of synthetic data sets in training machine learning algorithms also has a long history of research related to it. Synthetically generated data sets do not necessarily have to be images and have been used in many areas, ranging from sociology [31], finance [32], medicine [33], to the issues related to computer vision. Due to the fact that sets of correctly labeled data are necessary in the training of ML algorithms, and their manual collection is extremely time-consuming, automatic generators of synthetic data were also developed, thus further reducing laboriousness of building a data sets [34].…”

Section: Related Workmentioning

confidence: 99%

Computer Vision Based Inspection on Post-Earthquake With UAV Synthetic Dataset

et al. 2022

View full text Add to dashboard Cite

The area affected by the earthquake is vast and often difficult to entirely cover, and the earthquake itself is a sudden event that causes multiple defects simultaneously, that cannot be effectively traced using traditional, manual methods. This article presents an innovative approach to the problem of detecting damage after sudden events by using interconnected set of deep machine learning models organized in a single pipeline and allowing for easy modification and swapping models seamlessly. Models in the pipeline were trained with a synthetic dataset and were adapted to be further evaluated and used with unmanned aerial vehicles (UAVs) in real-world conditions. Thanks to the methods presented in the article, it is possible to obtain high accuracy in detecting buildings defects, segmenting constructions into their components and estimating their technical condition on the basis of a single drone flight. INDEX TERMS Structural health monitoring, machine learning, defect detection, synthetic datasetThis article has been accepted for publication in IEEE Access.

show abstract

A method for machine learning generation of realistic synthetic datasets for validating healthcare applications

Cited by 17 publications

References 25 publications

Entropy Removal of Medical Diagnostics

Entropy Removal of Medical Diagnostics

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Computer Vision Based Inspection on Post-Earthquake With UAV Synthetic Dataset

Contact Info

Product

Resources

About