2022
DOI: 10.3390/math10152733
|View full text |Cite
|
Sign up to set email alerts
|

Survey on Synthetic Data Generation, Evaluation Methods and GANs

Abstract: Synthetic data consists of artificially generated data. When data are scarce, or of poor quality, synthetic data can be used, for example, to improve the performance of machine learning models. Generative adversarial networks (GANs) are a state-of-the-art deep generative models that can generate novel synthetic samples that follow the underlying data distribution of the original dataset. Reviews on synthetic data generation and on GANs have already been written. However, none in the relevant literature, to the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 100 publications
(50 citation statements)
references
References 93 publications
0
17
0
Order By: Relevance
“…Once the training was finished, we used the data generation models to generate synthetic tabular data of 5000 rows. The rationale for this size of the synthetic datasets is twofold: i) we aimed at a computationally feasible data set size, which would allow us to examine the effect of different ratios of real to synthetic data on the evaluation measures, ii) synthetic dataset of such size was used in an earlier similar study [18]. The synthetic dataset and the original training dataset were then separately used to train the above-mentioned three machine learning algorithms and the resulting models were evaluated on the original test dataset.…”
Section: Machine Learning Utility Evaluationmentioning
confidence: 99%
“…Once the training was finished, we used the data generation models to generate synthetic tabular data of 5000 rows. The rationale for this size of the synthetic datasets is twofold: i) we aimed at a computationally feasible data set size, which would allow us to examine the effect of different ratios of real to synthetic data on the evaluation measures, ii) synthetic dataset of such size was used in an earlier similar study [18]. The synthetic dataset and the original training dataset were then separately used to train the above-mentioned three machine learning algorithms and the resulting models were evaluated on the original test dataset.…”
Section: Machine Learning Utility Evaluationmentioning
confidence: 99%
“…GANs have undergone several modifications since they were first proposed to solve several different problems in different domains, e.g., physics [ 18 ], healthcare [ 19 ], or object detection [ 20 ]. To analyze the state-of-the-art in what concerns GANs used for synthetic data generation, as well as synthetic data generation methods, we reviewed recently published scientific papers [ 21 , 22 , 23 ]. Pose-driven attention-guided image generation for person re-Identification proposed in [ 24 ] by Amena et al introduces attentive learning and transferring the subject pose through an attention mechanism based on GAN.…”
Section: Related Workmentioning
confidence: 99%
“…A few surveys in the field have examined various aspects of synthetic data generation 19,20 . Figueira et al 19 provide an extensive description of multiple generation methods while Hernandez et al 20 explored evaluation methods and compared them to determine the best-performing ones.…”
Section: Introductionmentioning
confidence: 99%
“…A few surveys in the field have examined various aspects of synthetic data generation 19,20 . Figueira et al 19 provide an extensive description of multiple generation methods while Hernandez et al 20 explored evaluation methods and compared them to determine the best-performing ones. In contrast to these prior studies, our approach differs in how we identify the obstacles hindering the adoption of synthetic data as we place a greater emphasis on the evaluation process and the privacy-utility trade-off dilemma by having a systematic look at how synthetic data is evaluated across 92 studies.…”
Section: Introductionmentioning
confidence: 99%