2021
DOI: 10.2196/preprints.35734
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study (Preprint)

Abstract: BACKGROUND A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. OBJECTIVE This study evaluates the ability of common utility metrics to rank SDG method… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(37 citation statements)
references
References 30 publications
0
37
0
Order By: Relevance
“…The original data is always at risk of being compromised or exposed as long as it is in play, especially in business, where data sharing is severely limited within and outside the company [63]. Therefore, it is vital to investigate methods for generating financial datasets that incorporate the same properties as the "real data" while still respecting the privacy of the parties involved.…”
Section: Businessmentioning
confidence: 99%
See 1 more Smart Citation
“…The original data is always at risk of being compromised or exposed as long as it is in play, especially in business, where data sharing is severely limited within and outside the company [63]. Therefore, it is vital to investigate methods for generating financial datasets that incorporate the same properties as the "real data" while still respecting the privacy of the parties involved.…”
Section: Businessmentioning
confidence: 99%
“…Therefore, it is vital to investigate methods for generating financial datasets that incorporate the same properties as the "real data" while still respecting the privacy of the parties involved. [63].…”
Section: Businessmentioning
confidence: 99%
“…A common interpretation is that as long as the real data remains in a secure environment during the generation of synthetic data, there is little to no risk to the original subjects. 103 As a consequence, the use of synthetic data can help prevent researchers from inadvertently using and possibly exposing patients identifiable data. Synthetic data can also lessen the controls imposed by Institutional Review Boards (IRBs) and based on international regulations by ensuring data is never mapped to real individuals.…”
Section: Legal Framework For Sharing Of Synthetic and Real Patient Datamentioning
confidence: 99%
“…The analysis of the performance of the algorithm requires a reasonable amount of empirical data. One of the best practices is to generate synthetic data to have a sound foundation for the data analytics, in order to verify the model of the data analysis [100]. Therefore, it is a feasible approach to generate a set of data that represents Business Processes that can be categorized as well-formed or with erroneous behavior.…”
Section: Model Checking For Dynamically Modified Business Processesmentioning
confidence: 99%