2021
DOI: 10.1093/jamiaopen/ooab012
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the utility of synthetic COVID-19 case data

Abstract: Background Concerns about patient privacy have limited access to COVID-19 datasets. Data synthesis is one approach for making such data broadly available to the research community in a privacy protective manner. Objectives Evaluate the utility of synthetic data by comparing analysis results between real and synthetic data. Methods A gradient boosted classific… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 32 publications
(22 citation statements)
references
References 59 publications
0
22
0
Order By: Relevance
“…The data used in this manuscript do not reflect the current size nor state of the N3C LDS. Other statistical techniques such as equivalence testing, bhattacharyya distance [50,51], or adversarial challenges [28] could be used in the future to compare similarity between epidemic curves. The Wilcoxon signed-rank and paired t-tests assume the null hypothesis that the original and synthetic datasets are equivalent.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The data used in this manuscript do not reflect the current size nor state of the N3C LDS. Other statistical techniques such as equivalence testing, bhattacharyya distance [50,51], or adversarial challenges [28] could be used in the future to compare similarity between epidemic curves. The Wilcoxon signed-rank and paired t-tests assume the null hypothesis that the original and synthetic datasets are equivalent.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, it is important to evaluate N3C synthetic data in a manner that can inform users with a wide range of intended use cases and definitions for synthetic data fitness for use. [25] The utility of synthetic health data has been evaluated in other work [15,19,20,[26][27][28][29][30] outside of N3C which applied a variety of the ways one can validate synthetic data. [31] However, N3C synthetic data utility has only been evaluated once before.…”
Section: Background and Significancementioning
confidence: 99%
“…The advancements made are maturing so rapidly that we should carefully understand what control we cede if we allow for 'spurious imitations' to gain a foothold in healthcare decision-making. For instance, since the start of the COVID-19 pandemic, there has been an explosion of interest around the development of synthetic data, with use cases such as the training of AI algorithms 17,52 , epidemiological modelling and digital contact tracing [53][54][55] , and data sharing between hospitals 56 . Because synthetic data will undoubtedly soon be used to solve pressing problems in healthcare, it is urgent to develop and refine regulatory frameworks involving synthetic data and the monitoring of their impact in society.…”
Section: Paths Forwardmentioning
confidence: 99%
“…Access to the data can enable exploration of alternative strategies to combat COVID-19 spread [ 8 , 9 ]. Synthetic COVID-19 healthcare datasets can come to our rescue and have been shown to be useful as proxies for real data [ 10 ].…”
Section: Introductionmentioning
confidence: 99%