2019
DOI: 10.1007/978-3-030-29196-9_25
|View full text |Cite
|
Sign up to set email alerts
|

Realistic Synthetic Data Generation: The ATEN Framework

Abstract: In secondary uses of data, access to real data is problematic due to data being non-existent, incomplete, or avoiding privacy and confidentiality breaches. Synthetic data (SD) are best replacements for real data but must be verifiably realistic. There is little or no investigation into systematically achieving realism in SD. This work investigates this problem, and contributes the ATEN framework, which incorporates three component approaches: (1) THOTH for synthetic data generation (SDG); (2) RA for characteri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 65 publications
(93 reference statements)
0
6
0
Order By: Relevance
“…Other data were also anonymised, including replacing the date of birth with an age in years, and delta shifting all dates in the patient record using a consistent randomly assigned number of days for each patient. All examples provided in the tables in this work are representative of the real anonymised data collected and used during the project, but were synthetically generated for this paper using the method described in [37]. Table 2 shows an example of raw patient demographics data.…”
Section: Identification Of Variables and Parametersmentioning
confidence: 99%
“…Other data were also anonymised, including replacing the date of birth with an age in years, and delta shifting all dates in the patient record using a consistent randomly assigned number of days for each patient. All examples provided in the tables in this work are representative of the real anonymised data collected and used during the project, but were synthetically generated for this paper using the method described in [37]. Table 2 shows an example of raw patient demographics data.…”
Section: Identification Of Variables and Parametersmentioning
confidence: 99%
“…Content models are a modelling method for synthetic data generated via electronic health records (EHR) and was originally proposed by McLachlan and colleagues to develop synthetic EHRs based on publicly available health information statistics coupled with the expertise of clinicians [14].…”
Section: Content Modelsmentioning
confidence: 99%
“…The Aten framework was proposed by McLachlan et al in 2019 [43] to generate synthetic Labour and Birth EHRs whilst characterizing and validating its realism by gathering the necessary knowledge, identifying realistic properties from real data and validating the realism of SD.…”
Section: Aten Frameworkmentioning
confidence: 99%