2018
DOI: 10.1111/rssa.12360
|View full text |Cite
|
Sign up to set email alerts
|

Generating Partially Synthetic Geocoded Public Use Data with Decreased Disclosure Risk by Using Differential Smoothing

Abstract: Summary When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies before making data publicly available. An alternative to releasing aggregated and/or perturbed data is to release synthetic data, where sensitive values are replaced with draws from models designed to capture distributional features in the data collected. The issues associated with spatially outlying observations in the data, however, have received relatively little attention. Ou… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 29 publications
0
9
0
Order By: Relevance
“…For some applications, such as Hu et al (2014) where fully synthetic individual records were generated, the attribute disclosure risks are in the form of guessing correctly of the attributes of a record. In other applications, for example when generating synthetic geolocation applications, researchers had created attribute disclosure risk measures based on distance between synthesized geolocations and actual geolocations (Wang and Reiter, 2012;Paiva et al, 2014;Quick et al, 2015Quick et al, , 2018. Moreover, for synthetic business establishment data applications, researchers had created attribute disclosure risk measures based on percentages of closest match and their variations (Domingo-Ferrer et al, 2001;Kim et al, 2015), and other measures based on relative difference between the true largest value and the intruder's estimate (Kim et al, 2018).…”
Section: Overview Of Synthetic Data Risks Evaluationmentioning
confidence: 99%
“…For some applications, such as Hu et al (2014) where fully synthetic individual records were generated, the attribute disclosure risks are in the form of guessing correctly of the attributes of a record. In other applications, for example when generating synthetic geolocation applications, researchers had created attribute disclosure risk measures based on distance between synthesized geolocations and actual geolocations (Wang and Reiter, 2012;Paiva et al, 2014;Quick et al, 2015Quick et al, , 2018. Moreover, for synthetic business establishment data applications, researchers had created attribute disclosure risk measures based on percentages of closest match and their variations (Domingo-Ferrer et al, 2001;Kim et al, 2015), and other measures based on relative difference between the true largest value and the intruder's estimate (Kim et al, 2018).…”
Section: Overview Of Synthetic Data Risks Evaluationmentioning
confidence: 99%
“…This data is generated by reproducing the original data's statistical properties [73] and this solution become an appealing alternative when there's an issue in the availability of representative data [74]. Nevertheless, the adopted approach is to generate a partially [75] synthetic data set comprising two phases :…”
Section: Figure 10: Portion Of the Datasetmentioning
confidence: 99%
“…While the risk of disclosure associated with the release of synthetic data is an active area of research (e.g. Hu, 2019; Quick et al., 2018; Reiter & Mitra, 2009), the drawback of many of these methods is the lack of formal privacy guarantees; for example, those implied by the use of a mechanism, p ( z | y , ψ ), that satisfies the requirements of differential privacy (Dwork et al., 2006).…”
Section: Introductionmentioning
confidence: 99%