2008
DOI: 10.1198/016214507000001328
|View full text |Cite
|
Sign up to set email alerts
|

Assessing Identification Risk in Survey Microdata Using Log-Linear Models

Abstract: ABSTRACT. This article considers the assessment of the risk of identification of respondents in survey microdata, in the context of applications at the United Kingdom (UK) Office for National Statistics (ONS). The threat comes from the matching of categorical 'key' variables between microdata records and external data sources and from the use of log-linear models to facilitate matching. While the potential use of such statistical models is well-established in the literature, little consideration has been given… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
118
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 67 publications
(121 citation statements)
references
References 20 publications
1
118
0
Order By: Relevance
“…A simple extension of this argument also applies under Poisson sampling where the inclusion probability k π may vary with respect to the key variables, for example if a stratifying variable is included among the key variables. In this case, we have | ( ) Skinner and Shlomo (2008) discuss methods for the specification of the model in (10). Skinner (2007) discusses the possible dependence of the measure on the search method employed by the intruder.…”
Section: File-level Measures Of Identification Riskmentioning
confidence: 99%
“…A simple extension of this argument also applies under Poisson sampling where the inclusion probability k π may vary with respect to the key variables, for example if a stratifying variable is included among the key variables. In this case, we have | ( ) Skinner and Shlomo (2008) discuss methods for the specification of the model in (10). Skinner (2007) discusses the possible dependence of the measure on the search method employed by the intruder.…”
Section: File-level Measures Of Identification Riskmentioning
confidence: 99%
“…Identification risk is defined in terms of the probability that such a link is correct (Bethlehem et al, 1990;Reiter, 2005b;Skinner and Shlomo, 2008). If it were the case that (i) no sampling occurs; (ii) the combination of values of the key variables for the target unit is unique in the population and (iii) the key values, as recorded in the microdata, are known by the adversary for the target unit, then the adversary could deduce the correct link and the probability of identification risk might be taken to be unity.…”
Section: Identification Riskmentioning
confidence: 99%
“…A log-linear model for the µ j is expressed as: log(µ j ) = z z z j β β β, where z z z j is a design vector which denotes the main effects and interactions of the model for the key variables. The maximum likelihood estimator (MLE)β β β may be obtained by solving the score equations: Skinner and Shlomo (2008) discuss goodness of fit criteria to ensure unbiased estimation of µ j .…”
Section: Identification Riskmentioning
confidence: 99%
“…The conditional probability may be estimated by estimating the log-linear model parameters and plugging these estimates into the expression for the conditional probability. E F f , may be estimated by applying the methodology of [19] to the observed microdata. The misclassification probability jj θ might be estimated by making some approximating assumptions and using external evidence on the misclassification process.…”
Section: Estimationmentioning
confidence: 99%
“…However, in many disclosure problems of interest this will not be the case. In these circumstances, a modelling approach such as using log-linear models [19] θ . We suggest that it will normally not be realistic to expect that the intruder will be able to estimate this parameter reliably from the available data (although the mixture model approach merits further investigation).…”
mentioning
confidence: 99%