Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk measures focus on sample unique cells in these tables and adopt parametric log-linear models as the standard statistical tools for the problem. Such models often have to deal with large and extremely sparse tables that pose a number of challenges to risk estimation. This paper proposes to overcome these problems by studying nonparametric alternatives based on Dirichlet process random effects. The main finding is that the inclusion of such random effects allows us to reduce considerably the number of fixed effects required to achieve reliable risk estimates. This is studied on applications to real data, suggesting, in particular, that our mixed models with main effects only produce roughly equivalent estimates compared to the all two-way interactions models, and are effective in defusing potential shortcomings of traditional log-linear models. This paper adopts a fully Bayesian approach that accounts for all sources of uncertainty, including that about the population frequencies, and supplies unconditional (posterior) variances and credible intervals.
In this paper we consider a particular Bayes factor B for comparing a fixed parametric model against a nonparametric alternative, and we investigate its local sensitivity to the sampling distribution. The nonparametric alternative is constructed by embedding the parametric model, characterized by a d.f. Fo known up to a real parameter 0, into a mixture of Dirichlet processes. More precisely, conditionally on 0, FQ represents the mean of a random d.f. which is assumed to be a Dirichlet Process. So, for the Bayes factor B, sensitivity to perturbations of the sampling distribution Fo and sensitivity to small departures from the fixed Dirichlet process parameter are the same problem. Here we consider B as a (non ratio-linear) functional defined on a set of sampling d.f.'s and maximize its first von Mises derivative over this set. In particular, mixture and density bounded sets are considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.