We address the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data. A fully Bayesian variable selection is implemented using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed effect coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. Default priors are proposed for the variance components and an efficient parameter expansion Gibbs sampler is developed for posterior computation. The approach is illustrated using simulated data and an epidemiologic example.
SummaryIn most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical models designed to mimic the distributions of the underlying real microdata. In this article, we describe an application of this strategy to create a public use file for the Longitudinal Business Database, an annual economic census of establishments in the United States comprising more than 20 million records dating back to 1976. The U.S. Bureau of the Census and the Internal Revenue Service recently approved the release of these synthetic microdata for public use, making the synthetic Longitudinal Business Database the first-ever business microdata set publicly released in the United States. We describe how we created the synthetic data, evaluated analytical validity, and assessed disclosure risk.
The research program of the Center for Economic Studies (CES) produces a wide range of economic analyses to improve the statistical programs of the U.S. Census Bureau. Many of these analyses take the form of CES research papers. The papers have not undergone the review accorded Census Bureau publications and no endorsement should be inferred. Any opinions and conclusions expressed herein are those of the author(s) and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. Republication in whole or part must be cleared with the authors.To obtain information about the series, see www.census.gov/ces or contact Cheryl Grim, Editor,
Risk-utility formulations for problems of statistical disclosure limitation are now common. We argue that these approaches are powerful guides to official statistics agencies in regard to how to think about disclosure limitation problems, but that they fall short in essential ways from providing a sound basis for acting upon the problems. We illustrate this position in three specific contextstransparency, tabular data and survey weights, with shorter consideration of two key emerging issues-longitudinal data and the use of administrative data to augment surveys.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.