We consider statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year differences-in-differences studies with clustering on state. In such settings default standard errors can greatly overstate estimator precision. Instead, if the number of clusters is large, statistical inference after OLS should be based on cluster-robust standard errors. We outline the basic method as well as many complications that can arise in practice. These include cluster-specific fixed effects, few clusters, multi-way clustering, and estimators other than OLS.
Researchers have increasingly realized the need to account for within-group dependence in estimating standard errors of regression parameter estimates. The usual solution is to calculate cluster-robust standard errors that permit heteroskedasticity and within-cluster error correlation, but presume that the number of clusters is large. Standard asymptotic tests can over-reject, however, with few (5-30) clusters. We investigate inference using cluster bootstrap-t procedures that provide asymptotic refinement. These procedures are evaluated using Monte Carlos, including the example of Bertrand, Duflo and Mullainathan (2004). Rejection rates of ten percent using standard methods can be reduced to the nominal size of five percent using our methods.
In this paper we propose a new variance estimator for OLS as well as for nonlinear estimators such as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or multi-way clustering that is non-nested. The variance estimator extends the standard cluster-robust variance estimator or sandwich estimator for one-way clustering (e.g. Liang and Zeger (1986), Arellano (1987)) and relies on similar relatively weak distributional assumptions. Our method is easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends the state-year effects example of Bertrand et al. (2004) to two dimensions; and by application to two studies in the empirical public/labor literature where two-way clustering is present.
Students in both social and natural sciences often seek regression methods to explain the frequency of events, such as visits to a doctor, auto accidents, or new patents awarded. This book, now in its second edition, provides the most comprehensive and up-to-date account of models and methods to interpret such data. The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences. The new material includes new theoretical topics, an updated and expanded treatment of cross-section models, coverage of bootstrap-based and simulation-based inference, expanded treatment of time series, multivariate and panel data, expanded treatment of endogenous regressors, coverage of quantile count regression, and a new chapter on Bayesian methods.
This paper deals with specification, estimation and tests of single equation reduced form type equations in which the dependent variable takes only non-negative integer values. Beginning with Poisson and compound Poisson models, which involve strong assumptions, a variety of possible stochastic models and their implications are discussed. A number of estimators and their properties are considered in the light of uncertainty about the data generation process. The paper also considers the role of tests in sequential revision of the model specification beginning with the Poisson case and provides a detailed application of the estimators and tests to a model of the number of doctor consultations.
Researchers have increasingly realized the need to account for within-group dependence in estimating standard errors of regression parameter estimates. The usual solution is to calculate cluster-robust standard errors that permit heteroskedasticity and within-cluster error correlation, but presume that the number of clusters is large. Standard asymptotic tests can over-reject, however, with few (five to thirty) clusters. We investigate inference using cluster bootstrap-t procedures that provide asymptotic refinement. These procedures are evaluated using Monte Carlos, including the example of Bertrand, Duflo, and Mullainathan (2004). Rejection rates of 10% using standard methods can be reduced to the nominal size of 5% using our methods. Copyright by the President and Fellows of Harvard College and the Massachusetts Institute of Technology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.