Yu-Sung Su scite author profile

Data Analysis Using Regression and Multilevel/Hierarchical Models, first published in 2007, is a comprehensive manual for the applied researcher who wants to perform data analysis using linear and nonlinear regression and multilevel models. The book introduces a wide variety of models, whilst at the same time instructing the reader in how to fit these models using available software packages. The book illustrates the concepts by working through scores of real data examples that have arisen from the authors' own applied research, with programming codes provided for each one. Topics covered include causal inference, including regression, poststratification, matching, regression discontinuity, and instrumental variables, as well as multilevel logistic regression and missing-data imputation. Practical tips regarding building, fitting, and understanding are provided throughout.

show abstract

A weakly informative default prior distribution for logistic and other regression models

Gelman¹,

Jakulin²,

Pittau³

et al. 2008

Ann. Appl. Stat.

1,686

1,650

View full text Add to dashboard Cite

We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Cross-validation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set. © Institute of Mathematical Statistics

show abstract

Multiple Imputation with Diagnostics (mi) inR: Opening Windows into the Black Box

Su¹,

Gelman²,

Hill³

et al. 2011

J. Stat. Soft.

419

343

View full text Add to dashboard Cite

Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: choice of predictors, models, and transformations for chained imputation models; standard and binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data. In addition, we use Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models. Our goal is to have a demonstration package that (a) avoids many of the practical problems that arise with existing multivariate imputation programs, and (b) demonstrates state-of-the-art diagnostics that can be applied more generally and can be incorporated into the software of others.

show abstract

On the stationary distribution of iterative imputations

Liu¹,

Gelman²,

Hill³

et al. 2013

Biometrika

140

View full text Add to dashboard Cite

Iterative imputation, in which variables are imputed one at a time each given a model predicting from all the others, is a popular technique that can be convenient and flexible, as it replaces a potentially difficult multivariate modeling problem with relatively simple univariate regressions. In this paper, we begin to characterize the stationary distributions of iterative imputations and their statistical properties. More precisely, when the conditional models are compatible (defined in the text), we give a set of sufficient conditions under which the imputation distribution converges in total variation to the posterior distribution of a Bayesian model. When the conditional models are incompatible but are valid, we show that the combined imputation estimator is consistent. arXiv:1012.2902v2 [math.ST] 3 Apr 2012 imputation algorithms are not well understood. Even if, as we would prefer, the fitting of each imputation model and the imputations themselves are performed using conditional Bayesian inference, the stationary distribution of the algorithm (if it exists) does not in general correspond to Bayesian inference on any specified multivariate distribution. Key questions are: (1) Under what conditions does the algorithm converge to a stationary distribution? (2) What statistical properties does the procedure admit given that a unique stationary distribution exists?Regarding the first question, researchers have long known that the Markov chain may be nonrecurrent ("blowing up" to infinity or drifting like a nonstationary random walk), even if each of

show abstract

The effect of institutional proximity in non-local university–industry collaborations: An analysis based on Chinese patent data

Wei

2013

Research Policy

190

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yu-Sung Su

Data Analysis Using Regression and Multilevel/Hierarchical Models

A weakly informative default prior distribution for logistic and other regression models

Multiple Imputation with Diagnostics (mi) inR: Opening Windows into the Black Box

On the stationary distribution of iterative imputations

The effect of institutional proximity in non-local university–industry collaborations: An analysis based on Chinese patent data

Contact Info

Product

Resources

About