“…The DPelicit function implements methods for eliciting the DP prior using exact and approximated formulas for the mean and variance of the number of clusters given the total mass parameter and the number of subjects (see, e.g. Jara, García-Zattera, and Lesaffre, 2007). The function computes pseudo-Bayes factors for model comparison.…”
Section: Implemented Modelsmentioning
confidence: 99%
“…The choice of a 0 and b 0 needs some careful thoughts, as the parameter α directly controls the number of distinct components. Kottas, Müller, and Quintana (2005), referred to as the KMQ approach, and Jara et al (2007), referred to as the JGL approach, proposed strategies for the specification of these hyperparameters.…”
Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian non- and semi-parametric models in , DPpackage. Currently DPpackage includes models for marginal and conditional density estimation, ROC curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison, and for eliciting the precision parameter of the Dirichlet process prior. To maximize computational efficiency, the actual sampling for each model is carried out using compiled FORTRAN.
“…The DPelicit function implements methods for eliciting the DP prior using exact and approximated formulas for the mean and variance of the number of clusters given the total mass parameter and the number of subjects (see, e.g. Jara, García-Zattera, and Lesaffre, 2007). The function computes pseudo-Bayes factors for model comparison.…”
Section: Implemented Modelsmentioning
confidence: 99%
“…The choice of a 0 and b 0 needs some careful thoughts, as the parameter α directly controls the number of distinct components. Kottas, Müller, and Quintana (2005), referred to as the KMQ approach, and Jara et al (2007), referred to as the JGL approach, proposed strategies for the specification of these hyperparameters.…”
Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian non- and semi-parametric models in , DPpackage. Currently DPpackage includes models for marginal and conditional density estimation, ROC curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison, and for eliciting the precision parameter of the Dirichlet process prior. To maximize computational efficiency, the actual sampling for each model is carried out using compiled FORTRAN.
“…Of course, the two are mathematically equivalent, as can be seen from (5.9). Jara et al (2006) Table 5.4 shows some posterior summaries of the regression coefficients for the four models considered here. The centering distribution F 0 plays a key role in the reported inferences.…”
We discuss nonparametric Bayesian methods that are suitable for inference with binary, ordinal and general categorical data. Modeling for such data becomes particularly interesting in the presence of covariates, when non-and semiparametric Bayesian models can generalize the link function in a generalized linear model setup, the regression on covariates or both. An important application arises in inference for diagnostic screening and related inference for ROC (receiver-operator characteristic) curves. We include some discussion of a rapidly growing literature on non-parametric Bayesian inference for ROC curves.
Categorical Responses Without Covariates
Binomial ResponsesWe start with an example to illustrate a number of key issues of a BNP approach for binary outcomes.Example 9 (Baseball Data) Albright (1993) describes a dataset involving the complete sequence of hits and outs for a number of players from both American and National Baseball Leagues over the [1987][1988][1989][1990] seasons. The data are available from: http://www.kelley.iu.edu/albright/Free_downloads.htm. Albright assumes the operational definition of a success to mean a player moving through the bases. We stick to that definition, and therefore, a success consists of either a hit, walk or sacrifice. From this large dataset, we consider now the total number of successes for the subset of n D 129 players from both leagues who were at bat at least on 500 occasions during the 1987 season. Denote by y i , i D 1; : : : ; n, the number of successes for the ith player. The simplest possible model for these data would assume just a single success probability, common to all players, that is, y 1 ; : : : ; y n j  ind Bin.`i; Â/;  Be.a; b/, where`i is the total number of at-bats for player i and .a; b/ are fixed hyperparameters, e.g.,
“…For instance, Bayesian Markov Chain Monte Carlo (MCMC) techniques have been developed to estimate the heterogeneity model (Ho and Hu, 2008). Furthermore, extensions of the heterogeneity model based on penalised Gaussian mixtures (Komarek and Lesaffre, 2008) and Dirichlet processes (Kleinman and Ibrahim, 1998;Jara et al, 2007) have also been developed. With the increasing accessibility of Bayesian methods and the increasing computational power to analyse longitudinal panel data, it would be valuable to explore the practicality and performance of Bayesian approaches to account for multimodal distributions within potential mover-stayer scenarios.…”
Section: Limitations and Scope For Further Researchmentioning
confidence: 99%
“…One approach is a penalized Gaussian mixture distribution where the weights of the mixture components are estimated using a penalized approach and parameters of the model are estimated using Markov Chain Monte Carlo (MCMC) techniques (Komarek and Lesaffre, 2008). Another approach fits an infinite mixture model within the Bayesian framework by incorporating a Dirichlet process mixture of a normal prior as the random effects distribution (Jara et al, 2007). These approaches will not be considered further, but highlight the feasibility of Bayesian techniques to estimate the heterogeneity model.…”
Section: Addressing Misspecification Of the Random Effects Distributimentioning
Logistic mixed models for binary longitudinal panel data typically assume normal distributed random effects, and appropriately account for correlated data, unobserved heterogeneity and missing data due to attrition. However, this normality assumption may be too restrictive to capture unobserved heterogeneity. The motivating case study is a longitudinal analysis of women's employment participation using data from the Household Income and Labour Force Dynamics in Australia (HILDA) survey. Multimodality of the random effects was identified, potentially due to an underlying mover-stayer scenario.This study focuses on logistic mixed models applied to the HILDA case study and simulation studies motivated by the case study, and aims to investigate:1. robustness of random intercept logistic models to the assumed normal random effects distribution when the true distribution is multimodal 2. whether relaxing the parametric assumption of the random effects distribution can provide a practical solution to reduce the impact of distributional misspecification 3. impact of misspecification and performance of logistic mixed models in the presence of missing data due to attrition.Random intercept logistic models applied to the case study demonstrate that the assumed normal distribution may not adequately capture the underlying heterogeneity due to a potential moverstayer scenario. An asymmetric three component mixture of normal distributions provided a more appropriate fit, potentially representing three sub-populations: those with an extremely low, moderate, or extremely high propensity to be constantly employed.Two simulation studies motivated by the HILDA study considered a three component mixture of normal distributions for the random intercept. The inferential impact of incorrectly assuming a normal distribution was dependent on the severity of departure of the true distribution from normality. In the first study, simulating a potential mover-stayer scenario, misspecification produced biased estimates of the intercept constant and random effect variance. More severely asymmetric and skewed multimodal distributions produced larger bias. The second study considered a range of true symmetric multimodal distributions, with increasing severity in departures from normality. The random intercept logistic model assuming normality was robust to minor deviations. However, for larger departures characterised by three distinct modes, ii misspecification produced biased parameter estimates and poor coverage rates for the intercept constant, time-invariant explanatory variables and those time-varying explanatory variables exhibiting minimal within-individual variability. For both simulation studies, estimates of the random effect variance were extremely sensitive to distributional misspecification, resulting in biased parameter estimates, poor coverage rates and inaccurate standard errors.Non-parametric estimation techniques, which leave the distribution completely unspecified, reduced the risks associated with misspecification o...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.