A Dirichlet process mixture model for the analysis of correlated binary responses

Jara, Alejandro; García-Zattera, María José; Lesaffre, Emmanuel

doi:10.1016/j.csda.2006.09.010

Cited by 33 publications

(33 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DPelicit function implements methods for eliciting the DP prior using exact and approximated formulas for the mean and variance of the number of clusters given the total mass parameter and the number of subjects (see, e.g. Jara, García-Zattera, and Lesaffre, 2007). The function computes pseudo-Bayes factors for model comparison.…”

Section: Implemented Modelsmentioning

confidence: 99%

See 1 more Smart Citation

DPpackage: Bayesian Semi- and Nonparametric Modeling inR

Hanson²,

et al. 2011

Self Cite

View full text Add to dashboard Cite

Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian non- and semi-parametric models in , DPpackage. Currently DPpackage includes models for marginal and conditional density estimation, ROC curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison, and for eliciting the precision parameter of the Dirichlet process prior. To maximize computational efficiency, the actual sampling for each model is carried out using compiled FORTRAN.

show abstract

Section: Implemented Modelsmentioning

confidence: 99%

“…The choice of a 0 and b 0 needs some careful thoughts, as the parameter α directly controls the number of distinct components. Kottas, Müller, and Quintana (2005), referred to as the KMQ approach, and Jara et al (2007), referred to as the JGL approach, proposed strategies for the specification of these hyperparameters.…”

Section: Implemented Modelsmentioning

confidence: 99%

DPpackage: Bayesian Semi- and Nonparametric Modeling inR

Hanson²,

et al. 2011

Self Cite

View full text Add to dashboard Cite

show abstract

“…Of course, the two are mathematically equivalent, as can be seen from (5.9). Jara et al (2006) Table 5.4 shows some posterior summaries of the regression coefficients for the four models considered here. The centering distribution F 0 plays a key role in the reported inferences.…”

Section: Models For Latent Scoresmentioning

confidence: 99%

Categorical Data

Müller¹,

Quintana²,

Jara³

et al. 2015

Springer Series in Statistics

Self Cite

View full text Add to dashboard Cite

We discuss nonparametric Bayesian methods that are suitable for inference with binary, ordinal and general categorical data. Modeling for such data becomes particularly interesting in the presence of covariates, when non-and semiparametric Bayesian models can generalize the link function in a generalized linear model setup, the regression on covariates or both. An important application arises in inference for diagnostic screening and related inference for ROC (receiver-operator characteristic) curves. We include some discussion of a rapidly growing literature on non-parametric Bayesian inference for ROC curves. Categorical Responses Without Covariates Binomial ResponsesWe start with an example to illustrate a number of key issues of a BNP approach for binary outcomes.Example 9 (Baseball Data) Albright (1993) describes a dataset involving the complete sequence of hits and outs for a number of players from both American and National Baseball Leagues over the [1987][1988][1989][1990] seasons. The data are available from: http://www.kelley.iu.edu/albright/Free_downloads.htm. Albright assumes the operational definition of a success to mean a player moving through the bases. We stick to that definition, and therefore, a success consists of either a hit, walk or sacrifice. From this large dataset, we consider now the total number of successes for the subset of n D 129 players from both leagues who were at bat at least on 500 occasions during the 1987 season. Denote by y i , i D 1; : : : ; n, the number of successes for the ith player. The simplest possible model for these data would assume just a single success probability, common to all players, that is, y 1 ; : : : ; y n j Â ind Bin.`i; Â/; Â Be.a; b/, where`i is the total number of at-bats for player i and .a; b/ are fixed hyperparameters, e.g.,

show abstract

“…For instance, Bayesian Markov Chain Monte Carlo (MCMC) techniques have been developed to estimate the heterogeneity model (Ho and Hu, 2008). Furthermore, extensions of the heterogeneity model based on penalised Gaussian mixtures (Komarek and Lesaffre, 2008) and Dirichlet processes (Kleinman and Ibrahim, 1998;Jara et al, 2007) have also been developed. With the increasing accessibility of Bayesian methods and the increasing computational power to analyse longitudinal panel data, it would be valuable to explore the practicality and performance of Bayesian approaches to account for multimodal distributions within potential mover-stayer scenarios.…”

Section: Limitations and Scope For Further Researchmentioning

confidence: 99%

“…One approach is a penalized Gaussian mixture distribution where the weights of the mixture components are estimated using a penalized approach and parameters of the model are estimated using Markov Chain Monte Carlo (MCMC) techniques (Komarek and Lesaffre, 2008). Another approach fits an infinite mixture model within the Bayesian framework by incorporating a Dirichlet process mixture of a normal prior as the random effects distribution (Jara et al, 2007). These approaches will not be considered further, but highlight the feasibility of Bayesian techniques to estimate the heterogeneity model.…”

Section: Addressing Misspecification Of the Random Effects Distributimentioning

confidence: 99%

Misspecification and flexible random effect distributions in logistic mixed effects models applied to panel survey data

Marquart-Wilson¹

View full text Add to dashboard Cite

Logistic mixed models for binary longitudinal panel data typically assume normal distributed random effects, and appropriately account for correlated data, unobserved heterogeneity and missing data due to attrition. However, this normality assumption may be too restrictive to capture unobserved heterogeneity. The motivating case study is a longitudinal analysis of women's employment participation using data from the Household Income and Labour Force Dynamics in Australia (HILDA) survey. Multimodality of the random effects was identified, potentially due to an underlying mover-stayer scenario.This study focuses on logistic mixed models applied to the HILDA case study and simulation studies motivated by the case study, and aims to investigate:1. robustness of random intercept logistic models to the assumed normal random effects distribution when the true distribution is multimodal 2. whether relaxing the parametric assumption of the random effects distribution can provide a practical solution to reduce the impact of distributional misspecification 3. impact of misspecification and performance of logistic mixed models in the presence of missing data due to attrition.Random intercept logistic models applied to the case study demonstrate that the assumed normal distribution may not adequately capture the underlying heterogeneity due to a potential moverstayer scenario. An asymmetric three component mixture of normal distributions provided a more appropriate fit, potentially representing three sub-populations: those with an extremely low, moderate, or extremely high propensity to be constantly employed.Two simulation studies motivated by the HILDA study considered a three component mixture of normal distributions for the random intercept. The inferential impact of incorrectly assuming a normal distribution was dependent on the severity of departure of the true distribution from normality. In the first study, simulating a potential mover-stayer scenario, misspecification produced biased estimates of the intercept constant and random effect variance. More severely asymmetric and skewed multimodal distributions produced larger bias. The second study considered a range of true symmetric multimodal distributions, with increasing severity in departures from normality. The random intercept logistic model assuming normality was robust to minor deviations. However, for larger departures characterised by three distinct modes, ii misspecification produced biased parameter estimates and poor coverage rates for the intercept constant, time-invariant explanatory variables and those time-varying explanatory variables exhibiting minimal within-individual variability. For both simulation studies, estimates of the random effect variance were extremely sensitive to distributional misspecification, resulting in biased parameter estimates, poor coverage rates and inaccurate standard errors.Non-parametric estimation techniques, which leave the distribution completely unspecified, reduced the risks associated with misspecification o...

show abstract

A Dirichlet process mixture model for the analysis of correlated binary responses

Cited by 33 publications

References 25 publications

DPpackage: Bayesian Semi- and Nonparametric Modeling inR

DPpackage: Bayesian Semi- and Nonparametric Modeling inR

Categorical Data

Misspecification and flexible random effect distributions in logistic mixed effects models applied to panel survey data

Contact Info

Product

Resources

About