Non-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, may result from repeatedly measuring the outcome, for various members of the same family, etc. The first issue is dealt with through a variety of overdispersion models, such as, for example, the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena may occur simultaneously, models combining them are uncommon. This paper proposes a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary, count and time-to-event cases are given particular emphasis. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical integration. Implications for the derivation of marginal correlations functions are discussed. The methodology is applied to data from a study in epileptic seizures, a clinical trial in toenail infection named onychomycosis and survival data in children with asthma. 1 2 MOLENBERGHS, VERBEKE, DEMÉTRIO AND VIEIRA
Count data often show a higher incidence of zero counts than would be expected if the data were Poisson distributed. Zero-inflated Poisson regression models are a useful class of models for such data, but parameter estimates may be seriously biased if the nonzero counts are overdispersed in relation to the Poisson distribution. We therefore provide a score test for testing zero-inflated Poisson regression models against zero-inflated negative binomial alternatives.
This paper reviews many different estimators of intraclass correlation that have been proposed for binary data and compares them in an extensive simulation study. Some of the estimators are very specific, while others result from general methods such as pseudo-likelihood and extended quasi-likelihood estimation. The simulation study identifies several useful estimators, one of which does not seem to have been considered previously for binary data. Estimators based on extended quasi-likelihood are found to have a substantial bias in some circumstances.
Non-Gaussian outcomes are often modeled using members of the so-called exponential family. The Poisson model for count data falls within this tradition. The family in general, and the Poisson model in particular, are at the same time convenient since mathematically elegant, but in need of extension since often somewhat restrictive. Two of the main rationales for existing extensions are (1) the occurrence of overdispersion, in the sense that the variability in the data is not adequately captured by the model's prescribed mean-variance link, and (2) the accommodation of data hierarchies owing to, for example, repeatedly measuring the outcome on the same subject, recording information from various members of the same family, etc. There is a variety of overdispersion models for count data, such as, for example, the negative-binomial model. Hierarchies are often accommodated through the inclusion of subject-specific, random effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these issues may occur simultaneously, models accommodating them at once are less than common. This paper proposes a generalized linear model, accommodating overdispersion and clustering through two separate sets of random effects, of gamma and normal type, respectively. This is in line with the proposal by Booth et al. (Stat Model 3:179-181, 2003). The model extends both classical overdispersion models for count data (Breslow, Appl Stat 33:38-44, 1984), in particular the negative binomial model, as well as the generalized linear mixed model (Breslow and Clayton, J Am Stat Assoc 88:9-25, 1993). Apart from model formulation, we briefly discuss several estimation options, and then settle for maximum likelihood estimation with both fully analytic integration as well as hybrid between analytic and numerical integration. The latter is implemented in the SAS procedure NLMIXED. The methodology is applied to data from a study in epileptic seizures.
Count and proportion data may present overdispersion, i.e., greater variability than expected by the Poisson and binomial models, respectively. Different extended generalized linear models that allow for overdispersion may be used to analyze this type of data, such as models that use a generalized variance function, random-effects models, zero-inflated models and compound distribution models. Assessing goodness-of-fit and verifying assumptions of these models is not an easy task and the use of half-normal plots with a simulated envelope is a possible solution for this problem. These plots are a useful indicator of goodness-of-fit that may be used with any generalized linear model and extensions. For GLIM users, functions that generated these plots were widely used, however, in the open-source software R, these functions were not yet available on the Comprehensive R Archive Network (CRAN). We describe a new package in R, hnp, that may be used to generate the half-normal plot with a simulated envelope for residuals from different types of models. The function hnp() can be used together with a range of different model fitting packages in R that extend the basic generalized linear model fitting in glm() and is written so that it is relatively easy to extend it to new model classes and different diagnostics. We illustrate its use on a range of examples, including continuous and discrete responses, and show how it can be used to inform model selection and diagnose overdispersion.
The objective of this study was to evaluate the factors that may affect conception rates (CR) following artificial insemination (AI) or embryo transfer (ET) in lactating Holstein cows. Estrous cycling cows producing 33.1 +/- 7.2 kg of milk/d received PGF2alpha injections and were assigned randomly to 1 of 2 groups (AI or ET). Cows detected in estrus (n = 387) between 48 and 96 h after the PGF2alpha injection received AI (n = 227) 12 h after detection of estrus or ET (n = 160) 6 to 8 d later (1 fresh embryo, grade 1 or 2, produced from nonlactating cows). Pregnancy was diagnosed at 28 and 42 d after estrus, and embryonic loss occurred when a cow was pregnant on d 28 but not pregnant on d 42. Ovulation, conception, and embryonic loss were analyzed by a logistic model to evaluate the effects of covariates [days in milk (DIM), milk yield, body temperature (BT) at d 7 and 14 post-AI, and serum concentration of progesterone (P4) at d 7 and 14 post-AI] on the probability of success. The first analysis included all cows that were detected in estrus. The CR of AI and ET were different on d 28 (AI, 32.6% vs. ET, 49.4%) and 42 (AI, 29.1% vs. ET, 38.8%) and were negatively influenced by high BT (d 7) and DIM. The second analysis included only cows with a corpus luteum on d 7. Ovulation rate was 84.8% and was only negatively affected by DIM. Conception rates of AI and ET were different on d 28 (AI, 37.9% vs. ET, 59.4%) and 42 (AI, 33.8% vs. ET, 46.6%) and were negatively influenced by high BT (d 7). The third analysis included only ovulating cows that were 7 d postestrus. Conception rates of AI and ET were different on d 28 (AI, 37.5% vs. ET, 63.2%) and 42 (AI, 31.7% vs. ET, 51.7%) and were negatively influenced by high BT (d 7). There was a positive effect of serum concentration of P4 and a negative effect of milk production on the probability of conception for the AI group but not for the ET group. The fourth analysis was embryonic loss (AI, 10.8% vs. ET, 21.5%). The transfer of fresh embryos is an important tool to increase the probability of conception of lactating Holstein cows because it can bypass the negative effects of milk production and low P4 on the early embryo. The superiority of ET vs. AI is more evident in high-producing cows. High BT measured on d 7 had a negative effect on CR and embryonic retention.
Entomological data are often overdispersed, characterised by a larger variance than assumed by simple standard models. It is important to model overdispersion properly in order to avoid incorrect and misleading inferences. Outcomes of interest are often in the form of counts or proportions and we present extended models that incorporate overdispersion, methods to assess its impact and model goodness-of-fit, and techniques to test treatment differences in the presence of overdispersion. Keywords Overdispersion • Statistical models • Count data • Proportion data • Zero-inflated data IntroductionOutcomes of interest for entomological data are often in the form of counts or proportions and as a first step we might analyse these using standard Poisson and binomial models. These are both specific examples of generalized linear models (McCullagh and Nelder 1989) and hence our focus here on this class of models. However, in general, the data are overdispersed, characterised by a larger variance than assumed by these simple standard models. It is important to adapt models to take account of overdispersion in order to avoid incorrect and misleading inferences . In this chapter we will consider some general approaches for doing this and illustrate with specific examples. There are many different possible causes of overdispersion and in specific situations a number of these could be involved. Some common possibilities in entomological studies are:1. Variability of experimental material -this can be thought of as individual variability of the experimental units and may give an additional component of variability that is not accounted for by the basic response model. For example, in dose-response experiments, the insects used will typically have differing susceptibilities to the substance which will affect propensity to respond. 2. Correlation between individual responses -in biological assays involving batches of insects we may expect to see some correlation between insects from the same batch since they may be genetically similar. There may also be correlation due to shared experimental environments or through observing a group of insects over time. 3. Cluster and multistage sampling -often, instead of a simple random sample, the insects under study may be structured into some hierarchy with sampling sequentially from each level. For example, we may consider insects within metapopulations within ecosystems. In our sampling we may take a random sample of ecosystems, then from these selected ecosystems we may pick a random sample of metapopulations, and, finally, take our observational units from a random sample of insects in these selected metapopulations. This structured hierarchical sampling can lead to complex dependencies between the individual level responses and certainly we are likely to see correlation between the responses within a given metapopulation. 4. Aggregation -here the individual level responses are grouped into a response at a higher, aggregate, level. The aggregation process may be known, but more...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.