Type I errors and power of the parametric bootstrap goodness‐of‐fit test: Full and limited information

Tollenaar, Nikolaj; Mooijaart, Ab

doi:10.1348/000711003770480048

Cited by 38 publications

(40 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, we assess goodness of fit, using a chi-square statistic computed across all cells (frequency counts) as a simple measure of discrepancy, and also using a chi-square statistic computed after concatenating cells for a minimum expected cell count of 5 in order to obtain an asymptotically correct goodness-of-fit test. (Currently we are implementing a computationally intensive goodness-of-fit assessment that avoids the necessity of concatenating cells [37], and we are adding the AIC to our computations for comparison of several models at a fixed value of , which is sometimes of interest.) We augment this numeric analysis with careful point-by-point examination of model fit, especially to the rare frequency counts.…”

Section: Methodsmentioning

confidence: 99%

Synthetic Statistical Approach Reveals a High Degree of Richness of Microbial Eukaryotes in an Anoxic Water Column

Jeon

Bunge

Stoeck

et al. 2006

Appl Environ Microbiol

View full text Add to dashboard Cite

Molecular surveys suggest that communities of microbial eukaryotes are remarkably rich, because even large clone libraries seem to capture only a minority of species. This provides a qualitative picture of protistan richness but does not measure its real extent either locally or globally. Statistical analysis can estimate a community's richness, but the specific methods used to date are not always well grounded in statistical theory. Here we study a large protistan molecular survey from an anoxic water column in the Cariaco Basin (Caribbean Sea). We group individual 18S rRNA gene sequences into operational taxonomic units (OTUs) using different cutoff values for sequence similarity (99 to 50%) and systematically apply parametric models and nonparametric estimators to the OTU frequency data to estimate the total protistan diversity. The parametric models provided statistically sound estimates of protistan richness, with biologically meaningful standard errors, maximal data usage, and extensive model diagnostics and were preferable to the available nonparametric tools. Our clone library exceeded 700 clones but still covered only a minority of species and less than half of the larger protistan clades. Our estimates of total protistan richness portray the target community as very rich at all OTU levels, with hundreds of different populations apparently co-occurring in the small (3-liter) volume of our sample, as well as dozens of clades of the highest taxonomic order. These estimates are among the first for microbial eukaryotes that are obtained using state-of-the-art statistical methods and can serve as benchmark numbers for the local diversity of protists.

show abstract

Section: Methodsmentioning

confidence: 99%

Synthetic Statistical Approach Reveals a High Degree of Richness of Microbial Eukaryotes in an Anoxic Water Column

Jeon

Bunge

Stoeck

et al. 2006

Appl Environ Microbiol

View full text Add to dashboard Cite

show abstract

“…Also, pooling cells ad hoc after the model has been fitted may result in a test statistic with an unknown asymptotic null sampling distribution. Regarding (b), generating the empirical sampling distribution of the goodness-of-fit statistic using a resampling method such as the parametric bootstrap method (e.g., Collins, Fidler, Wugalter, & Long, 1993;Bartholomew & Tzamourani, 1999) may result in trustworthy p-values (but see Tollenaar & Mooijaart, 2003). However, resampling methods may be very time-consuming if the researcher is interested in comparing the fit of several models.…”

Section: Introductionmentioning

confidence: 99%

Limited Information Goodness-of-fit Testing in Multidimensional Contingency Tables

2006

View full text Add to dashboard Cite

show abstract

“…Maybe for this reason, to date the use of resampling to test models for categorical data is not widespread. Furthermore, a recent simulation study by Tollenaar and Mooijart (2003) revealed that the p-values for x 2 and G 2 obtained using bootstrap need not be accurate.…”

mentioning

confidence: 99%

A Cautionary Note on Using G²(dif) to Assess Relative Model Fit in Categorical Data Analysis

Maydeu-Olivares

Cai²

2006

Multivariate Behavioral Research

View full text Add to dashboard Cite

The likelihood ratio test statistic G 2 (dif) is widely used for comparing the fit of nested models in categorical data analysis. In large samples, this statistic is distributed as a chi-square with degrees of freedom equal to the difference in degrees of freedom between the tested models, but only if the least restrictive model is correctly specified. Yet, this statistic is often used in applications without assessing the adequacy of the least restrictive model. This may result in incorrect substantive conclusions as the above large sample reference distribution for G 2 (dif) is no longer appropriate. Rather, its large sample distribution will depend on the degree of model misspecification of the least restrictive model. To illustrate this, a simulation study is performed where this statistic is used to compare nested item response theory models under various degrees of misspecification of the least restrictive model. G 2 (dif) was found to be robust only under small model misspecification of the least restrictive model. Consequently, we argue that some indication of the absolute goodness of fit of the least restrictive model is needed before employing G 2 (dif) to assess relative model fit.The two most widely used statistics for assessing the goodness of fit of a model fitted to a contingency table are Pearson's x 2 statistic and the likelihood ratio statistic G 2 . Under the null hypotheses that the tested model holds in the popula- MULTIVARIATE BEHAVIORAL RESEARCH, 41(1),[55][56][57][58][59][60][61][62][63][64]

show abstract

Type I errors and power of the parametric bootstrap goodness‐of‐fit test: Full and limited information

Cited by 38 publications

References 17 publications

Synthetic Statistical Approach Reveals a High Degree of Richness of Microbial Eukaryotes in an Anoxic Water Column

Synthetic Statistical Approach Reveals a High Degree of Richness of Microbial Eukaryotes in an Anoxic Water Column

Limited Information Goodness-of-fit Testing in Multidimensional Contingency Tables

A Cautionary Note on Using G²(dif) to Assess Relative Model Fit in Categorical Data Analysis

Contact Info

Product

Resources

About

Type I errors and power of the parametric bootstrap goodness‐of‐fit test: Full and limited information

Cited by 38 publications

References 17 publications

Synthetic Statistical Approach Reveals a High Degree of Richness of Microbial Eukaryotes in an Anoxic Water Column

Synthetic Statistical Approach Reveals a High Degree of Richness of Microbial Eukaryotes in an Anoxic Water Column

Limited Information Goodness-of-fit Testing in Multidimensional Contingency Tables

A Cautionary Note on Using G2(dif) to Assess Relative Model Fit in Categorical Data Analysis

Contact Info

Product

Resources

About

A Cautionary Note on Using G²(dif) to Assess Relative Model Fit in Categorical Data Analysis