Aclassic data-analytic problem is the statistical evaluation of the distributional form of interval-scale scores. The investigator may need to know whether the scores originate from a single Gaussian distribution or from a mixture of Gaussian distributions or from a different probability distribution. The relative merits of extant goodness-of-fit metrics are discussed. Monte Carlo power analyses are provided for several of the more powerful goodness-of-fit metrics.The goodness-of-fit problem in statistics is a general issue that has relevance for research both in psychology and in other disciplines. In this paper, a review of this problem will be outlined. In part, the need for such a review is to close the gap that exists between current practice in psychological research and what is known about the goodness-of-fit problem in statistics. Recent developments now make possible the utilization of more powerful tools for evaluating the goodness-of-fit. In particular, the focus of this paper will be on cases in which the researcher has interval-or ratio-scale measures.Given interval-or ratio-scale measurements, Xl' ... , X n , the goodness-of-fit problem is concerned with the question of whether or not these scores originate from a particular probability distribution function. For example, a set of n reaction time measurements might be modeled by a log-Gaussian distribution or an ex-Gaussian distribution or a mixture distribution of different stochastic processes (see Luce, 1986). We need an answer to this type of question in order to understand the underlying psychological processes. Moreover, if the reaction time values are to be used to fit a psychological model, the likelihood function of the reaction times needs to be specified because all methods ofparameter estimation require knowledge of the likelihood function (i.e., the distributional nature of the interval-scale scores). Consequently, the goodness-of-fit evaluation ofany proposal about the distribution is a very important initial step toward understanding the psychological processes.There is a large body of statistical research on the goodness-of-fit problem, but many ofthese developments unfortunately have yet to be exported to psychology. There are a large number of goodness-of-fit procedures. All of Correspondence concerning this article should be addressed to R. A. Chechile, Psychology Department, Tufts University, Medford, MA 02155 (e-mail: rchechil@emerald.tufts.edu).these methods control for the Type I error rate, but there are considerable differences among the methods in regard to power. Generally, current practice in psychology employs only the older goodness-of-fit metrics, which have relatively low power. It is not convincing to argue that the hypothesized model (i.e., the null hypothesis in a goodness-of-fit test) is reasonable because ofa nonsignificant test statistic that is known to be low in power.
Nominal-Scale Goodness-of-Fit MeasuresThe two old goodness-of-fit warriors are the X 2 (K. Pearson, 1900) and the G2 (Neyman & E. S. Pearson, ...