Tests for departure from normality: Comparison of powers

Pearson, E. S.; D’Agostino, Ralph B.; Bowman, K. O.

doi:10.1093/biomet/64.2.231

Cited by 234 publications

(59 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Some of these tests are constructed to be applied under certain conditions or assumptions. Extensive studies on the Type I error rate and power of these normality tests have been discussed in [1][2][3][4][5][6][7][8][9]. Most of these comparisons were carried out using selected normality tests and selected small sample sizes.…”

Section: Introductionmentioning

confidence: 99%

Comparisons of various types of normality tests

Wah

Sim

2011

Journal of Statistical Computation and Simulation

613

321

View full text Add to dashboard Cite

Normality tests can be classified into tests based on chi-squared, moments, empirical distribution, spacings, regression and correlation and other special tests. This paper studies and compares the power of eight selected normality tests: the Shapiro-Wilk test, the Kolmogorov-Smirnov test, the Lilliefors test, the Cramer-von Mises test, the Anderson-Darling test, the D'Agostino-Pearson test, the Jarque-Bera test and chi-squared test. Power comparisons of these eight tests were obtained via the Monte Carlo simulation of sample data generated from alternative distributions that follow symmetric short-tailed, symmetric long-tailed and asymmetric distributions. Our simulation results show that for symmetric short-tailed distributions, D'Agostino and Shapiro-Wilk tests have better power. For symmetric long-tailed distributions, the power of Jarque-Bera and D'Agostino tests is quite comparable with the Shapiro-Wilk test. As for asymmetric distributions, the Shapiro-Wilk test is the most powerful test followed by the Anderson-Darling test.

show abstract

Section: Introductionmentioning

confidence: 99%

Comparisons of various types of normality tests

Wah

Sim

2011

Journal of Statistical Computation and Simulation

613

321

View full text Add to dashboard Cite

show abstract

“…For an empirical comparison of the performances of our tests we use some alternatives and tests choosing from Pearson et al [4] with numbering using there (cf. Table 1) and studied in Morris and Szynal [2].…”

Section: Simulation Resultsmentioning

confidence: 99%

“…Tests and alternatives are taken from Pearson et al [4] as it was done in Morris and Szynal [2] where tests for normality are based on characterizations involving moments of order statistics.…”

Section: Introductionmentioning

confidence: 99%

Simulation Supplement to Goodness-of-Fit Tests Derived From Characterizations of Continuous Distributions via Record Values

Szynal¹,

'{n}ski²

2013

Int. J. of Pure and Appl. Math.

View full text Add to dashboard Cite

show abstract

“…A more successful approach was that of the Wilk (1965, 1972) W statistic and the Shapiro and Francia (1972) W' statistic. These statistics have been shown to result in high power (see E. S. Pearson, D' Agostino, & Bowman, 1977;Shapiro, Wilk, & Chen, 1968;and Stephens, 1974). Unfortunately, these statistical tests require the evaluation of expected-order statistics, and those calculations are computationally elaborate.…”

Section: Interval-scale Metricsmentioning

confidence: 99%

Reexamining the goodness-of-fit problem for interval-scale scores

Chechile

1998

Behavior Research Methods, Instruments, & Computers

View full text Add to dashboard Cite

Aclassic data-analytic problem is the statistical evaluation of the distributional form of interval-scale scores. The investigator may need to know whether the scores originate from a single Gaussian distribution or from a mixture of Gaussian distributions or from a different probability distribution. The relative merits of extant goodness-of-fit metrics are discussed. Monte Carlo power analyses are provided for several of the more powerful goodness-of-fit metrics.The goodness-of-fit problem in statistics is a general issue that has relevance for research both in psychology and in other disciplines. In this paper, a review of this problem will be outlined. In part, the need for such a review is to close the gap that exists between current practice in psychological research and what is known about the goodness-of-fit problem in statistics. Recent developments now make possible the utilization of more powerful tools for evaluating the goodness-of-fit. In particular, the focus of this paper will be on cases in which the researcher has interval-or ratio-scale measures.Given interval-or ratio-scale measurements, Xl' ... , X n , the goodness-of-fit problem is concerned with the question of whether or not these scores originate from a particular probability distribution function. For example, a set of n reaction time measurements might be modeled by a log-Gaussian distribution or an ex-Gaussian distribution or a mixture distribution of different stochastic processes (see Luce, 1986). We need an answer to this type of question in order to understand the underlying psychological processes. Moreover, if the reaction time values are to be used to fit a psychological model, the likelihood function of the reaction times needs to be specified because all methods ofparameter estimation require knowledge of the likelihood function (i.e., the distributional nature of the interval-scale scores). Consequently, the goodness-of-fit evaluation ofany proposal about the distribution is a very important initial step toward understanding the psychological processes.There is a large body of statistical research on the goodness-of-fit problem, but many ofthese developments unfortunately have yet to be exported to psychology. There are a large number of goodness-of-fit procedures. All of Correspondence concerning this article should be addressed to R. A. Chechile, Psychology Department, Tufts University, Medford, MA 02155 (e-mail: rchechil@emerald.tufts.edu).these methods control for the Type I error rate, but there are considerable differences among the methods in regard to power. Generally, current practice in psychology employs only the older goodness-of-fit metrics, which have relatively low power. It is not convincing to argue that the hypothesized model (i.e., the null hypothesis in a goodness-of-fit test) is reasonable because ofa nonsignificant test statistic that is known to be low in power. Nominal-Scale Goodness-of-Fit MeasuresThe two old goodness-of-fit warriors are the X 2 (K. Pearson, 1900) and the G2 (Neyman & E. S. Pearson, ...

show abstract

Tests for departure from normality: Comparison of powers

Cited by 234 publications

References 15 publications

Comparisons of various types of normality tests

Comparisons of various types of normality tests

Simulation Supplement to Goodness-of-Fit Tests Derived From Characterizations of Continuous Distributions via Record Values

Reexamining the goodness-of-fit problem for interval-scale scores

Contact Info

Product

Resources

About