Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose p rep , the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making.Statistics can address three different types of questions (Royall, 1997):
What should I believe?2. How should I evaluate this evidence?
What should I do?The first two are of great importance to scientists: Finding the answers to these questions defines their praxis. The last question, on which we focus here, is of greater relevance to practitioners who must deal with decisions that have practical consequences. In its simplest form, a decision is a choice between two alternative courses of action. All other things being equal, optimal decisions favor courses of action that are expected to yield higher returns (e.g., better indices of class attendance) and are less costly to implement over those that are expected to yield lower returns and cost more. Choices with dominant alternatives are trivial; it is when costs and expected returns covary in the same direction that practical choices may become dilemmas, invoking the aid of decision committees and statisticians. Costly actions must be justified by their returns exceeding some minimum standard of expected improvement. Once that minimum improvement is defined, researchers produce data from which statisticians, in turn, are expected to determine whether the minimum improvement is real or not. Null hypothesis significance tests (NHST) are the conventional tool for making these evaluations. The privileged status of NHST is most clearly reflected in its prevalence as a diagnostic tool in the psychological and educational literature and in its entrenchment in the statistical training of education and psychology professionals. We will illustrate its use by applying the NHST routine to the solution of a practical binary-choice situation and demonstrate its inadequacy in informing a decision in that scenario. We argue that the probability of obtaining a minimum cost-effective return is more informative than arbitrary decisions about statistical significance and provide the rationale and algorithm for its estimation.
The Null Hypothesis Significance Testing RoutineImagine that you are asked to evaluate a method for teaching English as a second language (ESL). How would you decide whether this new method for teaching ESL is better than the traditional one? First, you would collect data from two groups of students being taught ESL, matched as closely as possible on all potentially relevant variables, one using the old teaching method (Group OLD), and the other the new method (Group NEW). At...