We need to make substantial changes to how we conduct research. First, in response to heightened concern that our published research literature is incomplete and untrustworthy, we need new requirements to ensure research integrity. These include prespecification of studies whenever possible, avoidance of selection and other inappropriate dataanalytic practices, complete reporting, and encouragement of replication. Second, in response to renewed recognition of the severe flaws of null-hypothesis significance testing (NHST), we need to shift from reliance on NHST to estimation and other preferred techniques. The new statistics refers to recommended practices, including estimation based on effect sizes, confidence intervals, and meta-analysis. The techniques are not new, but adopting them widely would be new for many researchers, as well as highly beneficial. This article explains why the new statistics are important and offers guidance for their use. It describes an eight-step new-statistics strategy for research with integrity, which starts with formulation of research questions in estimation terms, has no place for NHST, and is aimed at building a cumulative quantitative discipline.Keywords research integrity, the new statistics, estimation, meta-analysis, replication, statistical analysis, research methods Received 7/8/13; Revision accepted 8/20/13 8 Cumming sufficient guidance is available for us to make the changes now, even as we develop further new-statistics practices. The changes will prove highly worthwhile: Our publicly available literature will become more trustworthy, our discipline more quantitative, and our research progress more rapid.
Wider use in psychology of confidence intervals (CIs), especially as error bars in figures, is a desirable development. However, psychologists seldom use CIs and may not understand them well. The authors discuss the interpretation of figures with error bars and analyze the relationship between CIs and statistical significance testing. They propose 7 rules of eye to guide the inferential use of figures with error bars. These include general principles: Seek bars that relate directly to effects of interest, be sensitive to experimental design, and interpret the intervals. They also include guidelines for inferential interpretation of the overlap of CIs on independent group means. Wider use of interval estimation in psychology has the potential to improve research communication substantially.
Error bars commonly appear in figures in publications, but experimental biologists are often unsure how they should be used and interpreted. In this article we illustrate some basic features of error bars and explain how they can help communicate data and assist correct interpretation. Error bars may show confidence intervals, standard errors, standard deviations, or other quantities. Different types of error bars give quite different information, and so figure legends must make clear what error bars represent. We suggest eight simple rules to assist with effective use and interpretation of error bars.
When 95 per cent confidence intervals (CIs) on independent means do not overlap, the two-tailed p-value is less than 0.05 and there is a statistically significant difference between the means. However, p for non-overlapping 95 per cent CIs is actually considerably smaller than 0.05: If the two CIs just touch, p is about 0.01, and the intervals can overlap by as much as about half the length of one CI arm before p becomes as large as 0.05. Keeping in mind this rule-that overlap of half the length of one arm corresponds approximately to statistical significance at p = 0.05-can be helpful for a quick appreciation of figures that display CIs, especially if precise p-values are not reported. The author investigated the robustness of this and similar rules, and found them sufficiently accurate when sample sizes are at least 10, and the two intervals do not differ in width by more than a factor of 2. The author reviewed previous discussions of CI overlap and extended the investigation to p-values other than 0.05 and 0.01. He also studied 95 per cent CIs on two proportions, and on two Pearson correlations, and found similar rules apply to overlap of these asymmetric CIs, for a very broad range of cases. Wider use of figures with 95 per cent CIs is desirable, and these rules may assist easy and appropriate understanding of such figures.
Replication is fundamental to science, so statistical analysis should give information about replication. Because p values dominate statistical analysis in psychology, it is important to ask what p says about replication. The answer to this question is "Surprisingly little." In one simulation of 25 repetitions of a typical experiment, p varied from <.001 to .76, thus illustrating that p is a very unreliable measure. This article shows that, if an initial experiment results in two-tailed p = .05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p <.00008, and fully a 10% chance that p >.44. Remarkably, the interval-termed a p interval-is this wide however large the sample size. p is so unreliable and gives such dramatically vague information that it is a poor basis for inference. Confidence intervals, however, give much better information about replication. Researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking.
Elicitation of expert opinion is important for risk analysis when only limited data are available. Expert opinion is often elicited in the form of subjective confidence intervals; however, these are prone to substantial overconfidence. We investigated the influence of elicitation question format, in particular the number of steps in the elicitation procedure. In a 3-point elicitation procedure, an expert is asked for a lower limit, upper limit, and best guess, the two limits creating an interval of some assigned confidence level (e.g., 80%). In our 4-step interval elicitation procedure, experts were also asked for a realistic lower limit, upper limit, and best guess, but no confidence level was assigned; the fourth step was to rate their anticipated confidence in the interval produced. In our three studies, experts made interval predictions of rates of infectious diseases (Study 1, n = 21 and Study 2, n = 24: epidemiologists and public health experts), or marine invertebrate populations (Study 3, n = 34: ecologists and biologists). We combined the results from our studies using meta-analysis, which found average overconfidence of 11.9%, 95% CI [3.5, 20.3] (a hit rate of 68.1% for 80% intervals)-a substantial decrease in overconfidence compared with previous studies. Studies 2 and 3 suggest that the 4-step procedure is more likely to reduce overconfidence than the 3-point procedure (Cohen's d = 0.61, [0.04, 1.18]).
Little is known about researchers' understanding of confidence intervals (CIs) and standard error (SE) bars. Authors of journal articles in psychology, behavioral neuroscience, and medicine were invited to visit a Web site where they adjusted a figure until they judged 2 means, with error bars, to be just statistically significantly different (p < .05). Results from 473 respondents suggest that many leading researchers have severe misconceptions about how error bars relate to statistical significance, do not adequately distinguish CIs and SE bars, and do not appreciate the importance of whether the 2 means are independent or come from a repeated measures design. Better guidelines for researchers and less ambiguous graphical conventions are needed before the advantages of CIs for research communication can be realized.
Reform of statistical practice in the social and behavioral sciences requires wider use of confidence intervals (CIs), effect size measures, and meta-analysis. The authors discuss four reasons for promoting use of CIs: They (a) are readily interpretable, (b) are linked to familiar statistical significance tests, (c) can encourage meta-analytic thinking, and (d) give information about precision. The authors discuss calculation of CIs for a basic standardized effect size measure, Cohen’s δ (also known as Cohen’s d), and contrast these with the familiar CIs for original score means. CIs for δ require use of noncentral t distributions, which the authors apply also to statistical power and simple meta-analysis of standardized effect sizes. They provide the ESCI graphical software, which runs under Microsoft Excel, to illustrate the discussion. Wider use of CIs for δ and other effect size measures should help promote highly desirable reform of statistical practice in the social sciences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.