2017
DOI: 10.1525/collabra.71
|View full text |Cite
|
Sign up to set email alerts
|

Too Good to be False: Nonsignificant Results Revisited

Abstract: Due to its probabilistic nature, Null Hypothesis Significance Testing (NHST) is subject to decision errors. The concern for false positives has overshadowed the concern for false negatives in the recent debates in psychology. This might be unwarranted, since reported statistically nonsignificant findings may just be 'too good to be false'. We examined evidence for false negatives in nonsignificant results in three different ways. We adapted the Fisher test to detect the presence of at least one false negative … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
45
0
2

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 39 publications
(47 citation statements)
references
References 69 publications
(94 reference statements)
0
45
0
2
Order By: Relevance
“…Despite repeated recommendations to change this, there seems to have been no overall improvement (Fraley & Vazire, 2014;Hartgerink et al, 2017;Marszalek et al, 2011;Maxwell, 2004;Stanley, Carter, & Doucouliagos, 2017;Szucs & Ioannidis, 2017; but see Maddock & Rossi, 2001;Rossi, 1990).…”
Section: Power In Intelligence Researchmentioning
confidence: 99%
“…Despite repeated recommendations to change this, there seems to have been no overall improvement (Fraley & Vazire, 2014;Hartgerink et al, 2017;Marszalek et al, 2011;Maxwell, 2004;Stanley, Carter, & Doucouliagos, 2017;Szucs & Ioannidis, 2017; but see Maddock & Rossi, 2001;Rossi, 1990).…”
Section: Power In Intelligence Researchmentioning
confidence: 99%
“…For instance, researchers might be interested in showing that the different groups of an experiment do not differ in a crucial confounding variable or in their pre-test scores (e.g., Hilgard, Engelhardt, Bartholow, & Rouder, 2017), or that suicide rates associated with a drug treatment for depression are no higher than in a placebo control group (Fergusson et al, 2006). Finding out whether these null results are true negatives is just as important as assessing the reliability of positive findings (Hartgerink, Wicherts, & van Assen, 2017), perhaps even more so, given that Null Hypothesis Significance Testing (NHST), as regularly implemented in psychological research, is poorly suited to assess the plausibility of the null hypothesis (Dienes, 2011(Dienes, , 2015Hoekstra, Finch, Kiers, & Johnson, 2016). Furthermore, given the average low power of psychological research (Sedlmeier & Gigerenzer, 1989;Smaldino & McElreath, 2016), finding a null result is hardly surprising and often uninformative as to the veracity of the null hypothesis.…”
mentioning
confidence: 99%
“…This happened for instance for all primary studies that did not contain enough information to reproduce the effect size, for which we copied the reported effect size. The number of effect sizes we calculated that were larger than reported (k = 162) was approximately equal to those that we calculated that were smaller than reported (k = degrees of freedom of reported test statistics in eight major psychology journals (Hartgerink et al, 2017). Original effect sizes (Hedges' g) Reproduced effect sizes (Hedges' g) Reproducible, k = 107, 43% Incomplete, k = 40, 16% Incorrect, k = 40, 16% Ambiguous, k = 60, 24% The most common reason for not being able to reproduce a primary study effect size was missing or unclear information in the meta-analysis (i.e., ambiguous effect sizes, k = 96, 19%).…”
Section: Resultsmentioning
confidence: 55%
“…Next to checking whether certain primary study effect sizes were irreproducible due to incomplete, erroneous, or ambiguous reporting, we were also interested in quantifying whether the discrepancy between the reported and reproduced effect size estimates were small, moderate, or large. Effect sizes in various psychological fields (e.g., personality, social, developmental, clinical psychology, intelligence) show small, moderate, and large effect sizes corresponding approximately to r = 0.10, 0.25, and 0.35 (Gignac & Szodorai, 2016;Hartgerink, Wicherts, & van Assen, 2017 6 Which discrepancies were classified as small, moderate or large depends on the type of effect size. We transformed our discrepancy measures for correlations r (small ≥ .025 and < .075], moderate [≥ .075 and < .125], and large [≥ .125] to other types of effect sizes based on N = 64, relating to the 50th percentile of the…”
Section: N Ymentioning
confidence: 99%