Repeated investigations of the same phenomenon typically yield effect sizes that vary more than one would expect from sampling error alone. Such variation is even found in exact replication studies, suggesting that it is not only because of identifiable moderators but also to subtler random variation across studies. Such heterogeneity of effect sizes is typically ignored, with unfortunate consequences. We consider its implications for power analyses, the precision of estimated effects, and the planning of original and replication research. With heterogeneity and an interest in generalizing to a population of studies, the usual power calculations and confidence intervals are likely misleading, and the preference for single definitive large-N studies is misguided. Researchers and methodologists need to recognize that effects are often heterogeneous and plan accordingly.
Translational AbstractRepeated investigations of the same phenomenon typically yield somewhat different results that vary more than one would expect from the fact that the investigations have different participants. Even when the very same phenomenon is studied, such variation is found, which implies there might be subtle random variation across studies. Such heterogeneity of effects is typically ignored, with unfortunate consequences. We consider its implications for determining the chances that the study would have a statistically significant effect, the statistical precision of estimated effects, and the planning of original and replication research. Researchers and methodologists need to recognize that effects are often heterogeneous and plan accordingly.