Measurement practices in large-scale replications: Insights from Many Labs 2.

Shaw, Mairead; Cloos, Leonie; Luong, Raymond; Elbaz, Sasha

doi:10.1037/cap0000220

Cited by 40 publications

(28 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This has been documented in literature on emotions (69% of scales sampled; Weidman et al, 2017), education and behavior (40–90% of articles sampled; Barry et al, 2014), and social psychology and personality (40% of scales sampled; Flake et al, 2017). Most recently, Shaw et al (2020) reviewed all of the measures used in the Many Labs 2 studies; 34 of the 43 (79%) item-based scales appeared to be ad hoc, having been created by the study authors and used without supporting validity information. If a study uses undeveloped measures with no validity evidence, many questions remain, and the validity of the study conclusions are cast in serious doubt.…”

Section: Using Questions That Promote Validity Of Measure Usementioning

confidence: 99%

“…When scientists lack validity evidence for measures, they lack the necessary information to evaluate the overall validity of a study’s conclusions. Further, recent research on commonly used measures in social and personality psychology showed that measures with less published validity evidence were less likely to show strong evidence for construct validity when evaluated in new data (Hussey & Hughes, 2020; Shaw, Cloos, Luong, Elbaz, & Flake, 2020). The lack of information about measures is a critical problem that could stem from underreporting, ignorance, negligence, misrepresentation, or some combination of these factors.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them

Flake

Fried

2020

Advances in Methods and Practices in Psychological Science

Self Cite

545

467

View full text Add to dashboard Cite

In this article, we define questionable measurement practices (QMPs) as decisions researchers make that raise doubts about the validity of the measures, and ultimately the validity of study conclusions. Doubts arise for a host of reasons, including a lack of transparency, ignorance, negligence, or misrepresentation of the evidence. We describe the scope of the problem and focus on how transparency is a part of the solution. A lack of measurement transparency makes it impossible to evaluate potential threats to internal, external, statistical-conclusion, and construct validity. We demonstrate that psychology is plagued by a measurement schmeasurement attitude: QMPs are common, hide a stunning source of researcher degrees of freedom, and pose a serious threat to cumulative psychological science, but are largely ignored. We address these challenges by providing a set of questions that researchers and consumers of scientific research can consider to identify and avoid QMPs. Transparent answers to these measurement questions promote rigorous research, allow for thorough evaluations of a study’s inferences, and are necessary for meaningful replication studies.

show abstract

Section: Using Questions That Promote Validity Of Measure Usementioning

confidence: 99%

mentioning

confidence: 99%

Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them

Flake

Fried

2020

Advances in Methods and Practices in Psychological Science

Self Cite

545

467

View full text Add to dashboard Cite

show abstract

“…The measures of problematic SMU often lack construct validity and may produce inflated effect sizes. For example, scales of problematic SMU often entail "negative outcomes" as a diagnostic criterion and may prime respondents to reporting negative effects (e.g., Mieczkowski et al, 2020;Shaw et al, 2020). Purely descriptive measures of SMU, on the other hand, can be equally problematic because depending on the scope of the researcher, the same variable (e.g., the objective number of followers) can be used as a proxy for popularity or anxiety (cf.…”

Section: Operationalization Of Social Media Usementioning

confidence: 99%

Plurality in the Measurement of Social Media Use and Mental Health: An Exploratory Study Among Adolescents and Young Adults

Petalas

Konijn

Johnson

et al. 2021

Social Media + Society

View full text Add to dashboard Cite

On a daily basis, individuals between 12 and 25 years of age engage with their mobile devices for many hours. Social Media Use (SMU) has important implications for the social life of younger individuals in particular. However, measuring SMU and its effects often poses challenges to researchers. In this exploratory study, we focus on some of these challenges, by addressing how plurality in the measurement and age-specific characteristics of SMU can influence its relationship with measures of subjective mental health (MH). We conducted a survey among a nationally representative sample of Dutch adolescents and young adults ( N = 3,669). Using these data, we show that measures of SMU show little similarity with each other, and that age-group differences underlie SMU. Similar to the small associations previously shown in social media-effects research, we also find some evidence that greater SMU associates to drops and to increases in MH. Albeit nuanced, associations between SMU and MH were found to be characterized by both linear and quadratic functions. These findings bear implications for the level of association between different measures of SMU and its theorized relationship with other dependent variables of interest in media-effects research.

show abstract

“…For example, with psychometrically sound scales our alternative ways of computing scale composites (unweighted average score across items, sum score, or first component from PCA) should have been very nearly equivalent (e.g., McNeish and Wolf 2020). Yet they resulted in a considerable UMV, which may reflect inadequate attention to psychometric scale development amongst our (non-random) sample of social and cognitive multi-lab studies (see also Shaw et al 2020). That said, these multi-lab projects were replication projects, and there is an understandable tension between exact replication and design improvements.…”

Section: /34mentioning

confidence: 99%

Preprint - Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

Olsson-Collentine¹,

Rcm²,

Bakker³

et al. 2021

Preprint

View full text Add to dashboard Cite

There are arbitrary decisions to be made (i.e., researcher degrees of freedom) in the execution and reporting of most research. These decisions allow for many possible outcomes from a single study. Selective reporting of results from this ‘multiverse’ of outcomes, whether intentional (_p_-hacking) or not, can lead to inflated effect size estimates and false positive results in the literature. In this study, we examine and illustrate the consequences of researcher degrees of freedom in primary research, both for primary outcomes and for subsequent meta-analyses. We used a set of 10 preregistered multi-lab direct replication projects from psychology (Registered Replication Reports) with a total of 14 primary outcome variables, 236 labs and 37,602 participants. By exploiting researcher degrees of freedom in each project, we were able to compute between 3,840 and 2,621,440 outcomes per lab. We show that researcher degrees of freedom in primary research can cause substantial variability in effect size that we denote the Underlying Multiverse Variability (UMV). In our data, the median UMV across labs was 0.1 standard deviations (interquartile range = 0.09 – 0.15). In one extreme case, the effect size estimate could change by _d_ = 1.27, evidence that _p_-hacking in some (rare) cases can provide support for almost any conclusion. We also show that researcher degrees of freedom in primary research provide another source of uncertainty in meta-analysis beyond those usually estimated. This would not be a large concern for meta-analysis if researchers made all arbitrary decisions at random. However, emulating selective reporting of lab results led to inflation of meta-analytic average effect size estimates in our data by as much as 0.1 - 0.48 standard deviations, depending to a large degree on the number of possible outcomes at the lab level (i.e., multiverse size). Our results illustrate the importance of making research decisions transparent (e.g., through preregistration and multiverse analysis), evaluating studies for selective reporting, and whenever feasible making raw data available.

show abstract

Measurement practices in large-scale replications: Insights from Many Labs 2.

Cited by 40 publications

References 55 publications

Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them

Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them

Plurality in the Measurement of Social Media Use and Mental Health: An Exploratory Study Among Adolescents and Young Adults

Preprint - Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

Contact Info

Product

Resources

About