Measurement invariance—the notion that the measurement properties of a scale are equal across groups, contexts, or time—is an important assumption underlying much of psychology research. The traditional approach for evaluating measurement invariance is to fit a series of nested measurement models using multiple-group confirmatory factor analyses. However, traditional approaches are strict, vary across the field in implementation, and present multiplicity challenges, even in the simplest case of two groups under study. The alignment method was recently proposed as an alternative approach. This method is more automated, requires fewer decisions from researchers, and accommodates two or more groups. However, it has different assumptions, estimation techniques, and limitations from traditional approaches. To address the lack of accessible resources that explain the methodological differences and complexities between the two approaches, we introduce and illustrate both, comparing them side by side. First, we overview the concepts, assumptions, advantages, and limitations of each approach. Based on this overview, we propose a list of four key considerations to help researchers decide which approach to choose and how to document their analytical decisions in a preregistration or analysis plan. We then demonstrate our key considerations on an illustrative research question using an open dataset and provide an example of a completed preregistration. Our illustrative example is accompanied by an annotated analysis report that shows readers, step-by-step, how to conduct measurement invariance tests using R and Mplus. Finally, we provide recommendations for how to decide between and use each approach and next steps for methodological research.
Validity of measurement is integral to the interpretability of research endeavours and any subsequent replication attempts. To assess current measurement practices and the construct validity of measures in large-scale replication studies, we conducted a systematic review of measures used in Many Labs 2: Investigating Variation in Replicability Across Samples and Settings (Klein et al., 2018). To evaluate the psychometric properties of the scales used in Many Labs 2 we conducted factor and reliability analyses on the publicly-available data. We report that measures in Many Labs 2 were often short with little validity evidence reported in the original study, that measures with more validity evidence in the original study had stronger psychometric properties in the replication sample, and that translated versions of scales had lower reliability. We discuss the implications of these findings for interpreting replication results, and make recommendations to improve measurement practices in future replications.
Background: Online crowdsourcing platforms, such as Amazon Mechanical Turk (MTurk), have become popular alternatives to the ubiquitous student samples used in psychology research. r/SampleSize, an alternative pool on the website Reddit, allows for online participant recruitment without compulsory or immediate payment, making it potentially useful for students, research trainees, and course instructors. Objective: The current study sought to assess the viability of using r/SampleSize as a participant pool by comparing its data characteristics to MTurk and existing lab samples. Method: Two hundred and fifty-six MTurk workers and 277 r/SampleSize participants completed identical questionnaires on demographics, participation motivations, and standard psychology scales. Results: Participants recruited through r/SampleSize reported diverse ages, education levels, income, and employment, although White ethnic background and US residence were predominant. r/SampleSize participants were more internally motivated than MTurk to participate in research and had greater need for cognition but did not differ significantly in altruism or motivation to gain self-knowledge. r/SampleSize data reliability and quality were comparable to MTurk and lab samples across most analyses. Teaching Implications: r/SampleSize can be used to recruit relatively large and diverse samples for undergraduate research projects with minimal setup, labor, and cost. Conclusion: The findings suggest that r/SampleSize is a diverse and viable participant pool.
Because of the misspecification of models and specificity of operationalizations, many studies produce claims of limited utility. We suggest a path forward that requires taking a few steps back. Researchers can retool large-scale replications to conduct the descriptive research which assesses the generalizability of constructs. Large-scale construct validation is feasible and a necessary next step in addressing the generalizability crisis.
Yarkoni describes a grim state of psychological science in which the gross misspecification of our models and specificity of our operationalizations produce claims with generality so narrow that no one would be interested in them. We consider this a generalizability of construct validity issue and discuss how construct validation research should precede large-scale replication research. We provide ideas for a path forward by suggesting psychologists take a few steps back. By retooling large-scale replication studies, psychologists can execute the descriptive research needed to assess the generalizability of constructs. We provide examples of reusing large-scale replication data to conduct construct validation research post hoc. We also discuss proof of concept research that is on-going at the Psychological Science Accelerator. Big team psychology makes large-scale construct validity and generalizability research feasible and worthwhile. We assert that no one needs to quit the field, in fact, there is plenty of work to do. The optimistic interpretation is that if psychologists focus less on generating new ideas and more on organizing, synthesizing, measuring, and assessing constructs from existing ideas, we can keep busy for at least 100 years.
Measurement invariance—the notion that the measurement properties of a scale are equalacross groups, contexts, or time—is an important assumption underlying much of psychology research. The traditional approach for evaluating measurement invariance is to fit a series of nested measurement models using multiple-group confirmatory factor analyses. However, traditional approaches are strict, vary across the field in implementation, and present multiplicity challenges, even in the simplest case of two groups under study. The alignment method was recently proposed as an alternative approach. This method is more automated, requires fewer decisions from researchers, and accommodates two or more groups. However, it has different assumptions, estimation techniques, and limitations from traditional approaches. To address the lack of accessible resources that explain the methodological differences and complexities between the two approaches, we introduce and illustrate both, comparing them side by side. First, we overview the concepts, assumptions, advantages, and limitations of each approach. Based on this overview, we propose a list of four key considerations to help researchers decide which approach to choose and how to document their analytical decisions in a preregistration or analysis plan. We then demonstrate our key considerations on an illustrative research question using an open dataset and provide an example of a completed preregistration. Our illustrative example is accompanied by an annotated analysis report that shows readers, step-by-step, how to conduct measurement invariance tests using R and Mplus. Finally, we provide recommendations for how to decide between and use each approach and next steps for methodological research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.