Coefficient alpha is the most popular measure of reliability (and certainly of internal consistency reliability) reported in psychological research. This is noteworthy given the numerous deficiencies of coefficient alpha documented in the psychometric literature. This mismatch between theory and practice appears to arise partly because users of psychological scales are unfamiliar with the psychometric literature on coefficient alpha and partly because alternatives to alpha are not widely known. We present a brief review of the psychometric literature on coefficient alpha, followed by a practical alternative in the form of coefficient omega. To facilitate the shift from alpha to omega we also present a brief guide to the calculation of point and interval estimates of omega using a free, open source software environment. | P a g eThe construction and application of psychometric scales has become accepted best practice when attempting to measure human performance and behaviour. The implications of test "quality" for the individual and society are unquestioned. Statistical procedures that attempt to assess reliability have acquired the status of ingrained conventions, with certain types of analyses being routinely adopted. The predominant framework under which most such procedures fall is Classical Test Theory (CTT) (e.g., see Lord & Novick, 1968). This is the most popular way of conceptualising how a scale should perform and function. In recent years improved approaches to reliability estimation have been advocated by psychometricians. Yet, despite a widespread dissemination and publication of alternatives, there remains a staunch resistance to advancements in the interpretation, application, and reporting of a scale"s reliability, particularly when it comes to internal consistency.The APA Task Force on Statistical Inference (Wilkinson and APA Task Force on Statistical Inference, 1999), placed emphasis on the correct use and treatment of reliability estimates.The most common type of reliability estimate reported in articles published by the American Psychological Association were internal consistency estimates (as opposed to test-retest or parallel forms). These accounted for 75% of all reported reliabilities (Hogan et al., 2000). The most common means of assessing internal consistency in the social sciences is that of coefficient alpha -also termed Cronbach"s alpha (alpha) (following Cronbach"s influential 1951 paper). This has become a routinely relied upon statistic for estimating a scale"s internal consistency. A recent search by the current authors (via Google Scholar®, 2012) confirms its prevalence -showing it to have been cited some 17,608 times since its original publication.However, as Cronbach himself stated, "The numerous citations to my paper by no means 3 | P a g e indicate that the person who cited it had read it, and does not even demonstrate that he had looked at it." (Cronbach & Shavelson, 2004, p.392).Reflective of Cronbach"s comment, researchers" understanding of reliability analysis is g...
It is regarded as best practice for psychologists to report effect size when disseminating quantitative research findings. Reporting of effect size in the psychological literature is patchy -though this may be changing -and when reported it is far from clear that appropriate effect size statistics are employed. This paper considers the practice of reporting point estimates of standardized effect size and explores factors such as reliability, range restriction and differences in design that distort standardized effect size unless suitable corrections are employed. For most purposes simple (unstandardized) effect size is more robust and versatile than standardized effect size. Guidelines for deciding what effect size metric to use and how to report it are outlined. Foremost among these are: i) a preference for simple effect size over standardized effect size, and ii) the use of confidence intervals to indicate a plausible range of values the effect might take. Deciding on the appropriate effect size statistic to report always requires careful thought and should be influenced by the goals of the researcher, the context of the research and the potential needs of readers. understanding of the importance of an effect -in particular its practical importance (see Kirk, 1996), ii) comparison of effect sizes within or between studies, and iii) secondary analysis (e.g., power calculations or meta-analysis).The practice of reporting effect size is complicated, however, by the large number of different measures of effect size from which to select. There is a growing literature on what measure ought to be selected (e.g., Kirk, 1996;Olejnik & Algina, 2000;, but it would be unrealistic to expect many researchers to keep up with the full range of available effect size metrics. The aim of this paper is to consider how best to report effect size, with particular focus on the choice between standardized and simple effect size. Standardized measures of effect sizeA standardized measure of effect is one which has been scaled in terms of the variability of the sample or population from which the measure was taken. In constrast, simple effect size (Frick, 1999) is unstandardized and expressed in the original units of analysis. Rosenthal (1994) classifies standardized effect sizes into one of two main families: the r family and the d family. An important distinction between r and d is that in a two-group independent design when both are applicable, d but not r is not sensitive to the base rates (n 1 and n 2 ) of the groups
In response to recommendations to redefine statistical significance to p ≤ .005, we propose that researchers should transparently report and justify all choices they make when designing a study, including the alpha level.
No abstract
Information from faces and voices combines to provide multimodal signals about a person. Faces and voices may offer redundant, overlapping (backup signals), or complementary information (multiple messages). This article reports two experiments which investigated the extent to which faces and voices deliver concordant information about dimensions of fitness and quality. In Experiment 1, participants rated faces and voices on scales for masculinity/femininity, age, health, height, and weight. The results showed that people make similar judgments from faces and voices, with particularly strong correlations for masculinity/femininity, health, and height. If, as these results suggest, faces and voices constitute backup signals for various dimensions, it is hypothetically possible that people would be able to accurately match novel faces and voices for identity. However, previous investigations into novel face-voice matching offer contradictory results. In Experiment 2, participants saw a face and heard a voice and were required to decide whether the face and voice belonged to the same person. Matching accuracy was significantly above chance level, suggesting that judgments made independently from faces and voices are sufficiently similar that people can match the two. Both sets of results were analyzed using multilevel modeling and are interpreted as being consistent with the backup signal hypothesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.