Despite their popularity and capacity to predict performance, there is no clear consensus on the internal measurement characteristics of situational judgement tests (SJTs). Contemporary propositions in the literature focus on treating SJTs as methods, as measures of dimensions, or as measures of situational responses. However, empirical evidence relating to the internal structure of SJT scores is lacking. Using generalizability theory, we decomposed multiple sources of variance for three different SJTs used with different samples of job candidates (N1 = 2,320; N2 = 989; N3 = 7,934). Results consistently indicated that (1) the vast majority of reliable observed score variance reflected SJT‐specific candidate main effects, analogous to a general judgement factor, and that (2) the contribution of dimensions and situations to reliable SJT variance was, in relative terms, negligible. These findings do not align neatly with any of the proposals in the contemporary literature; however, they do suggest an internal structure for SJTs.
Practitioner points
To help optimize reliable variance, overall‐level aggregation should be used when scoring SJTs.
The majority of reliable variance in SJTs reflects a general performance factor, relative to variance pertaining to specific dimensions or situations.
SJT‐based developmental feedback should be delivered in terms of general SJT performance rather than on performance relating to specific dimensions or situations.
Generalizability theory, although underutilized in organizational multifaceted measurement, offers an approach to informing on the psychometric properties of SJTs that is well suited to the complexities of SJT measurement designs.