Artifactual reliability of forced-choice scales.

Tenopyr, Mary L.

doi:10.1037/0021-9010.73.4.749

Cited by 49 publications

(53 citation statements)

References 2 publications

(2 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This scale has also demonstrated convergence with other measures of other orientation such as empathy (Davis, 1980) and social interest (Crandall, 1975), and divergence with measures of self-orientation, such as narcissism and selfenhancement (Korsgaard, Meglino, & Lester, 1996;McNeely, 1992;McNeely & Meglino, 1994). Internal consistency procedures can yield erroneous estimates of reliability for forced-choice scales (Baron, 1996;Tenopyr, 1988). Therefore, Tenopyr (1988) recommended computing the internal consistencies of forced-choice scales using the items in normative form.…”

Section: Other Orientationmentioning

confidence: 95%

The effect of other orientation on self–supervisor rating agreement

2004

View full text Add to dashboard Cite

SummaryEmployees generally rate their performance more favorably than do their supervisors, which can lead to conflict and poor job performance. However, comparative international research indicates that persons from other-oriented collectivist cultures are less self-enhancing, suggesting that other-oriented employees will exhibit greater agreement with ratings provided by their supervisors. We examined the effect of subordinates' other orientation on self-supervisor performance rating agreement. Consistent with cultural expectations, self-ratings of other-oriented subordinates showed greater agreement with ratings provided by their supervisors and less leniency relative to their supervisors' evaluations. These findings have implications for understanding how employees in different professions, organizations, and cultures utilize feedback from their supervisors.

show abstract

Section: Other Orientationmentioning

confidence: 95%

The effect of other orientation on self–supervisor rating agreement

2004

View full text Add to dashboard Cite

show abstract

“…The Thurstonian IRT model can be embedded within a familiar SEM framework to be estimated and scored by generalpurpose software (we used Mplus throughout this paper). The model also provides means of estimating reliability for forced-choice questionnaires, which has been problematic under the classical scoring schemes (Tenopyr, 1988;Baron, 1998).…”

Section: Discussionmentioning

confidence: 99%

Item Response Modeling of Forced-Choice Questionnaires

Brown¹,

Maydeu-Olivares

2011

Educational and Psychological Measurement

235

513

View full text Add to dashboard Cite

IRT modeling of forced choice   2 Abstract Multidimensional forced-choice formats can significantly reduce the impact of numerous response biases typically associated with rating scales. However, if scored with classical methodology these questionnaires produce ipsative data, which leads to distorted scale relationships and makes comparisons between individuals problematic. This research demonstrates how Item Response Theory (IRT) modeling may be applied to overcome these problems. A multidimensional IRT model based on Thurstone's framework for comparative data is introduced, which is suitable for use with any forced-choice questionnaire composed of items fitting the dominance response model, with any number of measured traits, and any block sizes (i.e. pairs, triplets, quads etc.). Thurstonian IRT models are normal ogive models with structured factor loadings, structured uniquenesses, and structured local dependencies.These models can be straightforwardly estimated using structural equation modeling (SEM) software Mplus. A number of simulation studies are performed to investigate how latent traits are recovered under various forced-choice designs, and to provide guidelines for optimal questionnaire design. An empirical application is given to illustrate how the model may be applied in practice. It is concluded that when the recommended design guidelines are met, scores estimated from forced-choice questionnaires with the proposed methodology reproduce the latent traits well.Keywords: forced-choice format, forced-choice questionnaires, ipsative data, comparative judgment, multidimensional IRT IRT modeling of forced choiceItem response modeling of forced-choice questionnairesThe most popular way of presenting questionnaire items is through rating scales (Likert scales), where participants are asked to rate a statement using some given categories (for example, ranging from "strongly disagree" to "strongly agree", or from "never" to "always", etc.). It is well-known that such format (single-stimulus format) can lead to various response biases, for instance because participants do not interpret the rating categories in the same way (Friedman & Amoo, 1999) between statements according to the extent these statements describe their preferences or behavior. When there are 2 statements in a block, respondents are simply asked to select the statement that better describes them. For blocks of 3, 4 or more statements, respondents may be asked to rank-order the statements, or to select one statement which is "most like me" and one which is "least like me" (i.e., to provide a partial ranking).Because it is impossible to endorse every item, the forced-choice format eliminates uniform biases such as acquiescence responding (Cheung & Chan, 2002), and can increase operational validity by reducing "halo" effects (Bartram, 2007). However, there are serious problems with the way the forced-choice questionnaires have been scored traditionally.IRT modeling of forced choiceTypically, rank orders of items in a block are reversed ...

show abstract

“…Clemans, 1966; to sample-based empirical illustrations (e.g. Hicks, 1970;Johnson, Wood, & Blinkhorn, 1988;Tenopyr, 1988;Closs, 1996;Meade, 2004). The classical test theory (CTT) approach, which works reasonably well with Likert items, performs poorly when applied to forced-choice items.…”

mentioning

confidence: 99%

How IRT can solve problems of ipsative data in forced-choice questionnaires.

Brown

Maydeu-Olivares

2013

Psychological Methods

159

185

View full text Add to dashboard Cite

In multidimensional forced-choice (MFC) questionnaires, items measuring different attributes are presented in blocks, and participants have to rank-order the items within each block (fully or partially). Such comparative formats can reduce the impact of numerous response biases often affecting single-stimulus items (aka, rating or Likert scales). However, if scored with traditional methodology, MFC instruments produce ipsative data, whereby all individuals have a common total test score. Ipsative scoring distorts individual profiles (it is impossible to achieve all high or all low scale scores), construct validity (covariances between scales must sum to zero), criterion related validity (validity coefficients must sum to zero), and reliability estimates.We argue that these problems are caused by inadequate scoring of forced-choice items, and advocate the use of item response theory (IRT) models based on an appropriate response process for comparative data, such as Thurstone's Law of Comparative Judgment. We show that by applying Thurstonian IRT modeling (Brown & Maydeu-Olivares, 2011), even existing forcedchoice questionnaires with challenging features can be scored adequately and that the IRTestimated scores are free from the problems of ipsative data. Assessments of personality, social attitudes, interests, motivation, psychopathology and well-being largely rely on respondent-reported measures. Most such measures employ the socalled single-stimulus format, where respondents evaluate one question (or item) at a time, often in relation to a rating scale (i.e. Likert-type items). Because the respondents rate each item separately from other items, they make absolute judgments about the extent to which the item describes their personality, attitudes, etc. Simple to answer and score and therefore popular with test takers and test users, the single-stimulus format makes several assumptions about the respondents' rating behaviors that are often unrealistic. For instance, the use of rating scales relies on the assumption that respondents interpret category labels in the same way. This assumption is very rarely tested in practice, but research available on the issue suggests that interpretation and meaning of response categories vary from one respondent to another (Friedman & Amoo, 1999). Furthermore, individual response styles may vary (Van Herk, Poortinga & Verhallen, 2004) so that some respondents avoid extreme categories (central tendency responding), whereas others prefer them (extreme responding). Sometimes respondents tend to agree with both positive and negative statements as presented (acquiescence bias).Another common problem is getting respondents to differentiate between ratings they give to single-stimulus items. When rating another person's attributes or behavior (as in the 360-degree feedback), respondents commonly give either high or low ratings on all behaviors (halo/horn effect) depending on whether they judge the person to score high or low on a single important dimension. Typically, respon...

show abstract

Artifactual reliability of forced-choice scales.

Cited by 49 publications

References 2 publications

The effect of other orientation on self–supervisor rating agreement

The effect of other orientation on self–supervisor rating agreement

Item Response Modeling of Forced-Choice Questionnaires

How IRT can solve problems of ipsative data in forced-choice questionnaires.

Contact Info

Product

Resources

About