GGUM-RANK Statement and Person Parameter Estimation With Multidimensional Forced Choice Triplets

Lee, Philseok; Joo, Seang‐Hwane; Stark, Stephen; Chernyshenko, Oleksandr S.

doi:10.1177/0146621618768294

Cited by 30 publications

(38 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wang & Chen, ). Specifically, a ranking item in this study is composed of a group of statements, and each statement loads on only one dimension (e.g., a 10‐dimensional test uses triplet ranking items; each item consists of three statements involving three of 10 dimensions; Lee, Joo, Stark, & Chernyshenko, ). Unlike cognitive tests, noncognitive tests often have many dimensions and items (e.g., SNAP‐2 has 15 dimensions and 390 items; Clark, Simms, Wu, & Casillas, ), requiring long wait times before an item is selected.…”

Section: Computation Time Resulting From High Dimensionality and A Humentioning

confidence: 99%

Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items

Chen

Wang

Chiu

et al. 2019

J Educational Measurement

View full text Add to dashboard Cite

The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within‐person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson‐Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision.

show abstract

Section: Computation Time Resulting From High Dimensionality and A Humentioning

confidence: 99%

Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items

Chen

Wang

Chiu

et al. 2019

J Educational Measurement

View full text Add to dashboard Cite

show abstract

“…To our knowledge, however, only a few studies have done this. Among those few studies, most have examined the equivalence of dominance-model-based SS and FC formats and found generally supportive evidence (Brown & Maydeu-Olivares, 2011Guenole, Brown, & Cooper, 2016;Lee, Lee, &Stark, 2018). However, as mentioned above, evidence has been accumulating that shows ideal point models more accurately capture the response processes underlying various psychological measures.…”

Section: Psychometric Equivalence Between Fc and Ssmentioning

confidence: 93%

Though Forced, Still Valid: Psychometric Equivalence of Forced-Choice and Single-Statement Measures

Zhang

Sun

Drasgow

et al. 2019

Organizational Research Methods

Self Cite

View full text Add to dashboard Cite

Forced choice (FC) measures are gaining popularity as an alternative assessment format to single statement (SS) measures due to their potential in reducing the impact of various response styles and faking. However, a fundamental question remains to be answered: do FC and SS instruments measure the same underlying constructs? In addition, FC measures are theorized to be more cognitively challenging, so how would this feature influence respondents' reactions to FC measures compared to SS? Two studies were designed to answer these questions. Study 1 results showed that FC measures scored by the Multi-unidimensional Pairwise Preference Model (MUPP) and SS measures scored with an ideal point model yielded similar factor structures and almost identical criterion-related validity across 12 criteria. Both formats also had similar pattern of marginal reliabilities and test-retest reliabilities. Study 1 findings were replicated in Study 2.In addition, we found strong evidence for convergent validity between the two formats. Though the FC format was perceived to be more difficult, respondents showed no differential preference and expressed similar level of emotional and cognitive reactions to the two formats.

show abstract

“…Our focus on pairwise similarity was thus natural to provide a direct comparison with the mean difference index. Recently, a format consisting of three items (i.e., triads or triplets) per forced-choice block has been gaining in popularity (e.g., Guenole et al, 2018;Lee et al, 2019;;Murano et al, 2020;Ng et al, 2020;Walton et al, 2020;Watrin et al, 2019;, because it seems to provide an optimal balance between the information gained and the cognitive burden placed on test takers (Brown & Maydeu-Olivares, 2013). That being said, future research should also consider evaluating indices of similarity involving more than two items.…”

Section: Discussionmentioning

confidence: 99%

“…Based on the discussion so far, it is clear that when constructing a forced-choice measure resistant to SDR, it is essential to identify item-combinations for which differences in desirability evaluations between items are sufficiently similar across test takers. Typically, item desirability matching relies on empirically obtained item desirability ratings (e.g., Chernyshenko et al, 2009;Christiansen et al, 2005;Converse et al, 2010;Drasgow et al, 2012;Guenole et al, 2018;Lee et al, 2019;Naemi et al, 2014;Usami et al, 2016;Vasilopoulos et al, 2006;Watrin et al, 2019). In one common approach, a "desirability sample" is asked to explicitly rate desirability of each item under consideration (e.g., Chernyshenko et al, 2009;Christiansen et al, 2005;Usami et al, 2016).…”

Section: Desirability Matching Proceduresmentioning

confidence: 99%

Item Desirability Matching in Forced-choice Test Construction

Pavlov¹,

Shi²,

MAYDEU-OLIVARES³

et al. 2021

Preprint

View full text Add to dashboard Cite

The forced-choice method has been proposed as a viable strategy to prevent socially desirable responding (SDR) on self-report non-cognitive measures. The ability of the method to eliminate SDR stems from matching items that are perceived as equally desirable into forced-choice item-blocks. The gold standard in quantifying similarity between items in terms of desirability has been the “mean difference index”, that is, the absolute difference between items’ mean desirability ratings. The mean difference index relies on the assumption that items have one “true” desirability value, as represented by their means, and may fail if this assumption does not hold. Instead of relying on the difference in mean ratings, we propose indexing within-rater agreement with several “robust” absolute agreement indices to appropriately quantify similarity between items in terms of desirability (i.e., “inter-item agreement”). On a set of empirically derived desirability ratings, we show that relying on the mean difference index may lead to matching items with a relatively poor agreement, that is, to suboptimal forced-choice item assembly. Implications of our findings and future research directions are presented. R code for computing the proposed agreement indices on a set of item desirability ratings is provided.

show abstract

GGUM-RANK Statement and Person Parameter Estimation With Multidimensional Forced Choice Triplets

Cited by 30 publications

References 24 publications

Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items

Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items

Though Forced, Still Valid: Psychometric Equivalence of Forced-Choice and Single-Statement Measures

Item Desirability Matching in Forced-choice Test Construction

Contact Info

Product

Resources

About