2019
DOI: 10.1177/0013164419865316
|View full text |Cite
|
Sign up to set email alerts
|

Methods of Detecting Insufficient Effort Responding: Comparisons and Practical Recommendations

Abstract: Insufficient effort responding (IER) affects many forms of assessment in both educational and psychological contexts. Much research has examined different types of IER, IER’s impact on the psychometric properties of test scores, and preprocessing procedures used to detect IER. However, there is a gap in the literature in terms of practical advice for applied researchers and psychometricians when evaluating multiple sources of IER evidence, including the best strategy or combination of strategies when preproces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
124
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 59 publications
(127 citation statements)
references
References 45 publications
3
124
0
Order By: Relevance
“…When evaluating response times, a best practice is to exclude participants who complete the task in less than one or two seconds per item (Wood et al, 2017). Finally, the most effective statistical tools that can be employed include: (a) long-string index (in which participant response patterns in choosing the same response for multiple items are analyzed for frequency and length, and a threshold is developed based on the data to indicate potentially invalid responses; Hong, Steedle, & Cheng, 2020;Johnson, 2005;Maniaci & Rogge, 2014); and (b) within-session response consistency (which calculates the level of similarity in a participant's responses to items they have rated twice and excludes responses that score below 0.25; Wood et al, 2017). At least two of the aforementioned recommendations should be used to screen data (Buchanan & Scofield, 2018).…”
Section: Implementation Stagementioning
confidence: 99%
“…When evaluating response times, a best practice is to exclude participants who complete the task in less than one or two seconds per item (Wood et al, 2017). Finally, the most effective statistical tools that can be employed include: (a) long-string index (in which participant response patterns in choosing the same response for multiple items are analyzed for frequency and length, and a threshold is developed based on the data to indicate potentially invalid responses; Hong, Steedle, & Cheng, 2020;Johnson, 2005;Maniaci & Rogge, 2014); and (b) within-session response consistency (which calculates the level of similarity in a participant's responses to items they have rated twice and excludes responses that score below 0.25; Wood et al, 2017). At least two of the aforementioned recommendations should be used to screen data (Buchanan & Scofield, 2018).…”
Section: Implementation Stagementioning
confidence: 99%
“…First, the approach to defining noneffortful responses in this model requires that the probability of a correct response from noneffortful responding is irrelevant to neither the characteristics of items nor the ability levels of examinees and that noneffortful responses can be correctly identified. Although prior literature has proposed methods for identifying noneffortful responses in survey data (e.g., Hong et al, 2020; Meade & Craig, 2012), the EM-IRT model has been predominately applied to computer-administered cognitive assessments composed of multiple-choice questions, due to the availability of both keyed-responses and log file information (response time and response accuracy data).…”
Section: The Em-irt Modelmentioning
confidence: 99%
“…First, noneffortful responses (i.e., invalid item responses that are provided with disregard to the item content due to low-testing effort) are generally identified by relying on response time and/or accuracy data to detect examinee responses that do not reflect the examinees’ underlying knowledge, abilities, or skills, due to a failure to expend full effort (for a greater discussion on classifying noneffortful response using response time and/or accuracy data, see Wise, 2017). 1 Second, data are purified by either removing individual examinees found to engage in noneffortful responding (examinee-level filtering; see Hong et al, 2020) or noneffortful responses are treated as missing data and ability is essentially estimated based solely on item responses judged to be effortful (response-level filtering). Rios et al (2017) compared these two approaches and found that the former led to greater bias in ability estimates when noneffortful responding was related to examinees’ underlying ability.…”
mentioning
confidence: 99%
“…The indices in the current study involve Mahalanobis distance (MD), long string, and the person-fit index (standardized log-likelihood, or l z ). MD detects response patterns that deviate from the multivariate normal (Hong et al, 2019). The long-string approach detects responses to items of diverse content that are suspiciously similar.…”
Section: Methodsmentioning
confidence: 99%