Methods of Detecting Insufficient Effort Responding: Comparisons and Practical Recommendations

Hong, Maxwell; Steedle, Jeffrey T.; Cheng, Ying

doi:10.1177/0013164419865316

Cited by 59 publications

(127 citation statements)

References 45 publications

Supporting

Mentioning

124

Contrasting

Order By: Relevance

“…When evaluating response times, a best practice is to exclude participants who complete the task in less than one or two seconds per item (Wood et al, 2017). Finally, the most effective statistical tools that can be employed include: (a) long-string index (in which participant response patterns in choosing the same response for multiple items are analyzed for frequency and length, and a threshold is developed based on the data to indicate potentially invalid responses; Hong, Steedle, & Cheng, 2020;Johnson, 2005;Maniaci & Rogge, 2014); and (b) within-session response consistency (which calculates the level of similarity in a participant's responses to items they have rated twice and excludes responses that score below 0.25; Wood et al, 2017). At least two of the aforementioned recommendations should be used to screen data (Buchanan & Scofield, 2018).…”

Section: Implementation Stagementioning

confidence: 99%

MTurk Research: Review and Recommendations

Aguinis

Villamor

Ramani

2020

Journal of Management

579

420

View full text Add to dashboard Cite

The use of Amazon’s Mechanical Turk (MTurk) in management research has increased over 2,117% in recent years, from 6 papers in 2012 to 133 in 2019. Among scholars, though, there is a mixture of excitement about the practical and logistical benefits of using MTurk and skepticism about the validity of the data. Given that the practice is rapidly increasing but scholarly opinions diverge, the Journal of Management commissioned this review and consideration of best practices. We hope the recommendations provided here will serve as a catalyst for more robust, reproducible, and trustworthy MTurk-based research in management and related fields.

show abstract

Section: Implementation Stagementioning

confidence: 99%

MTurk Research: Review and Recommendations

Aguinis

Villamor

Ramani

2020

Journal of Management

579

420

View full text Add to dashboard Cite

show abstract

“…First, the approach to defining noneffortful responses in this model requires that the probability of a correct response from noneffortful responding is irrelevant to neither the characteristics of items nor the ability levels of examinees and that noneffortful responses can be correctly identified. Although prior literature has proposed methods for identifying noneffortful responses in survey data (e.g., Hong et al, 2020; Meade & Craig, 2012), the EM-IRT model has been predominately applied to computer-administered cognitive assessments composed of multiple-choice questions, due to the availability of both keyed-responses and log file information (response time and response accuracy data).…”

Section: The Em-irt Modelmentioning

confidence: 99%

“…First, noneffortful responses (i.e., invalid item responses that are provided with disregard to the item content due to low-testing effort) are generally identified by relying on response time and/or accuracy data to detect examinee responses that do not reflect the examinees’ underlying knowledge, abilities, or skills, due to a failure to expend full effort (for a greater discussion on classifying noneffortful response using response time and/or accuracy data, see Wise, 2017). 1 Second, data are purified by either removing individual examinees found to engage in noneffortful responding (examinee-level filtering; see Hong et al, 2020) or noneffortful responses are treated as missing data and ability is essentially estimated based solely on item responses judged to be effortful (response-level filtering). Rios et al (2017) compared these two approaches and found that the former led to greater bias in ability estimates when noneffortful responding was related to examinees’ underlying ability.…”

mentioning

confidence: 99%

Parameter Estimation Accuracy of the Effort-Moderated Item Response Theory Model Under Multiple Assumption Violations

Rios

Soland

2020

Educational and Psychological Measurement

View full text Add to dashboard Cite

As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.

show abstract

“…The indices in the current study involve Mahalanobis distance (MD), long string, and the person-fit index (standardized log-likelihood, or l z ). MD detects response patterns that deviate from the multivariate normal (Hong et al, 2019). The long-string approach detects responses to items of diverse content that are suspiciously similar.…”

Section: Methodsmentioning

confidence: 99%

Bifactor Model Is Not the Best-Fitting Model for Self-Esteem: Investigation With a Novel Technique

Kam

2020

Assessment

View full text Add to dashboard Cite

In the measurement of self-esteem, previous research assumes that all respondents are qualitatively similar. The assumption has not been adequately tested. The current study examines its validity using factor mixture modeling. Results reveal two qualitatively distinct classes: the first provides more consistent responses to positive self-esteem items than the second. The correlations between positive and negative self-esteem suggest that self-esteem is essentially unidimensional in the first class but bidimensional in the second. Furthermore, those with high self-esteem are more likely to belong to the first class; those with low self-esteem are more likely to belong to the second class. The observed dimensionality of self-esteem depends on a person’s level on the trait. Finally, we found that the two-class solution fits the data much better than a simple one-class, two-factor solution or a bifactor solution. Psychometric researchers should no longer ignore the possible existence of qualitatively distinct groups in an underlying population. We include M plus syntax together with a detailed explanation for researchers to conduct similar investigations on constructs of interest.

show abstract

Methods of Detecting Insufficient Effort Responding: Comparisons and Practical Recommendations

Cited by 59 publications

References 45 publications

MTurk Research: Review and Recommendations

MTurk Research: Review and Recommendations

Parameter Estimation Accuracy of the Effort-Moderated Item Response Theory Model Under Multiple Assumption Violations

Bifactor Model Is Not the Best-Fitting Model for Self-Esteem: Investigation With a Novel Technique

Contact Info

Product

Resources

About