Careless or inattentive responding is frequently observed in questionnaire or survey data, which jeopardizes test validity and the generalizability of research findings. It is therefore very important to detect such response behavior. The most frequently encountered type of careless response behavior is back random responding (BRR). Literature suggests that BRR is challenging to detect, with reported power of detection around .5 or lower. Change point analysis (CPA), which is a widely used statistical process control method, can be applied to item response data to detect if aberrant behavior exists in a response pattern. Existing CPA methods, however, may not be suitable for detecting BRR, because the change may not be directional. In this article we propose a weighted-residual-based CPA procedure to detect BRR behavior. The performance of the proposed method was evaluated in a comprehensive simulation study and compared against 3 existing CPA methods. Results indicated that the proposed residual-based CPA procedure can detect BRR behavior with high power for tests of 20 items or longer, while keeping the Type-I error rate well under control. Compared with the 3 existing CPA methods, it leads to comparable empirical Type-I error but a gain in power of 17%–42%. An empirical study further illustrated the utility of the proposed method to detect BRR with a real dataset. Implications of the proposed method, its limitations, as well as future directions are provided at the end.
In a cognitive diagnostic assessment (CDA), attributes refer to fine-grained knowledge points or skills. The Q-matrix is a central component of CDA, which specifies the relationship between items and attributes. Oftentimes, attributes and Q-matrix are defined by subject-matter experts, and assumed to be appropriate without any misspecifications. However, this assumption does not always hold in real applications. To address this concern, this paper proposes a residual-based statistic for validating the Q-matrix. Its performance is evaluated in a simulation study and compared against that of an existing method proposed in Liu, Xu and Ying (2012, Applied Psychological Measurement, 36, 548). Simulation results indicate that the proposed method leads to a higher recovery rate of the Q-matrix and is computationally more efficient. The advantage in computational efficiency is particularly pronounced when the number of attributes measured by the test reaches five or more. Results also suggest that the two methods have different tendencies in estimating the attribute vector for each item. In cases where the methods fail to recover the correct Q-matrix, the method in Liu et al. (2012, Applied Psychological Measurement, 36, 548) tends to overestimate the number of attributes measured by the items, whereas our method does not show that bias.
In the task of auto-building a Chinese-English semantic lexicon for translation selection, this research presents a method, which introduces WordNet similarity measures to wash out misaligned Chinese-English word pairs. Six different proposed measures of similarity based on WordNet were experimentally compared and evaluated by using WordNet and the software package WordNet::Similarity 1 . It was found that the leader measure is res and lch, then wup, lin takes the fourth place, then jcn, and random is the tailender.
Most tests are administered within an allocated time. Due to the time limit, examinees might have different trade-offs on different items. In educational testing, the traditional hierarchical model cannot adequately account for the tradeoffs between response time and accuracy. Because of this, some joint models were developed as an extension of the traditional hierarchical model based on covariance. However, they cannot directly reflect the dynamic relationship between response time and accuracy. In contrast, response moderation models took the residual response time as the independent variable of the response model. Nevertheless, the models enlarge the time effect. Alternatively, the speed-accuracy tradeoff (SAT) model is superior to other experimental models in the SAT experiment. Therefore, this paper incorporates the SAT model with the traditional hierarchical model to establish a SAT hierarchical model. The results demonstrated that the Bayesian Markov chain Monte Carlo (MCMC) algorithm performed well in the SAT hierarchical model of parameters by using simulation. Finally, the deviance information criterion (DIC) more preferred the SAT hierarchical model than other models in empirical data. This means that it is indispensable to add the effect of response time on accuracy, but likewise should limit the effect on the empirical data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.