2016
DOI: 10.1111/jedm.12104
|View full text |Cite
|
Sign up to set email alerts
|

Monitoring Items in Real Time to Enhance CAT Security

Abstract: An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed CTT-based procedure through simulation studies. The results show that when the total number of examinees is fixed both procedures can control the rate of type I errors at a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
31
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(31 citation statements)
references
References 38 publications
0
31
0
Order By: Relevance
“…To facilitate hypothesis testing, the authors developed asymptotic properties of the item index and recommended its application to groups with 400 or more test takers. Research has also been conducted to monitor item performance in real time for computerized adaptive testing (CAT; Zhang, 2014; Zhang & Li, 2016). These methods assume that, for a given item, individual test takers’ responses come in sequentially during the CAT administration; to monitor this item’s performance in real time, they make use of moving samples of some constant size (e.g., 25 or 50) through a series of hypothesis tests.…”
Section: Introductionmentioning
confidence: 99%
“…To facilitate hypothesis testing, the authors developed asymptotic properties of the item index and recommended its application to groups with 400 or more test takers. Research has also been conducted to monitor item performance in real time for computerized adaptive testing (CAT; Zhang, 2014; Zhang & Li, 2016). These methods assume that, for a given item, individual test takers’ responses come in sequentially during the CAT administration; to monitor this item’s performance in real time, they make use of moving samples of some constant size (e.g., 25 or 50) through a series of hypothesis tests.…”
Section: Introductionmentioning
confidence: 99%
“…Few studies on the second type, detecting compromised items, have been conducted in the same context. Several studies used statistical quality control methods to detect statistical change of an item without considering subset partition, such as the sequential analysis in change-point problem by Zhang (2014) and Zhang and Li (2000) as well as the cumulative sum procedure (e.g., Montgomery, 2008) by veerkamp and Glas (2000) and Lee et al (2014). Other methods include calculating the difference between an item’s observed proportion correct and its expected value (Zhu et al, 2002) as well as summarizing the difference in an item’s proportion correct between preidentified cheater and noncheater groups (McLeod & Schnipke, 1999).…”
Section: Introductionmentioning
confidence: 99%
“…Given a lack of research on using subset partition in item-level preknowledge detection, we propose a residual-based method that compares the observed proportion correct of an S2 item with the model-implied value in the null condition, the latter of which is computed using the population ability distribution estimated from S1. In fact, the residual statistic has been used in previous studies (Zhang, 2014; Zhang & Li, 2000; Zhu et al, 2002), and we add to the existing framework by deriving the standard error of the residual statistic given the subset partition. We assume that the item parameters have been previously estimated when we conduct the evaluation, which is typically true in continuous testing programs.…”
Section: Introductionmentioning
confidence: 99%
“…Yearly increases, or long‐term trends, in test scores may be less clear to interpret because they can result from true improvement in performance or scale/item drift. To tease out one effect from the other, it may be worthwhile to conduct a special equating design to examine scale drift specifically (e.g., Petersen et al., 1983; Puhan, 2008); methods for detecting item drift (Bock, Muraki, & Pfeiffenberger, 1988; Donoghue & Isham, 1998; Guo, Robin, & Dorans, 2017; Zhang & Li, 2016) may be considered as well. If there are different assessments with similar target populations, then true improvement in performance is likely observed in more than one of these assessments.…”
Section: Discussionmentioning
confidence: 99%