Monitoring Items in Real Time to Enhance CAT Security

Zhang, Jinming; Li, Jie

doi:10.1111/jedm.12104

Cited by 15 publications

(31 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To facilitate hypothesis testing, the authors developed asymptotic properties of the item index and recommended its application to groups with 400 or more test takers. Research has also been conducted to monitor item performance in real time for computerized adaptive testing (CAT; Zhang, 2014; Zhang & Li, 2016). These methods assume that, for a given item, individual test takers’ responses come in sequentially during the CAT administration; to monitor this item’s performance in real time, they make use of moving samples of some constant size (e.g., 25 or 50) through a series of hypothesis tests.…”

Section: Introductionmentioning

confidence: 99%

Monitoring Item Performance With CUSUM Statistics in Continuous Testing

Lee¹,

Lewis

2021

Journal of Educational and Behavioral Statistics

View full text Add to dashboard Cite

In many educational assessments, items are reused in different administrations throughout the life of the assessments. Ideally, a reused item should perform relatively similarly over time. In reality, an item may become easier with exposure, especially when item preknowledge has occurred. This article presents a novel cumulative sum procedure for detecting item preknowledge in continuous testing where data for each reused item may be obtained from small and varying sample sizes across administrations. Its performance is evaluated with simulations and analytical work. The approach is effective in detecting item preknowledge quickly with group size at least 10 and is easy to implement with varying item parameters. In addition, it is robust to the ability estimation error introduced in the simulations.

show abstract

Section: Introductionmentioning

confidence: 99%

Monitoring Item Performance With CUSUM Statistics in Continuous Testing

Lee¹,

Lewis

2021

Journal of Educational and Behavioral Statistics

View full text Add to dashboard Cite

show abstract

“…Few studies on the second type, detecting compromised items, have been conducted in the same context. Several studies used statistical quality control methods to detect statistical change of an item without considering subset partition, such as the sequential analysis in change-point problem by Zhang (2014) and Zhang and Li (2000) as well as the cumulative sum procedure (e.g., Montgomery, 2008) by veerkamp and Glas (2000) and Lee et al (2014). Other methods include calculating the difference between an item’s observed proportion correct and its expected value (Zhu et al, 2002) as well as summarizing the difference in an item’s proportion correct between preidentified cheater and noncheater groups (McLeod & Schnipke, 1999).…”

Section: Introductionmentioning

confidence: 99%

“…Given a lack of research on using subset partition in item-level preknowledge detection, we propose a residual-based method that compares the observed proportion correct of an S2 item with the model-implied value in the null condition, the latter of which is computed using the population ability distribution estimated from S1. In fact, the residual statistic has been used in previous studies (Zhang, 2014; Zhang & Li, 2000; Zhu et al, 2002), and we add to the existing framework by deriving the standard error of the residual statistic given the subset partition. We assume that the item parameters have been previously estimated when we conduct the evaluation, which is typically true in continuous testing programs.…”

Section: Introductionmentioning

confidence: 99%

Detecting Compromised Items Using Information From Secure Items

Wang

Liu

2020

Journal of Educational and Behavioral Statistics

View full text Add to dashboard Cite

In continuous testing programs, some items are repeatedly used across test administrations, and statistical methods are often used to evaluate whether items become compromised due to examinees’ preknowledge. In this study, we proposed a residual method to detect compromised items when a test can be partitioned into two subsets of items: secure items and possibly compromised items. We derived the standard error of the residual statistic by taking the sampling error in both ability and item parameter estimate into account. The simulation results suggest that the Type I error is close to the nominal level when both sources of error are adjusted, and item parameter error can be ignored only when the item calibration sample size is much larger than the evaluation sample size. We also investigated the performance of the residual method when not using information from secure items in both simulation and real data analyses.

show abstract

“…Yearly increases, or long‐term trends, in test scores may be less clear to interpret because they can result from true improvement in performance or scale/item drift. To tease out one effect from the other, it may be worthwhile to conduct a special equating design to examine scale drift specifically (e.g., Petersen et al., 1983; Puhan, 2008); methods for detecting item drift (Bock, Muraki, & Pfeiffenberger, 1988; Donoghue & Isham, 1998; Guo, Robin, & Dorans, 2017; Zhang & Li, 2016) may be considered as well. If there are different assessments with similar target populations, then true improvement in performance is likely observed in more than one of these assessments.…”

Section: Discussionmentioning

confidence: 99%

Studying Score Stability with a Harmonic Regression Family: A Comparison of Three Approaches to Adjustment of Examinee‐Specific Demographic Data

Lee

Haberman

2020

J Educational Measurement

View full text Add to dashboard Cite

For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be unforeseen. The situation is more challenging for assessments that assemble many different forms and deliver frequent administrations per year. Harmonic regression, a seasonal‐adjustment method, has been found useful in achieving the goal of differentiating between possible known sources of variability and unknown sources so as to study score stability for such assessments. As an extension, this paper presents a family of three approaches that incorporate examinees' demographic data into harmonic regression in different ways. A generic evaluation method based on jackknifing is developed to compare the approaches within the family. The three approaches are compared using real data from an international language assessment. Results suggest that all approaches perform similarly and are effective in meeting the goal. The paper also discusses the properties and limitations of the three approaches, along with inferences about score (in)stability based on the harmonic regression results.

show abstract

Monitoring Items in Real Time to Enhance CAT Security

Cited by 15 publications

References 38 publications

Monitoring Item Performance With CUSUM Statistics in Continuous Testing

Monitoring Item Performance With CUSUM Statistics in Continuous Testing

Detecting Compromised Items Using Information From Secure Items

Studying Score Stability with a Harmonic Regression Family: A Comparison of Three Approaches to Adjustment of Examinee‐Specific Demographic Data

Contact Info

Product

Resources

About