Writing a high-quality, multiple-choice test item is a complex process. Creating plausible but incorrect options for each item poses significant challenges for the content specialist because this task is often undertaken without implementing a systematic method. In the current study, we describe and demonstrate a systematic method for creating plausible but incorrect options, also called distractors, based on students’ misconceptions. These misconceptions are extracted from the labeled written responses. One thousand five hundred and fifteen written responses from an existing constructed-response item in Biology from Grade 10 students were used to demonstrate the method. Using a topic modeling procedure commonly used with machine learning and natural language processing called latent dirichlet allocation, 22 plausible misconceptions from students’ written responses were identified and used to produce a list of plausible distractors based on students’ responses. These distractors, in turn, were used as part of new multiple-choice items. Implications for item development are discussed.
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness and the performance of two AES frameworks, each based on machine learning with deep language features, or complex language features, and deep neural algorithms. More specifically, support vector machines (SVMs) in conjunction with Coh-Metrix features were used for a traditional AES model development, and the convolutional neural networks (CNNs) approach was used for more contemporary deep-neural model development. Then, the strengths and weaknesses of the traditional and contemporary models under different circumstances (e.g., types of the rubric, length of the essay, and the essay type) were tested. The results were evaluated using the quadratic weighted kappa (QWK) score and compared with the agreement between the human raters. The results indicated that the CNNs model performs better, meaning that it produced more comparable results to the human raters than the Coh-Metrix + SVMs model. Moreover, the CNNs model also achieved state-of-the-art performance in most of the essay sets with a high average QWK score.
The introduction of computerized formative assessments in the classroom has opened a new area of effective progress monitoring with more accessible test administrations. With computerized formative assessments, all students could be tested at the same time and with the same number of test administrations within a school year. Alternatively, the decision for the number and frequency of such tests could be made by teachers based on their observations and personal judgments about students. However, this often results in rigid test scheduling that fails to take into account the pace at which students acquire knowledge. To administer computerized formative assessments efficiently, teachers should be provided with systematic guidance regarding effective test scheduling based on each student's level of progress. In this study, we introduce an intelligent recommendation system that can gauge the optimal number and timing of testing for each student. We discuss how to build an intelligent recommendation system using a reinforcement learning approach. Then, we present a case study with a large sample of students' test results in a computerized formative assessment. We show that the intelligent recommendation system can significantly reduce the number of testing for the students by eliminating unnecessary test administrations where students do not show significant progress (i.e., growth). Also, the proposed recommendation system is capable of identifying the optimal test time for students to demonstrate adequate progress from one test administration to another. Implications for future research on personalized assessment scheduling are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.