The use of automated systems in second-language learning could substantially reduce the workload of human teachers and test creators. This study proposes a novel method for automatically generating distractors for multiple-choice English vocabulary questions. The proposed method introduces new sources for collecting distractor candidates and utilises semantic similarity and collocation information when ranking the collected candidates. We evaluated the proposed method by administering the questions to real English learners. We further asked an expert to judge the quality of the distractors generated by the proposed method, a baseline method and humans. The results show that the proposed method produces fewer problematic distractors than the baseline method. Furthermore, the generated distractors have a quality that is comparable with that of human-made distractors.
The present study investigates the best factor for controlling the item difficulty of multiple-choice English vocabulary questions generated by an automatic question generation system. Three factors are considered for controlling item difficulty: (1) reading passage difficulty, (2) semantic similarity between the correct answer and distractors, and (3) the distractor word difficulty level. An experiment was conducted by administering machine-generated items to three groups of English learners. The groups were determined based on their standardised English test scores. In total, 120 items, generated using combinations of the above three factors, were tested. The results reveal that the distractor word difficulty level had the greatest impact on item difficulty, but this tendency changed depending on the proficiency of the test takers. These results will be of use when implementing a fully automatic system for administrating tests.
This paper describes details of the evaluation experiments for questions created by an automatic question generation system. Given a target word and one of its word senses, the system generates a multiple-choice English vocabulary question asking for the closest in meaning to the target word in the reading passage. Two kinds of evaluation were conducted considering two aspects: (1) measuring English learners' proficiency and (2) their similarity to the human-made questions. The first evaluation is based on the responses from English learners obtained through administering the machine-generated and human-made questions to them, and the second is based on the subjective judgement by English teachers. Both evaluations showed that the machine-generated questions were able to achieve a comparable level with the human-made questions in both measuring English proficiency and similarity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.