Automatic Generation of Number Series Reasoning Items of High Difficulty

Sun, Luning; Liu, Yanan; Luo, Fei

doi:10.3389/fpsyg.2019.00884

Cited by 7 publications

(4 citation statements)

References 35 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, in the literature, it is seen that there are studies in the field of medicine and mathematics in template-based automatic item generation (Colvin, 2014;Lai et al, 2016;Singley & Bennett, 2002;Sun et al, 2019). AIG should be used in psychological testing areas where cognitive models are involved, and individuals' reasoning skills should also be measured (Hommel et al, 2022;Sun et al, 2019;Yang et al, 2021) and cognitive ability items can be developed in the reasoning areas (Freund et al, 2008;Poinstingl, 2009). In the previous study, AIG was used to generate items in the field of Turkish literature.…”

Section: Discussionmentioning

confidence: 99%

Automatic item generation for online measurement and evaluation: Turkish literature items

Sayın

Gierl

2023

International Journal of Assessment Tools in Education

View full text Add to dashboard Cite

Developments in the field of education have significantly affected test development processes, and computer-based test applications have been started in many institutions. In our country, research on the application of measurement and evaluation tools in the computer environment for use with distance education is gaining momentum. A large pool of items is required for computer-based testing applications that provide significant advantages to practitioners and test takers. Preparing a large pool of items also requires more effort in terms of time, effort, and cost. To overcome this problem, automatic item generation has been widely used by bringing together item development subject matter experts and computer technology. In the present research, the steps for implementing automatic item generation are explained through an example. In the research, which was based on the fundamental research method, first a total of 2560 items were generated using computer technology and SMEs in field of Turkish literature. In the second stage, 60 randomly selected items were examined. As a result of the research, it was determined that a large item pool could be created to be used in online measurement and evaluation applications using automatic item generation.

show abstract

Section: Discussionmentioning

confidence: 99%

Automatic item generation for online measurement and evaluation: Turkish literature items

Sayın

Gierl

2023

International Journal of Assessment Tools in Education

View full text Add to dashboard Cite

show abstract

“…Whereas when the value of the βj is less than 0.20, the test item is described as extremely difficult and should be reviewed in subsequent tests. The optimal test item difficulty factor is 0.50, and it insures maximum discrimination between high and low ability [52][53][54]. To maximize item discrimination, the desired difficulty levels are slightly higher than halfway between the probability of answering correctly by chance (1.00 divided by the number of alternatives for the item) and the ideal score for the item (1.00) [55][56][57][58].…”

Section: A Item Difficultymentioning

confidence: 99%

“…Moreover, further consideration should be given to the item which was responded to better by those who generally performed poorly on the test than those who performed better on the test as a whole. The test item may be confusing in some way to top-performing respondents [52,53,58,59]. It is recommended to directly delete the item VI.…”

Section: B Item Discriminationmentioning

confidence: 99%

Examinee Characteristics and their Impact on the Psychometric Properties of a Multiple Choice Test According to the Item Response Theory (IRT)

Almaleki

2021

Eng. Technol. Appl. Sci. Res.

View full text Add to dashboard Cite

The aim of the current study is to provide improvement evaluation practices in the educational process. A multiple choice test was developed, which was based on content analysis and the test specification table covered some of the vocabulary of the applied statistics course. The test in its final form consisted of 18 items that were reviewed by specialists in the field of statistics to determine their validity. The results determine the relationship between individual responses and the student ability. Most thresholds span the negative section of the ability. Item information curves show that the items provide a good amount of information about a student with lower or moderate ability compared to a student with high ability. In terms of precision, most items were more convenient with lower ability students. The test characteristic curve was plotted according to the change in the characteristics of the examinees. The information obtained by female students appeared to be more than the information obtained by male students and the test provided more information about students who were not studying statistics in an earlier stage compared with students who did. This test clearly indicated that, based on the level of the statistics course, there should be a periodic review of the tests in line with the nature and level of the course materials in order to have a logical judgment about the level of the students’ progress at the level of their ability.

show abstract

“…The difficulty of mathematics items is important in assessing the quality of examinations and the value of educational outcomes [1]. It is generally determined by several features, including the knowledge required, the depth of thinking, the problem-solving ability, and the time constraints [2][3][4]. Understanding item difficulty has practical implications for intelligent educational applications such as knowledge tracking [5,6], automatic test item generation [7][8][9], intelligent paper generation [10] and personalized recommendations [11,12].…”

Section: Introductionmentioning

confidence: 99%

Novel Feature-Based Difficulty Prediction Method for Mathematics Items Using XGBoost-Based SHAP Model

Yi,

Sun,

2024

Mathematics

View full text Add to dashboard Cite

The level of difficulty of mathematical test items is a critical aspect for evaluating test quality and educational outcomes. Accurately predicting item difficulty during test creation is thus significantly important for producing effective test papers. This study used more than ten years of content and score data from China’s Henan Provincial College Entrance Examination in Mathematics as an evaluation criterion for test difficulty, and all data were obtained from the Henan Provincial Department of Education. Based on the framework established by the National Center for Education Statistics (NCES) for test item assessment methodology, this paper proposes a new framework containing eight features considering the uniqueness of mathematics. Next, this paper proposes an XGBoost-based SHAP model for analyzing the difficulty of mathematics tests. By coupling the XGBoost method with the SHAP method, the model not only evaluates the difficulty of mathematics tests but also analyzes the contribution of specific features to item difficulty, thereby increasing transparency and mitigating the “black box” nature of machine learning models. The model has a high prediction accuracy of 0.99 for the training set and 0.806 for the test set. With the model, we found that parameter-level features and reasoning-level features are significant factors influencing the difficulty of subjective items in the exam. In addition, we divided senior secondary mathematics knowledge into nine units based on Chinese curriculum standards and found significant differences in the distribution of the eight features across these different knowledge units, which can help teachers place different emphasis on different units during the teaching process. In summary, our proposed approach significantly improves the accuracy of item difficulty prediction, which is crucial for intelligent educational applications such as knowledge tracking, automatic test item generation, and intelligent paper generation. These results provide tools that are better aligned with and responsive to students’ learning needs, thus effectively informing educational practice.

show abstract

Automatic Generation of Number Series Reasoning Items of High Difficulty

Cited by 7 publications

References 35 publications

Automatic item generation for online measurement and evaluation: Turkish literature items

Automatic item generation for online measurement and evaluation: Turkish literature items

Examinee Characteristics and their Impact on the Psychometric Properties of a Multiple Choice Test According to the Item Response Theory (IRT)

Novel Feature-Based Difficulty Prediction Method for Mathematics Items Using XGBoost-Based SHAP Model

Contact Info

Product

Resources

About