“…Kane proposes four inferences: 1) Scoring, which is marked by the construction of an item in terms of its administration, ranging from the format of the test (i.e., multiple-choice-questions, skills evaluation) to the procedures planned to administer the test (i.e., training of raters, facilities needed); 2) Generalization, refers to the degree in which what is assessed (i.e., ten multiple-choice-questions based on the cardiology module) represent what should be assessed (I.e., the material of the cardiology module), this process may be aided by using a test blueprint or using reliability indices; 3) Extrapolation, is the relation between the test performance and real-world performance, this inference requires that the test theoretically re ects real-world performance (i.e., evaluate the test with content experts) or empirically (i.e., identifying the correlation between the test and workplace assessments); and 4) Implications, which measures real-world impact of the assessment using a cost-effectiveness approach. To further understand the application for Kane validity framework, a recent review conducted on the use of Automatic Item Generation (AIG) may be adequate (Falcão et al, 2022). In his seminal work, Falcão clearly delimitates scoring as the procedures uses to develop AIG, generalization as the di culty measured in the test and extrapolation as the discrimination of the items.…”