The interactive reading task: Transformer-based automatic item generation

Attali, Yigal; Runge, Andrew; LaFlair, Geoffrey T.; Yancey, Kevin P.; Goodwin, Sarah; Park, Yena; Davier, Alina A. von

doi:10.3389/frai.2022.903077

Cited by 21 publications

(21 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As an example, Figure 1 describes the automated content generation process with human‐in‐the‐loop used in the Duolingo English Test (Attali et al., 2022). Human experts stay in the loop of construct definition, task design, item generation, and refinement to review item quality, fairness, and bias.…”

Section: Llm Generative Ai and Human‐centered Aimentioning

confidence: 99%

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Hao,

von Davier,

Yaneva

et al. 2024

Educational Measurement

Self Cite

View full text Add to dashboard Cite

The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting‐edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely, these innovations raise significant concerns regarding validity, reliability, transparency, fairness, equity, and test security, necessitating careful thinking when applying them in assessments. In this article, we discuss the impacts and implications of LLMs and generative AI on critical dimensions of assessment with example use cases and call for a community effort to equip assessment professionals with the needed AI literacy to harness the potential effectively.

show abstract

Section: Llm Generative Ai and Human‐centered Aimentioning

confidence: 99%

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Hao,

von Davier,

Yaneva

et al. 2024

Educational Measurement

Self Cite

View full text Add to dashboard Cite

show abstract

“…Similarly, Attali et al. (2022) used GPT‐3 to create interactive reading passages involving human reviewers. Bezirhan and von Davier (2023) also sought expert opinions to assess the quality of the texts generated with GPT.…”

Section: Text Analysis Cognitive Modelmentioning

confidence: 99%

“…Von Davier (2018) carried out a study examining automatically generated items through an online survey and used experts' opinions for the validation. Similarly, Attali et al (2022) used GPT-3 to create interactive reading passages involving human reviewers. Bezirhan and von Davier (2023) also sought expert opinions to assess the quality of the texts generated with GPT.…”

Section: Text Analysis Cognitive Modelmentioning

confidence: 99%

Using OpenAI GPT to Generate Reading Comprehension Items

Sayin,

Gierl

2024

Educational Measurement

View full text Add to dashboard Cite

The purpose of this study is to introduce and evaluate a method for generating reading comprehension items using template‐based automatic item generation. To begin, we describe a new model for generating reading comprehension items called the text analysis cognitive model assessing inferential skills across different reading passages. Next, the text analysis cognitive model is used to generate reading comprehension items where examinees are required to read a passage and identify the irrelevant sentence. The sentences for the generated passages were created using OpenAI GPT‐3.5. Finally, the quality of the generated items was evaluated. The generated items were reviewed by three subject‐matter experts. The generated items were also administered to a sample of 1,607 Grade‐8 students. The correct options for the generated items produced a similar level of difficulty and yielded strong discrimination power while the incorrect options served as effective distractors. Implications of augmented intelligence for item development are discussed.

show abstract

“…This issue is, for example, illustrated by Shi and Aryadoust’s (2022) systematic review of automated writing evaluation (AWE) systems, which concluded that domain definition inferences are underrepresented in AWE research. Similarly, even the most sophisticated automatic item generation (AIG) systems (e.g., Attali et al, 2022), while making great advances, are currently still trained on more “traditional,” “conventional” language use, and restricted in the kind of tasks (input and questions) they can produce (as well as continuing to require considerable human reviewing). Thus, a challenge for future work on technology in language testing and assessment will be to reflect the progress made in (applied) linguistics regarding the nature of language and language use in society, and to prioritize construct operationalization insights, in order to avoid restricting domain representation through technology-mediated testing.…”

Section: Looking Aheadmentioning

confidence: 99%

Future challenges and opportunities in language testing and assessment: Basic questions and principles at the forefront

Brunfaut

2023

Language Testing

View full text Add to dashboard Cite

In this invited Viewpoint on the occasion of the 40th anniversary of the journal Language Testing, I argue that at the core of future challenges and opportunities for the field—both in scholarly and operational respects—remain basic questions and principles in language testing and assessment. Despite the high levels of sophistication of issues looked into, and methodological and operational solutions found, outstanding concerns still amount to: what are we testing, how are we testing, and why are we testing? Guided by these questions, I call for more thorough and adequate language use domain definitions (and a suitable broadening of research and testing methodologies to determine these), more comprehensive operationalizations of these domain definitions (especially in the context of technology in language testing), and deeper considerations of test purposes/uses and of their connections with domain definitions. To achieve this, I maintain that the field needs to continue investing in the topics of validation, ethics, and language assessment literacy, and engaging with broader fields of enquiry such as (applied) linguistics. I also encourage a more synthetic look at the existing knowledge base in order to build on this, and further diversification of voices in language testing and assessment research and practice.

show abstract

The interactive reading task: Transformer-based automatic item generation

Cited by 21 publications

References 43 publications

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Using OpenAI GPT to Generate Reading Comprehension Items

Future challenges and opportunities in language testing and assessment: Basic questions and principles at the forefront

Contact Info

Product

Resources

About