Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Hommel, Björn Erik; F, Wollang; Kotova, Veronika; Zacher, Hannes; Schmukle, Stefan C.

doi:10.1007/s11336-021-09823-9

Cited by 20 publications

(18 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Future applications may yield even better results by retraining the GPT-2 on custom-tailored text corpora, such as psychological scale databases. Indeed, first examples of targeted fine-tuning are already producing encouraging initial results and foreshadow the minute task-specific calibration that may be accomplished in the future (Hommel et al, 2021; Howard & Ruder, 2018; von Davier, 2019), while other scholars suggest that the ever-increasing powers of coming generations of generative language models will soon remove the need and incentive for retraining as peak performance can be accomplished without requiring any further fine-tuning (Han et al, 2021; Launay et al, 2021; Yang, 2021).…”

Section: Discussionmentioning

confidence: 99%

Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development.

Götz¹,

Maertens²,

Loomba³

et al. 2024

Psychological Methods

View full text Add to dashboard Cite

Measurement is at the heart of scientific research. As many-perhaps most-psychological constructs cannot be directly observed, there is a steady demand for reliable self-report scales to assess latent constructs. However, scale development is a tedious process that requires researchers to produce good items in large quantities. In this tutorial, we introduce, explain, and apply the Psychometric Item Generator (PIG), an opensource, free-to-use, self-sufficient natural language processing algorithm that produces large-scale, humanlike, customized text output within a few mouse clicks. The PIG is based on the GPT-2, a powerful generative language model, and runs on Google Colaboratory-an interactive virtual notebook environment that executes code on state-of-the-art virtual machines at no cost. Across two demonstrations and a preregistered fivepronged empirical validation with two Canadian samples (N Sample 1 = 501, N Sample 2 = 773), we show that the PIG is equally well-suited to generate large pools of face-valid items for novel constructs (i.e., wanderlust) and create parsimonious short scales of existing constructs (i.e., Big Five personality traits) that yield strong performances when tested in the wild and benchmarked against current gold standards for assessment. The PIG does not require any prior coding skills or access to computational resources and can easily be tailored to any desired context by simply switching out short linguistic prompts in a single line of code. In short, we present an effective, novel machine learning solution to an old psychological challenge. As such, the PIG will not require you to learn a new language-but instead, speak yours.

show abstract

Section: Discussionmentioning

confidence: 99%

Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development.

Götz¹,

Maertens²,

Loomba³

et al. 2024

Psychological Methods

View full text Add to dashboard Cite

show abstract

“…Recent work in psychometrics has capitalized on the availability of transformer models to improve psychological measurement (Demszky et al 2023;Hao et al 2024;Hommel et al 2022). Continuing this line of work, Guenole, Samo, & Sun (2024) introduced the idea of pseudo-discrimination using transformer-based large language models (LLMs).…”

Section: Pseudo Factor Analysis Of Language Embedding Similarity Matr...mentioning

confidence: 99%

Pseudo Factor Analysis of Language Embedding Similarity Matrices: New Ways to Model Latent Constructs

Guenole,

D'Urso,

Samo

et al. 2024

Preprint

View full text Add to dashboard Cite

This article builds on recent work using Large Language Models (LLMs) in psychometrics and, in particular, the use of sentence transformer models to generate pseudo-discrimination parameters. Pseudo-discrimination parameters are discrimination estimates that correlate with empirical discrimination parameters without needing empirical data collection. While earlier work looked at pseudo-discrimination on an item-by-construct basis, we introduce and evaluate the use of pseudo-factor analysis. Pseudo-factor analysis is a model-based approach to generating latent construct measurement model parameters, such as the number of construct dimensions and the relations between factors and their indicators. Like pseudo-discrimination, pseudo-factor analysis does not require response data. The approach involves factor analyzing the matrix of cosine similarities amongst scale (or item) language embeddings. Across two studies that used a variety of transformer models and three encoding approaches (atomic, atomic reversed, and one-pop), pseudo-factor analyses for the NEO and HEXACO personality inventories showed theoretically expected structures and these pseudo factor structures were strongly related to their established empirical factor structures. We provide a Python Shiny application for calculating pseudo-factor analysis discrimination parameters and related psychometric estimates.

show abstract

“…For instance, earlier generation automated item generation (AIG) tools (e.g., Gierl & Haladyna, 2012) required test developers to construct item blueprints with articulated factors from prespecified cognitive models. Large language models (LLMs), part of the ML toolkit, can assist test developers generate hundreds of thousands of items quickly, at scale, and with a large degree of variability (e.g., Belzak et al., 2023; Bezirhan & von Davier, 2023; Hommel et al., 2021; Laverghetta & Licato, 2023; von Davier, 2018). These items can be administered to candidates on large‐scale standardized tests, leading to high‐dimensional datasets with large degrees of sparsity across items of various response formats.…”

Section: Figurementioning

confidence: 99%

MxML (Exploring the Relationship between Measurement and Machine Learning): Current State of the Field

Zheng,

Nydick,

Huang

et al. 2024

Educational Measurement

View full text Add to dashboard Cite

The recent surge of machine learning (ML) has impacted many disciplines, including educational and psychological measurement (hereafter shortened as measurement). The measurement literature has seen rapid growth in applications of ML to solve measurement problems. However, as we emphasize in this article, it is imperative to critically examine the potential risks associated with involving ML in measurement. The MxML project aims to explore the relationship between measurement and ML, so as to identify and address the risks and better harness the power of ML to serve measurement missions. This paper describes the first study of the MxML project, in which we summarize the state of the field of applications, extensions, and discussions about ML in measurement contexts with a systematic review of the recent 10 years’ literature. We provide a snapshot of the literature in (1) areas of measurement where ML is discussed, (2) types of articles (e.g., applications, conceptual, etc.), (3) ML methods discussed, and (4) potential risks associated with involving ML in measurement, which result from the differences between what measurement tasks need versus what ML techniques can provide.

show abstract

Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Cited by 20 publications

References 32 publications

Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development.

Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development.

Pseudo Factor Analysis of Language Embedding Similarity Matrices: New Ways to Model Latent Constructs

MxML (Exploring the Relationship between Measurement and Machine Learning): Current State of the Field

Contact Info

Product

Resources

About