A multistage adaptive testing (MST) design was implemented for the Programme for the International Assessment of Adult Competencies (PIAAC) starting in 2012 for about 40 countries and has been implemented for the 2018 cycle of the Programme for International Student Assessment (PISA) for more than 80 countries. Using examples from PISA and PIAAC, this article addresses the advantages and considerations of an MST design in the context of international large‐scale assessments (ILSAs). It illustrates and discusses the unique features of the implemented designs in PISA and PIAAC and the expected gains in test efficiency and accuracy, as well as limitations and challenges of MST designs for cross‐country surveys. Practical aspects and insights into utilizing MST to measure complex constructs in cross‐cultural surveys are provided.
In Programme for International Student Assessment (PISA), item response theory (IRT) scaling is used to examine the psychometric properties of items and scales and to provide comparable test scores across participating countries and over time. To balance the comparability of IRT item parameter estimations across countries with the best possible model fit, a partial invariance approach is used in PISA. In this approach, international or common item parameters are estimated for the majority of items, while unique or country-specific item parameters are allowed for item-country combinations where a misfit to the common parameters can be identified. The goal of the current study is to establish item fit statistic thresholds for identifying such misfits. We investigated the impact of various thresholds on scale and score estimation. To evaluate the impact of various item fit thresholds, we systematically examined the number of unique item parameters and country performance distributions and compared the overall model fit statistics using data from PISA 2015 and 2018. Results showed that RMSD = .10 provides the best fitting model while still establishing stable parameter estimations and sufficient comparability across groups. The applications and implications of the results are discussed.
Careless and insufficient effort responding (C/IER) poses a major threat to the quality of large-scale survey data. Traditional indicator-based procedures for its detection are limited in that they are only sensitive to specific types of C/IER behavior, such as straight lining or rapid responding, rely on arbitrary threshold settings, and do not allow taking the uncertainty of C/IER classification into account. Overcoming these limitations, we develop a two-step screen-time-based weighting procedure for computer-administered surveys. The procedure allows considering the uncertainty in C/IER identification, is agnostic towards the specific types of C/IE response patterns, and can feasibly be integrated with common analysis workflows for large-scale survey data. In Step 1, we draw on mixture modeling to identify subcomponents of log screen time distributions presumably stemming from C/IER. In Step 2, the analysis model of choice is applied to item response data, with respondents’ posterior class probabilities being employed to downweigh response patterns according to their probability of stemming from C/IER. We illustrate the approach on a sample of more than 400,000 respondents being administered 48 scales of the PISA 2018 background questionnaire. We gather supporting validity evidence by investigating relationships between C/IER proportions and screen characteristics that entail higher cognitive burden, such as screen position and text length, relating identified C/IER proportions to other indicators of C/IER as well as by investigating rank-order consistency in C/IER behavior across screens. Finally, in a re-analysis of the PISA 2018 background questionnaire data, we investigate the impact of the C/IER adjustments on country-level comparisons.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.