Hyung-Ki Shin scite author profile

A multistage adaptive testing (MST) design was implemented for the Programme for the International Assessment of Adult Competencies (PIAAC) starting in 2012 for about 40 countries and has been implemented for the 2018 cycle of the Programme for International Student Assessment (PISA) for more than 80 countries. Using examples from PISA and PIAAC, this article addresses the advantages and considerations of an MST design in the context of international large‐scale assessments (ILSAs). It illustrates and discusses the unique features of the implemented designs in PISA and PIAAC and the expected gains in test efficiency and accuracy, as well as limitations and challenges of MST designs for cross‐country surveys. Practical aspects and insights into utilizing MST to measure complex constructs in cross‐cultural surveys are provided.

show abstract

A study of the wake effects on the wind characteristics and fatigue loads for the turbines in a wind farm

Kim

Shin

Joo

et al. 2015

Renewable Energy

View full text Add to dashboard Cite

Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross‐Country Comparability of Cognitive Items

Joo

Khorramdel

Yamamoto

et al. 2020

Educational Measurement

View full text Add to dashboard Cite

In Programme for International Student Assessment (PISA), item response theory (IRT) scaling is used to examine the psychometric properties of items and scales and to provide comparable test scores across participating countries and over time. To balance the comparability of IRT item parameter estimations across countries with the best possible model fit, a partial invariance approach is used in PISA. In this approach, international or common item parameters are estimated for the majority of items, while unique or country-specific item parameters are allowed for item-country combinations where a misfit to the common parameters can be identified. The goal of the current study is to establish item fit statistic thresholds for identifying such misfits. We investigated the impact of various thresholds on scale and score estimation. To evaluate the impact of various item fit thresholds, we systematically examined the number of unique item parameters and country performance distributions and compared the overall model fit statistics using data from PISA 2015 and 2018. Results showed that RMSD = .10 provides the best fitting model while still establishing stable parameter estimations and sufficient comparability across groups. The applications and implications of the results are discussed.

show abstract

Accounting for careless and insufficient effort responding in large-scale survey data—development, evaluation, and application of a screen-time-based weighting procedure

2023

View full text Add to dashboard Cite

Careless and insufficient effort responding (C/IER) poses a major threat to the quality of large-scale survey data. Traditional indicator-based procedures for its detection are limited in that they are only sensitive to specific types of C/IER behavior, such as straight lining or rapid responding, rely on arbitrary threshold settings, and do not allow taking the uncertainty of C/IER classification into account. Overcoming these limitations, we develop a two-step screen-time-based weighting procedure for computer-administered surveys. The procedure allows considering the uncertainty in C/IER identification, is agnostic towards the specific types of C/IE response patterns, and can feasibly be integrated with common analysis workflows for large-scale survey data. In Step 1, we draw on mixture modeling to identify subcomponents of log screen time distributions presumably stemming from C/IER. In Step 2, the analysis model of choice is applied to item response data, with respondents’ posterior class probabilities being employed to downweigh response patterns according to their probability of stemming from C/IER. We illustrate the approach on a sample of more than 400,000 respondents being administered 48 scales of the PISA 2018 background questionnaire. We gather supporting validity evidence by investigating relationships between C/IER proportions and screen characteristics that entail higher cognitive burden, such as screen position and text length, relating identified C/IER proportions to other indicators of C/IER as well as by investigating rank-order consistency in C/IER behavior across screens. Finally, in a re-analysis of the PISA 2018 background questionnaire data, we investigate the impact of the C/IER adjustments on country-level comparisons.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hyung-Ki Shin

Evaluating item response theory linking and model fit for data from PISA 2000–2012

Multistage Adaptive Testing Design in International Large‐Scale Assessments

A study of the wake effects on the wind characteristics and fatigue loads for the turbines in a wind farm

Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross‐Country Comparability of Cognitive Items

Accounting for careless and insufficient effort responding in large-scale survey data—development, evaluation, and application of a screen-time-based weighting procedure

Contact Info

Product

Resources

About