Item response theory (IRT) has become a popular methodological framework for modeling response data from assessments in education and health; however, its use is not widespread among psychologists. This paper aims to provide a didactic application of IRT and to highlight some of these advantages for psychological test development. IRT was applied to two scales (a positive and a negative affect scale) of a self-report test. Respondents were 853 university students (57 % women) between the ages of 17 and 35 and who answered the scales. IRT analyses revealed that the positive affect scale has items with moderate discrimination and are measuring respondents below the average score more effectively. The negative affect scale also presented items with moderate discrimination and are evaluating respondents across the trait continuum; however, with much less precision. Some features of IRT are used to show how such results can improve the measurement of the scales. The authors illustrate and emphasize how knowledge of the features of IRT may allow test makers to refine and increase the validity and reliability of other psychological measures.
This inquiry is an investigation of item response theory (IRT) proficiency estimators’ accuracy under multistage testing (MST). We chose a two‐stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two‐stage MST panels (i.e., forms) by manipulating two assembly conditions in each module, such as difficulty level and module length. For each panel, we investigated the accuracy of examinees’ proficiency levels derived from seven IRT proficiency estimators. The choice of Bayesian (prior) versus non‐Bayesian (no prior) estimators was of more practical significance than the choice of number‐correct versus item‐pattern scoring estimators. The Bayesian estimators were slightly more efficient than the non‐Bayesian estimators, resulting in smaller overall error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for low‐ and high‐performing examinees.
The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2‐stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths (low, middle, and high). When creating 2‐stage MST panels (i.e., forms), we manipulated 2 assembly conditions in each module, such as difficulty level and module length, to see if any interaction existed between IRT estimation methods and MST panel designs. For each panel, we compared the accuracy of examinees' proficiency levels derived from 7 IRT proficiency estimators. We found that the choice of Bayesian (prior) and non‐Bayesian (no prior) estimators was of more practical significance than the choice of number‐correct versus item‐pattern scoring. For the extreme proficiency levels, the decrease in standard error compensated for the increase in bias in the Bayesian estimates, resulting in smaller total error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for the extreme proficiency level examinees. The impact of misrouting at Stage 1 was minimal under the MST design used in this study.
This study assessed the factor structure of the Test of English for International Communication (TOEIC®) Listening and Reading test, and its invariance across subgroups of test-takers. The subgroups were defined by (a) gender, (b) age, (c) employment status, (d) time spent studying English, and (e) having lived in a country where English is the main language. The study results indicated that a correlated two-factor model corresponding to the two language abilities of listening and reading best accounted for the factor structure of the test. In addition, the underlying construct had the same structure across the test-taker subgroups studied. There were, however, significant differences in the means of the latent construct across the subgroups. This study provides empirical support for the current score reporting practice for the TOEIC test, suggests that the test scores have the same meaning across studied test-taker subgroups, and identifies possible test-taker background characteristics that affect English language abilities as measured by the TOEIC test.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.