What are the issues and techniques involved in protecting the integrity of item pools in computerized testing? How can item exposure be limited? How do security issues differ in computerized testing and paper‐and‐pencil testing?
In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change, test level, test content, and item format. As a follow-up to the real data analysis, a simulation study was performed to assess the effect of item position change on equating. Results from this study indicate that item position change significantly affects change in RID. In addition, although the test construction procedures used in the investigated state seem to somewhat mitigate the impact of item position change, equating results might be impacted in testing programs where other test construction practices or equating methods are utilized.
The extensive computer simulation work done in developing the computer adaptive versions of the Graduate Record Examinations (GRE) Board General Test and the College Board Admissions Testing Program (ATP) SAT is described in this report. Both the GRE General and SAT computer adaptive tests (CATs), which are fixed length in nature, were developed from pools of items that were calibrated using the three‐parameter logistic IRT model and item selection was based on the recently developed weighted deviations algorithm (see Swanson and Stocking, 1992), which simultaneously deals with content, statistical, and other constraints in the item selection process. For the GRE General CATs (Verbal, Quantitative, and Analytical), item exposure was controlled by using an extension of an approach originally developed by Sympson and Hetter (1988). For the SAT CATs (Verbal and Mathematical), item exposure was controlled by using a less complex randomization approach. Lengths of the CATs were determined so that CAT reliabilities matched or exceeded comparable full length paper‐and‐pencil test reliabilities.
This study compared the effects of using a unidi mensional IRT model with two-dimensional data gener ated by noncompensatory and compensatory multidi mensional IRT models. Within each model, simulated datasets differed according to the degree of correlation between two vectors of θ parameters, ranging from 0 to .95. Results showed that the number-correct distri butions for each group of datasets were generally comparable, although factor analyses of tetrachoric correlations suggested that differences existed in the structure of the data from the two models. For the uni dimensional parameter estimates, it was found that the â values from the noncompensatory model appeared to be averages of the a1 and a2 values, while the â values from the compensatory model were best considered as an estimate of the sum of the a1 and a2 values. Con versely, the b values for the noncompensatory data were consistently greater than the b1 values, while the b values from the compensatory model were best con sidered as the average of the b1 and b2 values. For both models the θ estimates were most highly related to the average of the two θ parameters. However, for the noncompensatory model there was a general in crease in the strength of this relationship with in creases in ρ(θ1,θ 2). For the compensatory model, the strength of this relationship did not show a great deal of change with differences in ρ(θ1,θ 2). Index terms: Compensatory multidimensional IRT models, Item response theory, Multidimensional IRT models, Noncompensatory multidimensional IRT models, Pa rameter estimation, Violations of unidimensionality.
The TOEFL testing program is currently exploring a change in Section 3 of the TOEFL® test that would replace the vocabulary subpart with additional reading comprehension questions. This change has been proposed by internal test development specialists and is supported by external experts in the field of English as a second language. The purpose of this study was to investigate the proposed revision to Section 3 in terms of the length and timing that would be necessary to address concerns of test speededness of the section. The study was carried out using an experimental design with test length and testing time defined as independent variables, and examinee test performance defined as the dependent variable. In addition, several psychometric issues relating to the proposed revision to Section 3 were investigated as part of the study.The results of the study supported the implementation of a revised TOEFL Section 3 consisting of five reading passages with a total of 50 items. The results of the study also suggested that a total testing time of no less than 55 minutes should be allowed for the revised TOEFL Section 3. Additional psychometric analyses indicated that the current TOEFL score scale can be maintained with the revised Section 3, and that the proposed revisions will not appreciably affect the reliability and validity of Section 3 of the TOEFL test. ETS administers the TOEFLprogram under the general direction of a Policy Council that was established by, and is affiliated with, the sponsoring organizations. Members of the PolicyCouncil represent the CollegeBoard,the GRE Board,and suchinstitutions and agenciesas graduateschools of business,junior and community colleges, nonprofiteducational exchangeagencies, and agencies of the United Statesgovernment.. A continuing programof research related to the TOEFL testis carriedout underthe direction of the TOEFL Research Committee. Its six members include representatives of the Policy Council, the TOEFLCommittee of Examiners, anddistinguished Englishasa secondlanguage specialistsfromthe academiccommunity. The Committee meets twice yearly to review and approveproposalsfor testrelatedresearch and to set guidelines for the entirescopeof the TOEFLresearchprogram.Members of theResearch Committee servethree-year termsat the invitation of the PolicyCouncil;thechairof the committeeserveson the Policy Council.Because the studies are specific to the test and the testing program, most of the actual research is conducted by ETS staff rather thanby outsideresearchers. Many projectsrequire the cooperation of other institutions, however, particularly those with programs in the teachingof Englishas a foreign or second language. Representatives of such programs who are interested in participating in or conducting TOEFL-related research are invited to contact the TOEFL program office. All TOEFL researchprojectsmust undergo appropriate ETS review to ascertain that data confidentiality will be protected.Current(1994-95) members of the TOEFLResearch Committee are:ii
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.