A Model and Heuristic For Solving Very Large Item Selection Problems

Swanson, Len; Stocking, Martha L.

doi:10.1177/014662169301700205

Cited by 138 publications

(92 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Such models have been proposed earlier, for example, to assemble tests to match a target information function (Swanson & Stocking, 1993;Theunissen, 1985;van der Linden & Boekkooi-Timminga, 1989), to assemble sets of parallel test forms (Adema, 1992;Armstrong, Jones & Wu, 1992 ; Boekkooi - Timminga, 1987Timminga, , 1990, to maximize classical test reliability (Adema & van der Linden, 1989;Armstrong, Jones & Wang, 1994), to match tests item by item van der Linden & Boekkooi-Timminga, 1988), or to implement constrained adaptive testing (van der Linden & Reese, in press). In addition, these models allow for all other test specifications typically constraining the selection of items in a testing program.…”

Section: Test Assembly Modelmentioning

confidence: 99%

Observed-score equating as a test assembly problem

Linden

Luecht

1998

Psychometrika

View full text Add to dashboard Cite

A set of linear conditions on the item response functions is derived that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly that assembles a new test form to have an observed-score distribution optimally equated to the distribution of the old form. For a well-designed item pool, use of the model results into observed-score pre-equating and prevents the necessity of post hoc equating by a conventional observed-score equating method. An empirical example illustrates the use of the model for an item pool from the Law School Admission Test (LSAT). (Contains 6 figures and 33 references.) (Author/SLD) ******************************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. *******************************************************************************

show abstract

Section: Test Assembly Modelmentioning

confidence: 99%

Observed-score equating as a test assembly problem

Linden

Luecht

1998

Psychometrika

View full text Add to dashboard Cite

show abstract

“…The weighted deviations model and heuristic of Swanson and Stocking (1993) was used to select items for the artificial tests. The WDM is similar to many models in the decision sciences and is used to select items from a pool of items in such a way as to minimize the weighted sum of deviations from constraints reflecting desirable test properties, as established by test specialists.…”

Section: Weighted Deviations Model (Wpm)mentioning

confidence: 99%

An Investigation of the Simultaneous Moderation of Average Gender and African‐american Score Differences on a Test of Mathematical Reasoning

Stocking

Jirele

Lewis

et al. 1998

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

A pool of items from operational tests of mathematical reasoning was constructed to investigate the feasibility of using automated test assembly methods to simultaneously moderate possibly irrelevant differences between the performance of women and men and African-American and White test takers. None of the artificial tests investigated exhibited substantial impact moderation, although the estimated mean scaled score differences for the relevant population indicated a modest move in the intended direction: the difference between scaled score means was reduced by about 20% for women and men and about 9%for African-American and White test takers. Although many issues in the implementation of this methodology remain to be solved, the consideration of impact in automated test assembly along with the maintenance of the detailed test plan appears to be a potential method of moderating possibly irrelevant mean test score differences.

show abstract

“…A New Methodology The foundation of this new methodology for incorporating expert test development practices in the construction of adaptive tests is the application of a weighted deviations model (WDM) and algorithm for item selection (Swanson & Stocking, 1993). This WDM and algorithm were developed in the context of conventional test assembly paradigms that have been proposed in the literature over the last 10 years.…”

mentioning

confidence: 99%

“…The weighted deviations algorithm was developed and investigated in many conventional test construction problems using real item pools (Stocking, Swanson, & Pearlman, 1991, 1993 Swanson & Stocking (1993). Swanson & Stocking (1993) (1986, p. 387) suggested that sets of items could be incorporated into a maximum information adaptive testing paradigm by using a set information function, which is the sum of the item information functions for the items comprising that set.…”

mentioning

confidence: 99%

“…Swanson & Stocking (1993) (1986, p. 387) suggested that sets of items could be incorporated into a maximum information adaptive testing paradigm by using a set information function, which is the sum of the item information functions for the items comprising that set. This approach is effective if the tests being constructed are made up entirely of item sets and the number of items to be administered from each set is known in advance.…”

mentioning

confidence: 99%

See 1 more Smart Citation

A Method for Severely Constrained Item Selection in Adaptive Testing

Stocking

Swanson

1992

ETS Research Report Series

Self Cite

View full text Add to dashboard Cite

Conventional tests administered using paper-and-pencil to large numbers of examinees simultaneously have been a fixture of educational testing and measurement for many years. This testing strategy represents vastly reduced unit costs over tests administered individually, which were used during the early part of this century.However, interest in restoring some of the advantages of individualized testing has never completely disappeared. Turnbull suggested investigations in this direction in 1951 and coined the phrase tailored testing to describe this mode of test administration (Lord, 1980, p. 151). Possibilities for constructing individualized tests became likely with the advent of item response theory (IRT;Lord, 1952Lord, , 1980. In the 1960s, Lord (1970Lord ( , 1971a began to explore this application of IRT by investigating various item selection strategies borrowed from the bioassay field. Later work by Lord (1977Lord ( , 1980 and Weiss (1976, 1978) laid the foundation for the application of adaptive/tailored testing as an alternative to conventional testing.Adaptive tests are tests in which items are selected to be appropriate for the examinee-the test adapts to the examinee, usually by selecting items of appropriate difficulty. Computerized adaptive testing (CAT) has received increasing attention as a practical alternative to paper-and-pencil (Zara, 1990;Zara, Bosma, & Kaplan, 1987 (see Lord, 1970(see Lord, , 1971a(see Lord, , 1971b. Such investigations eventually led to IRT -based algorithms that were fast, efficient, and psychometrically sound. A review of the most frequently used algorithms is given in Wainer et al. (1990, chap. 5) and Lord (1980, chap. 9). The fundamental philosophy underlying these algorithms is as follows:1. An item is selected on some basis and administered to the examinee. (Ward, 1988) in which there are 10 to 15 item types. The same kind of control is used in the CAT-ASVAB (Segall, 1987). This type of content control has been called a constrained CAT (c-CAT) by Kingsbury & Zara (1989).A major disadvantage of this approach is that it assumes that the item features of interest partition the item pool into mutually exclusive subsets. Given the number of item features that may be of interest to test specialists, the number of mutually exclusive partitions can become very large and the number of items in each partition can become quite small. Moreover, incorporating considerations of overlap and item sets requires further partitioning by overlap group and by set, thereby further enlarging the number of mutually exclusive partitions. Wainer & Kiely (1987) hypothesized that the use of testlets could overcome these problems. They suggested that an adaptive test be constructed from testlets by using the testlet rather than an item as the branching point. They hypothesized that this would enable test specialists to enforce constraints on intrinsic item features, overlap, and item sets in the same manner as is currently done with conventional tests.Kingsbury & Zara (1991) compared ...

show abstract

A Model and Heuristic For Solving Very Large Item Selection Problems

Cited by 138 publications

References 5 publications

Observed-score equating as a test assembly problem

Observed-score equating as a test assembly problem

An Investigation of the Simultaneous Moderation of Average Gender and African‐american Score Differences on a Test of Mathematical Reasoning

A Method for Severely Constrained Item Selection in Adaptive Testing

Contact Info

Product

Resources

About