Len Swanson scite author profile

¹

,

Applied Psychological Measurement

²

1993

A model for solving very large item selection problems is presented. The model builds on previous work in binary programming applied to test construction. Expert test construction practices are applied to situations in which all specifications for item selection cannot necessarily be met. A heuristic for selecting items that satisfy the constraints in the model also is presented. The heuristic is particularly useful for situations in which the size of the test construction problem exceeds the limits of current implementations of linear programming algorithms. A variety of test construction problems involving real test specifications and item data from actual test assemblies were investigated using the model and the heuristic.There has been considerable interest recently in methods of automated item selection in test construction (e.g., Ackerman). These methods, frequently implemented on microcomputers, provide several potential benefits to test specialists, including reducing labor costs and increasing the efficiency of test construction. They can also increase the consistency of successive forms of a test through the development and codification of specifications that are refined and perfected by test experts.Much of the recent interest in automated item selection methods has been stimulated by the development of large-scale computerized item banks (e.g., Boekkooi-Timminga, 1989;Hsu & Sadock, 1985). These item banks often contain very detailed statistical and content-related information about hundreds or, in many cases, thousands of test items. Computerized item banks make it possible for the test specialist to rapidly assemble representative test forms, review the psychometric characteristics and content attributes of those forms, refine the test specifications as needed, and repeat the process until a satisfactory test is assembled.Research in this area also has been stimulated by developments in item response theory (IRT; Lord, 1952Lord, , 1980. Under IRT, the test information function (TIF) is the sum of the independent item information functions for the items in a test (Lord, 1980). The additivity of item information makes it possible to construct tests to target test information functions (TTIFS), as suggested by Birnbaum (1968) andLord (1980). Test construction problems have been formulated as binary programming problems, which offers several advantages. The problems are binary in the sense that an item is or is not included in the test. This is not to be confused with binary response models, in which items are scored correct or incorrect. The methods described in this paper are independent of the way in which the items are scored. Viewing test construction as a binary programming problem allows statistical and nonstatistical test specifications to be expressed in mathematical terms, as constraints on test optimization. The APPLIED PSYCHOLOGICAL MEASUREMENT

A Method for Severely Constrained Item Selection in Adaptive Testing

¹

,

ETS Research Report Series

²

1992

Conventional tests administered using paper-and-pencil to large numbers of examinees simultaneously have been a fixture of educational testing and measurement for many years. This testing strategy represents vastly reduced unit costs over tests administered individually, which were used during the early part of this century.However, interest in restoring some of the advantages of individualized testing has never completely disappeared. Turnbull suggested investigations in this direction in 1951 and coined the phrase tailored testing to describe this mode of test administration (Lord, 1980, p. 151). Possibilities for constructing individualized tests became likely with the advent of item response theory (IRT;Lord, 1952Lord, , 1980. In the 1960s, Lord (1970Lord ( , 1971a began to explore this application of IRT by investigating various item selection strategies borrowed from the bioassay field. Later work by Lord (1977Lord ( , 1980 and Weiss (1976, 1978) laid the foundation for the application of adaptive/tailored testing as an alternative to conventional testing.Adaptive tests are tests in which items are selected to be appropriate for the examinee-the test adapts to the examinee, usually by selecting items of appropriate difficulty. Computerized adaptive testing (CAT) has received increasing attention as a practical alternative to paper-and-pencil (Zara, 1990;Zara, Bosma, & Kaplan, 1987 (see Lord, 1970(see Lord, , 1971a(see Lord, , 1971b. Such investigations eventually led to IRT -based algorithms that were fast, efficient, and psychometrically sound. A review of the most frequently used algorithms is given in Wainer et al. (1990, chap. 5) and Lord (1980, chap. 9). The fundamental philosophy underlying these algorithms is as follows:1. An item is selected on some basis and administered to the examinee. (Ward, 1988) in which there are 10 to 15 item types. The same kind of control is used in the CAT-ASVAB (Segall, 1987). This type of content control has been called a constrained CAT (c-CAT) by Kingsbury & Zara (1989).A major disadvantage of this approach is that it assumes that the item features of interest partition the item pool into mutually exclusive subsets. Given the number of item features that may be of interest to test specialists, the number of mutually exclusive partitions can become very large and the number of items in each partition can become quite small. Moreover, incorporating considerations of overlap and item sets requires further partitioning by overlap group and by set, thereby further enlarging the number of mutually exclusive partitions. Wainer & Kiely (1987) hypothesized that the use of testlets could overcome these problems. They suggested that an adaptive test be constructed from testlets by using the testlet rather than an item as the branching point. They hypothesized that this would enable test specialists to enforce constraints on intrinsic item features, overlap, and item sets in the same manner as is currently done with conventional tests.Kingsbury & Zara (1991) compared ...

A Method for Severely Constrained Item Selection in Adaptive Testing

¹

,

Applied Psychological Measurement

²

1993

Conventional tests administered using paper-and-pencil to large numbers of examinees simultaneously have been a fixture of educational testing and measurement for many years. This testing strategy represents vastly reduced unit costs over tests administered individually, which were used during the early part of this century.However, interest in restoring some of the advantages of individualized testing has never completely disappeared. Turnbull suggested investigations in this direction in 1951 and coined the phrase tailored testing to describe this mode of test administration (Lord, 1980, p. 151). Possibilities for constructing individualized tests became likely with the advent of item response theory (IRT;Lord, 1952Lord, , 1980. In the 1960s, Lord (1970Lord ( , 1971a began to explore this application of IRT by investigating various item selection strategies borrowed from the bioassay field. Later work by Lord (1977Lord ( , 1980 and Weiss (1976, 1978) laid the foundation for the application of adaptive/tailored testing as an alternative to conventional testing.Adaptive tests are tests in which items are selected to be appropriate for the examinee-the test adapts to the examinee, usually by selecting items of appropriate difficulty. Computerized adaptive testing (CAT) has received increasing attention as a practical alternative to paper-and-pencil (Zara, 1990;Zara, Bosma, & Kaplan, 1987 (see Lord, 1970(see Lord, , 1971a(see Lord, , 1971b. Such investigations eventually led to IRT -based algorithms that were fast, efficient, and psychometrically sound. A review of the most frequently used algorithms is given in Wainer et al. (1990, chap. 5) and Lord (1980, chap. 9). The fundamental philosophy underlying these algorithms is as follows:1. An item is selected on some basis and administered to the examinee. (Ward, 1988) in which there are 10 to 15 item types. The same kind of control is used in the CAT-ASVAB (Segall, 1987). This type of content control has been called a constrained CAT (c-CAT) by Kingsbury & Zara (1989).A major disadvantage of this approach is that it assumes that the item features of interest partition the item pool into mutually exclusive subsets. Given the number of item features that may be of interest to test specialists, the number of mutually exclusive partitions can become very large and the number of items in each partition can become quite small. Moreover, incorporating considerations of overlap and item sets requires further partitioning by overlap group and by set, thereby further enlarging the number of mutually exclusive partitions. Wainer & Kiely (1987) hypothesized that the use of testlets could overcome these problems. They suggested that an adaptive test be constructed from testlets by using the testlet rather than an item as the branching point. They hypothesized that this would enable test specialists to enforce constraints on intrinsic item features, overlap, and item sets in the same manner as is currently done with conventional tests.Kingsbury & Zara (1991) compared ...

Optimal Design of Item Banks for Computerized Adaptive Tests

¹

,

Applied Psychological Measurement

²

1998

Methods of optimal test assembly have served as the foundation for the development of methods for assembling adaptive tests. A model and a heuristic that facilitate the assembly of adaptive tests have been developed (Stocking & Swanson, 1993; Swanson & Stocking, 1993). Similar methods can be used to assemble item banks for adaptive testing in which optimal design seeks to simultaneously reduce item exposure to enhance item security and to increase exposure to enhance item efficiency. In this study, optimal design methods were applied to the item bank design of adaptive testing.