Statistics used to detect differential item functioning can also reflect differential strengths and weaknesses in the performance characteristics of population subgroups. In turn, item features associated with the differential performance patterns are likely to reflect some facet of the item task and hence its difficulty, that might previously have been overlooked. In this study, several item features were identified and coded for a large number of reading comprehension items from the two admissions testing programs. Item features included subject matter content, various properties of item structure, cognitive demand indicators, and semantic content (propositional analysis). Differential item functioning was evaluated for males and females and for White and Black examinees. Results showed a number of significant relationships between item features and indicators of differential item functioning—many of which were consistent across testing programs. Implications of the results for related areas of research are discussed.
Better understanding of sources of difficulty in test items would improve the test development process by bringing the functioning of items more under the control of the test developer. To help increase this understanding, a study was undertaken to evaluate the effects of various aspects of prose complexity on the difficulty of achievement test items. The items of interest were those that presented a verbal stimulus followed by a question about the stimulus and a standard set of multiple‐choice options. Items were selected for study from two tests with differing demands on an examinee's knowledge base, NTE Communications Skills and GRE Subject Test in Psychology. Standard multiple regression analyses and Embretson's model fitting procedures were used to evaluate the contribution of various complexity factors to the prediction of difficulty. These factors, which included measures of item structure, readability, semantic content, cognitive demand and knowledge demand, were found to be successful in predicting item difficulty for these items. The immediate usefulness of the results for test development practice, however, are limited by the fact that only a single item type was studied and by the time required to develop the complexity measures.
As part of the development of the academic skills portion of The Praxis Series: Professional Assessments for Beginning Teachers™ (the successor to the NTE examinations), a sample of undergraduate students each wrote at least two 50‐minute essays. For each essay, the topic was selected by the examinee from a pair of topics. Before writing, all students indicated their preferences for each of 20 possible topics from which their topics would be drawn. Subsequently, comparisons were made between performance on examinees' high and low preferred topics. The relationship of selected variables (e.g., undergraduate grades and admissions test scores) to performance on each kind of topic was also compared. Student preferences for different topics varied considerably across topics, and topics that were most preferred by some examinees were often least preferred by others. Preferences, however, exhibited little if any relationship to essay scores. Finally, scores based on high and low preferred topics had similar patterns of correlations with other variables.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.