Performance assessments have become popular in education and credentialing, and performance standards are common for interpreting and reporting scores. However, because of the unique characteristics of these assessments compared to multiple-choice tests (such as polytomous scoring), new and validstandard-setting methods are needed. Well-known standard-setting methods are no longer applicable. A number of promising methods for setting performance standards are described and their strengths and weaknesses are discussed. Suggestions for additional research are offered.
A study was conducted to evaluate four goodnessof-fit procedures using data simulation techniques. The procedures were evaluated using data generated according to three different item response theory models and a factor analytic model. Three different distributions of ability were used, as were three different sample sizes. It was concluded that the likelihood ratio chi-square procedure yielded the fewest erroneous rejections of the hypothesis of fit, whereas Bock's chisquare procedure yielded the fewest erroneous acceptances of fit. It was found that sample sizes somewhere between 500 and 1,000 were best. Shifts in the mean of the ability distribution were found to cause minor fluctuations, but they did not appear to be a major issue.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.