refers to a class of mathematical models relating individual differences on one or more latent variables to the probability of responding to a scale item in a specific response category. A response of 3 on a 5-point personality item, a correct answer on a multiple-choice item, and a clinician' s rating of an adolescent' s anxiety are all item responses that can potentially be related (probabilistically) to a latent variable. IRT models, which focus on characterizing how individual differences on a latent variable interact with item properties to produce a response, contrast sharply with classical test theory (Lord & Novick, 1968) procedures, which focus on understanding the statistical properties of a composite scale score (e.g., estimating reliability of a test score).The development of IRT models and associated methods (Birnbaum, 1968;Lord, 1952;Thurstone, 1925) was originally motivated by applied problems in large-scale, multiple-choice aptitude testing (e.g., how to efficiently administer different test items to individuals but still compare them on the same scale, how to link different sets of items measuring the same construct onto the same scale). However, applications of IRT models