The likelihood ratio test statistic G 2 (dif) is widely used for comparing the fit of nested models in categorical data analysis. In large samples, this statistic is distributed as a chi-square with degrees of freedom equal to the difference in degrees of freedom between the tested models, but only if the least restrictive model is correctly specified. Yet, this statistic is often used in applications without assessing the adequacy of the least restrictive model. This may result in incorrect substantive conclusions as the above large sample reference distribution for G 2 (dif) is no longer appropriate. Rather, its large sample distribution will depend on the degree of model misspecification of the least restrictive model. To illustrate this, a simulation study is performed where this statistic is used to compare nested item response theory models under various degrees of misspecification of the least restrictive model. G 2 (dif) was found to be robust only under small model misspecification of the least restrictive model. Consequently, we argue that some indication of the absolute goodness of fit of the least restrictive model is needed before employing G 2 (dif) to assess relative model fit.The two most widely used statistics for assessing the goodness of fit of a model fitted to a contingency table are Pearson's x 2 statistic and the likelihood ratio statistic G 2 . Under the null hypotheses that the tested model holds in the popula- MULTIVARIATE BEHAVIORAL RESEARCH, 41(1),[55][56][57][58][59][60][61][62][63][64]