In behavior genetics, like many fields, researchers must decide whether their models adequately explain their data – whether their models “fit” at some satisfactory level. Well-fitting models are compelling, whereas poorly-fitting models are not (Rodgers & Rowe, 2002). Oftentimes, researchers evaluate model fit by employing “universal” rules of thumb (e.g., Hu and Bentler, 1999). However, these rules are not universal, and should be treated as model specific (Kang et al., 2016). Accordingly, we focused on developing fit criteria emulating Hu and Bentler (1999) for classic univariate models (ACE; CE; AE) by fitting simulated twin data with correctly- and incorrectly-specified models. Ideal criteria should consistently accept correct models and reject incorrect models. Classic ACE models were indistinguishable and virtually all fit indices were non-informative because (or especially when) they were obtained in saturated models. For non-ACE models, criteria were informative. Nevertheless, every fit metric employed, except TLI, differed markedly across models and/or conditions. Universal solutions remain elusive, but promising and valid approaches include nested model comparisons, increasing degrees of freedom, and ruthless skepticism.