A framework based on mixture methods is proposed for evaluating goodness of fit in the analysis of contingency tables. For a given model H applied to a contingency table P, we consider the two-point mixture P = (I -1I")H t + 1I"H 2 , with 11" the mixing proportion (0~11"~1) and HI and H 2 the tables of probabilities for each latent class or component. In the unstructured approach recommended here, the mixture model applies H to HI but does not impose any restrictions on H 2 • A contingency table P can generally be represented as such a two-point mixture for an interval of 1I"-values. We define our index of lack of fit, 11"*, to be the smallest such 11", i.e. 11"* is the fraction of the population that cannot be described by model H. This approach can be contrasted with the structured approach that applies model H to both HI and H 2 and leads to conventional latent class models when H is the hypothesis of independence. The case where H is the hypothesis of row-column independence and P is a two-way contingency table is covered in detail, but the procedure is quite general.
The paper considers general multiplicative models for complete and incomplete contingency tables that generalize log-linear and several other models and are entirely coordinate free. Sufficient conditions of the existence of maximum likelihood estimates under these models are given, and it is shown that the usual equivalence between multinomial and Poisson likelihoods holds if and only if an overall effect is present in the model. If such an effect is not assumed, the model becomes a curved exponential family and a related mixed parameterization is given that relies on non-homogeneous odds ratios. Several examples are presented to illustrate the properties and use of such models.
Models defined by a set of conditional independence restrictions play an important role in statistical theory and applications, especially, but not only, in graphical modeling. In this paper we identify a subclass of these consisting of hierarchical marginal log-linear models, as defined by Bergsma and Rudas (2002a). Such models are smooth, which implies the applicability of standard asymptotic theory and simplifies interpretation. Furthermore, we give a marginal loglinear parameterization and a minimal specification of the models in the subclass, which implies the applicability of standard methods to compute maximum likelihood estimates and simplifies the calculation of the degrees of freedom of chi-squared statistics to test goodness-offit. The utility of the results is illustrated by applying them to certain block-recursive Markov models associated with chain graphs.
The paper describes a generalized iterative proportional fitting procedure that can be used for maximum likelihood estimation in a special class of the general log‐linear model. The models in this class, called relational, apply to multivariate discrete sample spaces that do not necessarily have a Cartesian product structure and may not contain an overall effect. When applied to the cell probabilities, the models without the overall effect are curved exponential families and the values of the sufficient statistics are reproduced by the MLE only up to a constant of proportionality. The paper shows that Iterative Proportional Fitting, Generalized Iterative Scaling, and Improved Iterative Scaling fail to work for such models. The algorithm proposed here is based on iterated Bregman projections. As a by‐product, estimates of the multiplicative parameters are also obtained. An implementation of the algorithm is available as an R‐package.
It was recently demonstrated that performing median splits on both of two predictor variables could sometimes result in spurious statistical significance instead of lower power. Not only is the conventional wisdom that dichotomization always lowers power incorrect, but the current article further demonstrates that inflation of apparent effects can also occur in certain cases where only one of two predictor variables is dichotomized. In addition, we show that previously published formulas claiming that correlations are necessarily reduced by bivariate dichotomization are incorrect. While the magnitude of the difference between the correct and incorrect formulas is not great for small or moderate correlations, it is important to correct the misunderstanding of partial correlations that led to the error in the previous derivations. This is done by considering the relationship between partial correlation and conditional independence in the context of dichotomized predictor variables.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.