SUMMARY The interpretation of a fitted statistical model such as the classical linear or the generalized linear model Is substantially clarified by a full partitioning of the maximized log‐likelihood ratio test statistic Into additive elements. This method generalizes the regression elements of Newton and Spurrell (1967) used to aid interpretation of regression equations. The primary elements measure the unique contribution of each explanatory variable whereas the secondary and higher order elements measure the effective balance In the observed design. It Is remarked herein that the elements correspond to the parameters of a saturated factorial model fitted to the likelihood function. This permits a coherent computational procedure. Examples are taken from some well‐analysed data sets Illustrating the interpretation of regression and log‐linear models.
SUMMARY Simulation is a standard technique for investigating the sampling distribution of parameter estimators. The bootstrap is a distribution‐free method of assessing sampling variability based on resampling from the empirical distribution; the parametric bootstrap resamples from a fitted parametric model. However, if the parameters of the model are constrained, and the application of these constraints is a function of the realized sample, then the resampling distribution obtained from the parametric bootstrap may become badly biased and overdispersed. Here we discuss such problems in the context of estimating parameters from a bilinear model that incorporates the singular value decomposition (SVD) and in which the parameters are identified by the standard orthogonality relationships of the SVD. Possible effects of the SVD parameter identification are arbitrary changes in the sign of singular vectors, inversion of the order of singular values and rotation of the plotted co‐ordinates. This paper proposes inverse transformation or ‘filtering’ techniques to avoid these problems. The ideas are illustrated by assessing the variability of the location of points in a principal co‐ordinates diagram and in the marginal sampling distribution of singular values. An application to the analysis of a biological data set is described. In the discussion it is pointed out that several exploratory multivariate methods may benefit by using resampling with filtering.
Including covariates in loglinear models of population registers improves population size estimates for two reasons. First, it is possible to take heterogeneity of inclusion probabilities over the levels of a covariate into account; and second, it allows subdivision of the estimated population by the levels of the covariates, giving insight into characteristics of individuals that are not included in any of the registers. The issue of whether or not marginalizing the full table of registers by covariates over one or more covariates leaves the estimated population size estimate invariant is intimately related to collapsibility of contingency tables [Biometrika 70 (1983) 567-578]. We show that, with information from two registers, population size invariance is equivalent to the simultaneous collapsibility of each margin consisting of one register and the covariates. We give a short path characterization of the loglinear model which describes when marginalizing over a covariate leads to different population size estimates. Covariates that are collapsible are called passive, to distinguish them from covariates that are not collapsible and are termed active. We make the case that it can be useful to include passive covariates within the estimation model, because they allow a finer description of the population in terms of these covariates. As an example we discuss the estimation of the population size of people born in the Middle East but residing in the Netherlands.
A statistical analysis of a bank's credit card database is presented. The database is a snapshot of accounts whose holders have missed a payment on a given month but who do not subsequently default. The variables on which there is information are observable measures on the account (such as profit and activity), and whether actions that are available to the bank (such as letters and telephone calls) have been taken. A primary objective for the bank is to gain insight into the effect that collections activity has on on-going account usage. A neglog transformation that highlights features that are hidden on the original scale and improves the joint distribution of the covariates is introduced. Quantile regression, a novel methodology to the credit scoring industry, is used as it is relatively assumption free, and it is suspected that different relationships may be manifest in different parts of the response distribution. The large size is handled by selecting relatively small subsamples for training and then building empirical distributions from repeated samples for validation. In the application to the database of clients who have missed a single payment a substantive finding is that the predictor of the median of the target variable contains different variables from those of the predictor of the 30% quantile. This suggests that different mechanisms may be at play in different parts of the distribution. Copyright 2005 Royal Statistical Society.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.