Mixtures of Gaussian factors are powerful tools for modeling an unobserved heterogeneous population, offering -at the same time -dimension reduction and model-based clustering. The high prevalence of spurious solutions and the disturbing effects of outlying observations in maximum likelihood estimation may cause biased or misleading inferences. Restrictions for the component covariances are considered in order to avoid spurious solutions, and trimming is also adopted, to provide robustness against violations of the normality assumptions of the underlying latent factors. A detailed AECM algorithm for this new approach is presented. Simulation results and an application to the AIS dataset show the aim and effectiveness of the proposed methodology.
To Professor Joseph L. Gastwirth whose creativity and fondness for indices have been matchlessly inspiring L-statistics play prominent roles in various research areas and applications, including development of robust statistical methods, measuring economic inequality and insurance risks. In many applications the score functions of L-statistics depend on parameters (e.g., distortion parameter in insurance, risk aversion parameter in econometrics), which turn the L-statistics into functions that we call L-functions. A simple example of an L-function is the Lorenz curve. Ratios of L-functions play equally important roles, with the Zenga curve being a prominent example. To illustrate real life uses of these functions/curves, we analyze a data set from the Bank of Italy year 2006 sample survey on household budgets. Naturally, empirical counterparts of the population L-functions need to be employed and, importantly, adjusted and modified in order to meaningfully capture situations well beyond those based on simple random sampling designs. In the processes of our investigations, we also introduce the L-process on which statistical inferential results about the population L-function hinges. Hence, we provide notes and references facilitating ways for deriving asymptotic properties of the L-process.
Mixtures of multivariate t distributions provide a robust parametric extension to the fitting of data with respect to normal mixtures. In presence of some noise component, potential outliers or data with longer-than-normal tails, one way to broaden the model can be provided by considering t distributions. In this framework, the degrees of freedom can act as a robustness parameter, tuning the heaviness of the tails, and downweighting the effect of the outliers on the parameters estimation. The aim of this paper is to extend to mixtures of multivariate elliptical distributions some theoretical results about the likelihood maximization on constrained parameter spaces. Further, a constrained monotone algorithm implementing maximum likelihood mixture decomposition of multivariate t distributions is proposed, to achieve improved convergence capabilities and robustness. Monte Carlo numerical simulations and a real data study illustrate the better performance of the algorithm, comparing it to earlier proposals.
In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.