Taking advantage of the S4 class system of the programming environment R, which facilitates the creation and maintenance of reusable and modular components, an objectoriented framework for robust multivariate analysis was developed. The framework resides in the packages robustbase and rrcov and includes an almost complete set of algorithms for computing robust multivariate location and scatter, various robust methods for principal component analysis as well as robust linear and quadratic discriminant analysis. The design of these methods follows common patterns which we call statistical design patterns in analogy to the design patterns widely used in software engineering. The application of the framework to data analysis as well as possible extensions by the development of new methods is demonstrated on examples which themselves are part of the package rrcov.
High-dimensional data often contain many variables that are irrelevant for predicting a response or for an accurate group assignment. The inclusion of such variables in a regression or classification model leads to a loss in performance, even if the contribution of the variables to the model is small. Sparse methods for regression and classification are able to suppress these variables. This is possible by adding an appropriate penalty term to the objective function of the method.An overview of recent sparse methods for regression and classification is provided. The methods are applied to several high-dimensional data sets from chemometrics. A comparison with the non-sparse counterparts allows us to acquire an insight into their performance.
[1] The discordancy measure in terms of the sample L-moment ratios (L-CV, L-skewness, L-kurtosis) of the at-site data is widely recommended in the screening process of atypical sites in the regional frequency analysis (RFA). The sample mean and the covariance matrix of the L-moments ratios, on which the discordancy measure is based, are not robust against outliers in the data, and consequently, this measure can be strongly affected by the discordant sites present in the region. We propose to replace the classical mean and covariance matrix estimates by their robust alternatives on the basis of the minimum covariance determinant estimator. The performance of the classical and robust measures for discordant sites identification is assessed in a series of Monte Carlo simulation experiments within the framework of the RFA. The simulation study shows that the robust discordant measure outperforms the classical one and is consistent with the heterogeneity measure H. Thus we recommend its use as a tool for discordant sites detection and formation of homogeneous regions in RFA.
Compositional tables -a continuous counterpart to the contingency tables -carry relative information about relationships between row and column factors; thus, for their analysis, only ratios between cells of a table are informative. Consequently, the standard Euclidean geometry should be replaced by the Aitchison geometry on the simplex that enables decomposition of the table into its independent and interactive parts. The aim of the paper is to find interpretable coordinate representation for independent and interaction tables (in sense of balances and odds ratios of cells, respectively), where further statistical processing of compositional tables can be performed.Theoretical results are applied to real-world problems from a health survey and in macroeconomics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.