Francisco J. Prieto scite author profile

In this article, we present a simple multivariate outlier-detection procedure and a robust estimator for the covariance matrix, based on the use of information obtained from projections onto the directions that maximize and minimize the kurtosis coef cient of the projected data. The properties of this estimator (computationa l cost, bias) are analyzed and compared with those of other robust estimators described in the literature through simulation studies. The performance of the outlier-detection procedure is analyzed by applying it to a set of well-known examples.KEY WORDS: Kurtosis; Linear projection; Multivariate statistics.The detection of outliers in multivariate data is recognized to be an important and dif cult problem in the physical, chemical, and engineering sciences. Whenever multiple measurements are obtained, there is always the possibility that changes in the measurement process will generate clusters of outliers. Most standard multivariate analysis techniques rely on the assumption of normality and require the use of estimates for both the location and scale parameters of the distribution. The presence of outliers may distort arbitrarily the values of these estimators and render meaningless the results of the application of these techniques. According to Rocke and Woodruff (1996), the problem of the joint estimation of location and shape is one of the most dif cult in robust statistics. Wilks (1963) proposed identifying sets of outliers of size j in normal multivariate data by checking the minimum values of the ratios -A 4I 5 -=-A-, where -A 4I 5 -is the internal scatter of a modi ed sample in which the set of observations I of size j has been deleted and -A-is the internal scatter of the complete sample. The internal scatter is proportional to the determinant of the covariance matrix and the ratios are computed for all possible sets of size j. Wilks computed the distribution of the statistic for j equal to 1 and 2. It is well known that this procedure is a likelihood ratio test and that for j D 1 the method is equivalent to selecting the observation with the largest Mahalanobis distance from the center of the data.Because a direct extension of this idea to sets of outliers larger than 2 or 3 is not practical, Gnanadesikan and Kettenring (1972) proposed to reduce the multivariate detection problem to a set of univariate problems by looking at projections of the data onto some direction. They chose the direction of maximum variability of the data and, therefore, they proposed to obtain the principal components of the data and search for outliers in these directions. Although this method provides the correct solution when the outliers are located close to the directions of the principal components, it may fail to identify outliers in the general case.An alternative approach is to use robust location and scale estimators. Maronna (1976) studied af nely equivariant M estimators for covariance matrices, and Campbell (1980) proposed using the Mahalanobis distance computed using M estimators for the mean...

show abstract

A decomposition procedure based on approximate Newton directions

Conejo¹,

Nogales²,

Prieto

2002

Mathematical Programming

164

View full text Add to dashboard Cite

The efficient solution of large-scale linear and nonlinear optimization problems may require exploiting any special structure in them in an efficient manner. We describe and analyze some cases in which this special structure can be used with very little cost to obtain search directions from decomposed subproblems. We also study how to correct these directions using (decomposable) preconditioned conjugate gradient methods to ensure local convergence in all cases. The choice of appropriate preconditioners results in a natural manner from the structure in the problem. Finally, we conduct computational experiments to compare the resulting procedures with direct methods, as well as to study the impact of different preconditioner choices. ABSTRACTThe efficient solution of large-scale linear and nonlinear optimization problems may require exploiting any special structure in them in an efficient manner. We describe and analyze some cases in which this special structure can be used with very little cost to obtain search directions from decomposed subproblems. We also study how to correct these directions using (decomposable) preconditioned conjugate gradient methods to ensure local convergence in all cases. The choice of appropriate preconditioners results in a natural manner from the structure in the problem. Finally, we conduct computational experiments to compare the resulting procedures with direct methods, as well as to study the impact of different preconditioner choices.

show abstract

Multimarket Optimal Bidding for a Power Producer

Plazas

Conejo

Prieto

2005

IEEE Trans. Power Syst.

138

View full text Add to dashboard Cite

Abstract-This paper considers a profit-maximizing thermal producer that participates in a sequence of spot markets, namely, day-ahead, automatic generation control (AGC), and balancing markets. The producer behaves as a price-taker in both the day-ahead market and the AGC market but as a potential price-maker in the volatile balancing market. The paper provides a stochastic programming methodology to determine the optimal bidding strategies for the day-ahead market. Uncertainty sources include prices for the day-ahead and AGC markets and balancing market linear price variations with the production of the thermal producer. Results from a realistic case study are reported and analyzed. Conclusions are duly drawn.

show abstract

Cluster Identification Using Projections

Peña¹,

Prieto²

2001

Journal of the American Statistical Association

View full text Add to dashboard Cite

This artiele describes a procedure to identify elusters in multivariate data using information obtained from the univariate proj~ctions of the sample data onto certain directions. The directions are chosen as those that minimize and maXlmlze the kurtOSlS coefficlent of the projected data. It is shown that, under certain conditions, these directions provide the largest separatlOn for the dlfferent clusters. The projected univariate data are used to group the observations according to the values of the gaps or spacmgs between consecutIveordered observations. These groupings are then combined over all projection directions. The behavlOr of the method lS tested on ~everal examples, and compared to k-means, MCLUST, and the procedure proposed by Jones and Sibson in 1987. The proposed algonthm lS iterative, affine equivariant, flexible, robust to outliers, fast to implement, and seems to work well m practIce.KEY WORDS: Classification; Kurtosis; Multivariate analysis; Robustness; Spacings. . INTRODUCTIONLet us suppose we have a sample of multivariate observations generated from several different populations. One of the most important problems of cluster analysis is the partitioning of the points of this sample into nonoverlapping clusters. The most commonly used algorithms as sume that the number of clusters, G, is known and the partition of the data is carried out by maximizing some optimality criterion. These algorithms start with an initial classification of the points into clusters and then reassign each point in tum to increase the criterion. The process is repeated until a local optimum of the criterion is reached. The most often used criteria can be derived from the application of likelihood ratio tests to mixtures of multivariate normal populations with different means. It is well known that (i) when all the covariance matrices are assumed to be egual to the identity matrix, the criterio n obtained corresponds to minimizing tr(W), where W is the within-groups covariance matrix, this is the criterion used in the standard k-means procedure; (ii) when the covariance matrices are assumed to be egual, without other restrictions, the criterion obtained is minimizing ¡W¡ (Friedman and Rubin 1967); (iii) when the covariance matrices are allowed to be different, the criterion obtained is minimizing Lf= 1 n j log I W j / n J where W j is the sample cross-product matrix for the jth cluster (see Seber 1984, and Gordon 1994, for other criteria). These algorithms may present two main limitations: (i) we have to choose the criterion a priori, without knowing the covariance structure of the data and different criteria can lead to very different answers; and (ii) they usually reguire large amounts of computer time, which makes them difficult to apply to large data sets.Banfield and Raftery (1993), and Dasgupta and Raftery (1998) have proposed a model-based approach to clustering that has several advantages over previous procedures. They as sume a mixture model and use the EM algorithm to estimate the parameters. The initial esti...

show abstract

An augmented Lagrangian interior-point method using directions of negative curvature

Moguerza

Prieto²

2003

Mathematical Programming

View full text Add to dashboard Cite

Abstract.We describe an efficient implementation of an interior-point algorithm for non-convex problems that uses directions of negative curvature. These directions should ensure convergence to second-order KKT points and improve the computational efficiency of the procedure. Some relevant aspects of the implementation are the strategy to combine a direction of negative curvature and a modified Newton direction, and the conditions to ensure feasibility of the iterates with respect to the simple bounds. The use of multivariate barrier and penalty parameters is also discussed, as well as the update rules for these parameters. We analyze the convergence of the procedure; both the linesearch and the update rule for the barrier parameter behave appropriately. As the main goal of the paper is the practical usage of negative curvature, a set of numerical results on small test problems is presented. Based on these results, the relevance of using directions of negative curvature is discussed.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.