Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Shen, Dan; Shen, Haipeng; Marron, J. S.

doi:10.1016/j.jmva.2012.10.007

Cited by 76 publications

(57 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, substantial work has been done on the PCA consistency on the spiked covariance model (e.g., Johnstone (2001) and Paul (2007)), and is extended to the HDLSS context by Ahn et al (2007), Jung and Marron (2009) and Shen et al (2013a). In a highdimensional factor model y t = Λf t + u t , let Σ = cov(y t ) be the p × p covariance matrix of y t .…”

Section: Projected-pca Consistency In the Hdlss Contextmentioning

confidence: 99%

Projected Principal Component Analysis in Factor Models

2014

View full text Add to dashboard Cite

This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semi-parametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index.

show abstract

Section: Projected-pca Consistency In the Hdlss Contextmentioning

confidence: 99%

Projected Principal Component Analysis in Factor Models

2014

View full text Add to dashboard Cite

show abstract

“…Most of this is asymptotics based, and OODA makes it clear that there are several important modes of asymptotics that can be considered. Even for Euclidean PCA, as made clear in Shen et al (2012a), several distinct asymptotic domains have been considered, ranging from the classical n → ∞, where d is fixed, through Random Matrix Theory where n ∼ d → ∞, see Johnstone (2006) and Johnstone and Lu (2009) for access to this literature, to High Dimension Low Sample Size (HDLSS) asymptotics where d → ∞ while n is fixed, see Jung and Marron (2009) for discussion of PCA in that context. In many OODA problems, the HDLSS type of asymptotics is informative and relevant, because sample sizes are frequently relatively small, and complexity of objects and their representations are frequently very large.…”

Section: Open Problems In Other Areas Of Statisticsmentioning

confidence: 99%

Overview of object oriented data analysis

Marron

Alonso²

2014

Biometrical J

154

100

View full text Add to dashboard Cite

Object oriented data analysis is the statistical analysis of populations of complex objects. In the special case of functional data analysis, these data objects are curves, where a variety of Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects that are elements of mildly non-Euclidean spaces, such as lie groups and symmetric spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for object oriented data analysis create several potentially large new interfaces between mathematics and statistics. The notion of object oriented data analysis also impacts data analysis, through providing a framework for discussion of the many choices needed in many modern complex data analyses, especially in interdisciplinary contexts.

show abstract

“…The main questions needed to be answered in sparse PCA is whether there has an algorithm not only asymptotically consistent but also computationally efficient. Theoretical research from statistical guarantees view of sparse PCA includes consistency [2,8,14,38,41,50,53,55], minimax risk bounds for estimating eigenvectors [40,[42][43]45,61], optimal sparsity level detection [4,44,48,59] and principal subspaces estimation [5,[15][16]36,9,40,51,57] have been established under various statistical models. Because most of the methods based on spiked covariance model, so we firstly given an introduction about spiked variance model and then give a high dimensional sparse PCA theoretical analysis review from above several aspects.…”

Section: Theoretical Analysis Of High-dimensional Sparse Pcamentioning

confidence: 99%

“…Shen et al [41] established conditions for consistency of a sparse PCA method in [11] when p and n is fixed. Yuan [98] also derived the convergence rate of TPower methods.…”

mentioning

confidence: 99%