Debashis Paul scite author profile

In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer.

show abstract

Minimax bounds for sparse PCA with noisy high-dimensional data

Birnbaum¹,

Johnstone²,

Nadler³

et al. 2013

Ann. Statist.

158

136

View full text Add to dashboard Cite

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the l2 loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.

show abstract

A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data

Peng

Paul

2009

Journal of Computational and Graphical Statistics

100

132

View full text Add to dashboard Cite

In this paper, we consider the problem of estimating the eigenvalues and eigenfunctions of the covariance kernel (i.e., the functional principal components) from sparse and irregularly observed longitudinal data. We approach this problem through a maximum likelihood method assuming that the covariance kernel is smooth and finite dimensional. We exploit the smoothness of the eigenfunctions to reduce dimensionality by restricting them to a lower dimensional space of smooth functions. The estimation scheme is developed based on a Newton-Raphson procedure using the fact that the basis coefficients representing the eigenfunctions lie on a Stiefel manifold. We also address the selection of the right number of basis functions, as well as that of the dimension of the covariance kernel by a second order approximation to the leave-one-curve-out cross-validation score that is computationally very efficient. The effectiveness of our procedure is demonstrated by simulation studies and an application to a CD4 counts data set. In the simulation studies, our method performs well on both estimation and model selection. It also outperforms two existing approaches: one based on a local polynomial smoothing of the empirical covariances, and another using an EM algorithm.

show abstract

Random matrix theory in statistics: A review

Paul

Aue

2014

Journal of Statistical Planning and Inference

164

115

View full text Add to dashboard Cite

b s t r a c tWe give an overview of random matrix theory (RMT) with the objective of highlighting the results and concepts that have a growing impact in the formulation and inference of statistical models and methodologies. This paper focuses on a number of application areas especially within the field of high-dimensional statistics and describes how the development of the theory and practice in high-dimensional statistical inference has been influenced by the corresponding developments in the field of RMT.

show abstract

A Regularized Hotelling’sT²Test for Pathway Analysis in Proteomic Studies

Chen

Paul²,

Prentice³

et al. 2011

Journal of the American Statistical Association

109

117

View full text Add to dashboard Cite

Recent proteomic studies have identified proteins related to specific phenotypes. In addition to marginal association analysis for individual proteins, analyzing pathways (functionally related sets of proteins) may yield additional valuable insights. Identifying pathways that differ between phenotypes can be conceptualized as a multivariate hypothesis testing problem: whether the mean vector μ of a p-dimensional random vector X is μ0. Proteins within the same biological pathway may correlate with one another in a complicated way, and type I error rates can be inflated if such correlations are incorrectly assumed to be absent. The inflation tends to be more pronounced when the sample size is very small or there is a large amount of missingness in the data, as is frequently the case in proteomic discovery studies. To tackle these challenges, we propose a regularized Hotelling’s T2 () statistic together with a non-parametric testing procedure, which effectively controls the type I error rate and maintains good power in the presence of complex correlation structures and missing data patterns. We investigate asymptotic properties of the statistic under pertinent assumptions and compare the test performance with four existing methods through simulation examples. We apply the test to a hormone therapy proteomics data set, and identify several interesting biological pathways for which blood serum concentrations changed following hormone therapy initiation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Debashis Paul

Prediction by Supervised Principal Components

Minimax bounds for sparse PCA with noisy high-dimensional data

A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data

Random matrix theory in statistics: A review

A Regularized Hotelling’sT²Test for Pathway Analysis in Proteomic Studies

Contact Info

Product

Resources

About

Debashis Paul

Prediction by Supervised Principal Components

Minimax bounds for sparse PCA with noisy high-dimensional data

A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data

Random matrix theory in statistics: A review

A Regularized Hotelling’sT2Test for Pathway Analysis in Proteomic Studies

Contact Info

Product

Resources

About

A Regularized Hotelling’sT²Test for Pathway Analysis in Proteomic Studies