Dan Shen scite author profile

Sparse Principal Component Analysis (PCA) methods are efficient tools to reduce the dimension (or the number of variables) of complex data. Sparse principal components (PCs) are easier to interpret than conventional PCs, because most loadings are zero. We study the asymptotic properties of these sparse PC directions for scenarios with fixed sample size and increasing dimension (i.e. High Dimension, Low Sample Size (HDLSS)). Under the previously studied spike covariance assumption, we show that Sparse PCA remains consistent under the same large spike condition that was previously established for conventional PCA. Under a broad range of small spike conditions, we find a large set of sparsity assumptions where Sparse PCA is consistent, but PCA is strongly inconsistent. The boundaries of the consistent region are clarified using an oracle result.

show abstract

The statistics and mathematics of high dimension low sample size asymptotics

Shen¹,

Shen²,

Zhu³

et al. 2017

STAT SINICA

View full text Add to dashboard Cite

The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.

show abstract

Study on the Hole Conduction Phenomenon in Carbon Fiber-Reinforced Concrete

Sun

Mao

et al. 1998

Cement and Concrete Research

137

View full text Add to dashboard Cite

Epoxy resin flame-retarded via a novel melamine-organophosphinic acid salt: Thermal stability, flame retardance and pyrolysis behavior

Shen

Long

et al. 2017

Journal of Analytical and Applied Pyrolysis

122

View full text Add to dashboard Cite

Thermoelectric percolation phenomena in carbon fiber-reinforced concrete

Sun

Mao

et al. 1998

Cement and Concrete Research

102

View full text Add to dashboard Cite

Functional Data Analysis of Tree Data Objects

Shen¹,

Shen²,

Bhamidi³

et al. 2014

Journal of Computational and Graphical Statistics

View full text Add to dashboard Cite

Data analysis on non-Euclidean spaces, such as tree spaces, can be challenging. The main contribution of this paper is establishment of a connection between tree data spaces and the well developed area of Functional Data Analysis (FDA), where the data objects are curves. This connection comes through two tree representation approaches, the Dyck path representation and the branch length representation. These representations of trees in Euclidean spaces enable us to exploit the power of FDA to explore statistical properties of tree data objects. A major challenge in the analysis is the sparsity of tree branches in a sample of trees. We overcome this issue by using a tree pruning technique that focuses the analysis on important underlying population structures. This method parallels scale-space analysis in the sense that it reveals statistical properties of tree structured data over a range of scales. The effectiveness of these new approaches is demonstrated by some novel results obtained in the analysis of brain artery trees. The scale space analysis reveals a deeper relationship between structure and age. These methods are the first to find a statistically significant gender difference.

show abstract

A survey of high dimension low sample size asymptotics

Aoshima

Shen

et al. 2018

Aus NZ J of Statistics

View full text Add to dashboard Cite

Peter Hall's work illuminated many aspects of statistical thought, some of which are very well known including the bootstrap and smoothing. However, he also explored many other lesser known aspects of mathematical statistics. This is a survey of one of those areas, initiated by a seminal paper in 2005, on high dimension low sample size asymptotics. An interesting characteristic of that first paper, and of many of the following papers, is that they contain deep and insightful concepts which are frequently surprising and counter-intuitive, yet have mathematical underpinnings which tend to be direct and not difficult to prove.

show abstract

A novel algorithm for analyzing drug-drug interactions from MEDLINE literature

Yin

Shen

Pietsch

et al. 2015

Sci Rep

View full text Add to dashboard Cite

Drug–drug interaction (DDI) is becoming a serious clinical safety issue as the use of multiple medications becomes more common. Searching the MEDLINE database for journal articles related to DDI produces over 330,000 results. It is impossible to read and summarize these references manually. As the volume of biomedical reference in the MEDLINE database continues to expand at a rapid pace, automatic identification of DDIs from literature is becoming increasingly important. In this article, we present a random-sampling-based statistical algorithm to identify possible DDIs and the underlying mechanism from the substances field of MEDLINE records. The substances terms are essentially carriers of compound (including protein) information in a MEDLINE record. Four case studies on warfarin, ibuprofen, furosemide and sertraline implied that our method was able to rank possible DDIs with high accuracy (90.0% for warfarin, 83.3% for ibuprofen, 70.0% for furosemide and 100% for sertraline in the top 10% of a list of compounds ranked by p-value). A social network analysis of substance terms was also performed to construct networks between proteins and drug pairs to elucidate how the two drugs could interact.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dan Shen

Consistency of sparse PCA in High Dimension, Low Sample Size contexts

The statistics and mathematics of high dimension low sample size asymptotics

Study on the Hole Conduction Phenomenon in Carbon Fiber-Reinforced Concrete

Epoxy resin flame-retarded via a novel melamine-organophosphinic acid salt: Thermal stability, flame retardance and pyrolysis behavior

Thermoelectric percolation phenomena in carbon fiber-reinforced concrete

Functional Data Analysis of Tree Data Objects

A survey of high dimension low sample size asymptotics

A novel algorithm for analyzing drug-drug interactions from MEDLINE literature

Contact Info

Product

Resources

About