Regularized variants of Principal Components Analysis, especially Sparse PCA and Functional PCA, are among the most useful tools for the analysis of complex high-dimensional data. Many examples of massive data, have both sparse and functional (smooth) aspects and may benefit from a regularization scheme that can capture both forms of structure. For example, in neuro-imaging data, the brain's response to a stimulus may be restricted to a discrete region of activation (spatial sparsity), while exhibiting a smooth response within that region. We propose a unified approach to regularized PCA which can induce both sparsity and smoothness in both the row and column principal components. Our framework generalizes much of the previous literature, with sparse, functional, two-way sparse, and two-way functional PCA all being special cases of our approach. Our method permits flexible combinations of sparsity and smoothness that lead to improvements in feature selection and signal recovery, as well as more interpretable PCA factors. We demonstrate the efficacy of our method on simulated data and a neuroimaging example on EEG data.
Convex clustering is a promising new approach to the classical problem of clustering, combining strong performance in empirical studies with rigorous theoretical foundations. Despite these advantages, convex clustering has not been widely adopted, due to its computationally intensive nature and its lack of compelling visualizations. To address these impediments, we introduce Algorithmic Regularization, an innovative technique for obtaining high-quality estimates of regularization paths using an iterative one-step approximation scheme. We justify our approach with a novel theoretical result, guaranteeing global convergence of the approximate path to the exact solution under easily-checked non-data-dependent assumptions. The application of algorithmic regularization to convex clustering yields the Convex Clustering via Algorithmic Regularization Paths (CARP) algorithm for computing the clustering solution path. On example data sets from genomics and text analysis, CARP delivers over a 100-fold speedup over existing methods, while attaining a finer approximation grid than standard methods. Furthermore, CARP enables improved visualization of clustering solutions: the fine solution grid returned by CARP can be used to construct a convex clusteringbased dendrogram, as well as forming the basis of a dynamic path-wise visualization based on modern web technologies. Our methods are implemented in the open-source R package clustRviz, available at https://github.com/DataSlingers/clustRviz.
Background and Aims Therapeutic, clinical trial entry and stratification decisions for hepatocellular carcinoma (HCC) are made based on prognostic assessments, using clinical staging systems based on small numbers of empirically selected variables that insufficiently account for differences in biological characteristics of individual patients’ disease. Approach and Results We propose an approach for constructing risk scores from circulating biomarkers that produce a global biological characterization of individual patient’s disease. Plasma samples were collected prospectively from 767 patients with HCC and 200 controls, and 317 proteins were quantified in a Clinical Laboratory Improvement Amendments–certified biomarker testing laboratory. We constructed a circulating biomarker aberration score for each patient, a score between 0 and 1 that measures the degree of aberration of his or her biomarker panel relative to normal, which we call HepatoScore. We used log‐rank tests to assess its ability to substratify patients within existing staging systems/prognostic factors. To enhance clinical application, we constructed a single‐sample score, HepatoScore‐14, which requires only a subset of 14 representative proteins encompassing the global biological effects. Patients with HCC were split into three distinct groups (low, medium, and high HepatoScore) with vastly different prognoses (medial overall survival 38.2/18.3/7.1 months; P < 0.0001). Furthermore, HepatoScore accurately substratified patients within levels of existing prognostic factors and staging systems (P < 0.0001 for nearly all), providing substantial and sometimes dramatic refinement of expected patient outcomes with strong therapeutic implications. These results were recapitulated by HepatoScore‐14, rigorously validated in repeated training/test splits, concordant across Myriad RBM (Austin, TX) and enzyme‐linked immunosorbent assay kits, and established as an independent prognostic factor. Conclusions HepatoScore‐14 augments existing HCC staging systems, dramatically refining patient prognostic assessments and therapeutic decision making and enrollment in clinical trials. The underlying strategy provides a global biological characterization of disease, and can be applied broadly to other disease settings and biological media.
Co-Clustering, the problem of simultaneously identifying clusters across multiple aspects of a data set, is a natural generalization of clustering to higher-order structured data. Recent convex formulations of bi-clustering and tensor co-clustering, which shrink estimated centroids together using a convex fusion penalty, allow for global optimality guarantees and precise theoretical analysis, but their computational properties have been less well studied. In this note, we present three efficient operator-splitting methods for the convex co-clustering problem: a standard two-block ADMM, a Generalized ADMM which avoids an expensive tensor Sylvester equation in the primal update, and a three-block ADMM based on the operator splitting scheme of Davis and Yin. Theoretical complexity analysis suggests, and experimental evidence confirms, that the Generalized ADMM is far more efficient for large problems.
We consider the problem of estimating multiple principal components using the recently-proposed Sparse and Functional Principal Components Analysis (SFPCA) estimator. We first propose an extension of SFPCA which estimates several principal components simultaneously using manifold optimization techniques to enforce orthogonality constraints. While effective, this approach is computationally burdensome so we also consider iterative deflation approaches which take advantage of efficient algorithms for rank-one SFPCA. We show that alternative deflation schemes can more efficiently extract signal from the data, in turn improving estimation of subsequent components. Finally, we compare the performance of our manifold optimization and deflation techniques in a scenario where orthogonality does not hold and find that they still lead to significantly improved performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.