This paper is concerned with approximating the dominant left singular vector space of a real matrix A of arbitrary dimension, from block Krylov spaces generated by the matrix AA T and the block vector AX. Two classes of results are presented. First are bounds on the distance, in the two and Frobenius norms, between the Krylov space and the target space. The distance is expressed in terms of principal angles. Second are quality of approximation bounds, relative to the best approximation in the Frobenius norm. For starting guesses X of full column-rank, the bounds depend on the tangent of the principal angles between X and the dominant right singular vector space of A. The results presented here form the structural foundation for the analysis of randomized Krylov space methods. The innovative feature is a combination of traditional Lanczos convergence analysis with optimal approximations via least squares problems. 1 but not vice versa. It is in this sense that dominant subspace approximations are harder.This paper. We consider block Krylov space methods for computing dominant left singular vector spaces of general rectangular matrices and we present structural, deterministic bounds on the quality of the subspaces, for essentially general starting guesses. The innovative feature is a fusion of eigenvalue and singular value technology: We combine a traditional Lanczos convergence analysis [38] with optimal approximations via least squares problems [10,11].Our long-term goal is to put randomized Krylov space approximations on a firm numerical footing. However, at this preliminary first step, we make a few idealized assumptions:1. The block Krylov spaces have maximal dimension.2. The analysis assumes exact arithmetic and does not address the implementation of numerically stable recursions. Future work will need to deal with the challenging issues of finite precision arithmetic and viable numerical implementations, including recursions, numerical stability, maintaining orthogonality, deflation, adaptation of block size, and restarting. Empirical evaluations will have to assess whether the bounds are tight enough to be informative in practice.Overview. We start with a brief summary of our contributions (Section 2), followed by a comparison to existing work (Section 3). Auxiliary results (Section 4) set the stage for the proof of the main Theorems (Sections 5, 6, 7, and Appendix A). We end the main part of the paper with a perspective on open problems (Section 8).2. Results. After setting the context (Section 2.1), we give a brief summary of our bounds for: The distance between the Krylov space and the dominant left singular space (Section 2.2); a particular dominant subspace approximation from the Krylov space (Section 2.3); and the polynomials appearing in the approximation (Section 2.4). We end this section with a discussion of options for bounding the distance between the initial guess and the dominant right singular vector space (Section 2.5).2.1. Setting. To approximate the dominant left singular vector subspace of...
We introduce a novel algorithm for approximating the logarithm of the determinant of a symmetric positive definite (SPD) matrix. The algorithm is randomized and approximates the traces of a small number of matrix powers of a specially constructed matrix, using the method of Avron and Toledo [AT11]. From a theoretical perspective, we present additive and relative error bounds for our algorithm. Our additive error bound works for any SPD matrix, whereas our relative error bound works for SPD matrices whose eigenvalues lie in the interval (θ 1 , 1), with 0 < θ 1 < 1; the latter setting was proposed in [HMS15]. From an empirical perspective, we demonstrate that a C++ implementation of our algorithm can approximate the logarithm of the determinant of large matrices very accurately in a matter of seconds. *
Motivation Principal Component Analysis is a key tool in the study of population structure in human genetics. As modern datasets become increasingly larger in size, traditional approaches based on loading the entire dataset in the system memory (Random Access Memory) become impractical and out-of-core implementations are the only viable alternative. Results We present TeraPCA, a C++ implementation of the Randomized Subspace Iteration method to perform Principal Component Analysis of large-scale datasets. TeraPCA can be applied both in-core and out-of-core and is able to successfully operate even on commodity hardware with a system memory of just a few gigabytes. Moreover, TeraPCA has minimal dependencies on external libraries and only requires a working installation of the BLAS and LAPACK libraries. When applied to a dataset containing a million individuals genotyped on a million markers, TeraPCA requires <5 h (in multi-threaded mode) to accurately compute the 10 leading principal components. An extensive experimental analysis shows that TeraPCA is both fast and accurate and is competitive with current state-of-the-art software for the same task. Availability and implementation Source code and documentation are both available at https://github.com/aritra90/TeraPCA. Supplementary information Supplementary data are available at Bioinformatics online.
We present and analyze a simple, two-step algorithm to approximate the optimal solution of the sparse PCA problem. Our approach first solves an ℓ 1 -penalized version of the NP-hard sparse PCA optimization problem and then uses a randomized rounding strategy to sparsify the resulting dense solution. Our main theoretical result guarantees an additive error approximation and provides a tradeoff between sparsity and accuracy. Our experimental evaluation indicates that our approach is competitive in practice, even compared to state-of-the-art toolboxes such as Spasm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.