2019
DOI: 10.1093/bioinformatics/btz157
|View full text |Cite
|
Sign up to set email alerts
|

TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes

Abstract: Motivation Principal Component Analysis is a key tool in the study of population structure in human genetics. As modern datasets become increasingly larger in size, traditional approaches based on loading the entire dataset in the system memory (Random Access Memory) become impractical and out-of-core implementations are the only viable alternative. Results We present TeraPCA, a C++ implementation of the Randomized Subspace I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
41
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
3

Relationship

2
8

Authors

Journals

citations
Cited by 35 publications
(41 citation statements)
references
References 30 publications
0
41
0
Order By: Relevance
“…There are also many focuses of recent PCA algorithms (Additional file 19). The randomized subspace iteration algorithm, which is a hybrid of Krylov and Rand methodologies, was developed based on randomized SVD [133,134]. In pass-efficient or one-pass randomized SVD, some tricks to reduce the number of passes have been considered [135,136].…”
Section: Future Perspectivementioning
confidence: 99%
“…There are also many focuses of recent PCA algorithms (Additional file 19). The randomized subspace iteration algorithm, which is a hybrid of Krylov and Rand methodologies, was developed based on randomized SVD [133,134]. In pass-efficient or one-pass randomized SVD, some tricks to reduce the number of passes have been considered [135,136].…”
Section: Future Perspectivementioning
confidence: 99%
“…Recently, the advent of large population-scale genetic datasets, such as the UK biobank data, has prompted research on developing scalable algorithms to compute PCA on very large data (Bycroft et al 2018). It is now possible to efficiently approximate PCA on very large datasets thanks to software such as FastPCA (fast mode of EIGENSOFT), FlashPCA2, PLINK 2.0 (approx mode), bigstatsr/bigsnpr, TeraPCA and ProPCA (Galinsky et al 2016; Abraham et al 2017; Chang et al 2015; Privé et al 2018; Bose et al 2019; Agrawal et al 2019).…”
Section: Introductionmentioning
confidence: 99%
“…There are also many focuses of recent PCA algorithms (Additional file 23). The randomized subspace iteration algorithm, which is a hybrid of Krylov and Rand methodologies, was developed based on randomized SVD [133,134]. In pass-efficient or one-pass randomized SVD, some tricks to reduce the number of passes have been considered [135,136].…”
Section: Future Perspectivementioning
confidence: 99%