2016
DOI: 10.1101/094714
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FlashPCA2: principal component analysis of biobank-scale genotype datasets

Abstract: Motivation: Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of individuals, standard approaches are no longer computationally feasible. We present FlashPCA2, a tool that can perform PCA on 1 million individuals faster than competing approaches, while requiring substantially less memory.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
136
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 108 publications
(137 citation statements)
references
References 11 publications
1
136
0
Order By: Relevance
“…We compared three different principal component analysis (PCA) methods using our simulated genotype data, namely flashPCA2 (or pruned PCA, with a recommended pruning step and a projection step, see URL and Ref. 19 ), exact PCA (implemented in GCTA using all the variants without pruning, see Ref. 20 ), and projection PCA (proj.…”
Section: Supplementary Note 6 Principal Component Analysismentioning
confidence: 99%
“…We compared three different principal component analysis (PCA) methods using our simulated genotype data, namely flashPCA2 (or pruned PCA, with a recommended pruning step and a projection step, see URL and Ref. 19 ), exact PCA (implemented in GCTA using all the variants without pruning, see Ref. 20 ), and projection PCA (proj.…”
Section: Supplementary Note 6 Principal Component Analysismentioning
confidence: 99%
“…Genome-wide association analyses were conducted on the simulated data with 6 different methods. The simulated phenotypes were pre-adjusted by the top 10 PCs computed from a set of LD-pruned variants using flashPCA2 67 (Supplementary Note 6 and Supplementary Figure 20).…”
Section: Assessing False Positive Rate and Statistical Powermentioning
confidence: 99%
“…We compare our method to GWAS using logistic regression, defining hypertension cases as belonging to Stage 2 or higher as done in Warren et al (). We recalculated principal components using FlashPCA after filtering individuals and SNPs through quality control (QC) filters, because the subset of individuals we analyze is exclusively of British ancestry, and the original principal components were calculated before filtering (Abraham, Qiu, & Inouye, ). Our hypertension GWAS analysis includes the following covariates: sex, center, age, age2, body mass index (BMI), and the top 10 principal components to adjust for ancestry/relatedness.…”
Section: Resultsmentioning
confidence: 99%