2016
DOI: 10.1038/ng.3583
|View full text |Cite
|
Sign up to set email alerts
|

Haplotype estimation for biobank-scale data sets

Abstract: The UK Biobank (UKB) has recently released genotypes on 152,328 individuals together with extensive phenotypic and lifestyle information. We present a new phasing method SHAPEIT3 that can handle such biobank scale datasets and results in switch error rates as low as ~0.3%. The method exhibits O(NlogN) scaling in sample size (N), enabling fast and accurate phasing of even larger cohorts.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

8
195
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 210 publications
(206 citation statements)
references
References 24 publications
8
195
0
Order By: Relevance
“…The first steps of the quality control have been previously described (SI Materials and Methods, URLs). Phasing and imputation were performed using SHAPEIT and IMPUTE2 (SI Materials and Methods, URLs), respectively, as previously described (35). After imputation, we selected 9,493,148 autosomal SNPs with imputation quality r 2 > 0.3, MAF > 1%, and Hardy-Weinberg equilibrium test P value >10 −6 .…”
Section: Methodsmentioning
confidence: 99%
“…The first steps of the quality control have been previously described (SI Materials and Methods, URLs). Phasing and imputation were performed using SHAPEIT and IMPUTE2 (SI Materials and Methods, URLs), respectively, as previously described (35). After imputation, we selected 9,493,148 autosomal SNPs with imputation quality r 2 > 0.3, MAF > 1%, and Hardy-Weinberg equilibrium test P value >10 −6 .…”
Section: Methodsmentioning
confidence: 99%
“…Over the past decade, phasing has most commonly been performed via statistical methods applied within a genotyped cohort 2-14 . Wet-lab technologies for direct phasing have also generated considerable recent interest, but these methods are currently much less scalable 15 .…”
mentioning
confidence: 99%
“…In general, the accuracy of statistical phasing methods increases steadily with sample size due to improved modeling of linkage disequilbrium and increasing prevalence of identity-by-descent. We and others have recently developed methods that achieve very high statistical phasing accuracy in cohorts comprising a large fraction of a population 8 or containing >100,000 samples 13,14 . However, for smaller cohorts, accuracy of cohort-based statistical phasing is fundamentally limited by the quantity of data available.…”
mentioning
confidence: 99%
See 2 more Smart Citations