2018
DOI: 10.1038/s41467-018-05513-w
|View full text |Cite
|
Sign up to set email alerts
|

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations

Abstract: The human reference genome is used extensively in modern biological research. However, a single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population. Using 10× Genomics (10×G) “Linked-Read” technology, we perform whole genome sequencing (WGS) and de novo assembly on 17 individuals across five populations. We identify 1842 breakpoint-resolved non-reference unique insertions (NUIs) that, in aggregate, add up to 2.1 Mb of so… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
84
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 76 publications
(85 citation statements)
references
References 46 publications
(51 reference statements)
1
84
0
Order By: Relevance
“…Finally, we ran our method on the well-known CEPH/HapMap NA12878 diploid genome (sequence coverage 52x) and compared it with a recently de novo assembly based method Wong et al (2018) which we will refer to as NUI-pipeline). We obtained high-coverage 10x Chromium Linked-Reads from the publicly available Genome In A Bottle (GIAB) data set (Zook et al (2018)).…”
Section: The Na12878 Diploid Genomementioning
confidence: 99%
See 2 more Smart Citations
“…Finally, we ran our method on the well-known CEPH/HapMap NA12878 diploid genome (sequence coverage 52x) and compared it with a recently de novo assembly based method Wong et al (2018) which we will refer to as NUI-pipeline). We obtained high-coverage 10x Chromium Linked-Reads from the publicly available Genome In A Bottle (GIAB) data set (Zook et al (2018)).…”
Section: The Na12878 Diploid Genomementioning
confidence: 99%
“…In what follows, we introduce an integrated mapping-based and assembly-based method, which is significantly more accurate than existing short-read methods for novel insertion discovery. While our method is less efficient that existing short-read methods, it is indeed more efficient compared to the recent Linked-Read algorithms that use whole-genome de novo assembly such as (Weisenfeld et al (2017); Wong et al (2018)) because it uses only a very small fraction of informative Linked-Reads as we describe below. While long-read sequencing is technically impractical for large-scale screening of whole genomes, our Linked-Read method is able to characterize one of the most challenging classes of SVs with a reasonable additional cost to standard short-read sequencing.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Many studies have pointed out that a single genome is inadequate for a variety of reasons, such as inherent bias towards the reference genome (Need and Goldstein 2009, Popejoy and Fullerton 2016, Ballouz et al 2019. The availability of reference genomes from multiple human populations would greatly aid attempts to find genetic causes of traits that are over-or under-represented in those populations, including susceptibility to disease (Wong et al 2018). Another drawback of relying on a single reference genome is that it almost certainly contains minor alleles at some loci, which in turn confounds studies focused on variant discovery and association of those variants with disease (Ferrarini et al 2015, Magi et al 2015, Barbitoff et al 2018, Wong et al 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Geographic and local population genetic stratification and variation complicate the ability to diagnose and treat medical conditions [32] (for additional exposition, see Addendum A.1). The predictive utility of GWAS and GWAS PRSs also varies broadly if the risk score is applied to a population other than the one for which the score was initially determined [33][34][35]. At the same time, there are many indications of the commonality of causal gene variants for polygenic diseases among geographically distinct populations [36,37], while admixed populations present an intermediate liability to diseases [38][39][40].…”
Section: Introductionmentioning
confidence: 99%