2022
DOI: 10.1101/2022.07.09.499321
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Draft Human Pangenome Reference

Abstract: The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 m… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

3
226
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 117 publications
(255 citation statements)
references
References 120 publications
3
226
0
1
Order By: Relevance
“…The HPRC effort, which began more than a year later, focused exclusively on CCS data (n=94) generated from diploid samples assembled using trio-based hifiasm (Cheng et al 2021). Here, parent–child data were directly used to aid assembly phasing of all HPRC samples (Wang et al 2022; Liao et al 2022) allowing for both platform and methodology comparisons.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The HPRC effort, which began more than a year later, focused exclusively on CCS data (n=94) generated from diploid samples assembled using trio-based hifiasm (Cheng et al 2021). Here, parent–child data were directly used to aid assembly phasing of all HPRC samples (Wang et al 2022; Liao et al 2022) allowing for both platform and methodology comparisons.…”
Section: Resultsmentioning
confidence: 99%
“…Unfortunately, there are regions in current genome assemblies that are still completely missing, incorrectly assembled, or otherwise pose challenges for the construction of such pangenome graphs. A set of regions, termed “brnn” regions, were identified and “trimmed” during the construction of the minigraph-cactus graph (Liao et al 2022). These regions were excluded at least once but, in some instances, up to 88 times and mapped predictably to satellite DNA (n=149 regions or ~149.7 Mbp; ~28.9 Mbp in acrocentrics) and SD regions (n=301 regions or ~65.7 Mbp) but also correspond to protein-coding genes (n=171) as well as common inversion polymorphisms (n=49) ( Fig.…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, we only interrogated regions where 1:1 synteny could be established. As more of the genome is assessed in the context of a pangenome reference framework, it is likely that the proportion of IGC will increase especially in regions such as the centromere and acrocentric, which currently are not well assembled or characterized (Liao et al 2022).…”
Section: Discussionmentioning
confidence: 99%
“…This exclusion has translated into a fundamental lack of understanding in mutational processes precisely in regions predicted to be more mutable due to the action of ectopic or interlocus gene conversion (IGC) (Teshima and Innan 2012). Leveraging high-quality phased genome assemblies generated as part of the Human Pangenome Reference Consortium (HPRC) (Liao et al 2022), we compare the SNV landscape of duplicated and unique DNA in the human genome.…”
mentioning
confidence: 99%
“…A pangenome graph can serve as a cornerstone for analyzing repeat structure variation in population 17,19 . It is usually hard to compare multiple sequences with complicated repeat structure by examining pairwise sequence alignments directly.…”
Section: Resultsmentioning
confidence: 99%