2021
DOI: 10.1101/2021.07.12.452063
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A complete reference genome improves analysis of human genetic variation

Abstract: Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 Mbp of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome to clinical and functional study. Here we demonstrate how the new reference universally improves read mapping and variant calling for 3,202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of novel variants per sample - a new frontier for evolution… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
67
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 44 publications
(71 citation statements)
references
References 96 publications
(95 reference statements)
4
67
0
Order By: Relevance
“…These false duplications exist only in GRCh38 and not in other human reference genome versions or in the broader population. A new telomere-to-telomere reference genome eliminates these false duplications and fixes collapsed duplications that prevented us from creating a benchmark for medically relevant genes like KCNJ18 and MAP2K3, and a similar CMRG benchmark for HG002 is now available on the new reference 25 . Future work will include using phased, diploid assemblies to form benchmarks for more genic and non-genic regions of the genome, eventually using genomes that are assembled telomere-to-telomere.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…These false duplications exist only in GRCh38 and not in other human reference genome versions or in the broader population. A new telomere-to-telomere reference genome eliminates these false duplications and fixes collapsed duplications that prevented us from creating a benchmark for medically relevant genes like KCNJ18 and MAP2K3, and a similar CMRG benchmark for HG002 is now available on the new reference 25 . Future work will include using phased, diploid assemblies to form benchmarks for more genic and non-genic regions of the genome, eventually using genomes that are assembled telomere-to-telomere.…”
Section: Discussionmentioning
confidence: 99%
“…A companion manuscript from the Telomere-to-Telomere Consortium demonstrates that the new T2T-CHM13 reference corrects these and additional false duplications affecting 1.2 Mbp and 74 genes. 27 We worked with the Genome Reference Consortium to use a new masking file that changes the sequence in the falsely duplicated regions of chromosome 21 on GRCh38 to N's. Masking in this way maintains the same coordinates but dramatically improves variant calling in the genes.…”
Section: Identifying and Resolving False Duplications Of Important Genes In The Referencementioning
confidence: 99%
See 1 more Smart Citation
“…We performed sequencing analysis after aligning the reads to the complete telomere-to-telomere human reference genome T2T-CHM13 v.1.1 (Nurk et al, 2021). The T2T-CHM13 reference genome improves coverage of complex regions and variant calling (Aganezov et al, 2021), delivering consensus sequences without the use of alternative contigs. IGV software was used to visualize alignment tracks and assess read coverage (Robinson et al, 2011).…”
Section: Whole-genome Sequencingmentioning
confidence: 99%
“…Here, we use linked reads and long reads to expand GIAB's benchmark to include challenging genomic regions for the GIAB pilot genome NA12878 and the GIAB Ashkenazi and Han Chinese trios from the Personal Genome Project, which are more broadly consented for genome sequencing and commercial redistribution of reference samples. 18 We more carefully exclude segmental duplications that are copy number variable in the GIAB samples 19 or missing copies in GRCh37 or GRCh38, 20,21 because these currently cannot be reliably benchmarked for small variants. We also refined the methods used to produce the diploid assembly-based MHC benchmark 17 to include most of the MHC region in each member of the trio.…”
mentioning
confidence: 99%