2020
DOI: 10.1186/s12915-020-00894-1
|View full text |Cite
|
Sign up to set email alerts
|

Insertion variants missing in the human reference genome are widespread among human populations

Abstract: Background Structural variants comprise diverse genomic arrangements including deletions, insertions, inversions, and translocations, which can generally be detected in humans through sequence comparison to the reference genome. Among structural variants, insertions are the least frequently identified variants, mainly due to ascertainment bias in the reference genome, lack of previous sequence knowledge, and low complexity of typical insertion sequences. Though recent developments in long-read … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 65 publications
(91 reference statements)
0
11
0
Order By: Relevance
“…Over the past decade, the sharp decrease in sequencing costs has led to a flood of large-scale and population-specific sequencing projects, revealing unique sequences missing from the current reference genome [8,9], population-specific differences in common genetic variants [10,11], and increasing recognition that some populations, particularly Indigenous, are at risk of being left behind [12]. The GRCh38 reference assembly has been expanded to include alternate loci (ALT loci), most of which are similar to the primary assembly but contain many small variants that commonly occur together.…”
Section: The Golden Era Of the Reference Genomementioning
confidence: 99%
“…Over the past decade, the sharp decrease in sequencing costs has led to a flood of large-scale and population-specific sequencing projects, revealing unique sequences missing from the current reference genome [8,9], population-specific differences in common genetic variants [10,11], and increasing recognition that some populations, particularly Indigenous, are at risk of being left behind [12]. The GRCh38 reference assembly has been expanded to include alternate loci (ALT loci), most of which are similar to the primary assembly but contain many small variants that commonly occur together.…”
Section: The Golden Era Of the Reference Genomementioning
confidence: 99%
“…Others refer to NRS variants as insertions ( Delage et al , 2020 ; Wong et al , 2020 ) because the variants describe novel sequence with respect to the reference genome. However, the majority of NRS appears to be ancestral rather than novel because they can be found in other primate genomes ( Kehr et al , 2017 ; Lee et al , 2020 ). A convincing explanation for the existence of NRS is that the genomes used to construct the reference genome lacked these sequences.…”
Section: Introductionmentioning
confidence: 99%
“…Some pipelines for moderate numbers of genomes create whole-genome assemblies prior to NRS variant calling, such as the pipelines that were applied to 50 Danish trios ( Liu et al , 2015 ; Maretty et al , 2017 ), 275 Han Chinese genomes ( Duan et al , 2019 ), 1000 Swedish genomes ( Eisfeldt et al , 2020 ) and 338 genomes from genetically divergent human populations ( Wong et al , 2018 , 2020 ). Finally, pipelines developed for the 1000 genomes project data ( Lee et al , 2020 ) and for the TOPMed program ( Taliun et al , 2021 ) search for NRS variants that match related genomes like other primates’ genomes.…”
Section: Introductionmentioning
confidence: 99%
“…Some pipelines for moderate numbers of genomes create whole-genome assemblies prior to NRS variant calling, such as the pipelines that were applied to 50 Danish trios [Maretty et al , 2017, Liu et al , 2015], 275 Han Chinese genomes [Duan et al , 2019], 1000 Swedish genomes [Eisfeldt et al , 2020], and 338 genomes from genetically divergent human populations [Wong et al , 2018, Wong et al , 2020]. Finally, pipelines developed for the 1000 genomes project data [Lee et al , 2020] and for the TOPMed program [Taliun et al , 2021] search for NRS variants that match related genomes like other primates’ genomes.…”
Section: Introductionmentioning
confidence: 99%
“…Others refer to NRS variants as insertions [Wong et al , 2020, Delage et al , 2020] because the variants describe novel sequence with respect to the reference genome. However, the majority of NRS appears to be ancestral rather than novel because they can be found in other primate genomes [Lee et al , 2020, Kehr et al , 2017]. A convincing explanation for the existence of NRS is that the genomes used to construct the reference genome lacked these sequences.…”
Section: Introductionmentioning
confidence: 99%