2017
DOI: 10.1038/ng.3801
|View full text |Cite
|
Sign up to set email alerts
|

Diversity in non-repetitive human sequences not found in the reference genome

Abstract: Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
104
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 68 publications
(107 citation statements)
references
References 52 publications
3
104
0
Order By: Relevance
“…In line with prior observations suggesting that the vast majority of human non-reference sequence is present in the assembled genomes of non-human primates 48,49 , we find that our assemblies likely represent retained ancestral sequences that have been deleted in some human lineages, including on the reference haplotype. Consistent with this, the frequencies of the newly assembled alleles (Supplementary Figure 10) are higher than those observed for SNVs and indels, with 78.3% of the events present in >5% of the samples and only 6% having a frequency <0.5%.…”
Section: Beyond Snvs and Indelssupporting
confidence: 89%
See 1 more Smart Citation
“…In line with prior observations suggesting that the vast majority of human non-reference sequence is present in the assembled genomes of non-human primates 48,49 , we find that our assemblies likely represent retained ancestral sequences that have been deleted in some human lineages, including on the reference haplotype. Consistent with this, the frequencies of the newly assembled alleles (Supplementary Figure 10) are higher than those observed for SNVs and indels, with 78.3% of the events present in >5% of the samples and only 6% having a frequency <0.5%.…”
Section: Beyond Snvs and Indelssupporting
confidence: 89%
“…Consistent with this, the frequencies of the newly assembled alleles (Supplementary Figure 10) are higher than those observed for SNVs and indels, with 78.3% of the events present in >5% of the samples and only 6% having a frequency <0.5%. Comparing our findings to two previous studies on different smaller datasets 48,49 , 243 sequences (164,099bp retained sequence) are wholly novel. Additionally, we have resolved length and both breakpoints for 137 events (170,133bp) for which only one breakpoint was previously known ( Figure 3D).…”
Section: Beyond Snvs and Indelssupporting
confidence: 72%
“…Non-reference human sequences may represent causal variants underlying disease associations or may even harbor novel genes. For example, by analyzing WGS data of ~15K Icelanders, Kehr et al 17 were able to identify 3,791 NRNR sequence variants, and demonstrated an association between a 766-bp NRNR (NRNR1361) and decreased risk of myocardial infarction.…”
Section: Discussionmentioning
confidence: 99%
“…A smaller fraction consists of NRNR sequences. Although a portion of NRSs may have no functional impact at the molecular or phenotypic level, some, especially NRNR sequences may represent causal variants underlying diseases 16,17 , representing a useful resource in genetic medicine. Human non-reference catalogues can also be highly valuable in metagenomics and microbiome research, as they can be used to aid the removal of human contaminant DNA from genomic and metagenomic datasets.…”
Section: Introductionmentioning
confidence: 99%
“…However, most of these effects are associated with just two well-known inversions. Despite attempts to associate inversions with gene-expression and phenotypic variation in large datasets, the analyses have been limited exclusively to those with simple breakpoints, and only a couple of additional candidates have been identified so far (Chiang et al, 2017;Kehr et al, 2017;Sudmant et al, 2015). Thus, specific genotyping studies of a diverse range of inversions in a high number of individuals are necessary to have a more global idea of their functional and evolutionary impact in the human genome.…”
Section: Introductionmentioning
confidence: 99%