2018
DOI: 10.3390/genes9100486
|View full text |Cite
|
Sign up to set email alerts
|

De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data

Abstract: The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
54
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(59 citation statements)
references
References 37 publications
(68 reference statements)
4
54
0
Order By: Relevance
“…This brings the canine reference genome quality in line with other key mammalian species e.g. human 43 , mouse 44 , and gorilla 45 . For both human and mouse projects, the de novo sequence assembly of multiple individuals from different population backgrounds has revealed novel sequence not found in the single (hybrid in the case of human) species reference, and facilitated the search for population speci c variants which likely contribute to traits of interest, including within the highly polymorphic immune gene clusters 43,44 .…”
Section: Resultsmentioning
confidence: 90%
See 1 more Smart Citation
“…This brings the canine reference genome quality in line with other key mammalian species e.g. human 43 , mouse 44 , and gorilla 45 . For both human and mouse projects, the de novo sequence assembly of multiple individuals from different population backgrounds has revealed novel sequence not found in the single (hybrid in the case of human) species reference, and facilitated the search for population speci c variants which likely contribute to traits of interest, including within the highly polymorphic immune gene clusters 43,44 .…”
Section: Resultsmentioning
confidence: 90%
“…human 43 , mouse 44 , and gorilla 45 . For both human and mouse projects, the de novo sequence assembly of multiple individuals from different population backgrounds has revealed novel sequence not found in the single (hybrid in the case of human) species reference, and facilitated the search for population speci c variants which likely contribute to traits of interest, including within the highly polymorphic immune gene clusters 43,44 . While this type of de novo collection is on-going within the canine community, GSD_1.0 is the rst genome of reference quality that is further annotated with novel long read RNA-sequencing data, allowing for the resolution of transcript complexity through regions with high GC context, or "dark" regions 28 .…”
Section: Resultsmentioning
confidence: 99%
“…Nevertheless, for most NGS applications, namely in clinical genetics, mapping against a reference sequence is the first choice. As for de novo assembly, it is still mostly confined to more specific projects, especially targeting to correct inaccuracies in the reference genome [74] and to improve the identification of SV and other complex rearrangements [36].…”
Section: Secondary Analysismentioning
confidence: 99%
“…For example, the human genome was assembled using the DNA of ∼50 individuals with just one of them accounting for ∼70% of the sequence, while the yeast reference genome was produced from a single laboratory strain (namely S288C) and its derivatives [9,10]. Recently, high-quality panels of reference sequences [11,12,13] and novel standards for genome assembly [14] have been reported, while graphbased models have been suggested to overcome the limits imposed by reference bias [15,16,17]. Nevertheless, using a single reference sequence is a convenient simplification [8] and current technologies are boosting genome quality [18].…”
Section: Introductionmentioning
confidence: 99%