2019
DOI: 10.1186/s13059-019-1707-2
|View full text |Cite
|
Sign up to set email alerts
|

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight

Abstract: Background The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions. … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

5
202
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 158 publications
(207 citation statements)
references
References 113 publications
5
202
0
Order By: Relevance
“…It is worth pointing out that deep learning tools such as Clairvoyante may not necessarily achieve better results given higher quality input data until they have been re-trained on similar higher quality data. Secondly, we would expect more uniform coverage of the genome, with fewer 'dark' regions 26 as a result of longer read lengths and the removal of the need for PCR amplification. This would enable variant calling within these previously 'dark' regions, which contain a substantial number of disease-relevant genes, and would allow us to phase across longer regions.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…It is worth pointing out that deep learning tools such as Clairvoyante may not necessarily achieve better results given higher quality input data until they have been re-trained on similar higher quality data. Secondly, we would expect more uniform coverage of the genome, with fewer 'dark' regions 26 as a result of longer read lengths and the removal of the need for PCR amplification. This would enable variant calling within these previously 'dark' regions, which contain a substantial number of disease-relevant genes, and would allow us to phase across longer regions.…”
Section: Discussionmentioning
confidence: 99%
“…Long read sequencing technologies, such as those developed by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have proved invaluable for overcoming these challenges 24,25 . Thorough comparative studies have shown that long reads reduce the number of `dark' or `camouflaged' regions of the genome 26 and improve the sensitivity of structural variant (SV) detection 27 . Of course, both technologies have their pros and cons, but with fast turn-around times and lower start-up costs, and despite higher error rates of >10%, ONT WGS has already been used to resolve SVs in clinical cases 28,29 .…”
Section: Introductionmentioning
confidence: 99%
“…However, this goal is much harder to achieve for larger gene panels containing difficult to sequence genes. This is the case for the neuromuscular disorder field, since 33 out of 203 genes on the consensus myopathy gene lists [4] contain "dark" regions of the genome that are not easily accessible using standard short-read sequencing approaches [1]. Disease-causing variants in these regions can therefore be overlooked leading to a false negative molecular diagnostic result.…”
Section: Discussionmentioning
confidence: 99%
“…However, many regions of human genome remain difficult to analyze using standard short-read sequencing approaches. These "dark" genome regions contain a number of genes responsible for human diseases [1]. For example, several neuromuscular disease-causing genes, such as NEBULIN (NEB) and SELENON (SEPN1), overlap these difficult to sequence regions.…”
Section: Introductionmentioning
confidence: 99%
“…Although this method yield reads with high per‐base consensus accuracy, the relatively short individual read length can prevent unambiguous mapping to the genomic reference sequence. Targets disproportionately affected by this limitation include families of gene and pseudogene homologs (Ebbert et al, ). Other genomic regions are intractable to analysis due to difficulties caused by the sequencing chemistry itself.…”
Section: Introductionmentioning
confidence: 99%