2011
DOI: 10.1038/nrg3117
|View full text |Cite|
|
Sign up to set email alerts
|

Repetitive DNA and next-generation sequencing: computational challenges and solutions

Abstract: Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when inter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

13
1,275
1
7

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 1,470 publications
(1,296 citation statements)
references
References 78 publications
13
1,275
1
7
Order By: Relevance
“…An opposing interpretation is that sequence datasets for larger genomes are less complete than for smaller genomes, due to the inherent difficulties in sequencing and assembly of highly repetitive genomes (e.g. [23]). The presence of more non-coding DNA, much of which will be tightly compacted as heterochromatin, can also make it challenging to achieve truly complete coverage of a genome sequence.…”
Section: Discussion (A) Sequence and Sizementioning
confidence: 99%
“…An opposing interpretation is that sequence datasets for larger genomes are less complete than for smaller genomes, due to the inherent difficulties in sequencing and assembly of highly repetitive genomes (e.g. [23]). The presence of more non-coding DNA, much of which will be tightly compacted as heterochromatin, can also make it challenging to achieve truly complete coverage of a genome sequence.…”
Section: Discussion (A) Sequence and Sizementioning
confidence: 99%
“…Some NGS techniques, such as Ion Torrent (Life Technologies/ThermoFisher, Waltham, Massachusetts, USA), rely on single-nucleotide additions and can have a high error rate for indel detection (1%) 37 . Illumina platforms have high sensitivity (0.1%); however, false-positive errors have also been reported 37,38 . AT-rich regions and GC-rich regions are well known to be problematic in conventional PCR and Sanger sequencing 38 .…”
Section: Ngs-based Platform and Processingmentioning
confidence: 99%
“…Illumina platforms have high sensitivity (0.1%); however, false-positive errors have also been reported 37,38 . AT-rich regions and GC-rich regions are well known to be problematic in conventional PCR and Sanger sequencing 38 . These areas can also be challenging for capture by target and WES probes and, therefore, tend to be underrepresented by NGS.…”
Section: Ngs-based Platform and Processingmentioning
confidence: 99%
See 1 more Smart Citation
“…They can be distinguished by their definition of discordant read, their clustering/grouping techniques, and their details in deriving predictions from groups of discordant reads. Note that handling of reads that became multiply mapped due to repetitive sequence often plays a major role [29], see for example [6] for a combinatorially principled approach.…”
Section: Internal-segment-size Based Approachesmentioning
confidence: 99%