2012
DOI: 10.1139/g11-088
|View full text |Cite
|
Sign up to set email alerts
|

Under-representation of repetitive sequences in whole-genome shotgun sequence databases: an illustration using a recently acquired transposable element

Abstract: It is widely accepted in a conceptual framework that repetitive sequences, especially those with high sequence homogeneity among copies, tend to be under-represented in whole-genome shotgun sequence databases, because of the difficulty of assembling sequence reads into contigs. Although this is easily inferred, there is no quantitative illustration of this phenomenon. An example using a currently used database is expected to contribute to the intuitive understanding of how serious the under-representation is. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
6

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…It is worth noting that the relationship between total coverage of a genome and the size of the discrepancy was weakly negative but marginally non-significant (r ¼ 20.117, p ¼ 0.088, n ¼ 212 contrasts), suggesting that at least some of the information that may be missing from many sequence datasets is indeed difficult to obtain. As an example, Koga [24] showed that harder to assemble internal regions of the Tol2 DNA transposon in the Oryzias latipes genome were missing from the assembly but were nonetheless quantifiable using a Southern blot analysis of genomic DNA. Perhaps more important is the disparity in the sizes of genomes that have been examined using sequencing versus traditional methods.…”
Section: Discussion (A) Sequence and Sizementioning
confidence: 99%
“…It is worth noting that the relationship between total coverage of a genome and the size of the discrepancy was weakly negative but marginally non-significant (r ¼ 20.117, p ¼ 0.088, n ¼ 212 contrasts), suggesting that at least some of the information that may be missing from many sequence datasets is indeed difficult to obtain. As an example, Koga [24] showed that harder to assemble internal regions of the Tol2 DNA transposon in the Oryzias latipes genome were missing from the assembly but were nonetheless quantifiable using a Southern blot analysis of genomic DNA. Perhaps more important is the disparity in the sizes of genomes that have been examined using sequencing versus traditional methods.…”
Section: Discussion (A) Sequence and Sizementioning
confidence: 99%
“…CENP-B boxes are embedded in centromeric repetitive DNA. In genome sequence databases constructed through next-generation sequencing, repetitive DNA is generally underrepresented and susceptible to artificial alterations because of the difficulty in assembling contigs [17,18]. The centromere regions are still left as large gaps even in the human sequence databases.…”
Section: Discussionmentioning
confidence: 99%
“…In this previous study, however, our BLAST search of the tammar wallaby genome database resulted in the detection of only one sequence that was aligned to the LTR portion of walb . This is not a contradiction, considering that repetitive sequences tend to be underrepresented or even missing in genome databases constructed through the assembly of short sequence reads (Koga, 2012; Weissensteiner et al, 2017). In the present study, as an approach with a lower risk of representation bias, we performed NCBI BLAST searches of the Illumina short‐read data collections of two tammar wallaby individuals (Figure 2).…”
Section: Resultsmentioning
confidence: 99%