2013
DOI: 10.1371/journal.pone.0059484
|View full text |Cite
|
Sign up to set email alerts
|

SeqEntropy: Genome-Wide Assessment of Repeats for Short Read Sequencing

Abstract: BackgroundRecent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free) reads at different lengths.Methodology/Principal FindingsWe define a metric H(k) to be the entropy of sequencing reads at a read length k and use the relative los… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2013
2013
2014
2014

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 19 publications
(22 reference statements)
0
1
0
Order By: Relevance
“…K=k/4), a hash table to count all k-mers in the human genome would require 3K GByte RAM, which quickly becomes implausible when k is greater than 100. Using a solution that is similar to other applications where the hard disk (22)(23)(24) or computing time (25) is traded with RAM, we use a new public-domain program DSK which utilizes the less expensive hard disk or longer CPU time to compensate a lack of RAM (26). Other efficient k-mer count procedures have been proposed in (27)(28)(29).…”
Section: Introductionmentioning
confidence: 99%
“…K=k/4), a hash table to count all k-mers in the human genome would require 3K GByte RAM, which quickly becomes implausible when k is greater than 100. Using a solution that is similar to other applications where the hard disk (22)(23)(24) or computing time (25) is traded with RAM, we use a new public-domain program DSK which utilizes the less expensive hard disk or longer CPU time to compensate a lack of RAM (26). Other efficient k-mer count procedures have been proposed in (27)(28)(29).…”
Section: Introductionmentioning
confidence: 99%