2014
DOI: 10.1186/s12859-014-0357-3
|View full text |Cite
|
Sign up to set email alerts
|

NeatFreq: reference-free data reduction and coverage normalization for De Novosequence assembly

Abstract: BackgroundDeep shotgun sequencing on next generation sequencing (NGS) platforms has contributed significant amounts of data to enrich our understanding of genomes, transcriptomes, amplified single-cell genomes, and metagenomes. However, deep coverage variations in short-read data sets and high sequencing error rates of modern sequencers present new computational challenges in data interpretation, including mapping and de novo assembly. New lab techniques such as multiple displacement amplification (MDA) of sin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 26 publications
0
15
0
Order By: Relevance
“…Although coverage reduction has been primarily used for unbalanced data ( Brown et al 2012 ), we have shown in Lonardi et al (2015) that in the presence of ultra-deep sequencing data, the assembly of a random sample of the input reads only marginally improves the assembly quality compared with the assembly of entire dataset. Diginorm ( Brown et al 2012 ) and NeatFreq ( McCorrison et al 2014 ) are two examples of down-sampling methods aimed to produce a more uniform coverage. They both reduce coverage by selecting representative reads binned by their median k-mer frequency.…”
Section: Introductionmentioning
confidence: 99%
“…Although coverage reduction has been primarily used for unbalanced data ( Brown et al 2012 ), we have shown in Lonardi et al (2015) that in the presence of ultra-deep sequencing data, the assembly of a random sample of the input reads only marginally improves the assembly quality compared with the assembly of entire dataset. Diginorm ( Brown et al 2012 ) and NeatFreq ( McCorrison et al 2014 ) are two examples of down-sampling methods aimed to produce a more uniform coverage. They both reduce coverage by selecting representative reads binned by their median k-mer frequency.…”
Section: Introductionmentioning
confidence: 99%
“…By randomly merging counters, the CMS provides an accurate frequency estimate for the most frequent items of a data stream. Diginorm [5], Bignorm [28] and NeatFreq [20] are sketching algorithms for genetic data normalization that use the CMS. These methods reduce the size of genetic datasets by eliminating redundant reads.…”
Section: Composable Coresetsmentioning
confidence: 99%
“…NeatFreq [15] clusters and selects short reads based on the median k-mer frequency. However, the main innovation in the work is the inclusion of methods for the use of paired reads alongside with preferential selection of regions with extremely low coverage.…”
Section: Related Workmentioning
confidence: 99%