2015
DOI: 10.1101/024679
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cookiecutter: a tool for kmer-based read filtering and extraction

Abstract: Motivation:Kmer-based analysis is a powerful method used in read error correction and implemented in various genome assembly tools. A number of read processing routines include extracting or removing sequence reads from the results of high-throughput sequencing experiments prior to further analysis. Here we present a new approach to sorting or filtering of raw reads based on a provided list of kmers.Results: We developed Cookiecutter -a computational tool for rapid read extraction or removing according to a pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0
2

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
2
2

Relationship

4
6

Authors

Journals

citations
Cited by 24 publications
(24 citation statements)
references
References 11 publications
0
22
0
2
Order By: Relevance
“…We used the Jellyfish software [ 74 ] for computing 23-mer frequencies and choosing a subset of 23-mers with coverage greater than 1,000. We used the Cookiecutter package [ 103 ] for extraction of raw reads containing subset of 23-mers with coverage greater than 1000. The selected reads were used to manually assemble tandem repeat monomer consensus sequences with the help of the targeted de novo short-read assembler PRICE [ 104 ].…”
Section: Methodsmentioning
confidence: 99%
“…We used the Jellyfish software [ 74 ] for computing 23-mer frequencies and choosing a subset of 23-mers with coverage greater than 1,000. We used the Cookiecutter package [ 103 ] for extraction of raw reads containing subset of 23-mers with coverage greater than 1000. The selected reads were used to manually assemble tandem repeat monomer consensus sequences with the help of the targeted de novo short-read assembler PRICE [ 104 ].…”
Section: Methodsmentioning
confidence: 99%
“…Spa-1, a total of 52,358,830 paired-end reads were generated, equating to approximately 13.09 Gb of sequence data. For five southern solenodons (S. p. woodi), an average of 151,783,327 paired-end reads were generated, equating to an average of 15. were also assembled using a Bruijn graph based algorithm that considers coverage, based on the software Cookiecutter (Starostina et al 2015). Both produced identical results outside of the control region and open reading frames were present in all coding regions which might otherwise indicate assembly of numts (Lopez et al 1994).…”
Section: Mitogenome Sequence Generation Assembly and Annotationmentioning
confidence: 99%
“…Machine learning will help to select which measure of sequence complexity is more predictive of read alignment performance. Some read trimming, masking, or filtering software uses sequence complexity Porter and Zhang (2017); Starostina et al (2015). The bisulfite software BatMeth has a low complexity filter using Shannon entropy Lim et al (2012), and BLAST can use a sequence complexity mask with the DUST score Morgulis et al (2006); Altschul et al (1990).…”
Section: Related Work and Motivationmentioning
confidence: 99%