2020
DOI: 10.1073/pnas.1903436117
|View full text |Cite
|
Sign up to set email alerts
|

Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters

Abstract: Alignment-free classification tools have enabled high-throughput processing of sequencing data in many bioinformatics analysis pipelines primarily due to their computational efficiency. Originally k-mer based, such tools often lack sensitivity when faced with sequencing errors and polymorphisms. In response, some tools have been augmented with spaced seeds, which are capable of tolerating mismatches. However, spaced seeds have seen little practical use in classification becau… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

5
2

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 52 publications
0
11
0
Order By: Relevance
“…Like the counting Bloom filter, the race conditions are minimized for multithreaded insertion. Code adapted from the Multi-index Bloom filter publication (Chu et al, 2020). • Indexlr An optimized and versatile minimizer calculator.…”
Section: Design and Implementationmentioning
confidence: 99%
“…Like the counting Bloom filter, the race conditions are minimized for multithreaded insertion. Code adapted from the Multi-index Bloom filter publication (Chu et al, 2020). • Indexlr An optimized and versatile minimizer calculator.…”
Section: Design and Implementationmentioning
confidence: 99%
“…In contrast, the alignment-free methods for biological sequence classification have proven to be efficient and accurate. In terms of memory utilization, a machine-learning model for sequence classification can be more efficient than an alignmentbased method (Chu et al, 2020). Recently, researchers have used a machine-learning method for biological sequence classification but their method was limited for kingdom level classification (Nugent and Adamowicz, 2020).…”
Section: Model Training and Optimizationmentioning
confidence: 99%
“…From each gap, we extracted 500 bp flanks from both sides to construct a FASTA file using a combination of in-house scripts, SAMtools (v1.9) [ 24 ], and BEDtools (v2.27.1) [ 25 ]. Finally, we used the BioBloomMIMaker utility from BioBloom Tools (v2.3.2) [ 26 ] to construct a multi-index Bloom filter for each flank. Next, using Bio-BloomMICategorizer [ 26 ] we built a FASTǪ file by selecting any read, along with its mate, that mapped to a gap flank sequence.…”
Section: Ethodsmentioning
confidence: 99%
“…Finally, we used the BioBloomMIMaker utility from BioBloom Tools (v2.3.2) [ 26 ] to construct a multi-index Bloom filter for each flank. Next, using Bio-BloomMICategorizer [ 26 ] we built a FASTǪ file by selecting any read, along with its mate, that mapped to a gap flank sequence. For each gap, this pair of FASTA and FASTǪ files was the input used to run GapPredict.…”
Section: Ethodsmentioning
confidence: 99%