2016
DOI: 10.1371/journal.pcbi.1005107
|View full text |Cite
|
Sign up to set email alerts
|

rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

Abstract: Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
48
0
1

Year Published

2017
2017
2020
2020

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 39 publications
(49 citation statements)
references
References 51 publications
0
48
0
1
Order By: Relevance
“…The results of these programs therefore depend on the underlying patterns. Both programs use the software rasbhari [24] to calculate patterns. rasbhari uses a probabilistic algorithm, so different program runs usually return different patterns and, as a result, different program runs with FSWM and Multi-SpaM may produce slightly different distance estimates, even if the same parameter setting is used.…”
Section: Test Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The results of these programs therefore depend on the underlying patterns. Both programs use the software rasbhari [24] to calculate patterns. rasbhari uses a probabilistic algorithm, so different program runs usually return different patterns and, as a result, different program runs with FSWM and Multi-SpaM may produce slightly different distance estimates, even if the same parameter setting is used.…”
Section: Test Resultsmentioning
confidence: 99%
“…by default the pattern has 10 match positions and 100 don't-care positions, but other values for and w can be chosen by the user. Given these parameters, P is calculated by running our previously developed software tool rasbhari [24]. As a basis for phylogeny reconstruction, we are using four-way alignments consisting of occurrences of the same spaced word with respect to P in four different sequences.…”
Section: Spaced Words and P -Blocksmentioning
confidence: 99%
“…By default, our program uses a set of 5 patterns. To find good patterns sets, we integrated the tool rasbhari [20] into our implementation. rasbhari uses a hill climbing algorithm to reduce the overlap complexity [28] of pattern sets.…”
Section: Methodsmentioning
confidence: 99%
“…Fig S3) according to metrics like overlap complexity (46), yet we still have not yet scratched the surface in terms of optimal spaced seed design. Optimal seed sensitivity computation NP-hard (47), and although faster approximations exist (48), they are still quite slow and infeasible because our seed can be any length. Optimal design for sequence classification is a function of sequencing error rate, homology detection tolerance, sequence length, and mutation/error types.…”
Section: Future Workmentioning
confidence: 99%