2021
DOI: 10.1016/j.isci.2021.102687
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic partitioning of search patterns for approximate pattern matching using search schemes

Abstract: Summary Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and demonstrate that this technique significantly reduces the search space. We present Columba, a software tool written in C++, in which a multitude of search schemes are implemented. We discuss implemen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…This means we report all occurrences within a pre-specified Hamming or edit distance. This functionality is similar to that provided by lossless read-mapper Columba (developed in the same research group) [25, 23, 24], but Columba’s index is based on the bidirectional FM-index ( O ( n ) memory requirements). To ensure practical efficiency in b-move, we incorporated several optimizations originally developed for Columba: optimized edit distance to reduce redundancy, superior search schemes replacing pigeonhole methods, a lookup-table to bypass matching the first 10-mers, dynamic pattern partitioning, and bit-parallel pattern matching.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This means we report all occurrences within a pre-specified Hamming or edit distance. This functionality is similar to that provided by lossless read-mapper Columba (developed in the same research group) [25, 23, 24], but Columba’s index is based on the bidirectional FM-index ( O ( n ) memory requirements). To ensure practical efficiency in b-move, we incorporated several optimizations originally developed for Columba: optimized edit distance to reduce redundancy, superior search schemes replacing pigeonhole methods, a lookup-table to bypass matching the first 10-mers, dynamic pattern partitioning, and bit-parallel pattern matching.…”
Section: Resultsmentioning
confidence: 99%
“…For example, approximate matches of a pattern can be sought using the pigeonhole principle or, more generally, search schemes [16]. The demonstrated efficiency of search schemes, for example in lossless read-mapper Columba [25, 23, 24], motivates the need for fast bidirectional character extensions.…”
Section: Preliminariesmentioning
confidence: 99%
“…Briefly, search schemes are a new class of sequence alignment algorithms that define how a pattern is matched using a bidirectional full-text index such that unsuccessful branches are discarded as quickly as possible, and runtime is minimized. Their excellent performance has been demonstrated for linear reference genomes [35,36,37,38]. In contrast to lossy heuristics (that often rely on the seed-and-extend paradigm), search schemes are lossless: they guarantee to retrieve all occurrences within a pre-specified number of errors.…”
Section: Contributionsmentioning
confidence: 99%
“…The construction process of the underlying bidirectional FM-index is based on the implementation of Columba [38,41]. The construction of components G and B is similar to the algorithms described in [31].…”
Section: Building the Data Structurementioning
confidence: 99%