2020
DOI: 10.1007/978-3-030-38919-2_40
|View full text |Cite
|
Sign up to set email alerts
|

Fast Indexes for Gapped Pattern Matching

Abstract: We describe indexes for searching large data sets for variablelength-gapped (VLG) patterns. VLG patterns are composed of two or more subpatterns, between each adjacent pair of which is a gap-constraint specifying upper and lower bounds on the distance allowed between subpatterns. VLG patterns have numerous applications in computational biology (motif search), information retrieval (e.g., for language models, snippet generation, machine translation) and capture a useful subclass of the regular expressions commo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 20 publications
0
8
0
Order By: Relevance
“…Cáceres et al [7] provided indexes in the context of huge data sets for variable lengthgapped (VLG) patterns. These variable-length gapped patterns include two or more subpatterns.…”
Section: Background Workmentioning
confidence: 99%
“…Cáceres et al [7] provided indexes in the context of huge data sets for variable lengthgapped (VLG) patterns. These variable-length gapped patterns include two or more subpatterns.…”
Section: Background Workmentioning
confidence: 99%
“…Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). An important variant of this problem is the gapped string indexing problem [6,8,10,14,27,28,31]. Here, the goal is to compactly represent the string such that given two patterns P 1 and P 2 and a gap range [α, β] we can quickly find occurrences of P 1 and P 2 with distance in [α, β].…”
Section: Introductionmentioning
confidence: 99%
“…Here, the goal is to compactly represent the string such that given two patterns P 1 and P 2 and a gap range [α, β] we can quickly find occurrences of P 1 and P 2 with distance in [α, β]. Searching and indexing with gaps is frequently used in computational biology applications [6,11,13,14,19,21,22,32,35,38].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The goal is to obtain a compact data structure while supporting fast queries in terms of the length of the pattern P and the number of reported occurrences k. For an example, see Figure 1. 4,7,11,22,24,26,30,39 and 41 in S. The top 5 close consecutive occurrences are (22,24), (24,26), (39, 41), (4,7), and (7,11), with the tie between (7,11) and (26,30) broken arbitrarily.…”
Section: Introductionmentioning
confidence: 99%