2020
DOI: 10.1145/3428286
|View full text |Cite
|
Sign up to set email alerts
|

Regex matching with counting-set automata

Abstract: We propose a solution to the problem of efficient matching regular expressions (regexes) with bounded repetition, such as (ab){1,100}, using deterministic automata. For this, we introduce novel counting-set automata (CsAs) , automata with registers that can hold sets of bounded integers and can be manipulated by a limited portfolio of constant-time operations. We present an algorithm that compiles a large sub-class of regexes to deterministic CsAs. This includes (1) a novel Antimirov-st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
42
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(42 citation statements)
references
References 41 publications
0
42
0
Order By: Relevance
“…DFAs and NFAs have been extended by [23] and [8] respectively by introducing counting operations and guards as an alternative to unfolding for large repetition bounds. An implementation of a class of counter automata, proposed in [59], is based on queues for representing sets of counter values. A variety of software regex matchers, including RE2 [18,41], Rust's Regex [44], PCRE [37], SRM [45], and Hyperscan [65] support the matching of regexes with counting.…”
Section: Related Workmentioning
confidence: 99%
“…DFAs and NFAs have been extended by [23] and [8] respectively by introducing counting operations and guards as an alternative to unfolding for large repetition bounds. An implementation of a class of counter automata, proposed in [59], is based on queues for representing sets of counter values. A variety of software regex matchers, including RE2 [18,41], Rust's Regex [44], PCRE [37], SRM [45], and Hyperscan [65] support the matching of regexes with counting.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, note that, since k is written as a decadic numeral, its value is exponential in the size of the regex. This makes matching with already moderately high k prone to significant slowdowns and ReDoS vulnerabilities with virtually every mainstream matcher (see [23,13]). At the same time, repetition bounds easily reach thousands, in extreme tens of millions (in real-life XML [24]).…”
Section: Introductionmentioning
confidence: 99%
“…The problem of matching with bounded repetition has been addressed from the theoretical as well as from the practical perspective by a number of authors [25,24,26,27,28,29,30,23]. From these, the recent work [23] is the only one offering fast matching for a practically significant class of regexes.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations