POSIX Regular Expression Parsing with Derivatives

Sulzmann, Martin; Lu, Kenny Zhuo Ming

doi:10.1007/978-3-319-07151-0_13

Cited by 22 publications

(77 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The experience of doing our proofs has been that this mechanical checking was absolutely essential: this subject area has hidden snares. This was also noted by Kuklewicz [7] who found that nearly all POSIX matching implementations are "buggy" [11,Page 203] and by Grathwohl et al [5, Page 36] who wrote:…”

Section: Introductionmentioning

confidence: 60%

“…There are two commonly used disambiguation strategies to generate a unique answer: one is called GREEDY matching [4] and the other is POSIX matching [7,11,13]. For example consider the string xy and the regular expression (x + y + xy)…”

Section: Introductionmentioning

confidence: 99%

“…Sulzmann and Lu [11] extended this matcher to allow generation not just of a YES/NO answer but of an actual matching, called a [lexical] value. They give a simple algorithm to calculate a value that appears to be the value associated with POSIX matching.…”

Section: Introductionmentioning

confidence: 99%

“…The answer given by Sulzmann and Lu [11] is to define a relation (called an "order relation") on the set of values of r, and to show that (once a string to be matched is chosen) there is a maximum element and that it is computed by their derivativebased algorithm. This proof idea is inspired by work of Frisch and Cardelli [4] on a GREEDY regular expression matching algorithm.…”

Section: Introductionmentioning

confidence: 99%

“…However, we were not able to establish transitivity and totality for the "order relation" by Sulzmann and Lu. In Section 5 we identify some inherent problems with their approach (of which some of the proofs are not published in [11]); perhaps more importantly, we give a simple inductive (and algorithm-independent) definition of what we call being a POSIX value for a regular expression r and a string s; we show that the algorithm computes such a value and that such a value is unique. Our proofs are both done by hand and checked in Isabelle/HOL.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl)

Ausaf

Dyckhoff

Urban

2016

Interactive Theorem Proving

View full text Add to dashboard Cite

Abstract. Brzozowski introduced the notion of derivatives for regular expressions. They can be used for a very simple regular expression matching algorithm. Sulzmann and Lu cleverly extended this algorithm in order to deal with POSIX matching, which is the underlying disambiguation strategy for regular expressions needed in lexers. Sulzmann and Lu have made available on-line what they call a "rigorous proof" of the correctness of their algorithm w.r.t. their specification; regrettably, it appears to us to have unfillable gaps. In the first part of this paper we give our inductive definition of what a POSIX value is and show (i) that such a value is unique (for given regular expression and string being matched) and (ii) that Sulzmann and Lu's algorithm always generates such a value (provided that the regular expression matches the string). We also prove the correctness of an optimised version of the POSIX matching algorithm. Our definitions and proof are much simpler than those by Sulzmann and Lu and can be easily formalised in Isabelle/HOL. In the second part we analyse the correctness argument by Sulzmann and Lu and explain why the gaps in this argument cannot be filled easily.

show abstract

Section: Introductionmentioning

confidence: 60%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl)

Ausaf

Dyckhoff

Urban

2016

Interactive Theorem Proving

View full text Add to dashboard Cite

show abstract

Efficient POSIX submatch extraction on nondeterministic finite automata

Borsotti

Trofimovich

2020

Softw Pract Exp

View full text Add to dashboard Cite

SummaryIn this paper we study the performance of POSIX submatch extraction algorithms based on nondeterministic finite automata (NFA). We propose an algorithm that combines Laurikari tagged NFA and extended Okui‐Suzuki disambiguation. The algorithm works in worst‐case O(n m2 t) time and O(m2) space (including preprocessing), where n is the length of input, m is the size of the regular expression with bounded repetition expanded and t is the number of capturing groups and subexpressions that contain them. On real‐world benchmarks our algorithm performs close to the O(n m t) complexity of leftmost‐greedy matching, although on artificial benchmarks it can be significantly slower. We propose a lazy version of the algorithm that runs much faster, but requires O(n m2) space. We show that the Kuklewicz algorithm is slower in practice, and the backward matching algorithm proposed by Cox is incorrect.

show abstract