Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems 2018
DOI: 10.1145/3196959.3196967
|View full text |Cite
|
Sign up to set email alerts
|

Joining Extractions of Regular Expressions

Abstract: Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NPcompleteness and W[1]-hardness) from the relat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
14
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 24 publications
(15 citation statements)
references
References 36 publications
(63 reference statements)
1
14
0
Order By: Relevance
“…A new aspect that has not been considered in formal language theory is that of enumerating all query results (i. e., span-tuples). This has been considered in [12,9,2] and it is a major result that constant delay enumeration is possible after linear preprocessing (even if the spanners are given by non-deterministic automata); see especially the survey [3]. The algorithmic approach is to construct the product graph of the automaton that represents the spanner (e. g., the one of Figure 2) and the input document (treated as a path).…”
Section: Regular Spanner Evaluationmentioning
confidence: 99%
“…A new aspect that has not been considered in formal language theory is that of enumerating all query results (i. e., span-tuples). This has been considered in [12,9,2] and it is a major result that constant delay enumeration is possible after linear preprocessing (even if the spanners are given by non-deterministic automata); see especially the survey [3]. The algorithmic approach is to construct the product graph of the automaton that represents the spanner (e. g., the one of Figure 2) and the input document (treated as a path).…”
Section: Regular Spanner Evaluationmentioning
confidence: 99%
“…Clearly, ≺ μ can be computed in polynomial time from μ. This approach was used by Freydenberger, Kimelfeld, and Peterfreund [17] to develop a polynomial delay algorithm for regular spanners.…”
Section: Vstk-automatamentioning
confidence: 99%
“…For the formal construction, we use the following observation. Freydenberger et al [12] observed that we can partition the state set Q P of P into three sets Q have closed all variables. Using these subsets, we can deduce from the current state of P whether some variable is open or not, and if all variables have been closed.…”
mentioning
confidence: 99%
“…. Freydenberger et al [12,Lemma 3.4] showed that every regex formula can be transformed into an equivalent functional VSetautomaton in linear time. This means that we only need to prove the Theorem for C = VSA.…”
mentioning
confidence: 99%
See 1 more Smart Citation