Provenance for nested subqueries

Glavic, Boris; Alonso, Gustavo

doi:10.1145/1516360.1516472

Cited by 24 publications

(16 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This minimizes the main problem of the technique presented in [4], which was the huge number of tuples that the user must consider in order to determine the validity of the result produced by a relation. Previous works deal with the problem of tracking provenance information for query results [9,7], but to the best of our knowledge, none of them treat the case of missing tuples, which is important in our setting. The proposed algorithm looks for particular but common error sources, like tuples missed in the from section or in and conditions (that is, intersect components in our representation).…”

Section: Discussionmentioning

confidence: 99%

Declarative Debugging of Wrong and Missing Answers for SQL Views

Caballero

García-Ruiz

Sáenz-Pérez

2012

Functional and Logic Programming

View full text Add to dashboard Cite

Abstract. This paper presents a debugging technique for diagnosing errors in SQL views. The debugger allows the user to specify the error type, indicating if there is either a missing answer (a tuple was expected but it is not in the result) or a wrong answer (the result contains an unexpected tuple). This information is employed for slicing the associated queries, keeping only those parts that might be the cause of the error. The validity of the results produced by sliced queries is easier to determine, thus facilitating the location of the error. Although based on the ideas of declarative debugging, the proposed technique does not use computation trees explicitly. Instead, the logical relations among the nodes of the trees are represented by logical clauses that also contain the information extracted from the specific questions provided by the user. The atoms in the body of the clauses correspond to questions that the user must answer in order to detect an incorrect relation. The resulting logic program is executed by selecting at each step the unsolved atom that yields the simplest question, repeating the process until an erroneous relation is detected. Soundness and completeness results are provided. The theoretical ideas have been implemented in a working prototype included in the Datalog system DES.

show abstract

Section: Discussionmentioning

confidence: 99%

Declarative Debugging of Wrong and Missing Answers for SQL Views

Caballero

García-Ruiz

Sáenz-Pérez

2012

Functional and Logic Programming

View full text Add to dashboard Cite

show abstract

“…We select 11 out of the 22 TPC-H queries to evaluate optimization of provenance capture for complex queries. The technique [31] we are using supports all TPC-H queries, but instrumentations for nested subqueries have not been implemented in GProM yet.…”

Section: Tpc-h Queriesmentioning

confidence: 99%

Heuristic and Cost-Based Optimization for Diverse Provenance Tasks

Niu

Kapoor

Glavic

et al. 2019

IEEE Trans. Knowl. Data Eng.

Self Cite

View full text Add to dashboard Cite

A well-established technique for capturing database provenance as annotations on data is to instrument queries to propagate such annotations. However, even sophisticated query optimizers often fail to produce efficient execution plans for instrumented queries. We develop provenance-aware optimization techniques to address this problem. Specifically, we study algebraic equivalences targeted at instrumented queries and alternative ways of instrumenting queries for provenance capture. Furthermore, we present an extensible heuristic and cost-based optimization framework utilizing these optimizations. Our experiments confirm that these optimizations are highly effective, improving performance by several orders of magnitude for diverse provenance tasks. ! INTRODUCTIONDatabase provenance, information about the origin of data and the queries and/or updates that produced it, is critical for debugging queries, auditing, establishing trust in data, and many other use cases. The de facto standard for database provenance [1], [2] is to model provenance as annotations on data and define a query semantics that determines how annotations propagate. Under such a semantics, each output tuple t of a query Q is annotated with its provenance, i.e., a combination of input tuple annotations that explains how these inputs were used by Q to derive t.Database provenance systems such as Perm [3], GProM [4], DBNotes [5], LogicBlox [2], declarative Datalog debugging [6], ExSPAN [7], and many others use a relational encoding of provenance annotations. These systems typically compile queries with annotated semantics into relational queries that produce this encoding of provenance annotations following the process outlined in Fig. 23a. We refer to this reduction from annotated to standard relational semantics as provenance instrumentation or instrumentation for short. The example below introduces a relational encoding of provenance polynomials [1] and the instrumentation approach for this model implemented in Perm [3].Example 1. Consider a query over the database in Fig. 1 returning shops that sell items which cost more than $20:The query's result is shown in Fig. 1d. Using provenance The instrumentation we are using here is defined for any SPJ (Select-Project-Join) query (and beyond) based on a set of algebraic rewrite rules (see [3] for details).The present paper extends [8]. Additional details are presented in the appendix. Instrumentation PipelinesIn this work, we focus on optimizing instrumentation pipelines such as the one from Example 1. These pipelines divide the compilation of a frontend language to a target language into multiple compilation steps using one or more intermediate languages. We now introduce a subset of the pipelines supported by our approach to illustrate the breadth of applications supported by instrumentation. Our approach can be applied to any data management task that can be expressed as instrumentation. Notably, our implementation already supports additional pipelines, e.g., for summarizing provenance and managing ...

show abstract

“…The closest to our work is the Perm System [81][84] [85][86] that extends the PostgreSQL DBMS and rewrites queries to obtain a provenance query that determines the source data. The reduction rules described in this paper also compute the source data for the coverage rules, but the implementation does not depend on a given DBMS and adds additional information needed to allow the reduction procedures to select a subset of the source data.…”

Section: Testing Database Applicationsmentioning

confidence: 99%

Coverage-Aware Test Database Reduction

Tuya

Riva

Suárez-Cabal

et al. 2016

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Abstract-Functional testing of applications that process the information stored in databases often requires a careful design of the test database. The larger the test database, the more difficult it is to develop and maintain tests as well as to load and reset the test data. This paper presents an approach to reduce a database with respect to a set of SQL queries and a coverage criterion. The reduction procedures search the rows in the initial database that contribute to the coverage in order to find a representative subset that satisfies the same coverage as the initial database. The approach is automated and efficiently executed against large databases and complex queries. The evaluation is carried out over two real life applications and a well-known database benchmark. The results show a very large degree of reduction as well as scalability in relation to the size of the initial database and the time needed to perform the reduction.

show abstract

Provenance for nested subqueries

Cited by 24 publications

References 15 publications

Declarative Debugging of Wrong and Missing Answers for SQL Views

Declarative Debugging of Wrong and Missing Answers for SQL Views

Heuristic and Cost-Based Optimization for Diverse Provenance Tasks

Coverage-Aware Test Database Reduction

Contact Info

Product

Resources

About