We study two succinct representation systems for relational data based on relational algebra expressions with unions, Cartesian products, and singleton relations: f-representations, which employ algebraic factorisation using distributivity of product over union, and d-representations, which are f-representations where further succinctness is brought by explicit sharing of repeated subexpressions.In particular we study such representations for results of conjunctive queries. We derive tight asymptotic bounds for representation sizes and present algorithms to compute representations within these bounds. We compare the succinctness of f-representations and d-representations for results of equi-join queries, and relate them to fractional edge covers and fractional hypertree decompositions of the query hypergraph.Recent work showed that f-representations can significantly boost the performance of query evaluation in centralised and distributed settings and of machine learning tasks. ACM Reference Format:Dan Olteanu and Jakub Závodný. 2015. Size bounds for factorised representations of query results. ACM Trans.
This paper settles the optimality of sorting networks given in The Art of Computer Programming vol. 3 more than 40 years ago. The book lists efficient sorting networks with n ≤ 16 inputs. In this paper we give general combinatorial arguments showing that if a sorting network with a given depth exists then there exists one with a special form. We then construct propositional formulas whose satisfiability is necessary for the existence of such a network. Using a SAT solver we conclude that the listed networks have optimal depth. For n ≤ 10 inputs where optimality was known previously, our algorithm is four orders of magnitude faster than those in prior work.
A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier work, we put forward a succinct representation system for relational data called factorised databases and reported on the main-memory query engine FDB for select-project-join queries on such databases.In this paper, we extend FDB to support a larger class of practical queries with aggregates and ordering. This requires novel optimisation and evaluation techniques. We show how factorisation coupled with partial aggregation can effectively reduce the number of operations needed for query evaluation. We also show how factorisations of query results can support enumeration of tuples in desired orders as efficiently as listing them from the unfactorised, sorted results.We experimentally observe that FDB can outperform offthe-shelf relational engines by orders of magnitude.Orders customer date pizza
FDB is an in-memory query engine for factorised databases, which are relational databases that use compact factorised representations at the physical layer to reduce data redundancy and boost query performance.We demonstrate FDB using real data sets from IMDB, DBLP, and the NELL repository of facts learned from Web pages. The users can inspect factorisations as well as plans used by FDB to compute factorised results of select-projectjoin queries on factorised databases. FACTORISED DATABASESThe thesis underlying factorised databases is that relational databases can admit compact representations by algebraic factorisation using distributivity of product over union. This is similar in spirit to the relationship between logic functions in disjunctive normal form and their equivalent nested forms obtained by algebraic factorisation. In earlier work [7] we give a complete characterisation of the compactness of factorised results for select-project-join queries on relational databases and show that the gap between the sizes of query results and of their factorised representations can be exponential. In particular, there are arbitrarily large queries for which the query results have sizes exponential in the query size yet their factorised representations only have sizes bounded by the input database size. A similar exponential gap holds between the times needed to compute from the input relational database the query results and their factorised representations. Furthermore, the succinctness and performance gaps widen when we consider factorised databases as input. Experiments with our in-memory engine FDB for select-project-join queries on factorised databases show that FDB can be up to six orders of magnitude faster than relational engines such as PostgreSQL, SQLite, and a home-bred in-memory relational engine, for a wide range of queries on data sets with many-to-many relationships [3].Factorised databases have applications beyond relational query evaluation. Factorised provenance polynomials are used as compact encoding of provenance [6] and for efficient query evaluation in probabilistic databases [8]. Factorised representations are a natural fit whenever we deal with a large space of possibilities and can be used to represent, e.g., AND/OR trees used in design specification [5] and world-set decompositions used for incomplete information [1]. They can also be used to compactly represent the space of feasible solutions to configuration problems in constraint satisfaction, where we need to connect a set of components so as to meet an objective while respecting given constraints [2].The focus of this demonstration is our query engine FDB. The audience will experiment with FDB on several data sets including the NELL knowledge base learned from a large corpus of Web pages [4], will explore visually FDB evaluation plans as well as factorised intermediate and final query results, and will compare the time and space requirements of FDB to those of PostgreSQL and SQLite. FDB will be introduced to the audience by exa...
Query tractability has been traditionally defined as a function of input database and query sizes, or of both input and output sizes, where the query result is represented as a bag of tuples. In this report, we introduce a framework that allows to investigate tractability beyond this setting. The key insight is that, although the cardinality of a query result can be exponential, its structure can be very regular and thus factorisable into a nested representation whose size is only polynomial in the size of both the input database and query.For a given query result, there may be several equivalent representations, and we quantify the regularity of the result by its readability, which is the minimum over all its representations of the maximum number of occurrences of any tuple in that representation. We give a characterisation of select-project-join queries based on the bounds on readability of their results for any input database. We complement it with an algorithm that can find asymptotically optimal upper bounds and corresponding factorised representations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.