Smooth Scan: Statistics-oblivious access paths

Borovica-Gajić, Renata; Idreos, Stratos; Ailamaki, Anastasia; Żukowski, Marcin; Fraser, Campbell

doi:10.1109/icde.2015.7113294

Cited by 16 publications

(9 citation statements)

References 29 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each query consists of one select-project-join block 4 . The join graph of the query is shown in Figure 2.…”

Section: The Job Queriesmentioning

confidence: 99%

“…The query runtime heavily depends on how the system's optimizer uses the estimates and how much trust it puts into these numbers. A sophisticated engine may employ adaptive operators (e.g., [4,8]) and thus mitigate the impact of misestimations. The results do, however, demonstrate that the state-of-the-art in cardinality estimation is far from perfect.…”

Section: Estimates For Joinsmentioning

confidence: 99%

“…The problem with fixed hash table sizes for PostgreSQL illustrates that cost misestimation can often be mitigated by making the runtime behavior of the query engine more "performance robust". This links to a body of work to make systems adaptive to estimation mistakes, e.g., dynamically switch sides in a join, or change between hashing and sorting (GJoin [15]), switch between sequential scan and index lookup (smooth scan [4]), adaptively reordering join pipelines during query execution [24], or change aggregation strategies at runtime depending on the actual number of group-by values [31] or partition-by values [3].…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

How good are query optimizers, really?

Leis¹,

Gubichev²,

Mirchev³

et al. 2015

Proc. VLDB Endow.

427

333

View full text Add to dashboard Cite

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. Finally, we investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the sub-optimal cardinality estimates.

show abstract

“…Each query consists of one select-project-join block 4 . The join graph of the query is shown in Figure 2.…”

Section: The Job Queriesmentioning

confidence: 99%

Section: Estimates For Joinsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

How good are query optimizers, really?

Leis¹,

Gubichev²,

Mirchev³

et al. 2015

Proc. VLDB Endow.

427

333

View full text Add to dashboard Cite

show abstract

“…On the contrary, our join processing does not need to rely on extra statistics as in zone-maps. Adaptivity during run-time regarding the decision of scanning a base relation or use a secondary index has been studied in [10,11] for disk-based systems.…”

Section: Rdf and Sparqlmentioning

confidence: 99%

In-memory parallelization of join queries over large ontological hierarchies

Bilidas

Koubarakis

2020

Distrib Parallel Databases

View full text Add to dashboard Cite

The Resource Description Framework (RDF) data model enables the construction of knowledge graphs over various domains, using ontologies in order to encode information about the domain, and simple statements in the form of subjectpredicate-object triples for data representation, facilitating the interlinking and exchange of Web data. However, this simplicity comes with the cost of having to execute a large number of joins in order to get the desirable query results, while at the same time large ontological hierarchies complicate the query answering process even more, for systems that provide complete answers with respect to such ontological axioms. In this work we present PARJ, an in-memory RDF store which takes into consideration ontological hierarchies during join processing with very low performance overhead, avoiding expensive preprocessing and materialization of implications, and is also amenable to straightforward parallelization. Specifically, we present a join implementation that allows to achieve any desired degree of parallelism on arbitrary join queries and RDF graphs stored in memory using compact vertical partitioning. We use an adaptive join processing approach, such that we take advantage of complete or even partial ordering of RDF data, which is compactly stored in order to increase spatial locality and keep memory consumption low, coupled with an ID-to-Position vector index used when ordering does not allow for efficient scanning of the input relation. Finally, we experimentally show the efficiency and scalability of our proposal.

show abstract

“…F1 Query eliminates both of these cliffs from its implementation of sorting and aggregation. Successful examples of cliff avoidance or removal include SmoothScan [16] and dynamic destaging in hash joins [52]. Dynamic re-optimization would introduce a huge cliff if a single row "too many" will stop execution and re-start the compile-time optimizer.…”

Section: Robust Performancementioning

confidence: 99%

F1 query

et al. 2018

View full text Add to dashboard Cite

F1 Query is a stand-alone, federated query processing platform that executes SQL queries against data stored in different filebased formats as well as different storage systems at Google (e.g., Bigtable, Spanner, Google Spreadsheets, etc.). F1 Query eliminates the need to maintain the traditional distinction between different types of data processing workloads by simultaneously supporting: (i) OLTP-style point queries that affect only a few records; (ii) low-latency OLAP querying of large amounts of data; and (iii) large ETL pipelines. F1 Query has also significantly reduced the need for developing hard-coded data processing pipelines by enabling declarative queries integrated with custom business logic. F1 Query satisfies key requirements that are highly desirable within Google: (i) it provides a unified view over data that is fragmented and distributed over multiple data sources; (ii) it leverages datacenter resources for performant query processing with high throughput and low latency; (iii) it provides high scalability for large data sizes by increasing computational parallelism; and (iv) it is extensible and uses innovative approaches to integrate complex business logic in declarative query processing. This paper presents the end-to-end design of F1 Query. Evolved out of F1, the distributed database originally built to manage Google's advertising data, F1 Query has been in production for multiple years at Google and serves the querying needs of a large number of users and systems.

show abstract

Smooth Scan: Statistics-oblivious access paths

Cited by 16 publications

References 29 publications

How good are query optimizers, really?

How good are query optimizers, really?

In-memory parallelization of join queries over large ontological hierarchies

F1 query

Contact Info

Product

Resources

About