On the effect of join operations on relation sizes

Grady, Danièle; Puech, Claude

doi:10.1145/76902.76907

Cited by 27 publications

(4 citation statements)

References 18 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This formula assumes that the attributes values are uniformly distributed. For the attribute values of the skew distribution, Grady [27] and Haas [28] gave corresponding estimation formulas.…”

Section: B Sequential Optimizationmentioning

confidence: 99%

Optimization for Multi-Join Queries on the GPU

Tang

2020

IEEE Access

View full text Add to dashboard Cite

Multi-join queries are important operations in data management systems and data integration systems, and their efficiency has attracted the attention of researchers. In recent years, graphics processing units (GPUs) have developed rapidly and become a powerful tool for parallel computing, providing a new idea for multi-join query optimization. This paper studies the use of GPU technology to optimize multi-join queries and focuses on two points: 1) a multi-phase optimization strategy and 2) optimization methods of each stage. For the first point, we discuss a two-phase optimization strategy on the GPU and prove the effectiveness of this strategy. For the second point, we provide an establishment method of a minimum cost join tree on the GPU, the parallel execution methods of intra-join and inter-join on the GPU, and a strategy of scheduling multiple joins to execute in parallel on the GPU. Experimental results show that the multi-join query optimization proposed in this paper improves the efficiency of multi-join queries, especially in the case of high load and complex join queries, achieving higher throughput than that of previous optimization algorithms.

show abstract

Section: B Sequential Optimizationmentioning

confidence: 99%

Optimization for Multi-Join Queries on the GPU

Tang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Selectivity factors are statistical values stored in the data dictionary of the DBMS. Many research efforts tackled the problem of join size estimation [34][35][36]; Mannino et al give a survey on the suggested statistical values to store, how to maintain them, and how to use them to predict the result sizes of various database operations [37]. Most projects make the same simplifying assumptions as we do: uniformity of attribute values and independence of attribute values [38].…”

Section: Related Work On Completenessmentioning

confidence: 99%

Completeness of integrated information sources

Naumann

Freytag

Leser

2004

Information Systems

View full text Add to dashboard Cite

For many information domains there are numerous World Wide Web data sources. The sources vary both in their extension and their intension: They represent different real world entities with possible overlap and provide different attributes of these entities. Mediator-based information systems allow integrated access to such sources by providing a common schema against which the user can pose queries. Given a query, the mediator must determine which participating sources to access and how to integrate the incoming results.This article describes how to support mediators in their source selection and query planning process. We propose three new merge operators, which formalize the integration of multiple source responses. A completeness model describes the usefulness of a source to answer a query. The completeness measure incorporates both extensional value (called coverage) and intensional value (called density) of a source. We show how to determine the completeness of single sources and of combinations of sources under the new merge operators. Finally, we show how to use the measure for source selection and query planning.

show abstract

“…The number of I/Os is a function of the access plan (e.g., nested loop join, merge join, hybrid join) chosen by the query optimizer of the DBMS to perform the select, of the existence of indexes and type of access method (e.g., b-tree, hashing) used in each table, the buffer size and buffer management policies (e.g., LRU), and of parameters such as page sizes, data and index page fill factors, and others. For relevant previous work on access plans, join processing, and query optimization see [2], [5], [29], [37], [38], [42], [44], [45]. Some of these papers describe the operation of access plans and others concentrate on estimating the resulting size of joins between relations, for various types of joins.…”

Section: Step 8-b: Db Modelingmentioning

confidence: 99%

A method for design and performance modeling of client/server systems

Menascé

Gomaa

2000

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

ÐDesigning complex distributed client/server applications that meet performance requirements may prove extremely difficult in practice if software developers are not willing or do not have the time to help software performance analysts. This paper advocates the need to integrate both design and performance modeling activities so that one can help the other. We present a method developed and used by the authors in the design of a fairly large and complex client/server application. The method is based on a software performance engineering language developed by one of the authors. Use cases were developed and mapped to a performance modeling specification using the language. A compiler for the language generates an analytic performance model for the system. Service demand parameters at servers, storage boxes, and networks are derived by the compiler from the system specification. A detailed model of DBMS query optimizers allows the compiler to estimate the number of I/Os and CPU time for SQL statements. The paper concludes with some results of the application that prompted the development of the method and language.

show abstract

On the effect of join operations on relation sizes

Cited by 27 publications

References 18 publications

Optimization for Multi-Join Queries on the GPU

Optimization for Multi-Join Queries on the GPU

Completeness of integrated information sources

A method for design and performance modeling of client/server systems

Contact Info

Product

Resources

About