Join operations in temporal databases

Gao, Dengfeng; Jensen, Søren Buus; Snodgrass, Tony; Soo, D.

doi:10.1007/s00778-003-0111-3

Cited by 77 publications

(51 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such applications require to join large relations using inequalities only, such as in temporal and spatial databases, and data cleaning applications. For example, in data analysis in a temporal database, one may want to find all employees and managers that overlapped while working in a certain company [12]. In data cleaning, when detecting violations based on denial constraints, one may want to find all pairs of tuples such that one individual (represented in the tuple) pays more taxes but earns less than another individual [7].…”

Section: Figure 1: East-coast and West-coast Transactionsmentioning

confidence: 99%

“…It also sets an offset variable to distinguish inequality operators with or without equality (lines 9-10). It then visits the values in L2 in the desired order, which is to sequentially scan the permutation array from left to right (lines [11][12][13][14][15][16]. For each tuple visited in L2, it needs to find all tuples whose X values satisfy the join condition.…”

Section: Ieselfjoinmentioning

confidence: 99%

See 1 more Smart Citation

Lightning fast and space efficient inequality joins

Khayyat

Lucia²,

Singh³

et al. 2015

Proc. VLDB Endow.

View full text Add to dashboard Cite

Inequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B + -tree, R * -tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space efficient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster.1. ONCE UPON A TIME . . . Bob1 , a data analyst working for an international provider of cloud services, wanted to analyze revenue and utilization trends from different regions. In particular, he wanted to find out all those transactions from the West-Coast that last longer and produce smaller revenues than any transaction in the East-Coast. In other words, he was looking for any customer from the West-Coast who rented a virtual machine for more hours than any customer from the East-Coast, but who paid less. Figure 1 illustrates a data instance for both tables. He wrote the following join query for such a task: Figure 1: East-Coast and West-Coast transactions Bob first ran Qt over 200K transactions on the distributed system storing the data (System-X). Given that the input dataset is ∼1GB, he expected to have his answer in a minute or so. However, he waited for more than three hours without seeing any result. He immediately thought that this problem comes from System-X and killed the query. He then used an open-source DBMS-X to run his query. Although join is by far the most important and most studied operator in the relational algebra [1], Bob had to wait for over two hours until DBMS-X returned the results. He found that Qt is processed by DBMS-X as a Cartesian product followed by a selection predicate, which is problematic due to the huge number of unnecessary intermediate results.In the meantime, Bob heard that a big DBMS vendor was in town to highlight the power of their recently released distributed DBMS to process big data (DBMS-Y). So he visited them with a small (few KBs) dataset sample of the tables to run Qt. Surprisingly, DBMS-Y could not run Qt for even that small sample! He spent 45 minutes waiting while one of the DBMS-Y experts was trying to solve the issue. Bob left the query running and the vendor never contacted him again. In fact, DBMS-Y is using underneath the same open-source DBMS-X that Bob tried before. He t...

show abstract

Section: Figure 1: East-coast and West-coast Transactionsmentioning

confidence: 99%

Section: Ieselfjoinmentioning

confidence: 99%

Lightning fast and space efficient inequality joins

Khayyat

Lucia²,

Singh³

et al. 2015

Proc. VLDB Endow.

View full text Add to dashboard Cite

show abstract

“…To ensure that our transformations were correct, we compared the result of evaluating each nontemporal query on a timeslice of the temporal database on each day with the result of a timeslice on that day of the result of both transformations of the temporal version of the query on the temporal database, termed commutativity [23]. We also ensured that the results of maximal slicing and per-statement slices were equivalent, and were also equivalent to the union of slices produced by their nontemporal variant.…”

Section: B Experimentsmentioning

confidence: 99%

Temporal Support for Persistent Stored Modules

Snodgrass

Gao

Zhang

et al. 2012

2012 IEEE 28th International Conference on Data Engineering

Self Cite

View full text Add to dashboard Cite

Abstract-We show how to extend temporal support of SQL to the Turing-complete portion of SQL, that of persistent stored modules (PSM). Our approach requires minor new syntax beyond that already in SQL/Temporal to define and to invoke PSM procedures and functions, thereby extending the current, sequenced, and non-sequenced semantics of queries to such routines. Temporal upward compatibility (existing applications work as before when one or more tables are rendered temporal) is ensured. We provide a transformation that converts Temporal SQL/PSM to conventional SQL/PSM. To support sequenced evaluation of stored functions and procedures, we define two different slicing approaches, maximal slicing and per-statement slicing. We compare these approaches empirically using a comprehensive benchmark and provide a heuristic for choosing between them.

show abstract

“…Since Snodgrass' definition of the temporal data model [14], there has been a large body of work in this area, summarized in [12,4]. This related work covers proposals for index structures (e.g., multi-version Btrees [1]) and algorithms for certain kinds of queries (e.g., temporal aggregation [10,2] and temporal joins [4,15]). In most related work the focus was on disk based structures, optimizing for I/O behavior.…”

Section: Introductionmentioning

confidence: 99%

Storing and processing temporal data in a main memory column store

Kaufmann

Kossmann

2013

Proc. VLDB Endow.

View full text Add to dashboard Cite

Managing and accessing temporal data is of increasing importance in industry. So far, most companies model the time dimension on the application layer rather than pushing down the operators to the database, which leads to a significant performance overhead. The goal of this PhD thesis is to develop a native support of temporal features for SAP HANA, which is a commercial inmemory column store database system. We investigate different alternatives to store temporal data physically and analyze the tradeoffs arising from different memory layouts which cluster the data either by time or by space dimension. Taking into account the underlying physical representation, different temporal operators such as temporal aggregation, time travel and temporal join have to be executed efficiently. We present a novel data structure called Timeline Index and algorithms based on this index, which have a very competitive performance for all temporal operators beating existing best-of-breed approaches by factors, sometimes even by orders of magnitude. The results of this thesis are currently being integrated into HANA, with the goal of being shipped to the customers as a productive release within the next few months.

show abstract

Join operations in temporal databases

Cited by 77 publications

References 37 publications

Lightning fast and space efficient inequality joins

Lightning fast and space efficient inequality joins

Temporal Support for Persistent Stored Modules

Storing and processing temporal data in a main memory column store

Contact Info

Product

Resources

About