2014
DOI: 10.14778/2735496.2735499
|View full text |Cite
|
Sign up to set email alerts
|

Memory-efficient hash joins

Abstract: We present new hash tables for joins, and a hash join based on them, that consumes far less memory and is usually faster than recently published in-memory joins. Our hash join is not restricted to outer tables that fit wholly in memory. Key to this hash join is a new concise hash table (CHT), a linear probing hash table that has 100% fill factor, and uses a sparse bitmap with embedded population counts to almost entirely avoid collisions. This bitmap also serves as a Bloom filter for use in multi-table joins.W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
35
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 67 publications
(35 citation statements)
references
References 20 publications
0
35
0
Order By: Relevance
“…Hash table lookup throughput is the main bottleneck of the join operation, and its performance strictly depends on the number of dependent memory accesses (i.e., number of pointers chased) required to locate an item. A lookup in the hash table can result in an arbitrary number of memory accesses as state-of-the-art hash tables offer a tradeoff between performance (i.e., number of chained memory accesses) and space efficiency [4,6,7]. Moreover, when the build relation keys follow a skewed value distribution, hash collisions are unavoidable as some build keys are identical but carry different payloads.…”
Section: Hash Tablesmentioning
confidence: 99%
“…Hash table lookup throughput is the main bottleneck of the join operation, and its performance strictly depends on the number of dependent memory accesses (i.e., number of pointers chased) required to locate an item. A lookup in the hash table can result in an arbitrary number of memory accesses as state-of-the-art hash tables offer a tradeoff between performance (i.e., number of chained memory accesses) and space efficiency [4,6,7]. Moreover, when the build relation keys follow a skewed value distribution, hash collisions are unavoidable as some build keys are identical but carry different payloads.…”
Section: Hash Tablesmentioning
confidence: 99%
“…The discovered solution applies a hash-join bloom filter in the HSJOIN (#2). A bloom filter is a space-efficient, probabilistic data structure to test whether an element is a member of a set by hashing the values and performing a bit comparison between them [3]. False positives can occur; however, false negatives never occur.…”
Section: Learning Enginementioning
confidence: 99%
“…As the acronym suggests, the languages is able to retrieve data stored in the RDF format. 3 A SPARQL query consists of a set of triple patterns similar to RDF triples. In the query, each of the subject, predicate, and object may be a variable.…”
Section: Matching Enginementioning
confidence: 99%
“…Wildfire also uses non-partitioned hash joins and Concise Hash Tables, as described in [5] . In addition to column scans, hash joins, and inserts, Wildfire has support for many other evaluators, such as hash-based group by, predicate evaluation, expression evaluation, and updates of inmemory (non-persistent) indexes.…”
Section: Wildfire Engine: Storage and Processingmentioning
confidence: 99%