2020
DOI: 10.1007/978-3-030-63924-2_11
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Study of Join Algorithms in Spark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…We evaluate the algorithms based on general cost models and experiments in Spark. This research extends our previous work [22]. The new contributions include a more complete and systematic presentation on the two-way join algorithms; and a comparative study on complexly recursive join algorithms using theory and empirical models in Spark.…”
Section: Introductionmentioning
confidence: 56%
“…We evaluate the algorithms based on general cost models and experiments in Spark. This research extends our previous work [22]. The new contributions include a more complete and systematic presentation on the two-way join algorithms; and a comparative study on complexly recursive join algorithms using theory and empirical models in Spark.…”
Section: Introductionmentioning
confidence: 56%
“…Multiway joins withe different entropy theories should be examined in the future. Besides, multiway join algorithms that considered data skewness in different distributed computing architectures such as Apache Spark [43] can be further studied on the basis of our research. Nonetheless, this study provides a novel method using MapReduce to achieve logically flexible partitions for join algorithms on Hadoop.…”
Section: Discussionmentioning
confidence: 99%
“…Experimental results demonstrated that the most effective and efficient distributed spatial join algorithm depends on the characteristics of the two input datasets; broadcast join is generally fastest when one of the datasets is modest in size (and only one is large) but cannot complete when both datasets are large. In [35], a comparative study of common join algorithms in MapReduce was provided. The join algorithms (map-side join, reduce-side join, broadcast join, bloom join and intersection bloom join) based on general cost model and experiments in Spark were evaluated.…”
Section: Spatial Analytics Systemmentioning
confidence: 99%
“…Table 2 shows the syntheses of the implementations directly on Apache Spark of distributed algorithms with sophisticated processing techniques for other spatial queries, not using the previous SASs. Generic framework using clustering methods [28] In-memory partitioning and indexing system (SparkNN) SJQ [33] Spatial Join with Spark (SJS), uniform grid partitioning [34] Distributed join methods: Broadcast Join and Bin Join [35] Comparative study of common join algorithms in Spark TKSJQ [36] Uniform grid partitioning and improved plane-sweeping KNNJQ [37] Locality-Sensitive Hashing (LSH) algorithm in Spark MwSJQ [38] Multiway Spatial Join algorithm in Spark (MSJS), using cascaded pairwise join technique STSQ [39] Spark-based spatio-textual skyline query alg. (Multi-PSS) KCPQ, DJQ [40] SliceNBound (SnB), parent-child and common-merged strip partitioning and, plane-sweep technique [41] Strip-based partitioning and plane-sweep technique [42] Binary Space Partitioning (BSP).…”
Section: Spatial Analytics Systemmentioning
confidence: 99%