2015
DOI: 10.14257/ijdta.2015.8.3.06
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient Parallel Top-k Similarity Join for Massive Multidimensional Data Using Spark

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…For the distributed in-memory framework, i.e., Spark, some work [17]- [19] has been performed on the similarity join. Chen et al [17] proposed an approximate similarity join method using a locality sensitive hashing (LSH)-based distance function. Sun et al [18] proposed a similaritybased query processing system called Dima.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For the distributed in-memory framework, i.e., Spark, some work [17]- [19] has been performed on the similarity join. Chen et al [17] proposed an approximate similarity join method using a locality sensitive hashing (LSH)-based distance function. Sun et al [18] proposed a similaritybased query processing system called Dima.…”
Section: Related Workmentioning
confidence: 99%
“…Content may change prior to final publication. [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] Hadoop no Similarity [17], [18], [19] Spark no Similarity [20], [21], [22], [23], [24], [25], [26], [27] N/A yes Equi [28] N/A yes Similarity [29] Spark yes Equi DSim-Join Spark yes Similarity…”
Section: Related Workmentioning
confidence: 99%
“…Kim et al [30] and Ma et al [31] proposed Top-k similarity join solutions respectively for massive high-dimensional vectors using MapReduce framework. Chen et al [32] proposed a distance based on LSH for high-dimensional data, and converted the distance based on LSH into hamming distance of high-dimensional data signature. On this basis, it designed a top-k similarity join algorithm using Spark.…”
Section: B Vector Similarity Joinmentioning
confidence: 99%
“…On this basis, it designed a top-k similarity join algorithm using Spark. Compared with Hadoop based solutions, Chen et al [32] has faster computing speed and better scalability. Rong et al [33] proposed a new similarity join algorithm called symbolic aggregation and vertical decomposition(SAVD) using Spark.…”
Section: B Vector Similarity Joinmentioning
confidence: 99%
“…In [8] Spark is used to compute topK similarity join in large multidimensional data. Data are being partitioned into buckets so that points that are close to each other are grouped into the same bucket, with high probability.…”
Section: Related Workmentioning
confidence: 99%