2020
DOI: 10.1109/access.2020.3007028
|View full text |Cite
|
Sign up to set email alerts
|

Projection Based Large Scale High-Dimensional Data Similarity Join Using MapReduce Framework

Abstract: Similarity join has been widely used in many data analysis and data mining applications, we mainly focus on the scalability and performance problem of similarity join query on massive highdimensional data set. p-stable distribution based projection scheme can implement dimension reduction effectively. Three novel approaches based on projection scheme are proposed to deal with massive highdimensional data similarity join problem: Single projection method, Multiple projection method and Projection space partitio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 47 publications
0
2
0
Order By: Relevance
“…Approximate similarity join 25 needs two datasets. It then compares the entities in these datasets and returns pairs of entities whose distance is less than the threshold ( t ) provided by the user.…”
Section: Experimental Methodologymentioning
confidence: 99%
“…Approximate similarity join 25 needs two datasets. It then compares the entities in these datasets and returns pairs of entities whose distance is less than the threshold ( t ) provided by the user.…”
Section: Experimental Methodologymentioning
confidence: 99%
“…To handle the problem that existing MapReducebased filtering methods require multiple MapReduce jobs to improve the join performance, the adaptive filter-based join algorithm is proposed [17]. e MapReduce framework is also suitable to handle large-scale high-dimensional data similarity joins [18]. For example, spatial join is able to perform data analysis in geospatial applications, which contain massive geographical information in a high-dimensional form [19].…”
Section: Related Workmentioning
confidence: 99%