2017
DOI: 10.14778/3099622.3099624
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging set relations in exact set similarity join

Abstract: Exact set similarity join, which finds all the similar set pairs from two collections of sets, is a fundamental problem with a wide range of applications. The existing solutions for set similarity join follow a filtering-verification framework, which generates a list of candidate pairs through scanning indexes in the filtering phase, and reports those similar pairs in the verification phase. Though much research has been conducted on this problem, set relations, which we find out is quite effective on improvin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
21
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(22 citation statements)
references
References 27 publications
1
21
0
Order By: Relevance
“…The discrepancy between the two studies was attributed to the more efficient process that performed the verification step in [84]; reducing the cost of verification means that employing a complex filter to reduce the number of candidate pairs may not pay off. Another interesting finding is that leveraging set relations can improve the performance of the filtering algorithms; SKJ was found to consistently outperform PPJoin, PPJoin+, AdaptJoin and PTJ in [159].…”
Section: Discussionmentioning
confidence: 94%
“…The discrepancy between the two studies was attributed to the more efficient process that performed the verification step in [84]; reducing the cost of verification means that employing a complex filter to reduce the number of candidate pairs may not pay off. Another interesting finding is that leveraging set relations can improve the performance of the filtering algorithms; SKJ was found to consistently outperform PPJoin, PPJoin+, AdaptJoin and PTJ in [159].…”
Section: Discussionmentioning
confidence: 94%
“…Firstly, given the collection of objects and a query object, the similarity search retrieves all objects similar to a query object [20][21][22]. Secondly, given two sets of objects, the exact set similarity join finds all pairs of similar objects [23][24][25]. The operations are used for data cleaning, information integration, entity detection, near duplicate detection and personalized recomendation.…”
Section: The Motivationmentioning
confidence: 99%
“…Sets are used in Internet applications for the representation of the properties of objects, sparse vector data, text files, itemsets, tags and the neighbours in graphs. Two types of queries attracted research attention: the similarity search [20,22] and the set similarity join [23][24][25]41] We present each of the above stated related areas in more detail in the following Sections 2.1-2.5.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of using a fixed-length prefix as done by Bayardo et al, Wang et al [29] designed an adaptive prefix filter framework. Wang et al [30] further improved Xiao et al 's work by leveraging the relation between sets.…”
Section: Related Workmentioning
confidence: 99%