2019 IEEE 35th International Conference on Data Engineering (ICDE) 2019
DOI: 10.1109/icde.2019.00048
|View full text |Cite
|
Sign up to set email alerts
|

GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search

Abstract: In this paper, we study the problem of approximate containment similarity search. Given two records Q and X, the containment similarity between Q and X with respect to Q is |Q∩X| |Q| . Given a query record Q and a set of records S, the containment similarity search finds a set of records from S whose containment similarity regarding Q is not less than the given threshold. This problem has many important applications in commercial and scientific fields such as record matching and domain search. Existing solutio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(16 citation statements)
references
References 44 publications
0
16
0
Order By: Relevance
“…The estimation variance by G-KMV method is smaller than that of simple KMV method under reasonable assumptions as analysed in [31].…”
Section: Kmv Synopsesmentioning
confidence: 92%
“…The estimation variance by G-KMV method is smaller than that of simple KMV method under reasonable assumptions as analysed in [31].…”
Section: Kmv Synopsesmentioning
confidence: 92%
“…The estimation variance by G-KMV method is smaller than that of simple KMV method under reasonable assumptions as analyzed in [35].…”
Section: Kmv Synopsesmentioning
confidence: 93%
“…Recent works [22,71] incorporate ideas similar to the strategy used in this paper and in KMV sketches family: they use a random hashing function to map join values to the unit range and then select tuples based on some selection strategy. For instance, the strategy adopted by the correlated sampling algorithm [71] is equivalent to the strategy of the G-KMV sketch [77], where tuples are selected if the hashed keys are smaller than a probability threshold. In contrast, Correlation Sketches includes tuples in the sketch up to a fixed number, which avoids assigning too much space to large datasets and leads to more predictable performance for query evaluation.…”
Section: Related Workmentioning
confidence: 99%
“…Recent research proposes methods that support dataset-oriented queries to retrieve datasets that can be concatenated [56] or joined with a given dataset [20,77,84]. However, neither supports the discovery tasks illustrated in the examples above.…”
Section: Introductionmentioning
confidence: 99%