2018
DOI: 10.48550/arxiv.1810.04361
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Semi-supervised clustering for de-duplication

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…Gamlath, Huang and Svensson extended the above result when approximation is allowed [15]. Ailon et al [2] studied correlation clustering with same-cluster queries and showed that there exists an (1 + ) approximation for correlation clustering where the number of queries is a (large) polynomial in k. Our algorithms are different from those in [2] in that our guarantees are parameterized by C OP T rather than by k. Kushagra et al [19] study a restricted version of correlation clustering where the valid clusterings are provided by a set of hierarchical trees and provide an algorithm using same-cluster queries for a related setting, giving guarantees in terms of the size of the input instance (or the VC dimension of the input instance) rather than C OP T . [20] studied, among other clustering problems, a random instance of correlation clustering under same-cluster queries.…”
Section: Related Workmentioning
confidence: 91%
“…Gamlath, Huang and Svensson extended the above result when approximation is allowed [15]. Ailon et al [2] studied correlation clustering with same-cluster queries and showed that there exists an (1 + ) approximation for correlation clustering where the number of queries is a (large) polynomial in k. Our algorithms are different from those in [2] in that our guarantees are parameterized by C OP T rather than by k. Kushagra et al [19] study a restricted version of correlation clustering where the valid clusterings are provided by a set of hierarchical trees and provide an algorithm using same-cluster queries for a related setting, giving guarantees in terms of the size of the input instance (or the VC dimension of the input instance) rather than C OP T . [20] studied, among other clustering problems, a random instance of correlation clustering under same-cluster queries.…”
Section: Related Workmentioning
confidence: 91%
“…where k is the number of non-singleton clusters and k 1 and k 2 are known. In Section B.3, we describe a principled approach to select the right value of k based on the framework of SSC (semi-supervised clustering) introduced in [19,20] and describe our complete sampling approach.…”
Section: Lsh-based Samplingmentioning
confidence: 99%
“…Note the each C ki is a clustering of the given dataset. We then use the SSC framework to select the best clustering from G. Owing to space constraints, we describe the details of the SSC algorithm (almost identical to the algorithm in [20]) and related proofs in the appendix section. We describe our "clustering and hashing" based sampling algorithm and then prove the main result from this section.…”
Section: Semi-supervised Clusteringmentioning
confidence: 99%