2012
DOI: 10.17562/pb-45-3
|View full text |Cite
|
Sign up to set email alerts
|

String Distances for Near-duplicate Detection

Abstract: Abstract-Near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. In this paper, we present the results of applying the Rank distance and the Smith-Waterman distance, along with more popular string similarity measures such as the Levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 12 publications
(13 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?