2011 IEEE Conference on Open Systems 2011
DOI: 10.1109/icos.2011.6079233
|View full text |Cite
|
Sign up to set email alerts
|

A threshold-based similarity measure for duplicate detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 17 publications
0
12
0
Order By: Relevance
“…The data sets used in our study are gold standard data sets which are available from the Hasso Plattner Institute (HP) website under the Duplicate Detection Project (DuDe) [1] . These data sets have been frequently used in duplicate detection research works [40,[44][45][46][47][48][49][50][51][52][53].…”
Section: Data Setsmentioning
confidence: 99%
“…The data sets used in our study are gold standard data sets which are available from the Hasso Plattner Institute (HP) website under the Duplicate Detection Project (DuDe) [1] . These data sets have been frequently used in duplicate detection research works [40,[44][45][46][47][48][49][50][51][52][53].…”
Section: Data Setsmentioning
confidence: 99%
“…Detection of Duplicate plays an important role in record linkage, near duplicate detection and filtering queue [16]. Duplication detection is used to identify the same real world entities which exist in different format or representation in database [17,18]. It is very common to find some non-identical fields or records that refer the same entity [19].…”
Section: Duplication Records Detection and Typesmentioning
confidence: 99%
“…Ektefa M et al [3] have proposed a threshold-based method which takes into account both string and semantic similarity measures for comparing record pairs. The threshold-based method is experimented on a real world dataset, namely Restaurant and its effectiveness is measured based on several standard evaluation metrics.…”
Section: Review Of Related Workmentioning
confidence: 99%
“…Sooner than a threshold based method is implemented in [3], a characteristic based technique is described in the [2] for executing the deduplication in databases. Unlike from the other approaches, Elhadi M et al [4]implemented a process based on combined part of speech and improved longest common subsequence.…”
Section: Review Of Related Workmentioning
confidence: 99%