Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery 2004
DOI: 10.1145/1008694.1008697
|View full text |Cite
|
Sign up to set email alerts
|

Iterative record linkage for cleaning and integration

Abstract: Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multiple sources. Traditional approaches use a similarity measure that compares tuples' attribute values; tuples with similarity scores above a certain threshold are declared to be matches. While this method can perform quite well in many domains, particularly domains where there is not a large amount of noise in the data, in some domains lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
120
0

Year Published

2005
2005
2016
2016

Publication Types

Select...
6
4

Relationship

2
8

Authors

Journals

citations
Cited by 143 publications
(122 citation statements)
references
References 20 publications
2
120
0
Order By: Relevance
“…Iterative approaches [8,14] identified the need to transitively compare merged records to discover more matches, for merges that are simple groupings of the data in merged records. Our approach allows richer, "custom" merges.…”
Section: Related Workmentioning
confidence: 99%
“…Iterative approaches [8,14] identified the need to transitively compare merged records to discover more matches, for merges that are simple groupings of the data in merged records. Our approach allows richer, "custom" merges.…”
Section: Related Workmentioning
confidence: 99%
“…The static active learning and weakly labeled non duplicates methods were used for training data (Singla and Domingos, 2005). An algorithm for discriminative learning of MLN parameters by combining the voted perceptron with a weighted satisfiability solver was proposed by Bhattacharya and Getoor (2004). An iterative deduplication algorithm was proposed by Bilenko and Mooney (2003), which is used to detect and remove duplicate entity from heterogeneous data sources.…”
Section: Related Workmentioning
confidence: 99%
“…Meanwhile, on criminal [131], epidemiology [130], financial [124], and linked data networks [141] [125] [128], node-related techniques have been used. As for link-related approaches, they also examined the data management [133], digital libraries [137], and lexical networks [134]. Besides the biological [143] [144] [147] and social networks [145] [146], graph-related tasks have been applied also on software behavior networks [142].…”
Section: Development and Tasksmentioning
confidence: 99%