1995
DOI: 10.1145/568271.223807
|View full text |Cite
|
Sign up to set email alerts
|

The merge/purge problem for large databases

Abstract: Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsistent and often incorrect fashion. The problem we study here is the task of merging data from multiple sources in as efficient manner as possible, while maximizing the accuracy of the result. We call this the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
244
0
1

Year Published

2000
2000
2012
2012

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 306 publications
(247 citation statements)
references
References 9 publications
2
244
0
1
Order By: Relevance
“…Extended-Manual manually specifies the matching rules (e.g., "if similarity(name1,name2) ≥ 0.8 but position=student then the two tuples do not match"). Thus, in a sense this method extends the manual method described in [46], which would exploit only shared attributes such as "name1" and "name2". Extended-AR is similar to Extended-Manual, but uses the association rule classification method of [63] to guide the process of generating rules.…”
Section: Algorithms and Methodologiesmentioning
confidence: 99%
See 2 more Smart Citations
“…Extended-Manual manually specifies the matching rules (e.g., "if similarity(name1,name2) ≥ 0.8 but position=student then the two tuples do not match"). Thus, in a sense this method extends the manual method described in [46], which would exploit only shared attributes such as "name1" and "name2". Extended-AR is similar to Extended-Manual, but uses the association rule classification method of [63] to guide the process of generating rules.…”
Section: Algorithms and Methodologiesmentioning
confidence: 99%
“…Object matching is often used to consolidate information about entities and to remove duplicates when merging multiple information sources. As such, it plays an important role in many information processing contexts, including information integration, data warehousing, information extraction, and text join in databases (e.g., [95,21,72,98,7,57,1,90,42,46]). …”
Section: Object Matching Across Disparate Data Sourcesmentioning
confidence: 99%
See 1 more Smart Citation
“…Identification of tuples representing the same individual is accomplished by the unique object identifier ID. The problem of assigning these object identifiers is not considered within this paper, i.e., we assume a preceding duplicate detection step (see for example Hernandez and Stolfo (1995)). Note that we are only interested in finding update operations that introduce conflicts between the overlapping parts of databases.…”
Section: Reproducing Conflict Generationmentioning
confidence: 99%
“…Record linkage (RL), also known as the merge-purge [12] or object identity [24] problem, is one of the key tasks in data cleaning [10] and integration [9]. Its goal is to identify related records that are associated with the same entity from multiple databases.…”
Section: Introductionmentioning
confidence: 99%