2015
DOI: 10.1007/978-3-319-23540-0_27
|View full text |Cite
|
Sign up to set email alerts
|

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Abstract: Abstract. Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 43 publications
0
10
0
Order By: Relevance
“…The database community has proposed declarative matching and resolution rules to express the domain knowledge about matching and resolution [5,7,9,13,24,26,32,33,51]. Matching dependencies (MD) are a popular type of such declarative rules, which provide a powerful method of expressing domain knowledge on matching values [8,10,23,24,38]. Let S be the schema of the original database and R 1 and R 2 two distinct relations in S. Attributes A 1 and A 2 from relations R 1 and R 2 , respectively, are comparable if they share the dame domain.…”
Section: Matching Dependenciesmentioning
confidence: 99%
“…The database community has proposed declarative matching and resolution rules to express the domain knowledge about matching and resolution [5,7,9,13,24,26,32,33,51]. Matching dependencies (MD) are a popular type of such declarative rules, which provide a powerful method of expressing domain knowledge on matching values [8,10,23,24,38]. Let S be the schema of the original database and R 1 and R 2 two distinct relations in S. Attributes A 1 and A 2 from relations R 1 and R 2 , respectively, are comparable if they share the dame domain.…”
Section: Matching Dependenciesmentioning
confidence: 99%
“…We equip LACE with a 'dynamic' and 'global' semantics. In line with approaches to ER based on matching dependencies (MDs) [9,20,22] and extensions thereof, such as relational MDs [4,6], LACE adopts a dynamic semantics in which rule bodies are evaluated on induced instances resulting from applying the already 'derived' merges. It is thanks to the dynamic nature of the semantics that we obtain a collective yet justifiable framework, in which merges can trigger further merges, possibly in a recursive fashion, while still being able to trace back the origins of each merge.…”
Section: Introductionmentioning
confidence: 99%
“…ER is a significant and common data cleaning problem, and it consists of detecting data duplicate representations for the same external entities, and merging them into single representations [ 43 ]. This problem can be applied to many different domains, such as deduplication in databases [ 44 ], duplicate detection in data or hierarchical data [ 45 ], cross-document co-reference resolution methods and tools [ 46 ], blocking techniques [ 43 , 47 ], bug reports [ 48 ], customer recognition [ 31 ], and E-health [ 49 ]. Most of the existing studies have been validated using real-world datasets, but very few of them have applied their proposal in a real case in the industry [ 42 ].…”
Section: Introductionmentioning
confidence: 99%
“…Authors in [ 31 ] used Levenshtein Edit Distance for feature selection in combination with weights based on the Inverse Document Frequency (IDF) of terms. Matching dependencies, a new class of semantic constraints for data quality and cleaning, has been shown to be profitably integrated with traditional machine learning methods for developing classification models for ER [ 43 ].…”
Section: Introductionmentioning
confidence: 99%