2012
DOI: 10.14778/2350229.2350276
|View full text |Cite
|
Sign up to set email alerts
|

Learning expressive linkage rules using genetic programming

Abstract: A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which specify the conditions that entities must fulfill in order to be considered to describe the same real-world object. In this paper, we present the GenLink algorithm for learning expressive linkage rules from a set of existing reference links using genetic programming. The algorithm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
128
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 124 publications
(130 citation statements)
references
References 30 publications
1
128
0
Order By: Relevance
“…The approach is based on the Silk rule learning framework [7], which is able to identify matching products based on their attributes. To do so, different combination of features from the product descriptions are used, e.g., bag of words, attribute-value pairs extracted using a dictionary, features extracted using manually written regular expressions, and combination of all.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The approach is based on the Silk rule learning framework [7], which is able to identify matching products based on their attributes. To do so, different combination of features from the product descriptions are used, e.g., bag of words, attribute-value pairs extracted using a dictionary, features extracted using manually written regular expressions, and combination of all.…”
Section: Related Workmentioning
confidence: 99%
“…Unstructured Product Offers -WDC Microdata Dataset The latest extraction of WebDataCommons includes over 5 billion entities marked up by one of the three main HTML markup languages (i.e., Microdata, Microformats and RDFa) and has been retrieved from the CommonCrawl 2014 corpus 7 . From this dataset we focus on product entities annotated with Microdata using the schema.org vocabulary.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…-Fitness evaluation assigns a fitness to each element in the population (lines [13][14][15][16]. In our case, the fitness of an individual (i.e.…”
Section: Evolutionary Searchmentioning
confidence: 99%
“…It is used to find out the potential comparison pairs, as shown in [5][6][7]. That is, from which classes and properties values should be compared.…”
Section: State Of the Artmentioning
confidence: 99%