2010
DOI: 10.14778/1920841.1920904
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of entity resolution approaches on real-world match problems

Abstract: Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community we also co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

3
281
0
4

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 344 publications
(288 citation statements)
references
References 17 publications
3
281
0
4
Order By: Relevance
“…In this experiment, we investigate ER strategies for the realworld dataset GS from [16] containing 64, 263 publication records from the Google Scholar search engine. The publication records are relatively unclean due to the high heterogeneity of paper citations and errors for extracting the bibliographic information from PDF files.…”
Section: Comparative Evaluation Of Different Er Strategiesmentioning
confidence: 99%
See 3 more Smart Citations
“…In this experiment, we investigate ER strategies for the realworld dataset GS from [16] containing 64, 263 publication records from the Google Scholar search engine. The publication records are relatively unclean due to the high heterogeneity of paper citations and errors for extracting the bibliographic information from PDF files.…”
Section: Comparative Evaluation Of Different Er Strategiesmentioning
confidence: 99%
“…The publication records are relatively unclean due to the high heterogeneity of paper citations and errors for extracting the bibliographic information from PDF files. To evaluate match quality in terms of precision, recall, and F-measure, we derived a gold standard from the manually determined, perfect match result from [16] that is based on a mapping of the GS publications to corresponding DBLP publications.…”
Section: Comparative Evaluation Of Different Er Strategiesmentioning
confidence: 99%
See 2 more Smart Citations
“…Entity resolution is the task of identifying different records that represent the same entity, and is an important step in many data integration and data cleaning applications [11,21]. A single entity may come to be represented using different records for many reasons; for example, data may be integrated from independently developed sources that have overlapping collections (e.g., different retailers may have overlapping product lines), or a single organization may capture the same data repeatedly (e.g., a police force may encounter the same individual or address many times, in situations where it may be difficult to be confident of the quality of the data).…”
Section: Introductionmentioning
confidence: 99%