2010
DOI: 10.14778/1920841.1920871
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating entity resolution results

Abstract: Entity Resolution (ER) is the process of identifying groups of records that refer to the same real-world entity. Various measures (e.g., pairwise F1, cluster F1) have been used for evaluating ER results. However, ER measures tend to be chosen in an ad-hoc fashion without careful thought as to what defines a good result for the specific application at hand. In this paper, our contributions are twofold. First, we conduct an analysis on existing ER measures, showing that they can often conflict with each other by… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
43
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 50 publications
(44 citation statements)
references
References 22 publications
1
43
0
Order By: Relevance
“…An alternative to a precision-recall curve is Receiver Operating Characteristic (ROC), which plots true positives against false positives (Hanley & McNeil, 1982). Historically, and currently, precision-recall curves dominate ROC curves in the instance matching community (Menestrina, Whang & Garcia-Molina, 2010;Köpcke, Thor & Rahm, 2010). In keeping with existing trends in the literature, precision-recall curves are favored over ROC curves for similarity evaluations in this dissertation.…”
Section: Evaluating Similaritymentioning
confidence: 93%
“…An alternative to a precision-recall curve is Receiver Operating Characteristic (ROC), which plots true positives against false positives (Hanley & McNeil, 1982). Historically, and currently, precision-recall curves dominate ROC curves in the instance matching community (Menestrina, Whang & Garcia-Molina, 2010;Köpcke, Thor & Rahm, 2010). In keeping with existing trends in the literature, precision-recall curves are favored over ROC curves for similarity evaluations in this dissertation.…”
Section: Evaluating Similaritymentioning
confidence: 93%
“…Note that there are other metrics defined based on different purposes. In entity resolution, there are also several cluster-based metrics, such as K-measure [27], GM D measure [36] and Rand-index [49]. For strings, similarity-based metrics including Jaccard, Dice, Cosine and Edit Distances are defined and used [11,55,58].…”
Section: Related Workmentioning
confidence: 99%
“…To evaluate duplicate detection results, a variety of evaluation metrics exists [21,22]. As we want to evaluate algorithms that select candidate pairs, we do not use a similarity function in Sec.…”
Section: Data Sets and Configurationmentioning
confidence: 99%