Evaluating entity resolution results

Menestrina, David; Whang, Steven Euijong; García-Molina, Héctor

doi:10.14778/1920841.1920871

Cited by 50 publications

(44 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An alternative to a precision-recall curve is Receiver Operating Characteristic (ROC), which plots true positives against false positives (Hanley & McNeil, 1982). Historically, and currently, precision-recall curves dominate ROC curves in the instance matching community (Menestrina, Whang & Garcia-Molina, 2010;Köpcke, Thor & Rahm, 2010). In keeping with existing trends in the literature, precision-recall curves are favored over ROC curves for similarity evaluations in this dissertation.…”

Section: Evaluating Similaritymentioning

confidence: 93%

Populating a linked data entity name system

Kejriwal¹

2017

AI Matters

View full text Add to dashboard Cite

Resource Description Framework (RDF) is a graph-based data model used to publish data as a Web of Linked Data (Bizer et al . 2009). RDF is an emergent foundation for large-scale data integration , the problem of providing a unified view over multiple data sources. The structure in RDF data can be conveniently visualized using directed labeled graphs , as illustrated in the real-world graph fragments in Figure 1. Nodes in the graph represent entities (e.g. the node with label dbpedia:Allen_, Paul represents the entity Paul Allen in the DBpedia knowledge graph) and edges represent either attributes of an entity (e.g. '01/21/1953' is the birthdate of Paul Allen) or relationships between two entities (e.g. Paul Allen is the co-founder of the company entity, Microsoft). Facts in the knowledge base are formally represented as a set of triples , with a triple comprising a labeled edge (denoted as a property ) in the RDF graph along with its incoming and outgoing nodes.

show abstract

Section: Evaluating Similaritymentioning

confidence: 93%

Populating a linked data entity name system

Kejriwal¹

2017

AI Matters

View full text Add to dashboard Cite

show abstract

“…Note that there are other metrics defined based on different purposes. In entity resolution, there are also several cluster-based metrics, such as K-measure [27], GM D measure [36] and Rand-index [49]. For strings, similarity-based metrics including Jaccard, Dice, Cosine and Edit Distances are defined and used [11,55,58].…”

Section: Related Workmentioning

confidence: 99%

Qasca

Zheng

Wang

et al. 2015

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

160

View full text Add to dashboard Cite

A crowdsourcing system, such as the Amazon Mechanical Turk (AMT), provides a platform for a large number of questions to be answered by Internet workers. Such systems have been shown to be useful to solve problems that are difficult for computers, including entity resolution, sentiment analysis, and image recognition. In this paper, we investigate the online task assignment problem: Given a pool of n questions, which of the k questions should be assigned to a worker? A poor assignment may not only waste time and money, but may also hurt the quality of a crowdsourcing application that depends on the workers' answers. We propose to consider quality measures (also known as evaluation metrics) that are relevant to an application during the task assignment process. Particularly, we explore how Accuracy and F-score, two widely-used evaluation metrics for crowdsourcing applications, can facilitate task assignment. Since these two metrics assume that the ground truth of a question is known, we study their variants that make use of the probability distributions derived from workers' answers. We further investigate online assignment strategies, which enables optimal task assignments. Since these algorithms are expensive, we propose solutions that attain high quality in linear time. We develop a system called the Quality-Aware Task Assignment System for Crowdsourcing Applications (QASCA) on top of AMT. We evaluate our approaches on five real crowdsourcing applications. We find that QASCA is efficient, and attains better result quality (of more than 8% improvement) compared with existing methods.

show abstract

“…To evaluate duplicate detection results, a variety of evaluation metrics exists [21,22]. As we want to evaluate algorithms that select candidate pairs, we do not use a similarity function in Sec.…”

Section: Data Sets and Configurationmentioning

confidence: 99%