Large-scale linked data integration using probabilistic reasoning and crowdsourcing

Demartini, Gianluca; Difallah, Djellel Eddine; Cudré-Mauroux, Philippe

doi:10.1007/s00778-013-0324-z

Cited by 78 publications

(38 citation statements)

References 39 publications

(61 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The other approach is to use a machine to narrow down the possible options and then employ the crowd to validate or chose the best matching one. As an example, the work presented in [23] employs a machine-based algorithm to classify entities along with calculating a confidence score. The authors suggest that the label crowdsourcing is required only for entities with low confidence scores produced by the classifier.…”

Section: Entity Annotation and Classificationmentioning

confidence: 99%

See 1 more Smart Citation

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

Qiong

Simperl

Zerr

et al. 2018

View full text Add to dashboard Cite

Abstract. DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or incomplete mappings from Wikipedia infobox to DBpedia ontology. In this work we focus on one class of such problems, un-typed entities. We propose a hierarchical tree-based approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyse the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBpedia ontology. To test our approach, we run experiments on CrowdFlower using a gold standard dataset of 120 previously unclassified entities. In our studies human-computation driven approaches generally achieved higher precision at lower cost when compared to workflows with automatic predictors. However, each of the tested workflows has its merit and none of them seems to perform exceptionally well on the entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond.

show abstract

Section: Entity Annotation and Classificationmentioning

confidence: 99%

“…We used this level for all three workflows. Aggregation: For the T2 tasks we used the default option (aggregation='agg') 23 , as the task is to choose from a set of pre-defined options. For T1, we looked at the first three answers (aggregation='agg_3') based on 11 judgments.…”

Section: Quality Controlmentioning

confidence: 99%

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

Qiong

Simperl

Zerr

et al. 2018

View full text Add to dashboard Cite

show abstract

“…ZenCrowd [9] identifies pairs of instances in linked data using two levels of blocking to identify candidate pairs for confirmation by the crowd. A probabilistic factor graph accumulates evidence from different sources, from which a probability is derived that a candidate pair is correct.…”

Section: Related Workmentioning

confidence: 99%

Pay-as-you-go Configuration of Entity Resolution

Maskat

Paton

Embury

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Entity resolution, which seeks to identify records that represent the same entity, is an important step in many data integration and data cleaning applications. However, entity resolution is challenging both in terms of scalability (all-against-all comparisons are computationally impractical) and result quality (syntactic evidence on record equivalence is often equivocal). As a result, end-to-end entity resolution proposals involve several stages, including blocking to efficiently identify candidate duplicates, detailed comparison to refine the conclusions from blocking, and clustering to identify the sets of records that may represent the same entity. However, the quality of the result is often crucially dependent on configuration parameters in all of these stages, for which it may be difficult for a human expert to provide suitable values. This paper describes an approach in which a complete entity resolution process is optimized, on the basis of feedback (such as might be obtained from crowds) on candidate duplicates. Given such feedback, an evolutionary search of the space of configuration parameters is carried out, with a view to maximizing the fitness of the resulting clusters. The approach is payas-you-go in that more feedback can be expected to give rise to better outcomes. An empirical evaluation shows that the co-optimization of the different stages in entity resolution can yield significant improvements over default parameters, even with small amounts of feedback.

show abstract

“…Many projects have already demonstrated substantial success in applying this idea to crowdsourcing settings; this applies most prominently for games-with-a purpose (GWAPs) [27], which build a game narrative around human computation tasks such as image labeling [26], protein folding, 5 or language translation. 6 Similarly to the concerns raised in the context of external rewards and incentivisation [18], gamification has been seen, in some context, to undermine intrinsic benefits by subjugating and trivialising contributions into simple game goals and achievements. 7 This effect has been called overjustification and has been the subject of various studies with intriguing results; while some negative effects of overjustification have been recurrently reproduced, current research acknowledges the fact that its prevalence seems to be highly dependent on context and that, in most cases, extrinsic rewards complement rather than hamper intrinsic motivations for participating [5,22].…”

Section: Theories Of External Reward and Incentivisationmentioning

confidence: 99%

Improving Paid Microtasks through Gamification and Adaptive Furtherance Incentives

Feyisetan

Simperl

Kleek

et al. 2015

Proceedings of the 24th International Conference on World Wide Web

View full text Add to dashboard Cite

Crowdsourcing via paid microtasks has been successfully applied in a plethora of domains and tasks. Previous efforts for making such crowdsourcing more effective have considered aspects as diverse as task and workflow design, spam detection, quality control, and pricing models. Our work expands upon such efforts by examining the potential of adding gamification to microtask interfaces as a means of improving both worker engagement and effectiveness. We run a series of experiments in image labeling, one of the most common use cases for microtask crowdsourcing, and analyse worker behavior in terms of number of images completed, quality of annotations compared against a gold standard, and response to financial and game-specific rewards. Each experiment studies these parameters in two settings: one based on a state-of-the-art, non-gamified task on CrowdFlower and another one using an alternative interface incorporating several game elements. Our findings show that gamification leads to better accuracy and lower costs than conventional approaches that use only monetary incentives. In addition, it seems to make paid microtask work more rewarding and engaging, especially when sociality features are introduced. Following these initial insights, we define a predictive model for estimating the most appropriate incentives for individual workers, based on their previous contributions. This allows us to build a personalised game experience, with gains seen on the volume and quality of work completed.

show abstract

Large-scale linked data integration using probabilistic reasoning and crowdsourcing

Cited by 78 publications

References 39 publications

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

Pay-as-you-go Configuration of Entity Resolution

Improving Paid Microtasks through Gamification and Adaptive Furtherance Incentives

Contact Info

Product

Resources

About