Detecting Linked Data quality issues via crowdsourcing: A DBpedia study

Acosta, Maribel; Zaveri, Amrapali; Simperl, Elena; Kontokostas, Dimitris; Flöck, Fabian; Lehmann, Jens

doi:10.3233/sw-160239

Cited by 30 publications

(25 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MOVIE, based on IMDb 6 [2] and WiKiData [4], is a knowledge base with entertainment-related facts mostly pertaining to actors, directors, movies, TV series, musicals etc. It contains more than 2 million factual triples.…”

Section: Dataset Descriptionmentioning

confidence: 99%

Efficient knowledge graph accuracy evaluation

Gao

Xian

et al. 2019

Proc. VLDB Endow.

View full text Add to dashboard Cite

Estimation of the accuracy of a large-scale knowledge graph (KG) often requires humans to annotate samples from the graph. How to obtain statistically meaningful estimates for accuracy evaluation while keeping human annotation costs low is a problem critical to the development cycle of a KG and its practical applications. Surprisingly, this challenging problem has largely been ignored in prior research. To address the problem, this paper proposes an efficient sampling and evaluation framework, which aims to provide quality accuracy evaluation with strong statistical guarantee while minimizing human efforts. Motivated by the properties of the annotation cost function observed in practice, we propose the use of cluster sampling to reduce the overall cost. We further apply weighted and two-stage sampling as well as stratification for better sampling designs. We also extend our framework to enable efficient incremental evaluation on evolving KG, introducing two solutions based on stratified sampling and a weighted variant of reservoir sampling. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of our proposed solution. Compared to baseline approaches, our best solutions can provide up to 60% cost reduction on static KG evaluation and up to 80% cost reduction on evolving KG evaluation, without loss of evaluation quality.

show abstract

Section: Dataset Descriptionmentioning

confidence: 99%

Efficient knowledge graph accuracy evaluation

Gao

Xian

et al. 2019

Proc. VLDB Endow.

View full text Add to dashboard Cite

show abstract

“…This special issue attracted a total of 10 submissions from which three papers [1,9,14] were accepted for publication as summarized in the next sections. A fourth paper is under review during the writing of this editorial.…”

Section: Special Issue Papersmentioning

confidence: 99%

“…This paper focuses on the problem of verifying the quality of Linked Data, in particular data from DBpedia [1]. As such it is illustrative of the scenario, where HC&C is used for knowledge validation and enhancement (HC4SW-Kn.Validation).…”

Section: Detecting Linked Data Quality Issues Via Crowdsourcing: a Dbmentioning

confidence: 99%

Semantic Web and Human Computation: The status of an emerging field

Sabou

Aroyo

Bontcheva

et al. 2018

View full text Add to dashboard Cite

This editorial paper introduces a special issue that solicited papers at the intersection of Semantic Web and Human Computation research. Research in that inter-disciplinary space dates back a decade, and has been acknowledged as a research line of its own by a seminal research manifesto published in 2015. But where do we stand in 2018? How did this research line evolve during the last decade? How do the papers in this special issue align with the main lines of work of the community? In this editorial we inspect and reflect on the evolution of research at the intersection of Semantic Web and Human Computation. We use a methodology based on Systematic Mapping Studies to collect quantitative bibliographic data which we analyze through the lens of research topics envisioned by the research manifesto to characterize the evolution of research in this area, thus providing a context for introducing the papers of this special issue. We found evidences of a thriving research field; while steadily maturing, the field offers a number of open research opportunities for work where Semantic Web best practices and techniques are applied to support and improve the state-of-the-art in Human Computation, but also for work that exploits the strength of both areas to address scientifically and societally relevant issues.

show abstract

“…An example of RDF triples is (dbr:Birmingham dbo:populationTotal "1123000"ˆˆxsd:integer ), which represents the fact that the city of Birmingham has a total population of 1123000 (dbr and dbo are the namespace prefixes of DBpedia repositories). 1 In recent years, several large-scale knowledge graphs have been constructed such as DBpedia 2 , YAGO 3 , Freebase 4 , Wikidata 5 , and others. Many of these knowledge graphs were created by extracting Web contents or through crowdsourcing.…”

Section: Introductionmentioning

confidence: 99%

“…These processes could be very noisy, and the created knowledge graphs are unlikely to be fully correct. There is an increasing interest in quality assessment for knowledge graphs [1] [4] [17] [20] [8] [6]. Some approaches focus on completing or correcting entity type information, while others target towards relations between entities, or interlinks between di↵erent knowledge graphs.…”

Section: Introductionmentioning

confidence: 99%

Measuring Accuracy of Triples in Knowledge Graphs

Liu

d’Aquin

Motta

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e.the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply di↵erent matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.

show abstract

Detecting Linked Data quality issues via crowdsourcing: A DBpedia study

Cited by 30 publications

References 51 publications

Efficient knowledge graph accuracy evaluation

Efficient knowledge graph accuracy evaluation

Semantic Web and Human Computation: The status of an emerging field

Measuring Accuracy of Triples in Knowledge Graphs

Contact Info

Product

Resources

About