Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Fleischhacker, Daniel; Paulheim, Heiko; Bryl, Volha; Völker, Johanna; Bizer, Christian

doi:10.1007/978-3-319-11964-9_23

Cited by 40 publications

(36 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To separate the fundamental operations from the complex syntax of querying languages, the generated SQL and SPARQL queries are presented in the form of relational algebra and SPARQL algebra 7 respectively. Note that although the SPARQL algebra has not yet become a W3C standard, it is already supported by several frameworks for querying LOD: all algebraic expressions of SPARQL queries in this section are generated by Apache Jena API.…”

Section: B Results and Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Designing multidimensional cubes from warehoused data and linked open data

Ravat

Song

Teste

2016

2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS)

View full text Add to dashboard Cite

OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 17167The contribution was presented at RCIS 2016 :http://www.sense-brighton.eu/rcis2016/ Abstract-A Data Warehouse (DW) is widely used as a consistent and integrated data repository in Business Intelligence systems. Under today's dynamic and competitive business context, warehoused data alone no longer provide enough information for decision-making processes. Business analyses should be enhanced by including Linked Open Data (LOD) to offer multiple perspectives to decision-makers. This paper provides a new multidimensional model, named Unified Cube, which offers a generic representation for both warehoused data and LOD at the conceptual level. A two-stage process is proposed to build a Unified Cube according to decision-makers' needs. As a first step, schemas published with specific modeling languages are transformed into a common conceptual representation. The second step is to associate together related data to form a Unified Cube containing all useful information about an analysis subject. A high-level declarative language is provided to enable nonexpert users to define the relevance between data according to their analysis needs. To demonstrate the feasibility of the proposed concepts, we show how analyses over data from different sources can be carried out through a Unified Cube.

show abstract

Section: B Results and Discussionmentioning

confidence: 99%

“…As the authors of [6] mention, the QB format is not always correctly used in some real-world application cases. To avoid bringing inaccurate information to decision-makers, error-detection methods such as those presented in [7] should be applied to a QB schema before applying the proposed algorithm.…”

Section: ) From a Qb Schema To An Exportationmentioning

confidence: 99%

Designing multidimensional cubes from warehoused data and linked open data

Ravat

Song

Teste

2016

2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS)

View full text Add to dashboard Cite

show abstract

“…Under the assumption that KBs are likely to be noisy and incomplete [18], there exist several approaches that aim to enhance or refine their quality and completeness (see Paulheim [16] for a recent survey). Among the most relevant to our work are: [31] which applies unsupervised numerical outlier detection methods to DBpedia for detecting wrong values that are used as literal objects of a property; and [4] that builds upon [31] by identifying sub-populations of instances where the outlier detection works more accurately, and by using external datasets accessible from the owl:sameAs links. Our work differs from theirs in that they focus on missing property values, not on their cardinality or multiplicity.…”

Section: Related Workmentioning

confidence: 99%

Mining Cardinalities from Knowledge Bases

Muñoz

Nickles

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Cardinality is an important structural aspect of data that has not received enough attention in the context of RDF knowledge bases (KBs). Information about cardinalities can be useful for data users and knowledge engineers when writing queries, reusing or engineering KBs. Such cardinalities can be declared using OWL and RDF constraint languages as constraints on the usage of properties over instance data. However, their declaration is optional and consistency with the instance data is not ensured. In this paper, we address the problem of mining cardinality bounds for properties to discover structural characteristics of KBs, and use these bounds to assess completeness. Because KBs are incomplete and error-prone, we apply statistical methods for filtering property usage and for finding accurate and robust patterns. Accuracy of the cardinality patterns is ensured by properly handling equality axioms (owl:sameAs); and robustness by filtering outliers. We report an implementation of our algorithm with two variants using SPARQL 1.1 and Apache Spark, and their evaluation on real-world and synthetic data.

show abstract

“…There is a growing body of work on fact validation [6] [20] [10] [5]. This literature can be categorised into two groups in terms of the sources utilised: (1) approaches such as [20] and [10] using internal information (i.e.…”

Section: Introductionmentioning

confidence: 99%

“…A low confidence score is assigned to statements if no or only a few web pages support these sentences. The approaches [20] and [5] apply outlier detection methods to identify errors in numerical property values that are extracted from a data repository. The work [5] improves the prior work by lowering the influence of natural outliers.…”

Section: Introductionmentioning

confidence: 99%

Measuring Accuracy of Triples in Knowledge Graphs

Liu

d’Aquin

Motta

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e.the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply di↵erent matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.

show abstract

Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Cited by 40 publications

References 16 publications

Designing multidimensional cubes from warehoused data and linked open data

Designing multidimensional cubes from warehoused data and linked open data

Mining Cardinalities from Knowledge Bases

Measuring Accuracy of Triples in Knowledge Graphs

Contact Info

Product

Resources

About