2014
DOI: 10.1007/978-3-319-11964-9_23
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Abstract: Outlier detection used for identifying wrong values in data is typically applied to single datasets to search them for values of unexpected behavior. In this work, we instead propose an approach which combines the outcomes of two independent outlier detection runs to get a more reliable result and to also prevent problems arising from natural outliers which are exceptional values in the dataset but nevertheless correct. Linked Data is especially suited for the application of such an idea, since it provides lar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
32
0
1

Year Published

2016
2016
2020
2020

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 40 publications
(36 citation statements)
references
References 16 publications
1
32
0
1
Order By: Relevance
“…To separate the fundamental operations from the complex syntax of querying languages, the generated SQL and SPARQL queries are presented in the form of relational algebra and SPARQL algebra 7 respectively. Note that although the SPARQL algebra has not yet become a W3C standard, it is already supported by several frameworks for querying LOD: all algebraic expressions of SPARQL queries in this section are generated by Apache Jena API.…”
Section: B Results and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…To separate the fundamental operations from the complex syntax of querying languages, the generated SQL and SPARQL queries are presented in the form of relational algebra and SPARQL algebra 7 respectively. Note that although the SPARQL algebra has not yet become a W3C standard, it is already supported by several frameworks for querying LOD: all algebraic expressions of SPARQL queries in this section are generated by Apache Jena API.…”
Section: B Results and Discussionmentioning
confidence: 99%
“…As the authors of [6] mention, the QB format is not always correctly used in some real-world application cases. To avoid bringing inaccurate information to decision-makers, error-detection methods such as those presented in [7] should be applied to a QB schema before applying the proposed algorithm.…”
Section: ) From a Qb Schema To An Exportationmentioning
confidence: 99%
“…Under the assumption that KBs are likely to be noisy and incomplete [18], there exist several approaches that aim to enhance or refine their quality and completeness (see Paulheim [16] for a recent survey). Among the most relevant to our work are: [31] which applies unsupervised numerical outlier detection methods to DBpedia for detecting wrong values that are used as literal objects of a property; and [4] that builds upon [31] by identifying sub-populations of instances where the outlier detection works more accurately, and by using external datasets accessible from the owl:sameAs links. Our work differs from theirs in that they focus on missing property values, not on their cardinality or multiplicity.…”
Section: Related Workmentioning
confidence: 99%
“…There is a growing body of work on fact validation [6] [20] [10] [5]. This literature can be categorised into two groups in terms of the sources utilised: (1) approaches such as [20] and [10] using internal information (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…A low confidence score is assigned to statements if no or only a few web pages support these sentences. The approaches [20] and [5] apply outlier detection methods to identify errors in numerical property values that are extracted from a data repository. The work [5] improves the prior work by lowering the influence of natural outliers.…”
Section: Introductionmentioning
confidence: 99%