2018
DOI: 10.1007/978-3-030-00671-6_22
|View full text |Cite
|
Sign up to set email alerts
|

Representativeness of Knowledge Bases with the Generalized Benford’s Law

Abstract: Knowledge bases (KBs) such as DBpedia, Wikidata, and YAGO contain a huge number of entities and facts. Several recent works induce rules or calculate statistics on these KBs. Most of these methods are based on the assumption that the data is a representative sample of the studied universe. Unfortunately, KBs are biased because they are built from crowdsourcing and opportunistic agglomeration of available databases. This paper aims at approximating the representativeness of a relation within a knowledge base. F… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 18 publications
(18 citation statements)
references
References 28 publications
0
17
0
Order By: Relevance
“…The query α FSD then calculates for each property the distribution of the first significant digits of the fact number per object. This query is particularly useful for estimating the representativeness of a knowledge base by exploiting Benford's law [34]. We will use this query in Section 5 to evaluate the representativeness of the LOD cloud.…”
Section: Analytical Queriesmentioning
confidence: 99%
See 4 more Smart Citations
“…The query α FSD then calculates for each property the distribution of the first significant digits of the fact number per object. This query is particularly useful for estimating the representativeness of a knowledge base by exploiting Benford's law [34]. We will use this query in Section 5 to evaluate the representativeness of the LOD cloud.…”
Section: Analytical Queriesmentioning
confidence: 99%
“…This query yields, for each property, a distribution over the frequency of the first significant digit of the number of objects per subject. We used the method proposed in [34] to convert this distribution into a score between 0 and 1 that measures the "representativeness" of the triplestores. A score of 1 means that the data is representative of the distribution in the real world (see [34] for details).…”
Section: Use Case 2: Representativeness Of the Lodmentioning
confidence: 99%
See 3 more Smart Citations