2018
DOI: 10.1007/978-3-030-00668-6_13
|View full text |Cite
|
Sign up to set email alerts
|

DistLODStats: Distributed Computation of RDF Dataset Statistics

Abstract: Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…The experimental setup of ABSTAT-HD is in line with the setup used in the only other approach proposed in the stateof-the-art to distribute the computation of knowledge graph profiling, namely, DistLODStat [43], where the scalability of the distributed and centralized version of the same systems are compared.…”
Section: Methodsmentioning
confidence: 99%
“…The experimental setup of ABSTAT-HD is in line with the setup used in the only other approach proposed in the stateof-the-art to distribute the computation of knowledge graph profiling, namely, DistLODStat [43], where the scalability of the distributed and centralized version of the same systems are compared.…”
Section: Methodsmentioning
confidence: 99%
“…Schmachtenberg et al [32] present the status of RDF datasets in the LOD Cloud in terms of size, linking, vocabulary usage, and metadata. LODStats [13] and the large-scale approach DistLODStats [33] report on descriptive statistics about RDF datasets on the web, including the number of triples, RDF terms, properties per entity, and usage of vocabularies across datasets. ExpLOD [25] generates summaries and aggregated statistics about the structure of RDF graphs, e.g., sets of used properties or the number of instances per class.…”
Section: Rdf-specific Analysesmentioning
confidence: 99%
“…It parallelizes streaming and sorting techniques to efficiently process RDF data. More recent methods either use HDFS (LODOP [14]) or store the data in memory (DistLODStats [33] via Spark). Exact rewriting rules have also been proposed to optimize the execution of such queries with groupings and aggregates in RDF data [11].…”
Section: Related Workmentioning
confidence: 99%
“…However, the increase in volume that makes these indicators more necessary also makes them harder to compute. The most recent methods adopt distributed architectures [14,33] that centralize the data, and then execute the indicator queries on that centralized data repository. To compute the exact query result, these approaches thus require the materialization of the entire LOD cloud.…”
Section: Introductionmentioning
confidence: 99%