2014
DOI: 10.14778/2732296.2732297
|View full text |Cite
|
Sign up to set email alerts
|

A principled approach to bridging the gap between graph data and their schemas

Abstract: Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to have an accurate description of the structuredness of the data at hand (how well the data conform to the schema).In this paper, we have approached the study of the structuredness of an RDF… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…A recent study on the structure refinement for the RDF data, [2] proposed an integer linear programming (ILP)-based algorithm which allows an RDF dataset being partitioned into a number of "sorts" where each sort satisfies a predefined structured-ness fitting threshold. This approach, relying mainly on the similarity and correlation between the properties of sorts, may merge subjects describing unrelated entities but having many common properties into a single sort (as also shown in their experiment with Drug Com-15 http://www.openphacts.org/ pany and Sultan), while our solution only merges related CS's together by exploiting the discriminating properties and the availability of the semantics/ontologies information.…”
Section: Related Workmentioning
confidence: 99%
“…A recent study on the structure refinement for the RDF data, [2] proposed an integer linear programming (ILP)-based algorithm which allows an RDF dataset being partitioned into a number of "sorts" where each sort satisfies a predefined structured-ness fitting threshold. This approach, relying mainly on the similarity and correlation between the properties of sorts, may merge subjects describing unrelated entities but having many common properties into a single sort (as also shown in their experiment with Drug Com-15 http://www.openphacts.org/ pany and Sultan), while our solution only merges related CS's together by exploiting the discriminating properties and the availability of the semantics/ontologies information.…”
Section: Related Workmentioning
confidence: 99%
“…3) Based on the design of RBench, a query generation process is proposed to generate different types of queries systematically for any generated benchmark. 4) Three aspects of RBench are explored in experiments: time and memory complexity of benchmark generation, benchmark datasets evaluation, and query evaluation analysis. We empirically show that benchmark datasets generated by RBench can achieve different scaling factors to fulfil different benchmark generation tasks, consistent with real scaling datasets, and address the limitations of the previous application-specific benchmark generator [8].…”
Section: Problem Definitionmentioning
confidence: 99%
“…Coverage and coherence metrics are introduced [8], as an intuitive way to combine primitive metrics into one single measure of structuredness of RDF datasets. A comprehensive study of the structuredness of RDF graphs is also presented in [4]. A framework is proposed [4] to discover a partitioning of the entities of an RDF graph into subsets which have high structuredness with respect to a specific function chosen by the user.…”
Section: Related Workmentioning
confidence: 99%
“…Research efforts on the automatic discovery of data models have focused on the Deep Web, in which web pages are automatically produced by filling web templates using the data of a back-end database [15]. In the context of the Web of Data, the most related work to ours is [3], which consists of a framework to define rules to study whether or not a given RDF dataset conforms to a given ontological model; the framework includes a formal language to express these rules. The main difference with respect to our approach is that we are able to discover a conceptual model without the intervention of the user.…”
Section: Related Workmentioning
confidence: 99%
“…We present two examples that are not negligible in practice of this gap between ontological models and RDF data (see [3] for additional discussions on this topic), namely:…”
Section: Introductionmentioning
confidence: 99%