A principled approach to bridging the gap between graph data and their schemas

Arenas, Marcelo; Diaz, Gonzalo I.; Fokoue, Achille; Kementsietsidis, Anastasios; Srinivas, Kavitha

doi:10.14778/2732296.2732297

Cited by 9 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A recent study on the structure refinement for the RDF data, [2] proposed an integer linear programming (ILP)-based algorithm which allows an RDF dataset being partitioned into a number of "sorts" where each sort satisfies a predefined structured-ness fitting threshold. This approach, relying mainly on the similarity and correlation between the properties of sorts, may merge subjects describing unrelated entities but having many common properties into a single sort (as also shown in their experiment with Drug Com-15 http://www.openphacts.org/ pany and Sultan), while our solution only merges related CS's together by exploiting the discriminating properties and the availability of the semantics/ontologies information.…”

Section: Related Workmentioning

confidence: 99%

Deriving an Emergent Relational Schema from RDF Data

Pham

Passing

Erling

et al. 2015

Proceedings of the 24th International Conference on World Wide Web

View full text Add to dashboard Cite

We motivate and describe techniques that allow to detect an "emergent" relational schema from RDF data. We show that on a wide variety of datasets, the found structure explains well over 90% of the RDF triples. Further, we also describe technical solutions to the semantic challenge to give short names that humans find logical to these emergent tables, columns and relationships between tables. Our techniques can be exploited in many ways, e.g., to improve the efficiency of SPARQL systems, or to use existing SQL-based applications on top of any RDF dataset using a RDBMS.

show abstract

Section: Related Workmentioning

confidence: 99%

Deriving an Emergent Relational Schema from RDF Data

Pham

Passing

Erling

et al. 2015

Proceedings of the 24th International Conference on World Wide Web

View full text Add to dashboard Cite

show abstract

“…3) Based on the design of RBench, a query generation process is proposed to generate different types of queries systematically for any generated benchmark. 4) Three aspects of RBench are explored in experiments: time and memory complexity of benchmark generation, benchmark datasets evaluation, and query evaluation analysis. We empirically show that benchmark datasets generated by RBench can achieve different scaling factors to fulfil different benchmark generation tasks, consistent with real scaling datasets, and address the limitations of the previous application-specific benchmark generator [8].…”

Section: Problem Definitionmentioning

confidence: 99%

“…Coverage and coherence metrics are introduced [8], as an intuitive way to combine primitive metrics into one single measure of structuredness of RDF datasets. A comprehensive study of the structuredness of RDF graphs is also presented in [4]. A framework is proposed [4] to discover a partitioning of the entities of an RDF graph into subsets which have high structuredness with respect to a specific function chosen by the user.…”

Section: Related Workmentioning

confidence: 99%

RBench

Qiao

Özsoyoğlu

2015

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

As more RDF data management systems and RDF data querying techniques emerge, RDF benchmarks providing a controllable and comparable testing environment for applications are needed. To address the needs of diverse applications, we propose an application-specific framework, called RBench, to generate RDF benchmarks. RBench takes an RDF dataset from any application as a template, and generates a set of synthetic datasets with similar characteristics including graph structure and literal labels, for the required "size scaling factor" and the "degree scaling factor". RBench analyzes several features from the given RDF dataset, and uses them to reconstruct the new benchmark graph. A flexible query load generation process is then proposed according to the design of RBench. Efficiency and usability of RBench are demonstrated via experimental results.

show abstract

“…Research efforts on the automatic discovery of data models have focused on the Deep Web, in which web pages are automatically produced by filling web templates using the data of a back-end database [15]. In the context of the Web of Data, the most related work to ours is [3], which consists of a framework to define rules to study whether or not a given RDF dataset conforms to a given ontological model; the framework includes a formal language to express these rules. The main difference with respect to our approach is that we are able to discover a conceptual model without the intervention of the user.…”

Section: Related Workmentioning

confidence: 99%

“…We present two examples that are not negligible in practice of this gap between ontological models and RDF data (see [3] for additional discussions on this topic), namely:…”

Section: Introductionmentioning

confidence: 99%

Discovering and Analysing Ontological Models From Big RDF Data

Rivero

Hernández

Ruiz

et al. 2015

Journal of Database Management

View full text Add to dashboard Cite

We are witnessing an increasing popularity of the Web of Data, which exposes a large variety of web sources that provide their data using RDF. Ontological models are used as the schema to organize this data. These models are usually shared by several communities and, to devise them, there is usually an agreement amongst those communities. As a result, it is common to have more than one ontological model to understand some RDF data; therefore, there might be a gap between the ontological models and the RDF data, which is not negligible in practice. In this article, the authors present a technique to automatically discover ontological models from raw RDF data. It is based on the intensive usage of a set of SPARQL 1.1 structural queries that are generic and independent from the RDF data. The final result of the authors' technique is an ontological model that is derived from the RDF data, and includes types and properties, subtypes, domains and ranges of properties and subproperties. The authors have conducted experiments with millions of triples that prove that their technique is suitable to deal with Big RDF Data. As far as they know, this is the first technique to discover such ontological models in the context of RDF data and the Web of Data.

show abstract

A principled approach to bridging the gap between graph data and their schemas

Cited by 9 publications

References 11 publications

Deriving an Emergent Relational Schema from RDF Data

Deriving an Emergent Relational Schema from RDF Data

RBench

Discovering and Analysing Ontological Models From Big RDF Data

Contact Info

Product

Resources

About