2019
DOI: 10.3233/sw-180336
|View full text |Cite
|
Sign up to set email alerts
|

VIG: Data scaling for OBDA benchmarks

Abstract: In this paper we describe VIG, a data scaler for Ontology-Based Data Access (OBDA) benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling an input data instance to s times its size, while preserving certain application-specific characteristics. The advantages of the scaling approach are that the same generator is general, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(16 citation statements)
references
References 25 publications
(33 reference statements)
0
15
0
Order By: Relevance
“…To answer these questions, we setup the following experimental studies: Datasets and queries. The GTFS-Madrid Benchmark 9 consists of an ontology, an initial dataset of the metro system of Madrid following the GTFS model, a set of mappings in several specifications, a set of queries according to the ontology that cover relevant features of the SPARQL query language, and a data generator based on a state of the art proposal [20]. We select the tabular sources of this benchmark (i.e., the CSV files) and we scale up the original data in several instances, we use the scale factors 10, 100 and 1000.…”
Section: Discussionmentioning
confidence: 99%
“…To answer these questions, we setup the following experimental studies: Datasets and queries. The GTFS-Madrid Benchmark 9 consists of an ontology, an initial dataset of the metro system of Madrid following the GTFS model, a set of mappings in several specifications, a set of queries according to the ontology that cover relevant features of the SPARQL query language, and a data generator based on a state of the art proposal [20]. We select the tabular sources of this benchmark (i.e., the CSV files) and we scale up the original data in several instances, we use the scale factors 10, 100 and 1000.…”
Section: Discussionmentioning
confidence: 99%
“…Figure 2(c) reports the average execution time in seconds of the query Q 1 only with skyline preferences. It was executed 5 times using the state-of-the-art tools [29,47] over our motivating dataset transformed into RDF by means of SDM-RDFizer [27] and scaled-up to a 10.000 scale value by using VIG [35]. These state-of-the-art tools evaluate preferences on top of triplestores [29,47].…”
Section: Motivating Examplementioning
confidence: 99%
“…iii) Gas Stations: Patel-Schneider and colleagues [41] presented a running example for gas stations over 18 instances. We have manually converted this data to CSV and then scaled it with VIG [35]. We have created an equivalent table and we have loaded this CSV file into mysql.…”
Section: Benchmarks and Queries I) Tpc-hmentioning
confidence: 99%
“…In [25], the authors propose a dataset scaling problem for RDF data and provide a solution RBench that scales the original input dataset by preserving 4 features: resource identity (resource name, resource type, resource degree), relationship patterns (subgraphs with only relationship edges), predicate dictionary (frequency counts of the words) and attribute stars (frequency counts of the star structure). In [21], the authors lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings are also taken into account as well. VIG [21] maintains the similarity for OBDA data by preserving the following features: size of columns clusters and disjointness, schema dependencies and column-based duplicates and NULL Ratios.…”
Section: Related Workmentioning
confidence: 99%
“…In [21], the authors lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings are also taken into account as well. VIG [21] maintains the similarity for OBDA data by preserving the following features: size of columns clusters and disjointness, schema dependencies and column-based duplicates and NULL Ratios. However, VIG only supports dataset where each table has at most one foreign key only.…”
Section: Related Workmentioning
confidence: 99%