VIG: Data scaling for OBDA benchmarks

Lanti, Davide; Xiao, Guohui; Calvanese, Diego

doi:10.3233/sw-180336

Cited by 17 publications

(16 citation statements)

References 25 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To answer these questions, we setup the following experimental studies: Datasets and queries. The GTFS-Madrid Benchmark 9 consists of an ontology, an initial dataset of the metro system of Madrid following the GTFS model, a set of mappings in several specifications, a set of queries according to the ontology that cover relevant features of the SPARQL query language, and a data generator based on a state of the art proposal [20]. We select the tabular sources of this benchmark (i.e., the CSV files) and we scale up the original data in several instances, we use the scale factors 10, 100 and 1000.…”

Section: Discussionmentioning

confidence: 99%

Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV

Chaves-Fraga,

Ruckhaus,

Priyatna

et al. 2020

Preprint

View full text Add to dashboard Cite

Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational database, CSV, JSON), either by materializing integrated data into RDF or by performing on-the-fly integration via SPARQL-to-SQL query translation. In the specific case of tabular datasets comprised of several CSV or Excel files, query translation approaches have been applied taking as input a lightweight schema with table and column names, and considering each source as a single table that can be loaded into a relational database system (RDB). This naïve approach does not consider implicit constraints in this type of data, e.g., referential integrity among data sources, datatypes, or data integrity; thus, completeness and performance of query processing can be affected. Our work is focused on explicitly enforcing implicit constraints during OBDA query translation over tabular data. We propose Morph-CSV, a framework that enforces constraints and can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV resorts to both a Constraints component and a set of operators that apply each type of constraint to the input with the aim of enhancing query completeness and performance. We evaluate Morph-CSV against a set of real-world open tabular datasets in the domain of the public transport; Morph-CSV is compared with existing approaches in terms of query result completeness and performance.

show abstract

Section: Discussionmentioning

confidence: 99%

Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV

Chaves-Fraga,

Ruckhaus,

Priyatna

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Figure 2(c) reports the average execution time in seconds of the query Q 1 only with skyline preferences. It was executed 5 times using the state-of-the-art tools [29,47] over our motivating dataset transformed into RDF by means of SDM-RDFizer [27] and scaled-up to a 10.000 scale value by using VIG [35]. These state-of-the-art tools evaluate preferences on top of triplestores [29,47].…”

Section: Motivating Examplementioning

confidence: 99%

“…iii) Gas Stations: Patel-Schneider and colleagues [41] presented a running example for gas stations over 18 instances. We have manually converted this data to CSV and then scaled it with VIG [35]. We have created an equivalent table and we have loaded this CSV file into mysql.…”

Section: Benchmarks and Queries I) Tpc-hmentioning

confidence: 99%

Handling qualitative preferences in SPARQL over virtual ontology-based data access

Gonçalves

Chaves-Fraga

Corcho

2022

View full text Add to dashboard Cite

With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to assign weights that may not be known beforehand to a scoring function. Unlike the quantitative approach, under the qualitative approach, which includes the well-known skyline, preference criteria are more intuitive in certain cases and can be expressed more naturally. In this paper, we address the problem of evaluating SPARQL qualitative preference queries over an Ontology-Based Data Access (OBDA) approach, which provides uniform access over multiple and heterogeneous data sources. Our main contribution is Morph-Skyline++, a framework for processing SPARQL qualitative preferences by directly querying relational databases. Our framework implements a technique that translates SPARQL qualitative preference queries directly into queries that can be evaluated by a relational database management system. We evaluate our approach over different scenarios, reporting the effects of data distribution, data size, and query complexity on the performance of our proposed technique in comparison with state-of-the-art techniques. Obtained results suggest that the execution time can be reduced by up to two orders of magnitude in comparison to current techniques scaling up to larger datasets while identifying precisely the result set.

show abstract

“…In [25], the authors propose a dataset scaling problem for RDF data and provide a solution RBench that scales the original input dataset by preserving 4 features: resource identity (resource name, resource type, resource degree), relationship patterns (subgraphs with only relationship edges), predicate dictionary (frequency counts of the words) and attribute stars (frequency counts of the star structure). In [21], the authors lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings are also taken into account as well. VIG [21] maintains the similarity for OBDA data by preserving the following features: size of columns clusters and disjointness, schema dependencies and column-based duplicates and NULL Ratios.…”

Section: Related Workmentioning

confidence: 99%

“…In [21], the authors lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings are also taken into account as well. VIG [21] maintains the similarity for OBDA data by preserving the following features: size of columns clusters and disjointness, schema dependencies and column-based duplicates and NULL Ratios. However, VIG only supports dataset where each table has at most one foreign key only.…”

Section: Related Workmentioning

confidence: 99%

A collaborative framework for tweaking properties in a synthetic dataset

2018

View full text Add to dashboard Cite

Researchers and developers use benchmarks to compare their algorithms and products. A database benchmark must have a dataset D. To be application-specific, this dataset D should be empirical. However, D may be too small, or too large, for the benchmarking experiments. D must, therefore, be scaled to the desired size.To ensure the scaled D is similar to D, previous work typically specifies or extracts a fixed set of features F = {F1, F2, . . . , Fn} from D, then uses F to generate synthetic data for D. However, this approach (D → F → D ) becomes increasingly intractable as F gets larger, so a new solution is necessary.Different from existing approaches, this paper proposes AS-PECT to scale D to enforce similarity. ASPECT first uses a size-scaler (S0) to scale D to D. Then the user selects a set of desired features F1, . . . , Fn. For each desired feature F k , there is a tweaking tool T k that tweaks D to make sure D has the required feature F k . ASPECT coordinates the tweaking of T1, . . . , Tn to D, so Tn(· · · (T1( D)) · · · ) has the required features F1, . . . , Fn.By shifting from D → F → D to D → D → F, data scaling becomes flexible. The user can customise the scaled dataset with their own interested features. Extensive experiments on real datasets show that ASPECT can enforce similarity in the dataset effectively and efficiently.

show abstract

VIG: Data scaling for OBDA benchmarks

Cited by 17 publications

References 25 publications

Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV

Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV

Handling qualitative preferences in SPARQL over virtual ontology-based data access

A collaborative framework for tweaking properties in a synthetic dataset

Contact Info

Product

Resources

About