Dynamic data transformation for low latency querying in big data systems

Ordonez-Ante, Leandro; Vanhove, Thomas; Seghbroeck, Gregory Van; Wauters, Tim; Volckaert, Bruno; Turck, Filip De

doi:10.1109/bigdata.2017.8258206

Cited by 1 publication

(7 citation statements)

References 9 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this sense, this paper delves further into the approach introduced by Ordonez et al [7], particularly by elaborating on an automatic mechanism for materialized view selection and creation. The mechanism presented in the following sections relies also on syntactic analysis of query workloads issued against a dimensionally modeled data collection.…”

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…It is noteworthy that the approach in [7] was concerned with defining a data transformation framework on a conceptual level, while the mechanism discussed herein addresses an actual realization of such framework, tackling the problem of materialized view selection on large data collections. Likewise, the approach introduced by Vanhove et al [8] -that served as inspiration for the framework proposed in [7]-is not particularly concerned with reducing query latency, but with enabling live data migration between different data storage technologies, irrespective of the query-workload. In this sense, the contribution of the proposed mechanism lies in three key features: (i ) a vector representation that encodes not only the query-attribute usage, but also the basic structure of analytical queries, enabling a more precise and also regular representation of the query set, (ii ) a measure of query distance tightly suited to the structure of the formulated feature vector representation providing a more accurate method for estimating query relatedness, instead of plain Hamming distance used in existing approaches, and (iii ) a scalable procedure for candidate view generation that relies on a measure of cluster consistency, which in turn uses the above-mentioned query dissimilarity metric to unambiguously identify materializable groups of related queries.…”

Section: Related Workmentioning

confidence: 99%

“…A key difference between these two types of workloads lies in the data models and structures they operate on: OLTP systems work on top of highly normalized data models, while OLAP workloads run against denormalized schemas featuring precomputed views derived from transactional business data. Results of a previous experimental study [7] evidence that using such read-optimized structures alone is not enough for analytical processing applications to meet strict response time requirements, even for small datasets.…”

Section: Introductionmentioning

confidence: 99%

“…Based on the work of Vanhove et al, a framework that serves as conceptual foundation for the mechanism this paper reports on is presented in [7]. The intuition behind that framework was to progressively optimize the schema of a base dataset by applying a sequence of data transformation operations (e.g.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Workload-Driven Approach for View Selection in Large Dimensional Datasets

et al. 2020

Self Cite

View full text Add to dashboard Cite

The information explosion the world has witnessed in the last two decades has forced businesses to adopt a data-driven culture for them to be competitive. These data-driven businesses have access to countless sources of information, and face the challenge of making sense of overwhelming amounts of data in a efficient and reliable manner, which implies the execution of readintensive operations. In the context of this challenge, a framework for the dynamic read-optimization of large dimensional datasets has been designed, and on top of it a workload-driven mechanism for automatic materialized view selection and creation has been developed. This paper presents an extensive description of this mechanism, along with a proof-of-concept implementation of it and its corresponding performance evaluation. Results show that the proposed mechanism is able to derive a limited but comprehensive set of views leading to a drop in query latency ranging from 80% to 99.99% at the expense of 13% of the disk space used by the base dataset. This way, the devised mechanism enables speeding up query execution by building materialized views that match the actual demand of query workloads.

show abstract

Section: Related Workmentioning

confidence: 99%