Users frequently formulate complex data analysis queries in order to identify interesting trends, make unusual patterns stand out, or verify hypotheses. They also issue complex data manipulation queries in order to generate tables to be used by data mining tools. Being able to express these queries succinctly and concisely is of major importance not only from the user's, but also from the system's point of view. Extended Multi-Feature query language (EMF SQL), an extension to SQL, has proven useful in the expression of these queries. The succinct representation of complex data manipulation queries leads to a simple and generic evaluation algorithm that is easy to analyze, optimize, scale and parallelize.The PanQuery tool (PanQ) is a product that utilizes EMF SQL to combine and aggregate information from several data sources. It implements efficient evaluation and optimization techniques developed specifically for EMF SQL, offering at least one to two orders of magnitude performance improvement in most cases compared to traditional commercial database systems.
a b s t r a c tThe SQL:2003 standard introduced window functions to enhance the analytical processing capabilities of SQL. The key concept of window functions is to sort the input relation and to compute the aggregate results during a scan of the sorted relation. For multi-dimensional OLAP queries with aggregation groups defined by a general y condition an appropriate ordering does not exist, though, and hence expensive join-based solutions are required.In this paper we introduce y-constrained multi-dimensional aggregation (y-MDA), which supports multi-dimensional OLAP queries with aggregation groups defined by inequalities.y-MDA is not based on an ordering of the data relation. Instead, the tuples that shall be considered for computing an aggregate value can be determined by a general y condition.This facilitates the formulation of complex queries, such as multi-dimensional cumulative aggregates, which are difficult to express in SQL because no appropriate ordering exists. We present algebraic transformation rules that demonstrate how the y-MDA interacts with other operators of a multi-set algebra. Various techniques for achieving an efficient evaluation of the y-MDA are investigated, and we integrate them into concrete evaluation algorithms and provide cost formulas. An empirical evaluation with data from the TPC-H benchmark confirms the scalability of the y-MDA operator and shows performance improvements of up to one order of magnitude over equivalent SQL implementations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.