In-Database Learning with Sparse Tensors

Khamis, Mahmoud Abo; Ngo, Hung Q.; Nguyen, XuanLong; Olteanu, Dan; Schleich, Maximilian

doi:10.1145/3196959.3196960

Cited by 63 publications

(97 citation statements)

References 38 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In-database machine learning algorithms is a growing class of algorithms that aims to learn in time sublinear in the input data a.k.a. the design matrix [22,2,11,3,18,19]. The trick is that the design matrix J often happens to be the output of some database query Q whose size could be much larger than the size of its input tables T 1 , .…”

Section: Related Resultsmentioning

confidence: 99%

“…By pushing machine learning algorithms down the database engine, we could run some of them in time max j |T j | ≪ |J|, hence sublinear in |J|. This however often requires the database engine to be capable of efficiently solving a large number of aggregate queries [3,2], many of which can be modeled as FAQs [5] or FAQ-AIs [1]. FAQ-AIs studied in this paper have been used as the building blocks of many in-database algorithms including k-means clustering, support vector machines, and polynomial regression [1,3].…”

Section: Related Resultsmentioning

confidence: 99%

“…, β (T ) . 3 Since the algorithm is inspired by several observations that lead to the analysis, we will state key observations and lemmas along with the presentation of the algorithm.…”

Section: Algorithm and Analysismentioning

confidence: 99%

See 2 more Smart Citations

On Functional Aggregate Queries with Additive Inequalities

Khamis¹,

Curtin²,

Moseley

et al. 2019

Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Self Cite

View full text Add to dashboard Cite

We consider the problem of evaluating an aggregation query, which is a sum-of-sum query or a sum-of-product query, subject to additive inequalities. Such aggregation queries, with a smallish number of additive inequalities, arise naturally/commonly in many applications, particularly in machine learning applications. We give a relatively complete categorization of the computational complexity of such problems. We first show that the problem is NP-hard, even in the case of one additive inequality. Thus we turn to approximating the query. Our main result is an efficient algorithm for approximating, with arbitrarily small relative error, many natural aggregation queries with one additive inequality. We give examples of natural queries that can be efficiently solved using this algorithm. In contrast we show that the situation with two additive inequalities is quite different, by showing that it is NP-hard to evaluate simple aggregation queries, with two additive inequalities, with any bounded relative error. We end by considering the problem of computing the gradient of the objective function in the Support Vector Machines (SVM) problem, a canonical machine learning problem. While computing the gradient for SVM can be reduced to the problem of computing an aggregation query with one additive inequality, our algorithm is not applicable due to what we call the "subtraction problem". However, we show how to circumvent this subtraction problem within the context of SVM to obtain a gradient-descent algorithm that will result in an approximately correct optimal solution, using an alternative notion of approximate correctness, which may be of independent interest.

show abstract

Section: Related Resultsmentioning

confidence: 99%

Section: Related Resultsmentioning

confidence: 99%

See 1 more Smart Citation

On Functional Aggregate Queries with Additive Inequalities

Khamis¹,

Curtin²,

Moseley

et al. 2019

Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…A similar generalization works for factorization machines [5,40]. Categorical a ributes can be accommodated as for linear regression and then each categorical a ribute X j with exponent a j > 0 becomes a group-by a ribute.…”

Section: Applicationsmentioning

confidence: 99%

A Layered Aggregate Engine for Analytics Workloads

Schleich

Olteanu

Khamis³

et al. 2019

Proceedings of the 2019 International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

is paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. e primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into groupby aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classi cation trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing.LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization.We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases.

show abstract

“…Instead of representing such a Cartesian product of two relation parts explicitly as done by relational database systems, we can represent it symbolically as a tree whose root is the Cartesian product symbol and has as children the two relation parts. It has been shown that factorization can improve the performance of joins [42], aggregates [9,6], and more recently machine learning [51,41,4,2]. The additive inverse of rings allows to treat uniformly data updates (inserts and deletes) and enables incremental maintenance of models learned over relational data [28,39,27].…”

Section: Structure-aware Learningmentioning

confidence: 99%

Learning Models over Relational Data: A Brief Tutorial

Schleich

Olteanu

Abo-Khamis³

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

This tutorial overviews the state of the art in learning models over relational databases and makes the case for a first-principles approach that exploits recent developments in database research.The input to learning classification and regression models is a training dataset defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using a statistical package. This approach can be expensive as it requires the materialization of the training dataset. An alternative approach is to cast the machine learning problem as a database problem by transforming the data-intensive component of the learning task into a batch of aggregates over the feature extraction query and by computing this batch directly over the input database.The tutorial highlights a variety of techniques developed by the database theory and systems communities to improve the performance of the learning task. They rely on structural properties of the relational data and of the feature extraction query, including algebraic (semi-ring), combinatorial (hypertree width), statistical (sampling), or geometric (distance) structure. They also rely on factorized computation, code specialization, query compilation, and parallelization.

show abstract

In-Database Learning with Sparse Tensors

Cited by 63 publications

References 38 publications

On Functional Aggregate Queries with Additive Inequalities

On Functional Aggregate Queries with Additive Inequalities

A Layered Aggregate Engine for Analytics Workloads

Learning Models over Relational Data: A Brief Tutorial

Contact Info

Product

Resources

About