2017
DOI: 10.1145/3059177
|View full text |Cite
|
Sign up to set email alerts
|

Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality

Abstract: Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of vectors) of multidimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
1
1

Relationship

5
1

Authors

Journals

citations
Cited by 20 publications
(19 citation statements)
references
References 42 publications
0
19
0
Order By: Relevance
“…Platforms such as MapReduce [14], Yarn [29], Spark [32] and Mahout [22] are nowadays commonplace. Predictive modeling [26], [23] and exploratory analysis [2,3,6,20] are commonly based on statistical aggregation operators over the results of exploration queries [4,7]. Such queries involve large datasets (which may themselves be the result of linking of other different datasets) and a number of range predicates over multidimensional data vectorial representation, structured, semi-and unstructured data.…”
Section: Introductionmentioning
confidence: 99%
“…Platforms such as MapReduce [14], Yarn [29], Spark [32] and Mahout [22] are nowadays commonplace. Predictive modeling [26], [23] and exploratory analysis [2,3,6,20] are commonly based on statistical aggregation operators over the results of exploration queries [4,7]. Such queries involve large datasets (which may themselves be the result of linking of other different datasets) and a number of range predicates over multidimensional data vectorial representation, structured, semi-and unstructured data.…”
Section: Introductionmentioning
confidence: 99%
“…This has been realized in many previous studies (McConnell and Skillicorn 2005;Tulone and Madden 2006;Goel and Imielinski 2001;Anagnostopoulos and Triantafillou 2014, 2015, 2017a. In this case, analytics tasks are carried out by the back-end system on the cloud only, and not by the SANs or ENs at the edge of the network, despite their increasing computing capacity.…”
Section: Literature Reviewmentioning
confidence: 96%
“…Our work is related to prior work in analytical-query processing and in applied ML research communities and to prior work focusing on the benefits of the query-driven approach in analytical query processing and tuning [5,6,18,24]. Analytical queries nowadays are executed over underlying systems that provide either exact answers [21,26] or approximate answers [4,14,16,22,23] working over large big data clusters in DCs/CS requiring several orders of magnitude longer query response times.…”
Section: Related Workmentioning
confidence: 99%
“…Query-driven models are largely being deployed for both aggregate estimation [5,6] and for hyper-tuning [28] database systems. Unlike [5,6] our focus is on a wide variety of aggregate operators and not just COUNT for selectivity estimation. Furthermore, we address the crucial problem of detecting query pattern changes and adapting to them, which (to our knowledge) has not been addressed in this context before.…”
Section: Related Workmentioning
confidence: 99%