Efficient schemes for similarity-aware refinement of aggregation queries

Albarrak, Abdullah M.; Sharaf, Mohamed A.

doi:10.1007/s11280-017-0434-4

Cited by 6 publications

(2 citation statements)

References 27 publications

(72 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, since HC[14] chooses refinement steps based on evaluating the relative error locally, it is vulnerable to getting stuck at a local minima when query similarity is included in assessing the relative error of each step. While this might not be true for SW framework[58], it still suffers from high I/O and CPU costs from exhaustively evaluating all cells in the partitioned space when there are no shape-based conditions.This thesis positions itself with[78,17,10,2,123,125,11,12] since it shares with all of these works a similar assumption. This assumption is a common problem that users often face when performing DE tasks.…”

mentioning

confidence: 66%

“…Accordingly, researchers have proposed highly specialized and optimized DE techniques to support users with their diverse exploration tasks. For example, some of these tasks are to recommend relevant data [30,29], to identify interesting subspaces of data that are highly deviated from the rest of data or a reference [124], to explain why outliers show up in the results [104,129], to summarize and present representative sets of the potentially huge result sets [28,65], to formulate or refine queries based on user-defined constraints [33,119,58,125,2].…”

Section: Data Explorationmentioning

confidence: 99%

See 1 more Smart Citation

Similarity-aware query refinement for data exploration

Albarrak¹

View full text Add to dashboard Cite

Database users are easily overwhelmed by the sheer size of data found in large-scale scientific and financial databases. Exploring these databases to make sense of the explored data and to discover interesting insights (i.e., data exploration) has been, and still is, a hideous and labour-intensive task, especially for non-expert users with no solid background of the underlying data. Some three decades ago, the database research community noticed the limitation of traditional DBMS in supporting users for data exploration tasks. Since then, the research community has proposed and designed various effective and efficient data exploration techniques to assist users in extracting interesting insights from their data. An instance of these techniques is the Query Refinement technique. In query refinement techniques, users' queries are assumed to be imprecise, i.e., the returned result does not meet some user-defined constraints. Accordingly, the goal of query refinement techniques is to automatically refine these imprecise queries to maximize users' satisfaction with the results. In particular, the predicates of the queries are carefully modified so that the returned results satisfy the user-defined constraints. Since users' constraints on the queries results are diverse and miscellaneous, this thesis focuses on two specific forms of constraints in exploring relational and sequential data, namely, 1) user-defined aggregate constraints on the result, and 2) user-defined correlation constraints of time series data. These constraints are common in real world applications because they represent an upper level view of the result that is easier to understand and digest than the raw result itself. This thesis addresses the limitations of current query refinement techniques that are oblivious to the similarity of the refined queries to the users' initial queries. Specifically, users' initial (and imprecise) queries are defined as anchor points for which the similarity of its corresponding refined queries are computed over the whole refinement space. Consequently, the similarity-aware query refinement problem is formulated as a search problem, which aims to balance the trade-off between minimizing the deviation from satisfying a constraint on the query result, and maximizing the similarity of the refined query to the initial one. Searching for a trade-off between satisfying a constraint on the result of a query and maximizing the similarity introduces various challenges. A common challenge shared by many query refinement problems is that finding an optimal trade-off i Lastly, I would like to formally thank my sponsor, Al-Imam Muhammad Ibn Saud Islamic University, for providing the financial support which made this journey possible.

show abstract

mentioning

confidence: 66%

Section: Data Explorationmentioning

confidence: 99%

Similarity-aware query refinement for data exploration

Albarrak¹

View full text Add to dashboard Cite

show abstract

Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

2019

View full text Add to dashboard Cite

When exploring big amounts of data without a clear target, providing an interactive experience becomes really difficult, since this tentative inspection usually defeats any early decision on data structures or indexing strategies. This is also true in the physics domain, specifically in high-energy physics, where the huge volume of data generated by the detectors are normally explored via C++ code using batch processing, which introduces a considerable latency. An interactive tool, when integrated into the existing data management systems, can add a great value to the usability of these platforms. Here, we intend to review the current state-of-the-art of interactive data exploration, aiming at satisfying three requirements: access to raw data files, stored in a distributed environment, and with a reasonably low latency. This paper follows the guidelines for systematic mapping studies, which is well suited for gathering and classifying available studies. We summarize the results after classifying the 242 papers that passed our inclusion criteria. While there are many proposed solutions that tackle the problem in different manners, there is little evidence available about their implementation in practice. Almost all of the solutions found by this paper cover a subset of our requirements, with only one partially satisfying the three. The solutions for data exploration abound. It is an active research area and, considering the continuous growth of data volume and variety, is only to become harder. There is a niche for research on a solution that covers our requirements, and the required building blocks are there.INDEX TERMS Big data applications, data analysis, data engineering, data exploration, database systems, interactive systems, systematic mapping study. APPENDIX RESULTS OF THE MAPPING STUDYSee Tables.

show abstract

Efficient Query Refinement for View Recommendation in Visual Data Exploration

Sharaf

Ehsan

2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

The need for efficient and effective data exploration has resulted in several solutions that automatically recommend interesting visualizations. The main idea underlying those solutions is to automatically generate all possible views of data, and recommend the top-k interesting views. However, those solutions assume that the analyst is able to formulate a well-defined query that selects a subset of data, which contains insights. Meanwhile, in reality, it is typically a challenging task to pose an exploratory query, which can immediately reveal some insights. To address that challenge, in this work we propose utilizing query refinement as one technique that allows to automatically adjust the analyst's input query to discover such valuable insights. However, a naive query refinement, in addition to generating a prohibitively large search space, also raises other problems such as deviating from the user's preference and recommending statistically insignificant views. In this paper, we address those problems and propose a novel suit of schemes, which efficiently navigate the refined queries search space to recommend the top-k insights that meet all of the analyst's pre-specified criteria.

show abstract

Efficient schemes for similarity-aware refinement of aggregation queries

Cited by 6 publications

References 27 publications

Similarity-aware query refinement for data exploration

Similarity-aware query refinement for data exploration

Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

Efficient Query Refinement for View Recommendation in Visual Data Exploration

Contact Info

Product

Resources

About