Alpine

Anagnostou, Antonios; Olma, Matthaios; Ailamaki, Anastasia

doi:10.1145/3035918.3058743

Cited by 4 publications

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

2019

View full text Add to dashboard Cite

When exploring big amounts of data without a clear target, providing an interactive experience becomes really difficult, since this tentative inspection usually defeats any early decision on data structures or indexing strategies. This is also true in the physics domain, specifically in high-energy physics, where the huge volume of data generated by the detectors are normally explored via C++ code using batch processing, which introduces a considerable latency. An interactive tool, when integrated into the existing data management systems, can add a great value to the usability of these platforms. Here, we intend to review the current state-of-the-art of interactive data exploration, aiming at satisfying three requirements: access to raw data files, stored in a distributed environment, and with a reasonably low latency. This paper follows the guidelines for systematic mapping studies, which is well suited for gathering and classifying available studies. We summarize the results after classifying the 242 papers that passed our inclusion criteria. While there are many proposed solutions that tackle the problem in different manners, there is little evidence available about their implementation in practice. Almost all of the solutions found by this paper cover a subset of our requirements, with only one partially satisfying the three. The solutions for data exploration abound. It is an active research area and, considering the continuous growth of data volume and variety, is only to become harder. There is a niche for research on a solution that covers our requirements, and the required building blocks are there.INDEX TERMS Big data applications, data analysis, data engineering, data exploration, database systems, interactive systems, systematic mapping study. APPENDIX RESULTS OF THE MAPPING STUDYSee Tables.

show abstract

Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

2019

View full text Add to dashboard Cite

show abstract

Multi-model Investigative Exploration of Social Media Data with BOUTIQUE: A Case Study in Public Health

Guo

Dasgupta

Gupta

2019

2019 15th International Conference on eScience (eScience)

View full text Add to dashboard Cite

We present our experience with a data science problem in Public Health, where researchers use social media (Twitter) to determine whether the public shows awareness of HIV prevention measures offered by Public Health campaigns. To help the researcher, we develop a investigative exploration system called BOUTIQUE that allows a user to perform a multistep visualization and exploration of data through a dashboard interface. Unique features of BOUTIQUE includes its ability to handle heterogeneous types of data provided by a polystore, and its ability to use computation as part of the investigative exploration process. In this paper, we present the design of the BOUTIQUE middleware and walk through an investigation process for a real-life problem.

show abstract

Slalom

et al. 2017

View full text Add to dashboard Cite

The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. Hence, recent in-situ query processing systems operate directly over raw data, alleviating the loading cost. At the same time, analytical workloads have increasing number of queries. Typically, each query focuses on a constantly shifting -- yet small -- range. Minimizing the workload latency, now, requires the benefits of indexing in in-situ query processing. In this paper, we present Slalom, an in-situ query engine that accommodates workload shifts by monitoring user access patterns. Slalom makes on-the-fly partitioning and indexing decisions, based on information collected by lightweight monitoring. Slalom has two key components: (i) an online partitioning and indexing scheme, and (ii) a partitioning and indexing tuner tailored for in-situ query engines. When compared to the state of the art, Slalom offers performance benefits by taking into account user query patterns to (a) logically partition raw data files and (b) build for each partition lightweight partition-specific indexes. Due to its lightweight and adaptive nature, Slalom achieves efficient accesses to raw data with minimal memory consumption. Our experimentation with both micro-benchmarks and real-life workloads shows that Slalom outperforms state-of-the-art in-situ engines (3 -- 10×), and achieves comparable query response times with fully indexed DBMS, offering much lower (∼ 3×) cumulative query execution times for query workloads with increasing size and unpredictable access patterns.

show abstract

Alpine

Cited by 4 publications

References 18 publications

Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

Multi-model Investigative Exploration of Social Media Data with BOUTIQUE: A Case Study in Public Health

Slalom

Contact Info

Product

Resources

About