2012
DOI: 10.14778/2367502.2367533
|View full text |Cite
|
Sign up to set email alerts
|

Blink and it's done

Abstract: In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 70 publications
(4 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…Beyond these technique-specific solutions, another way to remedy the effects of tailored and, therefore, potentially skewed samples is to combine the chunks of tailorable sampling with a "baseline sampling" that remains constant throughout the analysis. A similar idea is used in BlinkDB [1], where both a uniform sample and a set of stratified samples are maintained. Here, combining multiple samples provides "tighter approximation errors" and "significantly reduces [...] the subset error".…”
Section: Impact After Samplingmentioning
confidence: 99%
See 1 more Smart Citation
“…Beyond these technique-specific solutions, another way to remedy the effects of tailored and, therefore, potentially skewed samples is to combine the chunks of tailorable sampling with a "baseline sampling" that remains constant throughout the analysis. A similar idea is used in BlinkDB [1], where both a uniform sample and a set of stratified samples are maintained. Here, combining multiple samples provides "tighter approximation errors" and "significantly reduces [...] the subset error".…”
Section: Impact After Samplingmentioning
confidence: 99%
“…Depending on what task an analyst wants to perform on that data, there are different ways for how to make the sampling of that data most useful: For example, to gain an initial overview of the data, it makes sense to draw a uniform sample that helps depict the distribution of all three attributes. In the sample depicted in subfigure (1), the densely populated region in the center of the plot stands out. On the other hand, to analyze the local distribution of the Boolean attribute, it is more useful to sample the data along a regular grid, such that the density in each grid cell is even throughout the sample, which puts the focus on the Boolean attribute.…”
Section: Introductionmentioning
confidence: 99%
“…Prior the process of forwarding the query to the list of particular resources, the system implemented a semantic optimization to minimize the overall execution cost (Arens et al, 1994). Query processing algorithm must provide real-time answers along with fault-tolerant assumption, and could scale to petabytes of data and thousands of resources in a fault-tolerant manner (Agarwal et al, 2012). However, we assume that not all queries are associated with an explicit evidence of user's feedback and often queries are semantically associated with similar implicit feedback.…”
Section: Query Processingmentioning
confidence: 99%
“…The technique of caching intermediate results is one of the widely used query optimization technique [22], extended by Safaeei A et al [23] based on multiple sliding windows to improve execution of overlapping queries with common subexpressions. Laptev et al [24] presented EARL system and Agarwal et al [25] proposed the BlinkDB, those iteratively works for collecting larger samples to reach at the desired accuracy. The Shark, presented in [26] caches inter query data with the help of shared memory concept.…”
Section: Related Workmentioning
confidence: 99%