A v ector A of length N is de ned implicitly, via a stream of updates of the form add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a Bbucket piecewise-constant representation histogram H for A from the sketch, such that kA , Hk 1+ kA , Hoptk, where the error kA , Hk is either`1 absolute or`2 rootmean-square error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by polyB;logN; log kAk ; 1= . Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality signi cantly. F rom the robust histogram, we cull a histogram of desired accruacy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under`2 error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.
In this paper we introduce the idea of snapshot queries for energy efficient data acquisition in sensor networks. Network nodes generate models of their surrounding environment that are used for electing, using a localized algorithm, a small set of representative nodes in the network. These representative nodes constitute a network snapshot and can be used to provide quick approximate answers to user queries while reducing substantially the energy consumption in the network. We present a detailed experimental study of our framework and algorithms, varying multiple parameters like the available memory of the sensor nodes, their transmission range, the network message loss etc. Depending on the configuration, snapshot queries provide a reduction of up to 90% in the number of nodes that need to participate in a user query.
We present techniques for computing small space representations of massive data streams. These are inspired by traditional wavelet-based approximations that consist of specific linear projections of the underlying data. We present general "sketch"based methods for capturing various linear projections and use them to provide pointwise and rangesum estimation of data streams. These methods use small amounts of space and per-item time while streaming through the data and provide accurate representation as our experiments with real data streams show.
Recently, skyline queries have attracted much attention in the database research community. Space partitioning techniques, such as recursive division of the data space, have been used for skyline query processing in centralized, parallel and distributed settings. Unfortunately, such grid-based partitioning is not suitable in the case of a parallel skyline query, where all partitions are examined at the same time, since many data partitions do not contribute to the overall skyline set, resulting in a lot of redundant processing.In this paper we propose a novel angle-based space partitioning scheme using the hyperspherical coordinates of the data points. We demonstrate both formally as well as through an exhaustive set of experiments that this new scheme is very suitable for skyline query processing in a parallel sharenothing architecture. The intuition of our partitioning technique is that the skyline points are equally spread to all partitions. We also show that partitioning the data according to the hyperspherical coordinates manages to increase the average pruning power of points within a partition. Our novel partitioning scheme alleviates most of the problems of traditional grid partitioning techniques, thus managing to reduce the response time and share the computational workload more fairly. As demonstrated by our experimental study, our technique outperforms grid partitioning in all cases, thus becoming an efficient and scalable solution for skyline query processing in parallel environments.
Abstract-Nowadays, most applications return to the user a limited set of ranked results based on the individual user's preferences, which are commonly expressed through top-k queries. From the perspective of a manufacturer, it is imperative that her products appear in the highest ranked positions for many different user preferences, otherwise the product is not visible to potential customers. In this paper, we define a novel query type, namely the reverse top-k query, that covers this requirement: "Given a potential product, which are the user preferences that make this product belong to the top-k query result set?". Reverse top-k queries are essential for manufacturers to assess the impact of their products in the market based on the competition. We formally define reverse top-k queries and introduce two versions of the query, monochromatic and bichromatic. First, we provide a geometric interpretation of the monochromatic reverse topk query to acquire an intuition of the solution space. Then, we study in detail the case of bichromatic reverse top-k query, and we propose two techniques for query processing, namely an efficient thresholdbased algorithm and an algorithm based on materialized reverse topk views. Our experimental evaluation demonstrates the efficiency of our techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.