Abstract-Join processing in wireless sensor networks is difficult: As the tuples can be arbitrarily distributed within the network, matching pairs of tuples is communication intensive and costly in terms of energy. Current solutions only work well with specific placements of the nodes and/or make restrictive assumptions. In this paper, we present SENS-Join, an efficient general-purpose join method for sensor networks. To obtain efficiency, SENS-Join does not ship tuples that do not join, based on a filtering step. Our main contribution is the design of this filtering step which is highly efficient in order not to exhaust the potential savings. We demonstrate the performance of SENS-Join experimentally: The overall energy consumption can be reduced by more than 80%, as compared to the state-of-the-art approach. The per node energy consumption of the most loaded nodes can be reduced by more than an order of magnitude.
Materialized views (MV) can significantly improve the query performance of relational databases. In this paper, we consider MVs to optimize complex scenarios where many heterogeneous nodes with different resource constraints (e.g., CPU, IO and network bandwidth) query and update numerous tables on different nodes. Such problems are typical for large enterprises, e.g., global retailers storing thousands of relations on hundreds of nodes at different subsidiaries.Choosing which views to materialize in a distributed, complex scenario is NP-hard. Furthermore, the solution space is huge, and the large number of input factors results in non-monotonic cost models. This prohibits the straightforward use of brute-force algorithms, greedy approaches or proposals from organic computing. For the same reason, all solutions for choosing MVs we are aware of do not consider either distributed settings or update costs.In this paper we describe an algorithmic framework which restricts the sets of considered MVs so that a genetic algorithm can be applied. In order to let the genetic algorithm converge quickly, we generate initial populations based on knowledge on database tuning, and devise a selection function which restricts the solution space by taking the similarity of MV configurations into account. We evaluate our approach both with artificial settings and a real-world RFID scenario from retail. For a small setting consisting of 24 tables distributed over 9 nodes, an exhaustive search needs 10 hours processing time. Our approach derives a comparable set of MVs within 30 seconds. Our approach scales well: Within 15 minutes it chooses a set of MVs for a realworld scenario consisting of 1,000 relations, 400 hosts, and a workload of 3,000 queries and updates.
Distributed Hash Tables (DHT) promise to administer huge sets of (key, value)-pairs under high workloads. DHT currently are a hot topic of research in various disciplines of computer science. Experimental results that are convincing require evaluations with large DHT (i.e., more than 100,000 nodes). However, many studies confine themselves to (less convincing) experimental examinations with much fewer nodes. Information on how to run experiments with DHT with many nodes is not available. Based on experience gained with a DHT implementation of our own [4], this article describes how to carry out such experiments successfully. The infrastructure used is a cluster of 32 commodity workstations. The article starts by compiling requirements regarding such experiments. We then identify the various bottlenecks that may be the result of a naive implementation, and we describe their negative effects. The article proposes various countermeasures, e.g., an experiment clock, and a component that maintains persistent network connections between cluster nodes. The features proposed are beneficial: A naive experimental setup allows for 10,000 peers maximum and a total of 20 operations per second, a sophisticated one following our proposal for 1,000,000 peers and 150 operations per second. Furthermore, we say why experimental results gained in such a way are meaningful in many situations.
Structured P2P systems in the form of distributed hash tables (DHT) are a promising approach for building massively distributed data management platforms. However, for many applications the supported key lookup queries are not sufficient. Instead, techniques for managing and querying (relational) structured data are required. In this paper, we argue that in order to cope with the dynamics in large-scale P2P systems such query techniques should be work in a best effort manner. We describe such operations (namely grouping/aggregation, similarity and nearest neighbor search) and discuss appropriate query evaluation strategies.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.