Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without partitioning the graph, communication quickly becomes a limiting factor in scaling the system up. Existing graph partitioning heuristics incur high computation and communication cost on large graphs, sometimes as high as the future computation itself. Observing that the graph has to be loaded into the cluster, we ask if the partitioning can be done at the same time with a lightweight streaming algorithm.We propose natural, simple heuristics and compare their performance to hashing and METIS, a fast, offline heuristic. We show on a large collection of graph datasets that our heuristics are a significant improvement, with the best obtaining an average gain of 76%. The heuristics are scalable in the size of the graphs and the number of partitions. Using our streaming partitioning methods, we are able to speed up PageRank computations on Spark [32], a distributed computation system, by 18% to 39% for large social networks.
We present Brahms, an algorithm for sampling random nodes in a large dynamic system prone to malicious behavior. Brahms stores small membership views at each node, and yet overcomes Byzantine attacks by a linear portion of the system. Brahms is composed of two components. The first one is a resilient gossip-based membership protocol. The second one uses a novel memory-efficient approach for uniform sampling from a possibly biased stream of ids that traverse the node. We evaluate Brahms using rigorous analysis, backed by simulations, which show that our theoretical model captures the protocol's essentials. We study two representative attacks, and show that with high probability, an attacker cannot create a partition between correct nodes. We further prove that each node's sample converges to a uniform one over time. To our knowledge, no such properties were proven for gossip protocols in the past.
This paper presents RaWMS, a novel lightweight random membership service for ad hoc networks. The service provides each node with a partial uniformly chosen view of network nodes. Such a membership service is useful, e.g., in data dissemination algorithms, lookup and discovery services, peer sampling services, and complete membership construction. The design of RaWMS is based on a novel reverse random walk (RW) sampling technique. The paper includes a formal analysis of both the reverse RW sampling technique and RaWMS and verifies it through a detailed simulation study. In addition, RaWMS is compared both analytically and by simulations with a number of other known methods such as flooding and gossip-based techniques.
Abstract-Reliable broadcast is a basic service for many collaborative applications as it provides reliable dissemination of the same information to many recipients. This paper studies three common approaches for achieving scalable reliable broadcast in ad-hoc networks, namely probabilistic flooding, counter based broadcast, and lazy gossip. The strength and weaknesses of each scheme are analyzed, and a new protocol that combines these three techniques, called RAPID, is developed.Specifically, the analysis in this paper focuses on the tradeoffs between reliability (percentage of nodes that receive each message), latency, and the message overhead of the protocol. Each of these methods excel in some of these parameters, but no single method wins in all of them. This motivates the need for a combined protocol that benefits from all of these methods and allows to trade between them smoothly. Interestingly, since the RAPID protocol only relies on local computations and probability, it is highly resilient to mobility and failures and even selfish behavior. By adding authentication, it can even be made malicious tolerant.Additionally, the paper includes a detailed performance evaluation by simulation. The simulations confirm that RAPID obtains higher reliability with low latency and good communication overhead compared with each of the individual methods.
Quorums are a basic construct in solving many fundamental distributed computing problems. One of the known ways of making quorums scalable and efficient is by weakening their intersection guarantee to being probabilistic. This paper explores several access strategies for implementing probabilistic quorums in ad hoc networks. In particular, we present the first detailed study of asymmetric probabilistic bi-quorum systems, that allow to mix different access strategies and different quorums sizes, while guaranteeing the desired intersection probability. We show the advantages of asymmetric probabilistic bi-quorum systems in ad hoc networks. Such an asymmetric construction is also useful for other types of networks with non uniform access costs (e.g, peer-to-peer networks). The paper includes both a formal analysis of these approaches backed up by an extensive simulation based study. In particular, we show that one of the strategies that uses Random Walks, exhibits the smallest communication overhead, thus being very attractive for ad hoc networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.