This paper discusses approaches and environments for carrying out analytics on Clouds for Big Data applications. It revolves around four important areas of analytics and Big Data, namely (i) data management and supporting architectures; (ii) model development and scoring; (iii) visualisation and user interaction; and (iv) business models. Through a detailed survey, we identify possible gaps in technology and provide recommendations for the research community on future directions on Cloud-supported Big Data computing and analytics solutions.
Abstract-In the past few years, several DHT-based abstractions for peer-to-peer systems have been proposed. The main characteristic is to associate nodes (peers) with keys (objects) and to construct distributed routing structures to support an efficient location. These approaches address the load problem, and load balancing is achieved by moving the keys. However, the problem is still not properly covered. In this paper we present an analysis of structured peer-to-peer systems taking into consideration Zipf-like requests distribution. Based on our analysis, we propose a novel approach for load balancing relying on object popularity. Our approach is based on routing table reorganization in order to balance the lookup traffic load. We have implemented this approach in a Pastry-like system. The obtained results demonstrate a better balance of load, which can lead to improved scalability and performance.
Publish/subscribe systems provide useful platforms for delivering data (events) from publishers to subscribers in a decoupled fashion. Developing efficient publish/subscribe schemes in dynamic distributed systems is still an open problem for complex subscriptions (spanning multidimensional intervals). We propose a distributed R-tree (DR-tree) structure that uses R-tree-based spatial filters to construct a peer-to-peer overlay optimized for scalable and efficient selective dissemination of information. We adapt well-known variants of R-trees to organize publishers and subscribers in balanced peer-to-peer networks that support content-based filtering in publish/subscribe systems. DR-tree overlays guarantee subscription and publication times logarithmic in the size of the network while keeping space requirements low (comparable to distributed hash tables). The maintenance of the overlay is local and the structure is balanced with height logarithmic in the number of nodes. DR-tree overlays disseminate messages with no false negatives and very few false positives in the embedded publish/subscribe system. In addition, we propose self-stabilizing algorithms that guarantee consistency despite failures and changes in the peer population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.