Abstract-This work presents TIRAMOLA, a cloud-enabled, open-source framework to perform automatic resizing of NoSQL clusters according to user-defined policies. Decisions on adding or removing worker VMs from a cluster are modeled as a Markov Decision Process and taken in real-time. The system automatically decides on the most advantageous cluster size according to userdefined policies; it then proceeds on requesting/releasing VM resources from the provider and orchestrating them inside a NoSQL cluster. TIRAMOLA's modular architecture and standard API support allows interaction with most current IaaS platforms and increased customization. An extensive experimental evaluation on an HBase cluster confirms our assertions: The system resizes clusters in real-time and adapts its performance through different optimization strategies, different permissible actions, different input and training loads. Besides the automation of the process, it exhibits a learning feature which allows it to make very close to optimal decisions even with input loads 130% larger or alternating 10 times faster compared to the accumulated information.
NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, which allows for fairly portioned premiums and high-quality performance and directly applies to the philosophy of a cloudbased platform. Yet, the process of adaptive expansion and contraction of resources usually involves a lot of manual effort during cluster configuration. To date, there exists no comparative study to quantify this cost and measure the efficacy of NoSQL engines that offer this feature over a cloud provider. In this work, we present a cloud-enabled framework for adaptive monitoring of NoSQL systems. We perform a study of the elasticity feature on some of the most popular NoSQL databases over an open-source cloud platform. Based on these measurements, we finally present a prototype implementation of a decision making system that enables automatic elastic operations of any NoSQL engine based on administrator or application-specified constraints.
The focus of this work is the on-demand resource provisioning in cloud computing, which is commonly referred to as cloud elasticity. Although a lot of effort has been invested in developing systems and mechanisms that enable elasticity, the elasticity decision policies tend to be designed without quantifying or guaranteeing the quality of their operation. We present an approach towards the development of more formalized and dependable elasticity policies. We make two distinct contributions. First, we propose an extensible approach to enforcing elasticity through the dynamic instantiation and online quantitative verification of Markov Decision Processes (MDP) using probabilistic model checking. Second, various concrete elasticity models and elasticity policies are studied. We evaluate the decision policies using traces from a real NoSQL database cluster under constantly evolving external load. We reason about the behaviour of different modeling and elasticity policy options and we show that our proposal can improve upon the state-of-the-art in significantly decreasing under-provisioning while avoiding over-provisioning.
Efficient RDF data management systems are central to the vision of the Semantic Web. The enormous increase in both user and machine generated content dictates for scalable solutions in triple data stores. Current systems manage to decentralize some or all the stages of RDF data management, scaling to arbitrarily large numbers of triples. Yet, these systems prove highly inflexible in adjusting their behavior relative to the query in hand. Queries over triple data include multiple joins with varying degrees of selectivity and cost. In many cases, a join performed on a single centralized computer node is highly preferable. Thus, both informed query planning and adaptive join execution are necessary to gain optimal performance in both selective and non selective queries. Towards that direction, we describe H2RDF+, an RDF store that efficiently performs distributed joins over a multiple index scheme. H2RDF+ materializes 6 RDF indexes and detailed statistics using HBase. In this work, we emphasize on our novel, scalable and efficient MapReduce indexing process that allows H2RDF+ to handle arbitrarily large RDF datasets. Aggressive byte-level compression is also extensively used to reduce the storage space requirements of the system. H2RDF+ can also adaptively process both complex and selective queries by adaptively choosing the amount of resources allocated for each join, based on join complexity estimated through index statistics.
Abstract-Distributed systems such as Peer-to-Peer overlays have been shown to efficiently support the processing of range queries over large numbers of participating hosts. In such systems, uneven load allocation has to be effectively tackled in order to minimize overloaded peers and optimize their performance. In this work, we detect the two basic methodologies used to achieve load-balancing: Iterative key redistribution between neighbors and node migration. We identify these two key mechanisms and describe their relative advantages and disadvantages. Based on this analysis, we propose NIXMIG, a hybrid method that adaptively utilizes these two extremes to achieve both fast and cost-effective load-balancing in distributed systems that support range queries. We theoretically prove its convergence and as a case study, we offer an implementation on top of a Skip Graph, where we thoroughly validate our findings in a variety of static, dynamic and realistic workloads. We compare NIXMIG with an existing load-balancing algorithm proposed by Karger and Ruhl [1] and our experimental analysis shows that, NIXMIG can be as much as three times faster, requiring only one sixth and one third of message and item exchanges, respectively, to bring the system to a balanced state.Index Terms-Peer-to-peer systems, load-balancing, range queries.
In this paper, we present a distributed architecture for indexing and serving large and diverse datasets. It incorporates and extends the functionality of Hadoop, the open source MapReduce framework, and of HBase, a distributed, sparse, NoSQL database, to create a fully parallel indexing system. Experiments with structured, semi-structured and unstructured data of various sizes demonstrate the flexibility, speed and robustness of our implementation and contrast it with similarly oriented projects. Our 11 node cluster prototype managed to keep full-text indexing time of 150GB raw content in less than 3 hours, whereas the system's response time under sustained query load of more than 1000 queries/sec was kept in the order of milliseconds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.