Due to the competitiveness of the computing industry, software developers are pressured to quickly deliver new code releases. At the same time, operators are expected to update and keep production systems stable at all times. To overcome the development-operations barrier, organizations have started to adopt Infrastructure as Code (IaC) tools to efficiently deploy middleware and applications using automation scripts. These automations comprise a series of steps that should be idempotent to guarantee repeatability and convergence. Rigorous testing is required to ensure that the system idempotently converges to a desired state, starting from arbitrary states. We propose and evaluate a model-based testing framework for IaC. An abstracted system model is utilized to derive state transition graphs, based on which we systematically generate test cases for the automation. The test cases are executed in light-weight virtual machine environments. Our prototype targets one popular IaC tool (Chef), but the approach is general. We apply our framework to a large base of public IaC scripts written by operators, showing that it correctly detects non-idempotent automations. (Eds.): Middleware 2013, LNCS 8275, pp. 368-388, 2013. c IFIP International Federation for Information Processing 2013 to testing idempotence and convergence of IaC automations. idemN: This coverage parameter specifies a set of task sequence lengths for which idempotence should be tested. The possible values range from idemN = {1} (idempotence of single tasks) to idemN = {1, . . . , |A|} (maximum sequence length covering all automation tasks). Evidently, higher values produce more test cases, whereas lower values have the risk that problems related to dependencies between "distant" tasks are potentially not detected (see also Section 7.2). repeatN: This parameter controls the number of times each task is (at most) repeated. If the automation is supposed to converge after a single run (most Chef recipes are designed that way, see our evaluation in Section 7), it is usually sufficient to have repeatN = 1, because many idempotence related problems are already detected after executing a task (sequence) twice. However, certain scenarios might require higher values for repeatN , in particular automations that are continuously repeated in order to eventually converge. The tester then has to use domain knowledge to set a reasonable upper bound of repetitions. restart: The boolean parameter restart determines whether tasks are arbitrarily repeated in the middle of the automation (restart = f alse), or the whole automation always gets restarted from scratch (restart = true). Consider our scenario automation with task sequence a 1 , a 2 , a 3 , a 4 . If we require idemN = 3 with restart = true, then the test cases could for instance include the task sequences . If restart = f alse, we have additional test cases, including a 1 , a 2 , a 3 , a 2 , a 3 , ... , a 1 , a 2 , a 3 , a 4 , a 2 , a 3 , ... , etc. f orceP re: This parameter specifies whether different pre-states f...
No abstract
Abstract-Infrastructure-as-a-Service (IaaS) cloud environments expose to users the infrastructure of a data center while relieving them from the burden and costs associated with its management and maintenance. IaaS clouds provide an interface by means of which users can create, configure, and control a set of virtual machines that will typically host a composite software service. Given the increasing popularity of this computing paradigm, previous work has focused on modeling composite software services to automate their deployment in IaaS clouds. This work is concerned with the runtime state of composite services during and after deployment. We propose AESON, a deployment runtime that automatically detects node (virtual machine) failures and eventually brings the composite service to the desired deployment state by using information describing relationships between the service components. We have designed AESON as a decentralized peer-to-peer publish/subscribe system leveraging IBM's Bulletin Board (BB), a topic-based distributed shared memory service built on top of an overlay network.
Abstract-The complexity of today's computer systems poses a challenge to system administrators. Current systems comprise a multitude of inter-related software components running on different servers. In this paper, we propose the use of the stackable storage mechanism as the foundation of centralized systems management. At the management level, we show how this mechanism can be used to implement an infrastructure that allows administrators to perform typical tasks fast and effortlessly. In particular, we find that our prototype could have avoided 40% of the human mistakes observed experimentally by previous research. At the storage level, we identify three key characteristics of stackable storage that allow the definition of different policies with distinct performance and scalability behaviors. We quantitatively compare five storage policies under different workloads and conclude that stackable storage is a viable approach.
Cloud computing and the DevOps movement are two pillars that facilitate software delivery with extreme agility. "Born on the cloud" companies, such as Netflix ® , have demonstrated rapid growth to their business and continuous improvement to the service they provide, by reportedly applying DevOps principles. In this paper, we claim that to fulfill the vision of fast software delivery, without compromising the quality of the provided services, we need a new approach to detecting problems, including problems that may have occurred during the continuous deployment cycle. A native DevOps-centric approach to problem resolution puts the focus on a wider range of possible error sources (including code commits), makes use of DevOps metadata to clearly define the source of the problem, and leads to a quick problem resolution. We propose such a continuous quality assurance approach, and we demonstrate it by preliminary experiments in our public Container Cloud environment and in a private OpenStack ® cloud environment.
Abstract-Operator mistakes are a significant source of unavailability in Internet services. In our previous work, we proposed operator action validation as an approach for detecting mistakes while hiding them from the service and its users. Previous validation strategies have limitations, however, including the need for instances of correct behavior for comparison. In this paper, we propose a novel model-based validation strategy that addresses these limitations and complements our previous techniques. Model-based validation calls for service engineers to define models of Internet services that can be used to differentiate between correct and incorrect configurations and behaviors. These models are then used to guide the specification of validation assertions that check the correctness of operator actions before they are exposed. We have implemented a prototype modelbased validation system for two services, the Web crawler of a commercial search engine (Ask.com) and an academic yet realistic online auction service. Experimentation with modelbased validation demonstrates that it is highly effective at detecting and hiding both activated and latent mistakes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.