Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, the heterogeneity and irregularity usually associated with big data applications often overwhelm the existing software and hardware infrastructures. In such context, the flexibility and elasticity provided by the cloud computing paradigm offer a natural approach to cost-effectively adapting the allocated resources to the application's current needs. However, these same characteristics impose extra challenges to predicting the performance of cloud-based big data applications, a key step to proper management and planning. This paper explores three modeling approaches for performance prediction of cloud-based big data applications. We evaluate two queuing-based analytical models and a novel fast ad hoc simulator in various scenarios based on different applications and infrastructure setups. The three approaches are compared in terms of prediction accuracy, finding that our best approaches can predict average application execution times with 26% relative error in the very worst case and about 7% on average.
Big data analytics have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, challenging existing software and hardware infrastructures to meet their dynamic resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications as it provides flexibility and elasticity, adapting the allocated resources in response to the application's current needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and a novel fast ad-hoc simulator in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches can predict average application execution times with only up to a 7% relative error, on average. Moreover, both of them run very fast (requiring at least two orders of magnitude lower execution time than widely used tools while providing slightly better accuracy), being practical for online prediction.
Multiformalism modeling has shown to be a valuable technique to cope with the complexity of the constraints that apply to specifications of computer-based systems state of the art. Multiformalism techniques help modelers and designers by providing a more (natural and) convenient approach in the specification process and in analysis of performance. Although their application does not necessarily provide an advantage in the solutions of the models, this paper shows how a compositional multiformalism modeling approach can leverage the power of product-form solutions to offer both efficient solution and specification of models for complex system
The availability of powerful, worldwide span computing facilities offering application scalability by means of cloud infrastructures perfectly matches the needs for resources that characterize Big Data applications. Elasticity of resources in the cloud enables application providers to achieve results in terms of complexity, performance and availability that were considered beyond affordability, by means of proper resource management techniques and a savvy design of the underlying architecture and of communication facilities. This paper presents an evaluation technique for the combined effects of cloud elasticity and Big Data oriented data management layer on global scale cloud applications, by modeling the behavior of both typical in memory and in storage data management.
Exceptions constitute a widely accepted fault tolerance mechanism, suitable to manage both hardware and software faults. In performability analysis it is a common practice to exploit software tools capable of describing a system using models expressed in various formalisms. Often these tools provide extensibility features that allow augmenting the primitives of a given formalism, but in most cases they lack of exception support. This paper aims at filling this gap, by introducing a general mechanism to add support for exception handling to most of the existing formalisms. The validity of the proposed method is supported by two modelling cases that benefit in clarity and economy
Hybrid systems (HS) have been proven a valid formalism to study and analyze specific issues in a variety of fields. However, most of the analysis techniques for HS are based on low-level description, where single states of the systems have to be defined and enumerated by the modeler. Some high level modeling formalisms, such as Fluid Stochastic Petri Nets, have been introduced to overcome such difficulties, but simple procedures allowing the definitions of domain specific languages for HS could simplify the analysis of such systems. This paper presents a stochastic HS language consisting of a subset of piecewise deterministic Markov processes, and shows how SIMTHESys – a compositional, metamodeling based framework describing and extending formalisms – can be used to convert into this paradigm a wide number of high-level HS description languages. A simple example applying the technique to solve a model of the energy consumption of a data-center specified using Queuing Network and Hybrid Petri Nets is presented to show the effectiveness of the proposal
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.