The emerging cloud-computing paradigm is rapidly gaining momentum as an alternative to traditional IT (information technology). However, contemporary cloud-computing offerings are primarily targeted for Web 2.0-style applications. Only recently have they begun to address the requirements of enterprise solutions, such as support for infrastructure service-level agreements. To address the challenges and deficiencies in the current state of the art, we propose a modular, extensible cloud architecture with intrinsic support for business service management and the federation of clouds. The goal is to facilitate an open, service-based online economy in which resources and services are transparently provisioned and managed across clouds on an ondemand basis at competitive costs with high-quality service. The Reservoir project is motivated by the vision of implementing an architecture that would enable providers of cloud infrastructure to dynamically partner with each other to create a seemingly infinite pool of IT resources while fully preserving their individual autonomy in making technological and business management decisions. To this end, Reservoir could leverage and extend the advantages of virtualization and embed autonomous management in the infrastructure. At the same time, the Reservoir approach aims to achieve a very ambitious goal: creating a foundation for next-generation enterprise-grade cloud computing.
Matrix computations are both fundamental and ubiquitous in computational science and its vast application areas. Along with the development of more advanced computer systems with complex memory hierarchies, there is a continuing demand for new algorithms and library software that efficiently utilize and adapt to new architecture features. This article reviews and details some of the recent advances made by applying the paradigm of recursion to dense matrix computations on today's memory-tiered computer systems. Recursion allows for efficient utilization of a memory hierarchy and generalizes existing fixed blocking by introducing automatic variable blocking that has the potential of matching every level of a deep memory hierarchy. Novel recursive blocked algorithms offer new ways to compute factorizations such as Cholesky and QR and to solve matrix equations. In fact, the whole gamut of existing dense linear algebra factorization is beginning to be reexamined in view of the recursive paradigm. Use of recursion has led to using new hybrid data structures and optimized superscalar kernels. The results we survey include new algorithms and library software implementations for level 3 kernels, matrix factorizations, and the solution of general systems of linear equations and several common matrix equations. The software implementations we survey are robust and show impressive performance on today's high performance computing systems.
In order to meet stringent performance requirements, system administrators must e↵ectively detect undesirable performance behaviours, identify potential root causes and take adequate corrective measures. The problem of uncovering and understanding performance anomalies and their causes (bottlenecks) in di↵erent system and application domains is well studied. In order to assess progress, research trends and identify open challenges, we have reviewed major contributions in the area and present our findings in this survey. Our approach provides an overview of anomaly detection and bottleneck identification research as it relates to the performance of computing systems. By identifying fundamental elements of the problem, we are able to categorize existing solutions based on multiple factors such as the detection goals, nature of applications and systems, system observability, and detection methods.
Software architecture is undergoing a transition from monolithic architectures to microservices to achieve resilience, agility and scalability in software development. However, with microservices it is difficult to diagnose performance issues due to technology heterogeneity, large number of microservices, and frequent updates to both software features and infrastructure. This paper presents MicroRCA, a system to locate root causes of performance issues in microservices. MicroRCA infers root causes in real time by correlating application performance symptoms with corresponding system resource utilization, without any application instrumentation. The root cause localization is based on an attributed graph that model anomaly propagation across services and machines. Our experimental evaluation where common anomalies are injected to a microservice benchmark running in a Kubernetes cluster shows that MicroRCA locates root causes well, with 89% precision and 97% mean average precision, outperforming several state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.