Today, root cause analysis of failures in data centers is mostly done through manual inspection. More often than not, customers blame the network as the culprit. However, other components of the system might have caused these failures. To troubleshoot, huge volumes of data are collected over the entire data center. Correlating such large volumes of diverse data collected from different vantage points is a daunting task even for the most skilled technicians. In this paper, we revisit the question: how much can you infer about a failure in the data center using TCP statistics collected at one of the endpoints? Using an agent that captures TCP statistics we devised a classification algorithm that identifies the root cause of failure using this information at a single endpoint. Using insights derived from this classification algorithm we identify dominant TCP metrics that indicate where/why problems occur in the network. We validate and test these methods using data that we collect over a period of six months in the Azure production cloud.
a b s t r a c tIn unstructured peer-to-peer networks, such as Gnutella, peers propagate query messages towards the resource holders by flooding them through the network. This is, however, a costly operation since it consumes node and link resources excessively and often unnecessarily. There is no reason, for example, for a peer to receive a query message if the peer has no matching resource or is not on the path to a peer holding a matching resource. In this paper, we present a solution to this problem, which we call Route Learning, aiming to reduce query traffic in unstructured peer-to-peer networks. In Route Learning, peers try to identify the most likely neighbors through which replies can be obtained to submitted queries. In this way, a query is forwarded only to a subset of the neighbors of a peer, or it is dropped if no neighbor, likely to reply, is found. The scheme also has mechanisms to cope with variations in user submitted queries, like changes in the keywords. The scheme can also evaluate the route for a query for which it is not trained. We show through simulation results that when compared to a pure flooding based querying approach, our scheme reduces bandwidth overhead significantly without sacrificing user satisfaction.
In practice, inconsistencies between architectural documentation and the code might arise due to improper implementation of the architecture or the separate, uncontrolled evolution of the code. Several approaches have been proposed to detect inconsistencies between the architecture and the code but these tend to be limited for capturing inconsistencies that might occur at runtime. We present a runtime verification approach for detecting inconsistencies between the dynamic behavior of the documented architecture and the actual runtime behavior of the system. The approach is supported by a set of tools that implement the architecture and the code patterns in Prolog, and automatically generate runtime monitors for detecting inconsistencies. We illustrate the approach and the toolset for a Crisis Management System case study.
Abstract. A P2P network that is overlayed over Internet can consist of thousands, or even millions of nodes. To analyze the performance of a P2P network, or an algorithm or protocol designed for a P2P network, simulation studies have to be performed quite often, and simulation studies require the use of appropriate models for various components and parameters of a P2P network simulated. Therefore it is important to have models and statistical information about various parameters and properties of a P2P network. This paper tries to model and obtain the characteristics of some of the important parameters of one widely used P2P network, Gnutella. The methodology to derive the characteristics is based on collecting P2P protocol traces from the Gnutella network that is currently running over the Internet, and analyzing the collected traces. The results we present in this paper will be an important ingredient for studies that are based on simulation of P2P networks, especially unstructured P2P networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.