Abstract. In order to reduce mean time to recovery (MTTR) in heterogeneous enterprise environments it should be possible to easily and quickly determine the root cause of a problem detected at a higher level, e.g. through response time violation of a transaction category, and resolve it. Many problem determination applications use a component dependency graph to pinpoint the root cause. However, such graphs are often manually constructed. This paper introduces a simple nonintrusive technique based on mining of existing runtime monitored data, to construct a dynamic dependency graph between the components of an enterprise environment. The graph is traversed to identify nodes that are the cause of response time related problems.
IntroductionTypically dependency models of system hardware and software are analyzed for problem determination and impact analysis in complex enterprise environments. Prior work talks about explicit middleware instrumentation [5], or internal instrumentation of the components (via ARM [2]) for obtaining system dependencies. These methods are time consuming and are difficult to apply in legacy environments. The main contribution of this paper is in showing how existing performance monitoring infrastructure available in middleware, such as web application servers and database servers, can be used in discovering dependencies between the various components of a system. Management clients can poll the middleware for performance metrics, such as total number of requests to a component, average response time of a component, etc. This paper proposes a data-mining algorithm that uses this performance data for obtaining "probabilistic" dependencies between components. An online algorithm for discovering and updating these dependencies between components is provided. Because of the probabilistic nature of these dependencies, "false" dependencies may arise and therefore we show how a problem determination application can use the dependencies effectively. Dependencies can be of various types [17], but this paper focuses on finding runtime software/service dependencies among the following components: URLs, servlets, EJBs, and SQLs in a web application.