Second International Conference on Autonomic Computing (ICAC'05) 2005
DOI: 10.1109/icac.2005.18
|View full text |Cite
|
Sign up to set email alerts
|

Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization

Abstract: Web applications suffer from software and configuration faults that lower their availability. Recovering from failure is dominated by the time interval between when these faults appear and when they are detected by site operators. We introduce a set of tools that augment the ability of operators to perceive the presence of failure: an automatic anomaly detector scours HTTP access logs to find changes in user behavior that are indicative of site failures, and a visualizer helps operators rapidly detect and diag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
35
0

Year Published

2006
2006
2012
2012

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(36 citation statements)
references
References 5 publications
1
35
0
Order By: Relevance
“…Fa then characterizes the deviation of F from these healthy states to pinpoint the probable cause of the failure. Similar approaches have been proposed by other researchers, e.g., [3,8].…”
Section: Phase II Of Diagnosismentioning
confidence: 78%
See 1 more Smart Citation
“…Fa then characterizes the deviation of F from these healthy states to pinpoint the probable cause of the failure. Similar approaches have been proposed by other researchers, e.g., [3,8].…”
Section: Phase II Of Diagnosismentioning
confidence: 78%
“…[7] applies Bayesian-network learning techniques to correlate performance metrics with high-level system behavior. Reference [3] is a recent example of the baseliningbased approach where a heuristic is proposed to capture and represent the baseline behavior of a Web service; and two techniques-one based on the χ 2 statistical test, and another based on naive Bayesian networks-are proposed to detect and categorize deviation from the baseline behavior. In [2], we describe a new clustering algorithm that pays particular attention to the failure data while clustering the healthy data; see Figure 2(b).…”
Section: Related Workmentioning
confidence: 99%
“…Other related error detection techniques utilize logs of the visited internet pages [11], CPU-instructions and function-calls [12], and messaging between components [13]. In sensor networks, [14] examines simple metrics on network performance.…”
Section: Related Work In Service Composition Learningmentioning
confidence: 99%
“…We believe, however, that human knowledge is still a fundamental part of the diagnosis process, as discussed before, and even if manual work could be partially automated, it cannot be ignored. It is worth mentioning that others, as for instance Bodik et al (2005) and Xu et al (2008), follow the same assumption. Current automated diagnosis techniques are appropriate only for superficial failures, not for those that need internal-logic hypotheses investigation, thus requiring human knowledge to complement the necessary information and determine the root cause.…”
Section: Richer Runtime Information Are Needed For Failure Diagnosismentioning
confidence: 81%
“…Some works (Takada & Koide, 2002;Stearley, 2004;Bodik et al, 2005;Tan et al, 2008) invest in visualization tools to assist manual inspection. These studies aim at solving the visual pollution problem of long log files by condensing the events and generating statistical graphs.…”
Section: Visualization Tools To Assist Manual Diagnosismentioning
confidence: 99%