Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022
DOI: 10.1145/3534678.3539041
|View full text |Cite
|
Sign up to set email alerts
|

Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 15 publications
0
12
0
Order By: Relevance
“…For the scoring step, we categorize the methods into random-walk-based and regression-based methods. We selected PageRank [40] as a random-walk-based method, following implementations in [15], [16], [24], [25] and HT [13], which is the only regression-based approach. As an exception, RCD does not have a separate scoring phase because it treats the failure as an intervention in the causal structure graph on the root fault metrics.…”
Section: ) Fault Localization Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the scoring step, we categorize the methods into random-walk-based and regression-based methods. We selected PageRank [40] as a random-walk-based method, following implementations in [15], [16], [24], [25] and HT [13], which is the only regression-based approach. As an exception, RCD does not have a separate scoring phase because it treats the failure as an intervention in the causal structure graph on the root fault metrics.…”
Section: ) Fault Localization Methodsmentioning
confidence: 99%
“…(ii) The anomaly-propagation methods localize root fault metrics by tracing the propagation of fault-induced anomalies in monitoring metrics [10]- [13], [15], [17], [20]- [22], [24]- [26], [35]. With these methods, fault localization is attributed to a source localization problem of signal propagation in complex networks, which is a well-studied problem in the field of network science [36].…”
Section: B Automated Fault Localizationmentioning
confidence: 99%
“…Furthermore, incorporating knowledge of the system architecture can improve the accuracy of the estimated causal graph by removing unnecessary or redundant connections between metrics and enforcing connections that are inherent in microservice systems. Some works [15,20] have developed a causal Bayesian network of the system using system knowledge and causal assumptions. However, to the best of our knowledge, no previous studies have combined instance-level variations in metric data with system knowledge to estimate a causal graph at the performance metric level, which is the main contribution of our research.…”
Section: Related Workmentioning
confidence: 99%
“…Throughout this paper, we consider these broad categories of metrics in our formulation, while individual monitoring metrics can be plugged into the categories. Similar to a previous approach [20], we define certain causal assumptions between the metric categories based on domain knowledge of system engineers to define a causal metric graph (Fig. 2).…”
Section: Metrics Data and Causal Assumptionsmentioning
confidence: 99%
See 1 more Smart Citation