IEEE INFOCOM 2020 - IEEE Conference on Computer Communications 2020
DOI: 10.1109/infocom41043.2020.9155219
|View full text |Cite
|
Sign up to set email alerts
|

Automatically and Adaptively Identifying Severe Alerts for Online Service Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 33 publications
0
13
0
Order By: Relevance
“…Zhao et al [6] reported an approach for handling alert storms consisting of alert storm detection using Extreme Value Theory (EVT), alert filtering using ML Isolation Forest method, alert clustering using Similarity Matrix Construction, and representative alert selection. Furthermore, Zhao et al [17] published another study on enhancing the quality of services by utilizing the monitoring data. Similarly, they analyzed alerts but with aim of identifying the severity level.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Zhao et al [6] reported an approach for handling alert storms consisting of alert storm detection using Extreme Value Theory (EVT), alert filtering using ML Isolation Forest method, alert clustering using Similarity Matrix Construction, and representative alert selection. Furthermore, Zhao et al [17] published another study on enhancing the quality of services by utilizing the monitoring data. Similarly, they analyzed alerts but with aim of identifying the severity level.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Hence, we improve the feedback loop from operations to development by introducing a new element, a smart filter, for optimization of alert to noise ratio. In the design process, we considered the insights gained through interviews, results of the intensive discussions with the development team, and state of the art solutions for alert management [6,17].…”
Section: Research Approachmentioning
confidence: 99%
“…Many existing work [4], [44] address this problem by reducing the duplicated or correlated alerts. For example, Zhao et al [45] aimed to recommend the severe alerts to engineers. Lin et al [39] proposed an alert correlation method to cluster semi-structured alert texts to gain insights from the clustering results.…”
Section: B Incident Managementmentioning
confidence: 99%
“…In the process of recovering a system, it is critical to conduct accurate and efficient root cause analysis (RCA) [2], the second one of a three-step process. In the first step, anomalies are detected with alerting mechanisms [3]- [5] based on monitoring data such as logs [6]- [10], metrics/key performance indicators (KPIs) [11]- [15], or a combination thereof [16], [17]. In the second step, when the alerts are triggered, RCA is performed to analyze the root cause of these and additional events and propose recovery actions from the associated incident [6], [18], [19].…”
Section: Introductionmentioning
confidence: 99%
“…To overcome the limited effectiveness of existing approaches [2], [3], [14], [16], [21]- [31] (as mentioned in Section II) in industrial settings due to the aforementioned complexities, we propose GROOT, an event-graph-based RCA approach. In particular, GROOT constructs an event causality graph, the basic nodes are monitoring events such as performance metrics deviation events, status change event and developer activity events.…”
Section: Introductionmentioning
confidence: 99%