Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of 2019
DOI: 10.1145/3338906.3338961
|View full text |Cite
|
Sign up to set email alerts
|

Latent error prediction and fault localization for microservice applications by learning from system trace logs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
65
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 160 publications
(72 citation statements)
references
References 39 publications
0
65
0
1
Order By: Relevance
“…For online service systems, alerts are a key data source for recording the anomalies generated from various system components. More specifically, monitoring systems continuously collect various data (e.g., metrics [46], logs [19,31], and traces [55,56]) from various service components, and engineers manually define many rules to check these monitoring data to ensure service availability. When a certain rule is violated, an alert would be generated to report the anomaly.…”
Section: Motivation and Problem Formulation 21 Background: Alert And Imentioning
confidence: 99%
See 2 more Smart Citations
“…For online service systems, alerts are a key data source for recording the anomalies generated from various system components. More specifically, monitoring systems continuously collect various data (e.g., metrics [46], logs [19,31], and traces [55,56]) from various service components, and engineers manually define many rules to check these monitoring data to ensure service availability. When a certain rule is violated, an alert would be generated to report the anomaly.…”
Section: Motivation and Problem Formulation 21 Background: Alert And Imentioning
confidence: 99%
“…Nowadays, online service systems, such as online shopping, Ebank, and search engines, have become an indispensable part in our daily life. Although tremendous efforts have been devoted to software service maintenance (e.g., collecting various monitoring data for a service system such as metrics [44,46,54], logs [19,31,51], traces [55], and alerts [29]), due to their large scale and complexity, incidents (i.e., unplanned interruption/outage to a service [2, 16,25]) are still inevitable, which could lead to system unavailability and huge economic loss [32]. For example, according to a recent survey [1], the average cost per hour of server downtime is between $301,000 and $400,000.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…After parsing, the injector will use the ''chaosmonkey'' to inject faults. In previous work [38] and [39], researchers collected 22 representative microservice faults and listed the detail description of these faults. For those faults that result in the malfunctioning of system services by raising errors or producing incorrect results, researchers regard them as functional faults.…”
Section: ) Chaos Engineeringmentioning
confidence: 99%
“…In a microservice system, each request may result in a series of distributed service invocations executed synchronously or asynchronously. A service can have several to thousands of instances dynamically created, destroyed, and managed by a microservice discovery service (e.g., the service discovery component of Docker swarm) [21,22]. For a microservice system, operation engineers and developers highly rely on trace analysis to understand architectures and diagnose various problems.…”
Section: Introductionmentioning
confidence: 99%