2013
DOI: 10.1145/2490301.2451128
|View full text |Cite
|
Sign up to set email alerts
|

Production-run software failure diagnosis via hardware performance counters

Abstract: Sequential and concurrency bugs are widespread in deployed software. They cause severe failures and huge financial loss during production runs. Tools that diagnose production-run failures with low overhead are needed. The state-of-the-art diagnosis techniques use software instrumentation to sample program properties at run time and use off-line statistical analysis to identify properties most correlated with failures. Although promising, these techniques suffer from high run-time overhead, which is sometimes o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 41 publications
(69 reference statements)
0
13
0
Order By: Relevance
“…The high detector shows a limited performance overhead. 30 The detection results are shown in Figure 5. After adding the high detector, the SDC rate decreased from 20.49% to 4.45%, and shows that the detection effect of the high detector was obviously positive.…”
Section: Proposed Detection Mechanismsmentioning
confidence: 99%
“…The high detector shows a limited performance overhead. 30 The detection results are shown in Figure 5. After adding the high detector, the SDC rate decreased from 20.49% to 4.45%, and shows that the detection effect of the high detector was obviously positive.…”
Section: Proposed Detection Mechanismsmentioning
confidence: 99%
“…In other words, although sampling collects less data from each run at each end-user, to achieve statistical significance, more runs/endusers need to be involved and their data need to be transferred, leading to increased latency for failure diagnosis and delayed patch design. For example, under the common 1/100 or 1/1000 sampling rate, hundreds or thousands more failure runs need to be traced before sufficient predicates get sampled to produce statistically meaningful results [4,19,23]. Furthermore, a whole-program sampling infrastructure may lead to a large baseline overhead (e.g., more than 50%) that cannot be amortized through sampling [6].…”
Section: Problems and Motivationmentioning
confidence: 99%
“…The outcomes of these predicates are obtained through software instrumentation or hardware support [4], and constitute the profile of each run. Finally, a profile consists of a set of predicate counts, each recording the number of times a predicate is observed true during the run.…”
Section: Predicatesmentioning
confidence: 99%
See 2 more Smart Citations