Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles 2007
DOI: 10.1145/1294261.1294275
|View full text |Cite
|
Sign up to set email alerts
|

Triage

Abstract: Diagnosing production run failures is a challenging yet important task. Most previous work focuses on offsite diagnosis, i.e. development site diagnosis with the programmers present. This is insufficient for production-run failures as: (1) it is difficult to reproduce failures offsite for diagnosis; (2) offsite diagnosis cannot provide timely guidance for recovery or security purposes; (3) it is infeasible to provide a programmer to diagnose every production run failure; and (4) privacy concerns limit the rele… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 120 publications
(11 citation statements)
references
References 37 publications
(34 reference statements)
0
11
0
Order By: Relevance
“…Triage [Tucek et al 2007] uses dynamic slicing to diagnose failures at the user's site, which obviates privacy concerns. Despite that, it has limited support for concurrency bugs, being able to provide root cause isolation only for multithreaded programs running on uniprocessors.…”
Section: Related Workmentioning
confidence: 99%
“…Triage [Tucek et al 2007] uses dynamic slicing to diagnose failures at the user's site, which obviates privacy concerns. Despite that, it has limited support for concurrency bugs, being able to provide root cause isolation only for multithreaded programs running on uniprocessors.…”
Section: Related Workmentioning
confidence: 99%
“…An open-source flight simulator has been used to assess the proposal. In Tucek et al (2007) authors propose a system, called Triage, that automatically performs onsite software failure diagnosis. The system makes use of both kernel-level components and multiple re-executions of the target software to support failure diagnosis; during each re-execution, detailed data are collected via dynamic binary instrumentation to conduct the analysis of occurred failure and its causes.…”
Section: Code Instrumentation Approachesmentioning
confidence: 99%
“…In addition, the approach in Hiller et al (2004) requires measuring the error permeability for each input of each module, leading to a low scalability of the approach; while the tool (Hiller et al 2002a) addresses only single process software. The system proposed in Tucek et al (2007) uses kernel-level components and dynamic binary instrumentation, which is not allowed in critical production environments (e.g., mission critical systems) with stringent constraints imposed by certification standards and the use of obsolete kernel versions. Finally, the approaches (Hiller et al 2004;2002a;Johansson and Suri 2005) only address data errors, while those presented in Johansson and Suri (2005) and Calhoun et al (2017) are conceived only for OS device drivers and MPI applications, respectively.…”
Section: Code Instrumentation Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Software contains latent bugs Tucek et al 2007]. Although software testing helps identify these bugs, the schedule pressure often causes vendors to release software without comprehensive testing.…”
Section: Introductionmentioning
confidence: 99%