2008
DOI: 10.14236/ewic/vecos2008.11
|View full text |Cite
|
Sign up to set email alerts
|

Operating System Support to Detect Application Hangs

Abstract: On-line failure detection is an essential means to control and assess the dependability of complex and critical software systems. In such context, effective detection strategies are required, in order to minimize the possibility of catastrophic consequences. This objective is however difficult to achieve in complex systems, especially due to the several sources of non-determinism (e.g., multi-threading and distributed interaction) which may lead to software hangs, i.e., the system is active but no longer capab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 19 publications
(17 reference statements)
0
4
0
Order By: Relevance
“…Finally, the toolset will include ready to use tools that use some more specific algorithms targeting fault detection, proposed by research community. Examples of these works are static threshold analysis [9] and statistical analysis algorithms for on-line fault detection [8].…”
Section: Data Storage and Analysis Toolsmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, the toolset will include ready to use tools that use some more specific algorithms targeting fault detection, proposed by research community. Examples of these works are static threshold analysis [9] and statistical analysis algorithms for on-line fault detection [8].…”
Section: Data Storage and Analysis Toolsmentioning
confidence: 99%
“…Statistical analysis algorithms have been used in the past for on-line fault detection [8]. This technique overcomes some of the limitations of static threshold analysis, that for instance in [9] monitoring techniques are used to detect application hangs. Works towards certifying OTS components are also not new.…”
Section: Introductionmentioning
confidence: 99%
“…We would rather have a system revert to its design or fail-safe state if it is performing the incorrect function or misbehaving than have a lively system that never hangs but executes erroneously. Furthermore, in some cases, the hang detection algorithms require a pre-determined threshold that is tuned during the training phase such as the work by Carrozza et al [15]. This is not very well suited for systems with highly dynamic environments and might require adaptive threshold techniques like the one presented by Bovenzi et al [13].…”
Section: Introductionmentioning
confidence: 99%
“…These methods are often concerned with the liveness property of a system versus its correctness. A hang condition can be either a result of infinite loops, also known as active hangs, or a result of permanent or extended wait conditions, known as passive hangs [15]. Irrera et al [31] argue that critical systems are too complex and have highly non-deterministic behavior that renders online fault detection for safetycritical systems too difficult of a task.…”
Section: Introductionmentioning
confidence: 99%