This paper develops and validates a methodology to detect small, incipient faults in software systems. Incipient faults such as memory leaks slowly deteriorate the software's performance over time and if left undetected, the end result is usually a complete system failure. The proposed method combines tools from information theory and statistics: entropy and principal component analysis (PCA). The entropy calculation summarizes the information content associated with the collected low-level metrics and reduces the computational burden incurred by the subsequent PCA step which detects underlying patterns and correlations present in the multivariate data, as well as distortions in the correlations indicative of an incipient fault. We use the technique to detect memory bloat within the Trade6 enterprise application under dynamic workload patterns, showing that small leaks can be detected quickly and with a low false alarm rate. Our method is also robust to the periodic/seasonal patterns affecting the metrics used to detect the fault.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.