Fault detection and handling are crucial tasks in cloud systems. As these infrastructures are growing and evolving, manual monitoring and interaction have become less feasible. To deal with this issue, monitoring systems are developed to track the behavior of the various components (e.g. nodes) in cloud systems, as well as the served applications in the virtual environment. Nowadays, most cloud environments provide graphics accelerators for their users leading to different problems. However, the application of GPUs in deep learning could also help the detection of incorrect behavior. In this paper, a short overview of cloud monitoring and fault detection methods is given focusing on GPU-enabled nodes.
One effective method for assessing the dependability of computer systems is fault injection. This deliberate technique introduces faults into a system to assess its resilience and ability to handle abnormal conditions. Therefore, this study investigates and simulates the different network problems in the CloudSim Plus environment. CloudSim Plus is a simulation framework that enables the modeling and simulation of cloud computing environments, allowing researchers and practitioners to evaluate the performance and behavior of cloud-based systems and algorithms. Network fault detection and its management are essential duties in cloud systems. Moreover, the feasibility of manual monitoring and involvement has decreased as these infrastructures expand and change. This paper briefly introduces network problems and fault injection outcomes in CloudSim Plus nodes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.