“…Representative works include failure-aware resource management and scheduling [10,15,20], checkpointing [6,18,24,38], proactive or adaptive runtime resilience support [14,29]. The advance of these technologies, however, greatly depends on whether we can predict the occurrence of failure, i.e., failure prediction.…”