As one core component of high-performance computing (HPC) platforms, parallel file systems (PFSes) grow quickly in scale and complexity, which makes them vulnerable to various failures or anomalies. Identifying PFS anomalies in runtime is thus critically helpful for HPC users and administrators. Analyzing runtime logs to detect the anomalies of large-scale systems has been proven effective in many recent studies. However, applying existing log analysis to PFSes faces significant challenges due to the large volume and irregularity of PFS logs. This study proposes SentiLog, a new approach to analyzing PFS logs for detecting anomalies. Unlike existing solutions, SentiLog works by training a general sentimental, natural language model based on the logging-relevant source code collected from a set of PFSes. In this way, SentiLog learns the implicit semantic information embedded in PFS by developers. Our preliminary results show that SentiLog can accurately predict anomalies and perform better than state-of-the-art log analysis solutions on two representative PFSes (i.e., Lustre and BeeGFS). To the best of our knowledge, this is the first work demonstrating that sentiment analysis could be a promising method to analyze complex and irregular system logs.
CCS CONCEPTS• Software and its engineering → Software maintenance tools; • Computing methodologies → Natural language processing; • Computer systems organization → Dependable and fault-tolerant systems and networks.